Product SiteDocumentation Site

14.3. Operation History

A node’s resource history is held in the lrm_resources tag (a child of the lrm tag). The information stored here includes enough information for the cluster to stop the resource safely if it is removed from the configuration section. Specifically, the resource’s id, class, type and provider are stored.

Example 14.3. A record of the apcstonith resource

<lrm_resource id="apcstonith" type="apcmastersnmp" class="stonith"/>
Additionally, we store the last job for every combination of resource, action and interval. The concatenation of the values in this tuple are used to create the id of the lrm_rsc_op object.

Table 14.3. Contents of an lrm_rsc_op job

FieldDescription
id
Identifier for the job constructed from the resource’s id, operation and interval.
call-id
The job’s ticket number. Used as a sort key to determine the order in which the jobs were executed.
operation
The action the resource agent was invoked with.
interval
The frequency, in milliseconds, at which the operation will be repeated. A one-off job is indicated by 0.
op-status
The job’s status. Generally this will be either 0 (done) or -1 (pending). Rarely used in favor of rc-code.
rc-code
The job’s result. Refer to the Resource Agents section of Pacemaker Administration for details on what the values here mean and how they are interpreted.
last-run
Machine-local date/time, in seconds since epoch, at which the job was executed. For diagnostic purposes.
last-rc-change
Machine-local date/time, in seconds since epoch, at which the job first returned the current value of rc-code. For diagnostic purposes.
exec-time
Time, in milliseconds, that the job was running for. For diagnostic purposes.
queue-time
Time, in seconds, that the job was queued for in the LRMd. For diagnostic purposes.
crm_feature_set
The version which this job description conforms to. Used when processing op-digest.
transition-key
A concatenation of the job’s graph action number, the graph number, the expected result and the UUID of the controller instance that scheduled it. This is used to construct transition-magic (below).
transition-magic
A concatenation of the job’s op-status, rc-code and transition-key. Guaranteed to be unique for the life of the cluster (which ensures it is part of CIB update notifications) and contains all the information needed for the controller to correctly analyze and process the completed job. Most importantly, the decomposed elements tell the controller if the job entry was expected and whether it failed.
op-digest
An MD5 sum representing the parameters passed to the job. Used to detect changes to the configuration, to restart resources if necessary.
crm-debug-origin
The origin of the current values. For diagnostic purposes.

14.3.1. Simple Operation History Example

Example 14.4. A monitor operation (determines current state of the apcstonith resource)

<lrm_resource id="apcstonith" type="apcmastersnmp" class="stonith">
  <lrm_rsc_op id="apcstonith_monitor_0" operation="monitor" call-id="2"
    rc-code="7" op-status="0" interval="0"
    crm-debug-origin="do_update_resource" crm_feature_set="3.0.1"
    op-digest="2e3da9274d3550dc6526fb24bfcbcba0"
    transition-key="22:2:7:2668bbeb-06d5-40f9-936d-24cb7f87006a"
    transition-magic="0:7;22:2:7:2668bbeb-06d5-40f9-936d-24cb7f87006a"
    last-run="1239008085" last-rc-change="1239008085" exec-time="10" queue-time="0"/>
</lrm_resource>
In the above example, the job is a non-recurring monitor operation often referred to as a "probe" for the apcstonith resource.
The cluster schedules probes for every configured resource on a node when the node first starts, in order to determine the resource’s current state before it takes any further action.
From the transition-key, we can see that this was the 22nd action of the 2nd graph produced by this instance of the controller (2668bbeb-06d5-40f9-936d-24cb7f87006a).
The third field of the transition-key contains a 7, which indicates that the job expects to find the resource inactive. By looking at the rc-code property, we see that this was the case.
As that is the only job recorded for this node, we can conclude that the cluster started the resource elsewhere.