CLOMD (Cluster Level Object Manager Daemon) plays a key role in the operation of a vSAN cluster. It runs on every ESXi host and is responsible for new object creation, initiating repair of existing objects after failures, all types of data moves and evacuations (For example: Enter Maintenance Mode, Evacuate data on disk removal from vSAN), maintaining balance and thus triggering rebalancing, implementing policy changes, etc.
It does not actually participate in the data path, but it triggers data path operations and as such is a critical component during a number of management workflows and failure handling scenarios.
Virtual machine power on, or Storage vMotion to vSAN are two operations where CLOMD is required (and which are not that obvious), as those operations require the creation of a swap object, and object creation requires CLOMD.
Similarly, starting with vSAN 6.0, memory snapshots are maintained as objects, so taking a snapshot with memory state will also require the CLOMD.
Cluster health – CLOMD liveness check :
This checks if the Cluster Level Object Manager (CLOMD) daemon is alive or not. It does so by first checking that the service is running on all ESXi hosts, and then contacting the service to retrieve run-time statistics to verify that CLOMD can respond to inquiries.
Note: This does not ensure that all of the functionalities discussed above (For example: Object creation, rebalancing) actually work, but it gives a first level assessment as to the health of CLOMD.
CLOMD ERROR :
If any of the ESXi hosts are disconnected, the CLOMD liveness state of the disconnected host is shown as unknown. If the Health service is not installed on a particular ESXi host, the CLOMD liveness state of all the ESXi hosts is also reported as unknown.
If the CLOMD service is not running on a particular ESXi hosts, the CLOMD liveness state of one host is abnormal.
For this test to succeed, the health service needs to be installed on the ESXi host and the CLOMD service needs to be running. To get the state status of the CLOMD service, on the ESXi host, run this command:
If the CLOMD health check is still failing after these steps or if the CLOMD health check continues to fail on a regular basis, open a support request with VMware Support.
In the /var/run/log/clomd.log file, you see logs similar to:
2017-04-19T03:59:32.403Z 120360 (482850097440)(opID:1804289387)CLOMProcessWorkItem: Op REPAIR starts:1804289387
2017-04-19T03:59:32.403Z 120360 (482850097440)(opID:1804289387)CLOMReconfigure: Reconfiguring aae9cf268-cd5e-abc4-448d-050010d45c96 workItem type REPAIR
2017-04-19T03:59:32.408Z 120360 (482850097440)(opID:1804289387)CLOMReplacementPreWorkRepair: Repair needed. 1 absent/degraded data components for ae9cf268-cd5e-abc4-448d-050010d45c96 found
^^^ Here, CLOMD crashed while attempting to repair object with UUID ae9cf268-cd5e-abc4-448d-050010d45c96 . The vSAN health check will report CLOMD liveness issue. A CLOMD restart will fail because each time it is restarted, it will fail again while attempting to repair the 0 sized object. Swap objects can be the only vSAN objects that can be zero sized, so this issue can occus only with swap objects.