This is a common issue that we get in VSAN.
Examples of this alert are:
LSOM Memory Congestion State: Exceeded. Congestion Threshold: 200 Current Congestion: 201.
LSOM SSD Congestion State: Exceeded. Congestion Threshold: 200 Current Congestion: 201.
Congestion in vSAN occurs when the I/O rate of the lower layers of the storage subsystem fails to keep up with the I/O rate of the higher layers.
Local Log Structured Object Management (LSOM) is an internal component of vSAN, that works at the physical disk level (both flash devices and magnetic disks). LSOM also handles the read caching and write buffering for the components.
SSD is a cache device for a vSAN disk group.
The LSOM memory congestion state and LSOM SSD congestion state occur when vSAN artificially introduces latencies in the virtual machines in order to slow down writes to the flash device layer or layers.
During an observed congestion period, higher virtual machine latencies occur.
Short periods of congestion might occur as vSAN uses a throttling mechanism to ensure that all layers run at the same I/O rate.
Smaller values for congestion are preferable, as higher value signifies latency. However, sustained congestion are not usual and in most cases, congestion should be close to zero.
If virtual machines perform a high number of write operations, write buffers could fill up on flash cache devices. These buffers must be de-staged to magnetic disks in hybrid configurations. De-staging can only be performed at a rate at which the magnetic disks in a hybrid configuration can handle.
Other reasons for congestion could be related to:
>>Corrupted or incorrectly functioning drivers or firmware
>>Insufficient I/O controller queue depths