- SSD Congestion: SSD congestion is typically raised when the active working set of write IOs for a specific disk group is much larger than the size of the cache tier of the disk group. In both the hybrid and all-flash vSAN cluster, data is first written to the write cache (also known as write buffer). A process known as de-staging moves the data from the write buffer to the capacity disks. The write cache absorbs a high write rate, ensuring that the write performance does not get limited by the capacity disks. However, if a benchmark fills the write cache at a very fast rate, the de-staging process may not be able to keep pace with the arriving IO rate. In such cases, SSD congestion is raised to signal the vSAN DOM client layer to slow down IOs to a rate that the vSAN disk group can handle.
Remedy: To avoid SSD congestion, tune the size of the VM disks that the benchmark uses. For the best results, we recommend that the size of VM disks (active working set), be no larger than 40% of the cumulative size of the write caches across all disk groups. Please keep in mind that for a hybrid vSAN cluster, the size of the write cache is 30% the size of the cache tier disk. In an all-flash cluster, the size of the write cache is the size of the cache tier disk, but no greater than 600GB.
- Log Congestion: Log congestion is typically raised when vSAN LSOM Logs (which store the metadata of IO operations that have not been de-staged) consumes significant space in the write cache.
Typically, a large volume of small sized writes on a small working set can cause a large number of vSAN LSOM log entries and cause this type of congestion. Additionally, if the benchmark does not issue 4K aligned IOs, then the number of IOs on the vSAN stack get inflated accounting for 4K alignment. The higher number of IOs can lead to log congestion.
Remedy: Check if your benchmark aligns IO requests on the 4K boundary. If not, then check if your benchmark uses a very small working set (a small working set is when the total size of accessed VM disks is less than 10% of the size of caching tier). Please see above on how to calculate the size of the caching tier). If yes, please increase the working set to 40% of the size of the caching tier. If neither of the above two conditions hold true, you will need to reduce write traffic by either reducing the number of outstanding IOs that your benchmark issues, or decreasing the number of VMs that the benchmark is creating.
- Component Congestion (Comp-Congestion): This congestion indicates that there is a large volume of outstanding commit operations for some components resulting from the IO requests to those components getting queued. This can lead to worse latency. Typically, a heavy volume of writes to a few VM disks causes this congestion.
Remedy: Increase the number of VM disks that your benchmark uses. Make sure that your benchmark does not issue IOs to a few VM disks.
- Memory and Slab Congestion: Memory and slab congestion usually means that the vSAN LSOM layer is running out of heap memory space or slab space to maintain its internal data structures. vSAN provisions a certain amount of system memory for its internal operations. However, if a benchmark aggressively issues IOs without any throttling, it can lead to vSAN using up all of its allocated memory space.
Remedy: Reduce the working set of your benchmark. Alternatively, increase the following settings while experimenting with benchmarks to increase the amount of memory reserved for the vSAN LSOM layer. Please note that these settings are per disk group. Also, we do not recommend using these settings on a production cluster. These settings can be changed via esxcli (see KB 1038578) as follows:
/LSOM/blPLOGCacheLines, default=128K, increase to 512K
/LSOM/blPLOGLsnCacheLines, default=4K, tuned=32K
/LSOM/blLLOGCacheLines, default=128, increase to 32K
The types of congestion and remedies for each type are listed below: