Split brain syndrome, in a clustering context, is a state in which a cluster of nodes gets divided (or partitioned) into smaller clusters of equal numbers of nodes, each of which believes it is the only active cluster.

Lets assume that the other clusters are dead, each cluster may simultaneously access the same application data or disks, which can lead to data corruption. A split brain situation is created during cluster reformation. When one or more node fails in a cluster, the cluster reforms itself with the available nodes. During this reformation, instead of forming a single cluster, multiple fragments of  the cluster with an equal number of nodes may be formed. Each cluster fragment assumes that it is the only active cluster — and that other clusters are dead — and starts accessing the data or disk. Since more than one cluster is accessing the disk, the data gets corrupted.

 

Here’s how it works in more detail:

  • Let’s say there are 5 nodes A,B,C,D and E which form a cluster, TEST.
  • Now a node (say A) fails.
  • Cluster reformation takes place. Actually, the remaining nodes B,C,D and E should form cluster TEST.
  • But split brain situation may occur which leads to formation of two clusters TEST1(containing B and C) and TEST2(containing D and E).
  • Both TEST1 and TEST2 clusters think that they are the only active cluster. Both clusters start accessing the data or disk, leading to data corruption.

HA clusters are all vulnerable to split brain syndrome and should use some mechanism to avoid it. Clustering tools, such as Pacemaker, HP ServiceGuard, CMAN and LinuxHA, generally include such mechanisms.