19.4. The Software Side of High Availability

The key software aspects of high availability solutions are described below.

19.4.1. heartbeat

heartbeat is a package that is used to monitor all the nodes used in the cluster. heartbeat exchanges “heartbeats” on the network interfaces of the members of the cluster to find out which nodes in the cluster are active. If a node fails, it does not emit a signal. In this case, heartbeat ensures that another node takes over the relevant tasks and identity and makes the failover known within the network. This means that the cluster remains consistent. At present, the heartbeat failover function is limited to two nodes.

19.4.2. RAID

RAID (redundant array of independent disks) brings together several hard disk partitions to form a large virtual hard disk. RAID can be used to optimize the performance and data security of your system. RAID levels 1 and 5 offer protection against the failure of a disk because the data is recorded on several disks at the same time. This ensures that the complete data record is always available on another disk in the system should a disk fail. Find more information about RAID with SUSE LINUX in Section 3.11. “Soft RAID”.

19.4.3. rsync

rsync can be used to synchronize large amounts of data between a server and its backup. rsync has sophisticated mechanisms for only transferring changes to files. This applies not only to text files, but also to binary files. To enable the differences between files to be identified, rsync divides the files into blocks and calculates checksums for these blocks. Find more information about rsync in Section 23.6. “Introduction to rsync”.

19.4.4. DRBD

Distributed replicated block device (drbd) mirrors (RAID1) partitions and logical volumes (data areas) by means of a normal network on the basis of TCP/IP. Each node has a particular drbd resource active and all changes are mirrored as secure transactions.

drbd has additional features in comparison with RAID1 for local disks that enable the resynchronization time to be minimized after the two nodes have been disconnected briefly and a robust check after various malfunctions to establish which side has the latest, consistent data.