The Joy of Server Clusters
25 September 2009

A while ago I was given the responsibility of server maintenance. Not just any servers, server clusters. We are using DRBD 8 and Heartbeat for our clusters. When it's working, it's a thing of beauty, setting it up however, can be challenging.

I have been working with one of our new clients who had wanted to do their own server setup but couldn't get the DRBD disk syncing to work. I had initially helped them get it to the point where it was syncronizing over eth0 (eth1 was preferred) and then a couple days later nothing was working.

What I found out was that DRBD had become split-brained at one point and now both nodes would default to Secondary and disconnected states. The trick was to get them back together again. Fortunately, this was a new client so there was no worry about losing any data.

Through the usual Google searches I discovered that I needed to do the following:
1. Stop DRBD on the secondary
# /etc/init.d/drbd stop

2. Restart DRBD on the primary
# /etc/init.d/drbd restart

3. Make sure the primary is primary. In this case it would start as primary and then switch to secondary when starting.
# drbdadm primary [resource]

4. Invalidate the data on the secondary. This will force it do re-sync with the primary.
# modprobe drbd  -- we need the drbd module to be loaded
# drbdadm invalidate [resource]
# drbdadm adjust [resource]

At this point, /proc/drbd should give you a syncronization progress
[user@Primary ~]# cat /proc/drbd
version: 8.2.6 (api:88/proto:86-88)
GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by buildsvn@c5-i386-build, 2008-10-03 11:43:01
 0: cs:SyncSource st:Primary/Secondary ds:UpToDate/Inconsistent C r---
    ns:78160576 nr:0 dw:0 dr:78160576 al:0 bm:4770 lo:0 pe:0 ua:0 ap:0 oos:395131820
    [==>.................] sync'ed: 16.6% (385870/462199)M
    finish: 10:28:23 speed: 10,436 (10,232) K/sec

That's what it took for me to fix my situation. It's possible you might not need the exact same solution, but maybe it will help.