Results 1 to 6 of 6

Thread: DRBD & Heartbeat not quite working as expected

  1. #1
    DougWare is offline Loyal Member
    Join Date
    Dec 2007
    Location
    Raleigh, NC
    Posts
    91
    Rep Power
    7

    Default DRBD & Heartbeat not quite working as expected

    After several days (heartbeat and DRBD are new to me) I've gotten Zimbra working with heartbeat, mostly.

    If Zimbra is working off Server-B and Server-B goes down, Zimbra transfers over to Server-A. The problem is that the servers reboot so quickly during a test (less than a minute) that Zimbra is about 90% started on Server-A when it receives a heartbeat command to transfer back to Server-B. Server-A takes a while to unmount /opt and both server's DRBD ends up going to Secondary/Secondary, the shared IP is never assigned again. I end up rebooting both servers and everything comes back up.

    auto_failback off is set to off on both servers, and heartbeat is set to prefer Server-A to start with.

    I've been pulling my hair out on this one, and these are new servers.
    2.66G 64bit Pentium Ds
    1G of RAM
    1 mailbox (I was still testing heartbeat and haven't setup the mailboxes yet)

    Does anyone know what I need to tweak?

    Doug

  2. #2
    mmorse's Avatar
    mmorse is offline Moderator
    Join Date
    May 2006
    Location
    USA
    Posts
    6,242
    Rep Power
    20

    Default

    Quote Originally Posted by DougWare View Post
    The problem is that the servers reboot so quickly during a test (less than a minute) that Zimbra is about 90% started on Server-A when it receives a heartbeat command to transfer back to Server-B.
    Did you remove zimbra from your runlevels on Server-A? (/etc/rc#.d/S99zimbra)

  3. #3
    DougWare is offline Loyal Member
    Join Date
    Dec 2007
    Location
    Raleigh, NC
    Posts
    91
    Rep Power
    7

    Default

    I did, but then I reinstalled Zimbra on Server-B.

    I guess I forgot to remove them again. I've removed them and I am restarting now to see if that corrects the problem.

    Thank you for pointing that out!

    Doug

  4. #4
    DougWare is offline Loyal Member
    Join Date
    Dec 2007
    Location
    Raleigh, NC
    Posts
    91
    Rep Power
    7

    Default

    Same outcome....

    Dec 17 20:23:48 mailserver1B heartbeat: [2506]: info: mailserver1a wants to go standby [foreign]
    Dec 17 20:23:49 mailserver1B heartbeat: [2506]: info: standby: acquire [foreign] resources from mailserver1a
    Dec 17 20:23:49 mailserver1B heartbeat: [2842]: info: acquire local HA resources (standby).
    Dec 17 20:23:49 mailserver1B heartbeat: [2842]: info: local HA resource acquisition completed (standby).
    Dec 17 20:23:49 mailserver1B heartbeat: [2506]: info: Standby resource acquisition done [foreign].
    Dec 17 20:23:49 mailserver1B heartbeat: [2506]: info: remote resource transition completed.

    Doug

  5. #5
    DougWare is offline Loyal Member
    Join Date
    Dec 2007
    Location
    Raleigh, NC
    Posts
    91
    Rep Power
    7

    Default

    Here's the same output from Server-A....

    Dec 17 20:23:11 mailserver1A heartbeat: [2498]: WARN: T_STARTING received during takeover.
    Dec 17 20:23:11 mailserver1A heartbeat: [2498]: info: remote resource transition completed.
    Dec 17 20:23:13 mailserver1A ResourceManager[18922]: info: Running /etc/ha.d/resource.d/IPaddr 192.168.2.20/24/bond0 stop
    Dec 17 20:23:13 mailserver1A IPaddr[24657]: INFO: ifconfig bond0:0 down
    Dec 17 20:23:13 mailserver1A IPaddr[24628]: INFO: Success
    Dec 17 20:23:13 mailserver1A ResourceManager[18922]: info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /opt reiserfs stop
    Dec 17 20:23:13 mailserver1A Filesystem[24719]: INFO: Running stop for /dev/drbd0 on /opt
    Dec 17 20:23:13 mailserver1A Filesystem[24719]: INFO: Trying to unmount /opt
    Dec 17 20:23:13 mailserver1A Filesystem[24719]: ERROR: Couldn't unmount /opt; trying cleanup with SIGTERM
    Dec 17 20:23:14 mailserver1A Filesystem[24719]: INFO: Some processes on /opt were signalled
    Dec 17 20:23:15 mailserver1A Filesystem[24719]: INFO: unmounted /opt successfully
    Dec 17 20:23:15 mailserver1A Filesystem[24708]: INFO: Success
    Dec 17 20:23:15 mailserver1A ResourceManager[18922]: info: Running /etc/ha.d/resource.d/drbddisk r0 stop
    Dec 17 20:23:15 mailserver1A kernel: drbd0: role( Primary -> Secondary )
    Dec 17 20:23:15 mailserver1A kernel: drbd0: Writing meta data super block now.
    Dec 17 20:23:15 mailserver1A heartbeat: [18896]: info: local HA resource acquisition completed (standby).
    Dec 17 20:23:15 mailserver1A heartbeat: [2498]: info: Standby resource acquisition done [all].
    Dec 17 20:23:15 mailserver1A harc[24828]: info: Running /etc/ha.d/rc.d/status status
    Dec 17 20:23:15 mailserver1A mach_down[24844]: info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
    Dec 17 20:23:15 mailserver1A mach_down[24844]: info: mach_down takeover complete for node mailserver1b.
    Dec 17 20:23:15 mailserver1A heartbeat: [2498]: info: mach_down takeover complete.
    Dec 17 20:23:15 mailserver1A harc[24878]: info: Running /etc/ha.d/rc.d/status status
    Dec 17 20:23:15 mailserver1A harc[24894]: info: Running /etc/ha.d/rc.d/status status
    Dec 17 20:23:15 mailserver1A harc[24910]: info: Running /etc/ha.d/rc.d/status status
    Dec 17 20:23:45 mailserver1A hb_standby[24946]: Going standby [foreign].
    Dec 17 20:23:45 mailserver1A heartbeat: [2498]: info: mailserver1a wants to go standby [foreign]
    Dec 17 20:23:45 mailserver1A heartbeat: [2498]: info: standby: mailserver1b can take our foreign resources
    Dec 17 20:23:45 mailserver1A heartbeat: [24960]: info: give up foreign HA resources (standby).
    Dec 17 20:23:45 mailserver1A heartbeat: [24960]: info: foreign HA resource release completed (standby).
    Dec 17 20:23:45 mailserver1A heartbeat: [2498]: info: Local standby process completed [foreign].
    Dec 17 20:23:46 mailserver1A heartbeat: [2498]: WARN: 1 lost packet(s) for [mailserver1b] [46:48]
    Dec 17 20:23:46 mailserver1A heartbeat: [2498]: info: remote resource transition completed.
    Dec 17 20:23:46 mailserver1A heartbeat: [2498]: info: No pkts missing from mailserver1b!
    Dec 17 20:23:46 mailserver1A heartbeat: [2498]: info: Other node completed standby takeover of foreign resources.

  6. #6
    tibby is offline Senior Member
    Join Date
    May 2010
    Location
    Budapest
    Posts
    56
    Rep Power
    4

    Default

    can you please tell me what's in your /etc/heartbeat/haresources file?
    I can't get zimbra to start and get it mounted from drbd with heartbeat

    Thanks,
    Tibby

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. [SOLVED] Zimbra on DRBD
    By prash in forum Administrators
    Replies: 60
    Last Post: 08-26-2012, 09:07 AM
  2. Zimlets all not working?
    By jadestorm in forum Administrators
    Replies: 16
    Last Post: 10-28-2007, 07:25 PM
  3. Catchall not working as expected?
    By jbwiv in forum Administrators
    Replies: 4
    Last Post: 02-24-2007, 09:45 PM
  4. Replies: 2
    Last Post: 08-24-2006, 02:12 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •