We have difficulties setting up Zimbra in combination with DRBD and Heartbeat. This is our situation:
We are setting up Zimbra, using DRBD and Heartbeat to make it highly available. We hired 2 dedicated servers in two separate datacenters, so that zimbra will still be available when either network goes down.
In any Heatbeat & DRBD documentation that I've seen the two machines are in the same local network. A virtual ip address is then used to always make the active machine available at the same address, and the domain name for the zimba interface (eg. mail.domain.com) resolves to that virtual ip address.
However, in our situation we can't use a virtual ip address, since it would only be known locally, and our servers are in two completely separated networks.
The first solution we came up with, was to run our own DNS server and update the entry for the zimbra interface (I'll stick with mail.domain.com for the rest of this post) whenever the primary server fails. In more detail:
server 1:
ip: 130.0.0.45
primary server, running zimbra
server 2:
ip: 145.0.0.50
secondary server, waiting until server 1 fails
mail.domain.com points to 130.0.0.45
(the ip addresses used aren't the real ips, they're just there to show that the servers are in different networks)
Now, when server 1 fails, server 2 notices this through Heartbeat, mounts the drbd partition, fires up Zimbra, and updates the A record for mail.domain.com to point to 145.0.0.50. This all works fine. The problem is that DNS entries are being cached client-side, so a person visiting mail.domain.com will still be directed to 130.0.0.45. We tried setting the time-to-live value in Bind really low (120 seconds), but this seems to be ignored by most clients we tried with. (
afaik DNS clients aren't required to throw away their cached values after the ttl has passed). In case of a failover, it can take hours before clients seem to notice the destination of mail.domain.com has changed, breaking the point of using a high availability setup (we want such a setup because it offers fast recovery of the zimbra service in case of a failure, if it's going to take hours we might just as well use a single machine).
Another option we thought of was to create a virtual LAN, so that we could use a virtual ip address and use the setup as shown in most examples. The problem with this option is that a single external ip address has to be forwarded to the virtual ip address of the active server. If the network in which the router that does this forwarding resides goes down, our server still becomes unavailable, making it dependent on a single network again. We chose to hire two geographically separated servers so that not only hardware failures could be overcome quickly, but also network failures. So again, this is not an option.
(Also, we would need the cooperation of both our hosts to create a virtual LAN and it's quite complex, so we're not really eager to use such a setup.)
Now, our question:
How can we setup DRBD and Heartbeat in such way that when either a network failure or hardware failure occurs, Zimbra is made available again quickly?
If there would be a solution that only requires us to tweak our setup using a DNS update a little this would be great, however, suggestions that need a total rearrange are also welcome.