Hi,
Having a cold standby server in a different location would be nice. But at this point you can't use clustering anymore. So I spent some time and build a cold standby. Here is how I did it.
System: ZCS 4.5.6 NE on Ubuntu 6.06.1
Step one:
Create a 1:1 copy of your server. You can use what ever you prefer LVM snapshot, physical hard drive copy, rsync. I used rsync as I don't have physical access to server. Important for this step: zimbra needs to be down to make sure its really 1:1.
Step two:
Adjusting DNS. As zimbra wont start if the DNS is not correct we have to fake a bit.
Lets say the primary server is: zmail.mydomain.tld - add an additional DNS entry for this server zmail2.mydomain.tld
Install a local DNS server on your cold standby server - I used dnsmasq.
We need this DNS server so the cold standby server can use the FQDN of your primary server while having a different IP. To do this I added this line in /etc/dnsmasq.conf:
Code:
address=/zmail.mydomain.tld/192.168.1.100
and changed /etc/hosts to:
Code:
127.0.0.1 localhost.localdomain localhost
192.168.1.100 zmail.mydomain.tld zmail
Now the cold standby server can use "zmail.mydomain.tld" to run a local zimbra configured for you primary server and still access the primary server using zmail2.mydomain.tld.
Step three:
Configure password less ssh using ssh keys - we need this to use rsync in a cron job.
Step four:
To be sure nothing is going wrong while syncing the backups I moved /opt/zimbra/backup to /zmailbackup.
Step five:
Create some scripts to control the sync / restore and if your server is "cold standby" or "active"
/root/coldstandby is just a file. I use it to check if the server is in "cold standby" modus or live.
/root/change.zimbra.status.sh is used to change the function from "cold standby" to "live". If the server is live you don't want to sync with your primary server anymore.....
Code:
#!/bin/bash
case $1 in
"status")
if [ -f /root/coldstandby ]; then
echo "server is in cold standby modus"
else
echo "server is LIVE"
fi
;;
"cold")
echo "switching into cold modus"
echo "... remove start scripts"
update-rc.d -f zimbra remove
echo "... activate sync, backup, restore"
echo "if this file is missing server is live" > /root/coldstandby
echo "... stop zimbra"
/etc/init.d/zimbra stop
echo "... done"
;;
"hot")
echo "switching into live modus"
echo "... install start scripts"
update-rc.d zimbra defaultis 99
echo "... deactivate sync, backup, restore"
rm /root/coldstandby
echo "... check if a restore is running"
RESTORE=`ps fax | grep -i java | grep -i restore | wc -l`
if [ $RESTORE -gt 0 ]; then
while [ $RESTORE -gt 0 ]; do
echo "!!! FOUND ACTIVE RESTORE PROCESS PLEASE WAIT UNTIL FINISHED !!!"
echo "... waiting for 5 min, and check again"
sleep 3000
RESTORE=`ps fax | grep -i java | grep -i restore | wc -l`
done
fi
echo "... no restore runnin anymore ... going live now"
/etc/init.d/zimbra stop
/etc/init.d/zimbra start
;;
*)
echo "help..."
echo "switch to live modus: ./change.zimbra.status hot"
echo "switch to cold modus: ./change.zimbra.status cold"
echo "query status: ./change.zimbra.status status"
;;
esac /root/sync.live.server.sh syncs the backup folder with the primary server and starts the restore.
Code:
#!/bin/bash
if [ -f /root/coldstandby ];
then
echo "`date`: start syncing backups" >> /var/log/zimbra.cold.log
rsync -a root@zmail2.mydomain.tld:/opt/zimbra/backup/* /zmailbackup/
echo "`date`: start restoring backups" >> /var/log/zimbra.cold.log
su - zimbra /opt/zimbra/cold.restore.sh
echo "`date`: restore finished"
else
echo "`date`: no sync/restore done server considered to be LIVE" >> /var/log/zimbra.cold.log
fi
/opt/zimbra/cold.restore.sh restores the backup
Code:
#!/bin/bash
zmcontrol stop
LABEL=`zmrestoreldap -lbs -t /zmailbackup | sed -n 1p`
zmrestoreldap -lb $LABEL -t /zmailbackup
zmmailboxctl start
zmrestore -a "all" --ignoreRedoErrors -t /zmailbackup
zmcontrol stop
I had trouble with RedoErrors and the only way I could get it to work was using the "--ignoreRedoErrors" option. I also tried to use zmrestoreoffline - but this did not work at all for me. The "zmcontrol stop" at the beginning and the end are just for safety.
The only thing that is left is a cron job. I have this line in my /etc/crontab:
Code:
55 */2 * * * root /root/sync.live.server.sh
For me this is 30min after the server did it's backup - which works fine for me.
I know that the "cold standby" server is never 100% up2date but this is ok for me.
Cheers
Andre