Zimbra offers Open Source email server software and shared calendar for Linux and the Mac
Go Back   Zimbra :: Forums > Zimbra Collaboration Suite > Administrators

Welcome to the Zimbra :: Forums!
Welcome, if you would like to post a comment please register. We also encourage you to explore all things Zimbra with our team and members of the community.

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 03-08-2010, 01:29 AM
Loyal Member
 
Posts: 82
Default Inconsistent backup of Zimbra

Hi all !

I am using on of the scripts found in the wiki to backup my Zimbra server.
I have two servers (one in production and one pn my LAN), both are running Centos 5.4 and Zimbra 6.0.5.
I used the method described in Open Source Edition Backup Procedure : 1.2 More elaborated script, using LVM and rsync
I just modified the rsync options, because rsync -aAK complained about ACL errors (I don't understand why this "A" option was there, so I removed it and used -aHK).

The process is :
  • At 01:00, on production server :
    • Stop zimbra
    • Create Snapshot LV of /opt
    • Start zimbra
    • rsync -aHK /opt-snapshot/zimbra to /opt.bak/zimbra.01h00
    • Remove Snapshot LV (process takes less than 10 minutes)
  • At 02:00, on backup server :

When I checked my local backup today, it was stopped and could not start :
Code:
        Starting ldap...Done.
Failed.
Failed to start slapd.  Attempting debug start to determine error.
hdb_db_open: database "": db_open(/opt/zimbra/data/ldap/hdb/db/id2entry.bdb) failed: Invalid argument (22).
backend_startup_one (type=hdb, suffix=""): bi_db_open failed! (22)
bdb_db_close: database "": alock_close failed
So I guess my backup is NOT consistent, even though I stopped zimbra before creating the snapshot.

Any idea of what went wrong here ?
As a side note, here is the email I received from the latest cron job (indicating that everything ran fine apparently) :
Code:
2010-03-08 01:00:01 zimbra backup: backup started
2010-03-08 01:00:01 zimbra backup: stopping the Zimbra services, this may take some time
2010-03-08 01:00:32 zimbra backup: creating a LV called LogVolOptSnapshot
  Logical volume "LogVolOptSnapshot" created
2010-03-08 01:00:33 zimbra backup: starting the Zimbra services in the background.....
2010-03-08 01:00:33 zimbra backup: creating mountpoint for the LV
2010-03-08 01:00:33 zimbra backup: mounting the snapshot LogVolOptSnapshot
2010-03-08 01:00:33 zimbra backup: rsyncing the snapshot to the backup directory 
2010-03-08 01:01:28 zimbra backup: unmounting the snapshot
2010-03-08 01:01:28 zimbra backup: pausing 1s and syncing before removing the snapshot from LVM
2010-03-08 01:01:30 zimbra backup: removing the snapshot
  Logical volume "LogVolOptSnapshot" successfully removed
2010-03-08 01:01:31 zimbra backup: backup ended
2010-03-08 01:03:59 zimbra backup: services background startup completed

I post the complete scripts below :

Backup script on production server :
Code:
#!/bin/bash
#
#    Script to backup a Zimbra installation (open source version)
#    by installing the Zimbra on a separate LVM Logical Volume,
#    taking a snapshot of that partition after stopping Zimbra,
#    restarting Zimbra services, then rsyncing the snapshot to a
#    separate backup point.

#    This script was originally based on a script found on the Zimbra wiki
#    http://wiki.zimbra.com/index.php?title=Open_Source_Edition_Backup_Procedure
#    and totally rewritten since then.

#    Copyright (C) 2007 Serge van Ginderachter <svg@ginsys.be>
#
#    This program is free software; you can redistribute it and/or modify
#    it under the terms of the GNU General Public License version 2 as
#    published by the Free Software Foundation.
#
#    This program is distributed in the hope that it will be useful,
#    but WITHOUT ANY WARRANTY; without even the implied warranty of
#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#    GNU General Public License for more details.
#
#    You should have received a copy of the GNU General Public License
#    along with this program; if not, write to the Free Software
#    Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
#    Or download it from http://www.gnu.org/licenses/old-licenses/gpl-2.0.html

####################################################################################

# Read config
source /root/zmbackup/zmbackup_01h00.config

##########################################
# Do not change anything beyond this point
##########################################

pause() {
        if [ -n "$debug" ]; then
        echo "Press Enter to execute this step..";
        read input;
        fi
        }

say() {
        MESSAGE_PREFIX="zimbra backup:"
        MESSAGE="$1"
        TIMESTAMP=$(date +"%F %T")
        echo -e "$TIMESTAMP $MESSAGE_PREFIX $MESSAGE"
        logger -t $log_tag -p $log_facility.$log_level "$MESSAGE"
        logger -t $log_tag -p $log_facility_mail.$log_level "$MESSAGE"
        pause
        }

error ()  {
        MESSAGE_PREFIX="zimbra backup:"
        MESSAGE="$1"
        TIMESTAMP=$(date +"%F %T")
        echo -e $TIMESTAMP $MESSAGE >&2
        logger -t $log_tag -p $log_facility.$log_level_err "$MESSAGE"
        logger -t $log_tag -p $log_facility_mail.$log_level_err "$MESSAGE"
        echo $TIMESTAMP $MESSAGE | mail -s "Zimbra Backup - Erreur" arnaud.lesauvage@codata.eu
        exit
        }

# load kernel module to enable LVM snapshots
/sbin/modprobe dm-snapshot || error "Error loading dm-snapshot module"

# Output date
say "backup started"

# Stop the Zimbra services
say "stopping the Zimbra services, this may take some time"
/etc/init.d/zimbra stop || error "error stopping Zimbra"
[ "$(ps -u zimbra -o "pid=")" ] && kill -9 $(ps -u zimbra -o "pid=") #added as a workaround to zimbra bug 18653

# Create a logical volume called ZimbraBackup
say "creating a LV called $zm_snapshot"
$LVCREATE -L $zm_snapshot_size -s -n $zm_snapshot /dev/$zm_vg/$zm_lv  || error "error creating snapshot, exiting"

# Start the Zimbra services
say "starting the Zimbra services in the background....."
(/etc/init.d/zimbra start && say "services background startup completed") || error "services background startup FAILED" &

# Create a mountpoint to mount the logical volume to
say "creating mountpoint for the LV"
mkdir -p $zm_snapshot_path || error "error creating snapshot mount point $zm_snapshot_path"

# Mount the logical volume snapshot to the mountpoint
say "mounting the snapshot $zm_snapshot"
mount /dev/$zm_vg/$zm_snapshot $zm_snapshot_path

# Create the current backup
say "rsyncing the snapshot to the backup directory $backup_dir"
rsync -aHK$V --delete --inplace $zm_snapshot_path/$zm_path/ $zm_backup_path || say "error during rsync but continuing the backup script"

# Unmount $zm_snapshot from $zm_snapshot_mnt
say "unmounting the snapshot"
umount $zm_snapshot_path || error "error unmounting snapshot"

# Delete the snapshot mount dir
rmdir $zm_snapshot_path

# Remove the snapshot volume
# https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.15/+bug/71567
say "pausing 1s and syncing before removing the snapshot from LVM"
sleep 1 ; sync
say "removing the snapshot"
$LVREMOVE --force /dev/$zm_vg/$zm_snapshot  || say "error removing the snapshot"

# Done!
say "backup ended"
date >$zm_backup_path/lastsync
Backup Script configuration file (/root/zmbackup/zmbackup_01h00.config) :
Code:
#!/bin/bash
#
#    Copyright (C) 2007 Serge van Ginderachter <svg@ginsys.be>
#
#    This program is free software; you can redistribute it and/or modify
#    it under the terms of the GNU General Public License version 2 as
#    published by the Free Software Foundation.
#
#    This program is distributed in the hope that it will be useful,
#    but WITHOUT ANY WARRANTY; without even the implied warranty of
#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#    GNU General Public License for more details.
#
#    You should have received a copy of the GNU General Public License
#    along with this program; if not, write to the Free Software
#    Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
#    Or download it from http://www.gnu.org/licenses/old-licenses/gpl-2.0.html


#### Modify the following variables according to your installation

# backup_dir - directory to backup to
zm_backup_path=/opt.bak/zimbra.01h00/

# zm_lv - the Logical Volume that contains /opt/zimbra - /opt mount point expected
zm_lv=LogVolOpt

# vol_group - the Volume Group that contains $zm_lv
zm_vg=VolGroupOpt

# zimbra_path - the path beneath the Logical Volume $zm_lv that needs to be synced
zm_path=zimbra

# zm_lv_fs - the file system type (ext3, xfs, ...) in /opt/zimbra
zm_lv_fs=ext3

# lvcreate lvremove - path and command for the lvm logical volume creation and deletion command
LVCREATE=/usr/sbin/lvcreate
LVREMOVE=/usr/sbin/lvremove

#### Modify the following variables according to your taste and needs

# zmsnapshot - the snapshot volume name for $zm_lv
zm_snapshot=LogVolOptSnapshot

# zmsnapshot_size - size avalable for growing the snapshot
zm_snapshot_size=20GB

# zm_snapshot_mnt - zimbra snapshot mount point
zm_snapshot_path=/tmp/opt.snapshot

# rsync verbose set to "v"
# V=v
V=

#  pause at each step if $debug is set to a non-zero string
debug=

#### Following parameters probably shouldn't need to be changed

log_facility=daemon
log_facility_mail=mail
log_level=notice
log_level_err=error
log_tag="$0"
Local script to sync backup server :
Code:
#!/bin/bash

/etc/init.d/zimbra stop
[ "$(ps -u zimbra -o "pid=")" ] && kill -9 $(ps -u zimbra -o "pid=")
rsync -avzHK --delete -e "ssh -i <my_private_key>" root@<production_server_ip_address>:/opt.bak/zimbra.01h00/ /opt/zimbra/
/etc/init.d/zimbra start
Any help would be greatly appreciated !
Thanks a lot !
Reply With Quote
  #2 (permalink)  
Old 03-08-2010, 01:36 AM
Loyal Member
 
Posts: 82
Default

NB : I just synced the local backup server with the live "/opt/zimbra" folder, and zimbra started just fine.
Something must be wrong with the backup script I guess !
Reply With Quote
  #3 (permalink)  
Old 03-08-2010, 07:07 AM
Moderator
 
Posts: 7,928
Default

You may wish to add some debugging around
Code:
[ "$(ps -u zimbra -o "pid=")" ] && kill -9 $(ps -u zimbra -o "pid=")
It is possible that by the time the split occurred some processes were still active. Using -9 to kill still running processes may have zapped a LDAP one. It would be safer to just use kill -SIGHUP. After the zmcontrol stop run a ps that dumps any zimbra processes to a file and see what is happening.
__________________
Reply With Quote
  #4 (permalink)  
Old 03-08-2010, 10:08 AM
Loyal Member
 
Posts: 82
Default

Quote:
Originally Posted by uxbod View Post
You may wish to add some debugging around
Code:
[ "$(ps -u zimbra -o "pid=")" ] && kill -9 $(ps -u zimbra -o "pid=")
It is possible that by the time the split occurred some processes were still active. Using -9 to kill still running processes may have zapped a LDAP one. It would be safer to just use kill -SIGHUP. After the zmcontrol stop run a ps that dumps any zimbra processes to a file and see what is happening.
Am I right to asssume that "kill -9" really kills the process whereas kill -SIGHUP sends a stop signal to it ? (sorry I am not very linux savvy)

Outputting ps to a file is a good idea,
I'll add "ps -aux > ps_zmcontrol.txt" after the zmcontrol stop AND "ps -aux > ps_kill.txt" after the kill to see what the differences are and whether there are sole suspicious processes still running.

Thanks for the advice !
Reply With Quote
  #5 (permalink)  
Old 03-08-2010, 11:54 PM
Loyal Member
 
Posts: 82
Default

Still the same error trying to start the backup server this morning !

Output of ps_zmcontrol.txt looks very clean to me ! I am not surprised though : since I upgraded to 6.0.5, I never had any more process running after a "zmcontrol stop".

ps_zmcontrol.txt :
Code:
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0   2072   628 ?        Ss   Feb26   0:02 init [3]
root         2  0.0  0.0      0     0 ?        S<   Feb26   0:00 [migration/0]
root         3  0.0  0.0      0     0 ?        SN   Feb26   0:00 [ksoftirqd/0]
root         4  0.0  0.0      0     0 ?        S<   Feb26   0:00 [watchdog/0]
root         5  0.0  0.0      0     0 ?        S<   Feb26   0:00 [events/0]
root         6  0.0  0.0      0     0 ?        S<   Feb26   0:00 [khelper]
root         7  0.0  0.0      0     0 ?        S<   Feb26   0:00 [kthread]
root        10  0.0  0.0      0     0 ?        S<   Feb26   0:04 [kblockd/0]
root        11  0.0  0.0      0     0 ?        S<   Feb26   0:00 [kacpid]
root       137  0.0  0.0      0     0 ?        S<   Feb26   0:00 [cqueue/0]
root       140  0.0  0.0      0     0 ?        S<   Feb26   0:00 [khubd]
root       142  0.0  0.0      0     0 ?        S<   Feb26   0:00 [kseriod]
root       205  0.0  0.0      0     0 ?        S    Feb26   0:23 [pdflush]
root       206  0.0  0.0      0     0 ?        S    Feb26   0:24 [pdflush]
root       207  0.0  0.0      0     0 ?        S<   Feb26   0:08 [kswapd0]
root       208  0.0  0.0      0     0 ?        S<   Feb26   0:00 [aio/0]
root       363  0.0  0.0      0     0 ?        S<   Feb26   0:00 [kpsmoused]
root       388  0.0  0.0      0     0 ?        S<   Feb26   0:00 [ata/0]
root       389  0.0  0.0      0     0 ?        S<   Feb26   0:00 [ata_aux]
root       392  0.0  0.0      0     0 ?        S<   Feb26   0:00 [scsi_eh_0]
root       393  0.0  0.0      0     0 ?        S<   Feb26   0:00 [scsi_eh_1]
root       394  0.0  0.0      0     0 ?        S<   Feb26   0:00 [scsi_eh_2]
root       395  0.0  0.0      0     0 ?        S<   Feb26   0:00 [scsi_eh_3]
root       398  0.0  0.0      0     0 ?        S<   Feb26   0:00 [kstriped]
root       407  0.0  0.0      0     0 ?        S<   Feb26   0:00 [ksnapd]
root       418  1.5  0.0      0     0 ?        S<   Feb26 251:48 [md2_raid1]
root       421  0.0  0.0      0     0 ?        S<   Feb26   0:28 [md1_raid1]
root       424  0.0  0.0      0     0 ?        S<   Feb26   3:52 [md0_raid1]
root       425  0.0  0.0      0     0 ?        S<   Feb26   1:00 [kjournald]
root       446  0.0  0.0      0     0 ?        S<   Feb26   0:01 [kauditd]
root       474  0.0  0.0   2276   672 ?        S<s  Feb26   0:00 /sbin/udevd -d
root       738  0.0  0.0      0     0 ?        S<   Feb26   0:00 [kedac]
root      1322  0.0  0.0      0     0 ?        S<   Feb26   0:00 [kmpathd/0]
root      1323  0.0  0.0      0     0 ?        S<   Feb26   0:00 [kmpath_handlerd]
root      1355  0.0  0.0      0     0 ?        S<   Feb26   1:07 [kjournald]
root      1495  0.0  0.0      0     0 ?        S<   Feb26   0:34 [kondemand/0]
root      1659  0.0  0.0  12548   812 ?        S<sl Feb26   0:39 auditd
root      1661  0.0  0.0  13108   776 ?        S<sl Feb26   0:08 /sbin/audispd
root      1681  0.0  0.0   1728   580 ?        Ss   Feb26   1:32 syslogd -m 0
root      1684  0.0  0.0   1680   392 ?        Ss   Feb26   0:00 klogd -x
named     1724  0.0  0.1  42828  6432 ?        Ssl  Feb26   5:30 /usr/sbin/named -u named -t /var/named/chroot
rpc       1772  0.0  0.0   1816   548 ?        Ss   Feb26   0:00 portmap
root      1797  0.0  0.0      0     0 ?        S<   Feb26   0:00 [rpciod/0]
root      1803  0.0  0.0   1868   740 ?        Ss   Feb26   0:00 rpc.statd
root      1840  0.0  0.0   1964   396 ?        Ss   Feb26   0:00 mdadm --monitor --scan -f --pid-file=/var/run/mdadm/mdadm.pid
root      1858  0.0  0.0   5520   580 ?        Ss   Feb26   0:00 rpc.idmapd
dbus      1871  0.0  0.0   2756   924 ?        Ss   Feb26   0:00 dbus-daemon --system
root      1904  0.0  0.0  12736  1268 ?        Ssl  Feb26   0:04 pcscd
root      1913  0.0  0.0   1676   532 ?        Ss   Feb26   0:00 /usr/sbin/acpid
68        1924  0.0  0.1   5664  3768 ?        Ss   Feb26   0:01 hald
root      1925  0.0  0.0   3164   988 ?        S    Feb26   0:00 hald-runner
68        1933  0.0  0.0   2020   812 ?        S    Feb26   0:00 hald-addon-acpi: listening on acpid socket /var/run/acpid.socket
root      1941  0.0  0.0   1976   636 ?        S    Feb26   1:36 hald-addon-storage: polling /dev/hda
root      1970  0.0  0.0  27248  1364 ?        Ssl  Feb26   0:00 automount
root      1989  0.0  0.0   7076  1068 ?        Ss   Feb26   0:12 /usr/sbin/sshd
root      2000  0.0  0.0   1908   368 ?        Ss   Feb26   0:00 gpm -m /dev/input/mice -t exps2
root      2008  0.0  0.0   3248  1104 ?        Ss   Feb26   0:02 crond
root      2023  0.0  0.0   2276   428 ?        Ss   Feb26   0:00 /usr/sbin/atd
avahi     2039  0.0  0.0   2600  1356 ?        Ss   Feb26   0:03 avahi-daemon: running [serveurmail01.local]
avahi     2040  0.0  0.0   2600   320 ?        Ss   Feb26   0:00 avahi-daemon: chroot helper
root      2052  0.0  0.0   3516   528 ?        S    Feb26   0:00 /usr/sbin/smartd -q never
root      2104  0.0  0.2  23760 10588 ?        SN   Feb26   0:01 /usr/bin/python -tt /usr/sbin/yum-updatesd
root      2118  0.0  0.0   2564  1124 ?        SN   Feb26   0:04 /usr/libexec/gam_server
root      9421  0.0  0.0   1664   424 tty1     Ss+  Feb26   0:00 /sbin/mingetty tty1
root      9422  0.0  0.0   1664   420 tty2     Ss+  Feb26   0:00 /sbin/mingetty tty2
root      9423  0.0  0.0   1664   424 tty3     Ss+  Feb26   0:00 /sbin/mingetty tty3
root      9427  0.0  0.0   1664   424 tty4     Ss+  Feb26   0:00 /sbin/mingetty tty4
root      9429  0.0  0.0   1664   420 tty5     Ss+  Feb26   0:00 /sbin/mingetty tty5
root      9430  0.0  0.0   1664   420 tty6     Ss+  Feb26   0:00 /sbin/mingetty tty6
ntp      10876  0.0  0.1   4400  4396 ?        SLs  Mar02   0:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
root     13036  0.0  0.0      0     0 ?        S<   Mar02   0:11 [kjournald]
root     15030  0.0  0.0   3800  1468 ?        S    01:00   0:00 crond
root     15042  0.0  0.0   2416   948 ?        Ss   01:00   0:00 /bin/bash /root/zmbackup/zmbackup_01h00.sh
smmsp    15074  0.0  0.0   8144  2888 ?        S    01:00   0:00 /usr/sbin/sendmail -FCronDaemon -i -odi -oem -oi -t
root     16131  0.0  0.0   2184   820 ?        R    01:00   0:00 ps -aux
root     30127  0.0  0.1   5356  5348 ?        S<s  Feb26   0:00 [dmeventd]
Isn't it something with the options I use for rsync ? Like symlinks or hardlinks not going where they should ?

I am quite sure that restoring from the backup works. What seems to fail is restoring the backup on another server.
Weird thing is that restoring the backup from the live /opt/zimbra/ folder works perfectly.
Paradoxical to have a consistent backup from the live folder and an inconsistent one from the "snapshot" folder, right ?
Reply With Quote
  #6 (permalink)  
Old 03-09-2010, 12:37 AM
Loyal Member
 
Posts: 82
Default

My problem is very similar to http://www.zimbra.com/forums/adminis...re-backup.html

The suggested fix does not work for me though (db_recover on /opt/zimbra/data/ldap/hdb/db/).
db_recover finds and fixes errors, but trying to start zimbra still gives me the same error.
Reply With Quote
  #7 (permalink)  
Old 03-09-2010, 02:16 AM
Advanced Member
 
Posts: 192
Default

This may sound irrelevant, but what are the hostnames of both machines?
Reply With Quote
  #8 (permalink)  
Old 03-09-2010, 04:12 AM
Loyal Member
 
Posts: 82
Default

Same hostname on both machines.
They both have a Split DNS configuration that allows them to have the exact same Zimbra configuration. They just have different IP addresses because they are on different networks.

That's a very convenient way to have a backup server always ready.
If the production server fails for whatever reason, I just have to change the IP address of my backup server (in resolv.conf, /etc/hosts and in BIND), remove the failed server and plug the backup one in...
Well, at least that would work if the backup did not fail !
Reply With Quote
  #9 (permalink)  
Old 03-11-2010, 05:02 AM
Loyal Member
 
Posts: 82
Default

Well, things were not OK after all.

I decided to remove all the LVM snapshot stuff. As I said before, downtime is less than 5 minutes without the snapshot, so that is fine.

I changed kill -9 for kill -15.
Right after that, I added "sleep 30 ; sync"
Then I rsync.
Command is :
rsync -aHK --delete --exclude='*.pid' --link-dest=/opt.bak/zimbra.current/ /opt/zimbra/ /opt.bak/zimbra.$date/
Then I do :
rm -f /opt.bak/zimbra.current/
ln -s /opt.bak/zimbra.$date/ /opt.bak/zimbra.current

/opt.bak and /opt are on a different LV, I don't know if this matters.

Then stop the zimbra on the local backup server and I rsync the /opt.bak/zimbra.current/ directory to my local backup server :
rsync -aHK --delete root@myserver:/opt.bak/zimbra.current/ /opt/zimbra/

I restart zimbra and get the dreaded error :
Starting ldap...Done.
Failed.
Failed to start slapd. Attempting debug start to determine error.

hdb_db_open: database "": db_open(/opt/zimbra/data/ldap/hdb/db/id2entry.bdb) failed: Invalid argument (22).
backend_startup_one (type=hdb, suffix=""): bi_db_open failed! (22)
bdb_db_close: database "": alock_close failed

Now, what really puzzles me :
This file (/opt/zimbra/data/ldap/hdb/db/id2entry.bdb) has not change for more than 1 hour !
It has the same modification time in my backup and in the /opt/zimbra folder.
The backup was done almost an hour after this modification time !

I really don't get it here...
Do these clarifications give an idea to someone ?

Thanks in advance !
Reply With Quote
Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes


Similar Threads

Why Join?

Registering let's you ask questions, makes it easier to search, displays any files attached to posts, and notifies you about replies.

blog.zimbra.com




 

SEO by vBSEO ©2011, Crawlability, Inc.