| Welcome to the Zimbra :: Forums! | |
Welcome, if you would like to post a comment please register.
We also encourage you to explore all things Zimbra with our team and members of the community.
|  | 
03-09-2007, 12:05 PM
| | Project Contributor | |
Posts: 11
| | maximizing imapsync speed - parallel runs I'm working on migrating 224 users with about 74G of mail.
I really would prefer to do a mass cutover instead of a drawn out split domain migration. I'll do a full sync once then do a final sync with -delete2 after shutting off the email flow during the cutover.
So, I'm trying to find the quickest way to run imapsync. Here are some stats from an acount sync with imapsync running directly on the zimbra server. - Messages: 33015
- Bytes: 1292550810
- Time: 02:21:00
- Msg/Sec: 3.9
That adds up to a lot of hours. I could turn off the content indexing in Zimbra, but it looks to me like imapsync is the biggest load on the system and zimbra could take much more. This is a pretty good sized Sun v40z; quad 2.4G opteron 16G. Code: top - 10:29:47 up 44 days, 18:21, 3 users, load average: 2.72, 2.70, 2.58
Tasks: 153 total, 2 running, 151 sleeping, 0 stopped, 0 zombie
Cpu(s): 33.3% us, 1.8% sy, 0.0% ni, 57.7% id, 7.0% wa, 0.1% hi, 0.1% si
Mem: 16359500k total, 16335916k used, 23584k free, 477672k buffers
Swap: 5261240k total, 274304k used, 4986936k free, 9218064k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
8486 root 25 0 244m 168m 3084 R 100 1.1 40:28.97 imapsync.1.213
3432 zimbra 17 0 5418m 3.4g 19m S 35 22.0 509:57.15 java
30026 zimbra 16 0 5478m 1.4g 4548 S 2 8.7 77:14.97 mysqld
9667 zimbra 19 0 472m 42m 12m S 1 0.3 3:04.42 java
406 root 16 0 6276 1080 772 R 1 0.0 0:00.28 top
467 root 15 0 0 0 0 S 0 0.0 57:22.75 md0_raid5
2270 root 15 0 0 0 0 S 0 0.0 32:26.25 kjournald So, I'm going to try running multiple imapsync processes on terciary servers and see if I can speed up the whole process by parallelizing it. I'm thinking 4 at a time. Anyone have any experience with that?
p.s. Here is how I'm running the sync. Feel free to copy. Code: #!/bin/bash
################################################################################
# $Id: sync.sh,v 1.2 2007/03/07 23:15:27 bewley Exp $
#-------------------------------------------------------------------------------
# Description:
# Syncs IMAP data from old Dovecot IMAP server to new Zimbra server.
#
# Set DOVECOT_MASTER to use "<username>*<masteruser>" logins. Include the "*".
# See http://wiki.dovecot.org/MasterPassword
#
# Usage:
# ./sync.sh <username>
#
################################################################################
USER_DIR=users
USER_DEFAULT=myself
PWORD_DEFAULT="$USER_DIR/.pwd"
HOST_FROM=mail
HOST_TO=zimbra
# Appended to username @ HOST_FROM see http://wiki.dovecot.org/MasterPassword
DOVECOT_MASTER="*zimbra"
# user on command line
if [ ! -z "$1" ]; then
USER="$1"
else
USER="$USER_DEFAULT"
fi
if [ ! -d "$USER_DIR" ]; then
echo "making dir $USER_DIR"
mkdir $USER_DIR
chmod 700 $USER_DIR
fi
if [ -r "$USER_DIR/$USER.pwd" ]; then
PWORD="$USER_DIR/$USER.pwd"
else
PWORD=$PWORD_DEFAULT
fi
# setup log file and begin
LOG="$USER_DIR/$USER.log"
date >> $LOG
echo "Syncing ${USER}${DOVECOT_MASTER}@$HOST_FROM to $USER@$HOST_TO with $PWORD"
time ./imapsync.1.213 \
--ssl1 --authmech1 PLAIN --host1 $HOST_FROM --user1 "${USER}${DOVECOT_MASTER}" \
--passfile1 $PWORD \
--exclude '^My-spam-\d\d' \
--ssl2 --authmech2 PLAIN --host2 $HOST_TO --user2 $USER --passfile2 $PWORD \
--regextrans2 's/My-spam/Junk/' \
--regextrans2 's/:/./g' \
--delete2 \
--syncinternaldates \
>> $LOG
date >> $LOG | 
03-09-2007, 11:45 PM
| | Project Contributor | |
Posts: 11
| | bake off Well, this is kind of fun.
I have 5 imapsyncs running on each of 4 servers for a total of 20 sessions.
Here's the load on the zimbra server. Code: top - 22:37:24 up 45 days, 6:28, 4 users, load average: 9.69, 9.99, 8.52
Tasks: 132 total, 1 running, 131 sleeping, 0 stopped, 0 zombie
Cpu(s): 49.7% us, 11.8% sy, 0.0% ni, 24.0% id, 12.8% wa, 0.2% hi, 1.7% si
Mem: 16359500k total, 16346892k used, 12608k free, 652416k buffers
Swap: 5261240k total, 274304k used, 4986936k free, 7561768k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3432 zimbra 17 0 5420m 4.9g 19m S 179 31.4 662:00.36 java
9667 zimbra 19 0 476m 43m 12m S 37 0.3 20:53.49 java
30026 zimbra 16 0 5493m 1.7g 4576 S 9 10.8 101:01.30 mysqld
467 root 15 0 0 0 0 S 4 0.0 61:58.63 md0_raid5
2270 root 15 0 0 0 0 D 2 0.0 35:15.92 kjournald
10313 root 16 0 6276 1044 772 R 1 0.0 0:00.14 top mpstat 5 sec average Code: Average: CPU %user %nice %system %iowait %irq %soft %idle intr/s
Average: all 39.31 0.00 12.89 11.84 0.15 1.90 33.92 6971.66
Average: 0 53.89 0.00 9.38 5.59 0.40 5.59 25.15 5168.66
Average: 1 33.93 0.00 5.59 12.77 0.00 0.20 46.91 10.78
Average: 2 34.73 0.00 18.76 20.56 0.20 1.60 23.95 792.81
Average: 3 34.53 0.00 17.56 8.18 0.00 0.20 39.52 999.00 We'll see how it finishes up this weekend... | 
03-13-2007, 05:28 PM
| | Project Contributor | |
Posts: 11
| | It is faster. But It's much faster this way. Imapsync is definitely the biggest hog in the migration process. I got about 142 hours of syncing done in about 24 hours of clock time by running 20 copies at once across 4 servers. I've since gone to 5 servers with 4 procs each. However, after a while, all the imapsyncs eventually hang.
I checked netstat on zimbra and none of them show up any longer, but the imapsync processes are still running (idling) on the clients. Netstat on the clients shows CLOSE_WAIT to both the legacy imap server and the zimbra server.
I don't see any error in the mailbox.log, message adds just slow down and eventually stop. I'd really like to figure that out. If anyone wants to try this:
Here are some scripts I used to do this parallel run. First you need a big list of your usernames and a list of clients to run on. Split the file into one per host, name the chunks users.host1, users.host2, etc. Copy the spool file and scripts to the host. Then run syncwrapper. Code: #!/bin/bash
################################################################################
# $Id: syncwrapper,v 1.1 2007/03/13 00:00:33 bewley Exp $
#-------------------------------------------------------------------------------
# Given a large spool file with userids on each line, split it into smaller
# spool chunks and call syncfile on each chunk.
# Syncfile runs in the background so all files will be processed concurrently.
################################################################################
# file containing all the users to be synced
SPOOL=users.`hostname -s`
# not really a hard limit. may be one extra chunk of small length
MAX_PROCS=4
total_users=`wc -l $SPOOL | cut -d ' ' -f 1`
echo "There are $total_users users to be processed by this machine"
# figure out how big to make each chunk
users_per_proc=$(( $total_users / $MAX_PROCS ))
overage=$(( $total_users % $MAX_PROCS ))
if [ "$overage" -gt 0 ]; then
users_per_proc=$(( $users_per_proc + ($overage / $MAX_PROCS) ))
fi
# make the spool chunks
split -a 2 -l $users_per_proc -d $SPOOL $SPOOL.
echo "in the following chunks:"
wc -l $SPOOL.??
# now process each chunk
for spool_chunk in $SPOOL.??; do
./syncfile $spool_chunk &
done
Syncwrapper will call this script so it can fire off multiple syncs. And this calls sync.sh which was posted above. Code: #!/bin/bash
################################################################################
# $Id: syncfile,v 1.1 2007/03/13 00:00:33 bewley Exp $
#-------------------------------------------------------------------------------
# Small wrapper around sync.sh to make it easy to kick off multiple
# copies.
################################################################################
SPOOL=$1
while read user; do
./sync.sh $user >> $SPOOL.log 2>&1
done < $SPOOL
Last edited by bewley; 03-13-2007 at 06:12 PM..
Reason: fix log name
| 
03-19-2007, 09:23 PM
| | | Here's a makefile i use to accomplish the same thing... edit $USERS to be a list of your users to migrate. Edit imapysync command to suit. In our case we're migrating from Cyrus to Zimbra.
Run with make -j<numjobs> to run numjobs parallel imapsyncs. Code: USERS = a b c d e f g h
all: $(USERS)
$(USERS) :
/usr/bin/time ./imapsync --syncinternaldates --ssl1 --user1 $@ --host1 host1 --authuser1 cyrus --host2 host2 --user2 $@ --passfile1 cyruspasswd --passfile2 defpasswd --authmech1 PLAIN --authmech2 LOGIN --exclude 'INBOX/Templates' --prefix2 'INBOX/' --regextrans2 's#INBOX/Drafts#Drafts#' --regextrans2 's#INBOX/Sent#Sent#' --regextrans2 's#INBOX/Trash#Trash#g' | tee logs/imapsync.$@.log | 
09-30-2008, 04:14 PM
| | | What about imapsync's --maxage flag?
If you're going to have repeated runs on imapsync it seems that you can shrink the time for each sync by setting --maxage to the number of days between the start and end of the previous imapsync.
I did an early test. It took me 50 minutes to imapsync 450M of mail even if run repeatedly every hour. However if I run a second time with "--maxage 1", i.e. sync only messages from the last day, then it took only 3 minutes. This is only for a single user but I expect this trick to ease my migration for thousands of users.
I expect the first sync to take a long time: let's assume n days. After that I would sync again but with maxage at n and then I expect n to decrease for each call.
imapsync
imapsync --maxage n
imapsync --maxage n=time(previous_call)
...
imapsync --maxage k
Finally k would be either the outage window OR the time users must wait for the last set of messages between k and today for all of their mail to finally be cut over after go live. You would have a good idea of what k would be after repeated runs with no change in n.
Has anyone tried this? | | Thread Tools | Search this Thread | | | | | Display Modes | Linear Mode | | Why Join? Registering let's you ask questions, makes it easier to search, displays any files attached to posts, and notifies you about replies.  |