Results 1 to 5 of 5

Thread: maximizing imapsync speed - parallel runs

  1. #1
    bewley's Avatar
    bewley is offline Project Contributor
    Join Date
    Mar 2007
    Posts
    11
    Rep Power
    8

    Post maximizing imapsync speed - parallel runs

    I'm working on migrating 224 users with about 74G of mail.

    I really would prefer to do a mass cutover instead of a drawn out split domain migration. I'll do a full sync once then do a final sync with -delete2 after shutting off the email flow during the cutover.

    So, I'm trying to find the quickest way to run imapsync. Here are some stats from an acount sync with imapsync running directly on the zimbra server.

    • Messages: 33015
    • Bytes: 1292550810
    • Time: 02:21:00
    • Msg/Sec: 3.9


    That adds up to a lot of hours. I could turn off the content indexing in Zimbra, but it looks to me like imapsync is the biggest load on the system and zimbra could take much more. This is a pretty good sized Sun v40z; quad 2.4G opteron 16G.

    Code:
    top - 10:29:47 up 44 days, 18:21,  3 users,  load average: 2.72, 2.70, 2.58
    Tasks: 153 total,   2 running, 151 sleeping,   0 stopped,   0 zombie
    Cpu(s): 33.3% us,  1.8% sy,  0.0% ni, 57.7% id,  7.0% wa,  0.1% hi,  0.1% si
    Mem:  16359500k total, 16335916k used,    23584k free,   477672k buffers
    Swap:  5261240k total,   274304k used,  4986936k free,  9218064k cached
    
      PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
     8486 root      25   0  244m 168m 3084 R  100  1.1  40:28.97 imapsync.1.213
     3432 zimbra    17   0 5418m 3.4g  19m S   35 22.0 509:57.15 java
    30026 zimbra    16   0 5478m 1.4g 4548 S    2  8.7  77:14.97 mysqld
     9667 zimbra    19   0  472m  42m  12m S    1  0.3   3:04.42 java
      406 root      16   0  6276 1080  772 R    1  0.0   0:00.28 top
      467 root      15   0     0    0    0 S    0  0.0  57:22.75 md0_raid5
     2270 root      15   0     0    0    0 S    0  0.0  32:26.25 kjournald
    So, I'm going to try running multiple imapsync processes on terciary servers and see if I can speed up the whole process by parallelizing it. I'm thinking 4 at a time. Anyone have any experience with that?

    p.s. Here is how I'm running the sync. Feel free to copy.

    Code:
    #!/bin/bash
    ################################################################################
    # $Id: sync.sh,v 1.2 2007/03/07 23:15:27 bewley Exp $
    #-------------------------------------------------------------------------------
    # Description:
    #   Syncs IMAP data from old Dovecot IMAP server to new Zimbra server.
    #
    #   Set DOVECOT_MASTER to use "<username>*<masteruser>" logins. Include the "*".
    #   See http://wiki.dovecot.org/MasterPassword
    #
    # Usage:
    #   ./sync.sh <username>
    #
    ################################################################################
    
    USER_DIR=users
    USER_DEFAULT=myself
    PWORD_DEFAULT="$USER_DIR/.pwd"
    HOST_FROM=mail
    HOST_TO=zimbra
    # Appended to username @ HOST_FROM see http://wiki.dovecot.org/MasterPassword
    DOVECOT_MASTER="*zimbra"
    
    # user on command line
    if [ ! -z "$1" ]; then
            USER="$1"
    else
            USER="$USER_DEFAULT"
    fi
    
    if [ ! -d "$USER_DIR" ]; then
            echo "making dir $USER_DIR"
            mkdir $USER_DIR
            chmod 700 $USER_DIR
    fi
    
    if [ -r "$USER_DIR/$USER.pwd" ]; then
            PWORD="$USER_DIR/$USER.pwd"
    else
            PWORD=$PWORD_DEFAULT
    fi
    
    # setup log file and begin
    LOG="$USER_DIR/$USER.log"
    date >> $LOG
    echo "Syncing ${USER}${DOVECOT_MASTER}@$HOST_FROM to $USER@$HOST_TO with $PWORD"
    time ./imapsync.1.213 \
        --ssl1 --authmech1 PLAIN --host1 $HOST_FROM --user1 "${USER}${DOVECOT_MASTER}" \
        --passfile1 $PWORD \
        --exclude '^My-spam-\d\d' \
        --ssl2 --authmech2 PLAIN --host2 $HOST_TO   --user2 $USER --passfile2 $PWORD \
        --regextrans2 's/My-spam/Junk/' \
        --regextrans2 's/:/./g' \
        --delete2 \
        --syncinternaldates \
        >> $LOG
    date >> $LOG

  2. #2
    bewley's Avatar
    bewley is offline Project Contributor
    Join Date
    Mar 2007
    Posts
    11
    Rep Power
    8

    Default bake off

    Well, this is kind of fun.

    I have 5 imapsyncs running on each of 4 servers for a total of 20 sessions.
    Here's the load on the zimbra server.

    Code:
    top - 22:37:24 up 45 days,  6:28,  4 users,  load average: 9.69, 9.99, 8.52
    Tasks: 132 total,   1 running, 131 sleeping,   0 stopped,   0 zombie
    Cpu(s): 49.7% us, 11.8% sy,  0.0% ni, 24.0% id, 12.8% wa,  0.2% hi,  1.7% si
    Mem:  16359500k total, 16346892k used,    12608k free,   652416k buffers
    Swap:  5261240k total,   274304k used,  4986936k free,  7561768k cached
    
      PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
     3432 zimbra    17   0 5420m 4.9g  19m S  179 31.4 662:00.36 java
     9667 zimbra    19   0  476m  43m  12m S   37  0.3  20:53.49 java
    30026 zimbra    16   0 5493m 1.7g 4576 S    9 10.8 101:01.30 mysqld
      467 root      15   0     0    0    0 S    4  0.0  61:58.63 md0_raid5
     2270 root      15   0     0    0    0 D    2  0.0  35:15.92 kjournald
    10313 root      16   0  6276 1044  772 R    1  0.0   0:00.14 top
    mpstat 5 sec average
    Code:
    Average:     CPU   %user   %nice %system %iowait    %irq   %soft   %idle    intr/s
    Average:     all   39.31    0.00   12.89   11.84    0.15    1.90   33.92   6971.66
    Average:       0   53.89    0.00    9.38    5.59    0.40    5.59   25.15   5168.66
    Average:       1   33.93    0.00    5.59   12.77    0.00    0.20   46.91     10.78
    Average:       2   34.73    0.00   18.76   20.56    0.20    1.60   23.95    792.81
    Average:       3   34.53    0.00   17.56    8.18    0.00    0.20   39.52    999.00
    We'll see how it finishes up this weekend...

  3. #3
    bewley's Avatar
    bewley is offline Project Contributor
    Join Date
    Mar 2007
    Posts
    11
    Rep Power
    8

    Post It is faster. But

    It's much faster this way. Imapsync is definitely the biggest hog in the migration process. I got about 142 hours of syncing done in about 24 hours of clock time by running 20 copies at once across 4 servers. I've since gone to 5 servers with 4 procs each. However, after a while, all the imapsyncs eventually hang.

    I checked netstat on zimbra and none of them show up any longer, but the imapsync processes are still running (idling) on the clients. Netstat on the clients shows CLOSE_WAIT to both the legacy imap server and the zimbra server.

    I don't see any error in the mailbox.log, message adds just slow down and eventually stop. I'd really like to figure that out.

    If anyone wants to try this:

    Here are some scripts I used to do this parallel run. First you need a big list of your usernames and a list of clients to run on. Split the file into one per host, name the chunks users.host1, users.host2, etc. Copy the spool file and scripts to the host. Then run syncwrapper.

    Code:
    #!/bin/bash
    ################################################################################
    # $Id: syncwrapper,v 1.1 2007/03/13 00:00:33 bewley Exp $
    #-------------------------------------------------------------------------------
    # Given a large spool file with userids on each line, split it into smaller
    # spool chunks and call syncfile on each chunk.
    # Syncfile runs in the background so all files will be processed concurrently.
    ################################################################################
    
    # file containing all the users to be synced
    SPOOL=users.`hostname -s`
    # not really a hard limit. may be one extra chunk of small length
    MAX_PROCS=4
    
    total_users=`wc -l $SPOOL | cut -d ' ' -f 1`
    echo "There are $total_users users to be processed by this machine"
    
    # figure out how big to make each chunk
    users_per_proc=$(( $total_users / $MAX_PROCS ))
    overage=$(( $total_users % $MAX_PROCS ))
    if [ "$overage" -gt 0 ]; then
        users_per_proc=$(( $users_per_proc + ($overage / $MAX_PROCS) ))
    fi
    
    # make the spool chunks
    split -a 2 -l $users_per_proc -d $SPOOL $SPOOL.
    echo "in the following chunks:"
    wc -l $SPOOL.??
    
    # now process each chunk
    for spool_chunk in $SPOOL.??; do
        ./syncfile $spool_chunk &
    done

    Syncwrapper will call this script so it can fire off multiple syncs. And this calls sync.sh which was posted above.
    Code:
    #!/bin/bash
    ################################################################################
    # $Id: syncfile,v 1.1 2007/03/13 00:00:33 bewley Exp $
    #-------------------------------------------------------------------------------
    # Small wrapper around sync.sh to make it easy to kick off multiple
    # copies.
    ################################################################################
    SPOOL=$1
    while read user; do
            ./sync.sh $user >> $SPOOL.log 2>&1
    done < $SPOOL
    Last edited by bewley; 03-13-2007 at 05:12 PM. Reason: fix log name

  4. #4
    jamesbraid is offline Starter Member
    Join Date
    Mar 2007
    Posts
    1
    Rep Power
    8

    Default

    Here's a makefile i use to accomplish the same thing... edit $USERS to be a list of your users to migrate. Edit imapysync command to suit. In our case we're migrating from Cyrus to Zimbra.

    Run with make -j<numjobs> to run numjobs parallel imapsyncs.

    Code:
    USERS = a b c d e f g h 
    
    all: $(USERS)
    
    $(USERS) :
            /usr/bin/time ./imapsync --syncinternaldates --ssl1 --user1 $@ --host1 host1 --authuser1 cyrus --host2 host2 --user2 $@ --passfile1 cyruspasswd --passfile2 defpasswd --authmech1 PLAIN --authmech2 LOGIN --exclude 'INBOX/Templates' --prefix2 'INBOX/' --regextrans2 's#INBOX/Drafts#Drafts#' --regextrans2 's#INBOX/Sent#Sent#' --regextrans2 's#INBOX/Trash#Trash#g' | tee logs/imapsync.$@.log

  5. #5
    fultonj is offline Senior Member
    Join Date
    Feb 2008
    Location
    Easton PA
    Posts
    63
    Rep Power
    7

    Default

    What about imapsync's --maxage flag?

    If you're going to have repeated runs on imapsync it seems that you can shrink the time for each sync by setting --maxage to the number of days between the start and end of the previous imapsync.

    I did an early test. It took me 50 minutes to imapsync 450M of mail even if run repeatedly every hour. However if I run a second time with "--maxage 1", i.e. sync only messages from the last day, then it took only 3 minutes. This is only for a single user but I expect this trick to ease my migration for thousands of users.

    I expect the first sync to take a long time: let's assume n days. After that I would sync again but with maxage at n and then I expect n to decrease for each call.

    imapsync
    imapsync --maxage n
    imapsync --maxage n=time(previous_call)
    ...
    imapsync --maxage k

    Finally k would be either the outage window OR the time users must wait for the last set of messages between k and today for all of their mail to finally be cut over after go live. You would have a good idea of what k would be after repeated runs with no change in n.

    Has anyone tried this?


LinkBacks (?)

  1. 09-23-2007, 03:23 PM

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. 4.0 RC1 imapsync with admin???
    By kirme3 in forum Administrators
    Replies: 37
    Last Post: 07-19-2007, 09:52 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •