Results 1 to 10 of 10

Thread: [SOLVED] Backup Hanging (unable to stop)

  1. #1
    tomw is offline New Member
    Join Date
    Apr 2009
    Posts
    4
    Rep Power
    6

    Default [SOLVED] Backup Hanging (unable to stop)

    v5.0.18_GA_3011 Network Edition

    So I have a backup process that seems to be hung on a single user's mailbox. I've found evidence that indicates re-indexing the mailbox will help to prevent the hang from occurring. This is complicated by the fact that the mailbox is currently locked (maintenance mode).

    Running zmbackupabort does not return any errors, nor does it abort the backup. A subsequent zmbackupquery shows the backup as "in progress". Running zmbadkupabort with a debug flag show nothing additional. I've poked around in the process lists for the system and cannot determine which processes are the current running backup, I did however determine that java hates the living.

    I think I have the answer to my problem, but I cant seem to find out how to stop the backup process thread without taking down the whole server (which will result in the backup being restarted and hanging again). Any suggestions, Support ticket was opened, though only one user is impacted and does not classify as a high priority case for Zimbra, due to the user who is being impacted, it is a high priority issue for my team.

  2. #2
    tomw is offline New Member
    Join Date
    Apr 2009
    Posts
    4
    Rep Power
    6

    Default

    So, to follow up, restarting mailboxd released the mailbox, and allowed for the backup to be aborted. I have re-indexed the mailbox, and now am waiting for our next full backup window. Thankfully the service outage was not all that long (less than 2 minutes).

    To restart mailboxd:
    zmmailboxdctl restart

  3. #3
    jasonwilson is offline Junior Member
    Join Date
    Aug 2009
    Posts
    7
    Rep Power
    6

    Default

    Hey Tom - thanks very much for your follow-up post. We had the exact same problem, and your suggestions solved it for us. Appreciated.
    Last edited by jasonwilson; 10-02-2009 at 03:17 PM.

  4. #4
    y@w's Avatar
    y@w
    y@w is offline Moderator
    Join Date
    Jan 2008
    Posts
    658
    Rep Power
    8

    Default

    We just had the same thing happen on 5.0.18. We were forced to bounce mailboxd during production hours.. The weird thing is I see the account enter maintenance mode for the backup, leave maintenance mode, and then re-enter..

    Code:
    2009-10-05 23:35:18,837 INFO  [FullBackupThread] [name=user@domain.com;mid=2774;] backup - redo log file sequence is 3035 at full backup for user@domain.com
    2009-10-05 23:35:18,942 INFO  [FullBackupThread] [name=user@domain.com;mid=2774;] backup - Full backup started for account user@domain.com (62794a62-338b-4120-9b3f-87ea317c7ae4) mailbox 2774
    2009-10-05 23:35:18,943 INFO  [FullBackupThread] [name=user@domain.com;mid=2774;] mailbox - Locking mailbox 2774 for maintenance.
    2009-10-05 23:35:19,807 INFO  [FullBackupThread] [name=user@domain.com;mid=2774;] backup - Number of blobs to backup for mailbox 2774: 29092
    2009-10-05 23:35:19,807 INFO  [FullBackupThread] [name=user@domain.com;mid=2774;] mailbox - Ending maintenance on mailbox 2774.
    2009-10-05 23:35:19,807 INFO  [FullBackupThread] [name=user@domain.com;mid=2774;] mbxmgr - Mailbox 2774 account 62794a62-338b-4120-9b3f-87ea317c7ae4 AVAILABLE
    So, the mailbox was put into maintenance mode and taken out just a few seconds later.. Then, nearly 20 minutes later I find this:

    Code:
    2009-10-05 23:52:23,760 INFO  [FullBackupThread] [name=user@domain.com;mid=2774;] backup - Account user@domain.com in backup set full-20091006.040017.519: All pending file IO completed (29092 out of 29092)
    2009-10-05 23:52:24,077 INFO  [FullBackupThread] [name=user@domain.com;mid=2774;] mailbox - Locking mailbox 2774 for maintenance.
    There's no mention of the machine taking his account out of maintenance mode any time after that. Then, same story as the other two posters here, we had to bounce mailboxd to get the backup to abort and to allow the user to login. We could change the account's status all we wanted, it still threw the "mailbox in maintenance mode" at login.

    Checking out the I/O and CPU usage graphs on the mailbox server in the monitoring system, it appears that the server was only actively copying data from the time that the backup was fired off until the time in the logs that it shows "All pending file IO completed".

    It also appears that the user was actively in their account moving messages around between the two times that it was in maintenance mode. I see a ton of MsgActionRequest's where the user moved messages to his Inbox. The user also received and deleted several messages as well.

    jasonwilson and tomw, are you seeing something similar in your logs?

    For now, we've re-indexed the mailbox and will see if the backup succeeds tonight again.

  5. #5
    jasonwilson is offline Junior Member
    Join Date
    Aug 2009
    Posts
    7
    Rep Power
    6

    Default

    Yeah - very similar, except it wasnt 20 mins. Here's a scrubbed snip from our logs:

    Code:
    2009-10-02 09:55:00,331 INFO  [FullBackupThread] [name=--------@-----.com;mid=63;] backup - redo log file sequence is 912 at full backup for --------@-----.com
    2009-10-02 09:55:00,366 INFO  [FullBackupThread] [name=--------@-----.com;mid=63;] backup - Full backup started for account --------@-----.com (64eb000e-20d3-44af-9065-df4ae8971abd) mailbox 63
    2009-10-02 09:55:00,366 INFO  [FullBackupThread] [name=--------@-----.com;mid=63;] mailbox - Locking mailbox 63 for maintenance.
    2009-10-02 09:55:01,607 INFO  [FullBackupThread] [name=--------@-----.com;mid=63;] backup - Number of blobs to backup for mailbox 63: 24804
    2009-10-02 09:55:01,608 INFO  [FullBackupThread] [name=--------@-----.com;mid=63;] mailbox - Ending maintenance on mailbox 63.
    2009-10-02 09:55:01,608 INFO  [FullBackupThread] [name=--------@-----.com;mid=63;] mbxmgr - Mailbox 63 account 64eb000e-20d3-44af-9065-df4ae8971abd AVAILABLE
    
    2009-10-02 09:55:16,411 INFO  [FullBackupThread] [name=--------@-----.com;mid=63;] backup - Account --------@-----.com in backup set full-20091002.163124.235: All pending file IO completed (23905 out of 23905)
    2009-10-02 09:55:16,580 INFO  [FullBackupThread] [name=--------@-----.com;mid=63;] mailbox - Locking mailbox 63 for maintenance.
    this was the account we had to re-index...

    Some additional, possibly non-sequitur details:

    I recently got samba/posix ldap integration working on our NE 5.18 installation. Our full backups begain failing after that, however, they weren't hanging like the one above...after I got the integration working, I added two test accounts, which were the only ones with the samba/posix ldap attributes populated. Looking through the logs, the fullback ups would hit only those two test accounts, and then finish, exiting normally. Then a day or so later we had the "locked in maintenance mode" issue which brought it to our attention.

    Again, unsure if this is related, but strangely enough the full backups began working normally again once we re-indexed that one account that was locked in maintenance mode. Let me know if you see anything else you'd like me to check.

  6. #6
    y@w's Avatar
    y@w
    y@w is offline Moderator
    Join Date
    Jan 2008
    Posts
    658
    Rep Power
    8

    Default

    Thanks jasonwilson, I found a bit more..

    It looks like this enter maintenance mode twice is by design (see: Bug 33583 – Reduce duration of maintenance mode during backup)

    Backup behavior was changed a bit in 5.0.18 (ironically) so that accounts didn't have to be in maintenance mode nearly as long (see: Zimbra Product Portal).

    It doesn't appear that is our issue. Does someone else know of something else that we can check? I'm not seeing anything out of the ordinary except that it's obviously not completing.

  7. #7
    jasonwilson is offline Junior Member
    Join Date
    Aug 2009
    Posts
    7
    Rep Power
    6

    Default

    y@w, you mentioned you havent tried the full backup yet. We ran one manually right after re-indexing the offending account, and it completed successully..

  8. #8
    y@w's Avatar
    y@w
    y@w is offline Moderator
    Join Date
    Jan 2008
    Posts
    658
    Rep Power
    8

    Default

    Ours ran successfully overnight as well.

  9. #9
    y@w's Avatar
    y@w
    y@w is offline Moderator
    Join Date
    Jan 2008
    Posts
    658
    Rep Power
    8

    Unhappy

    Did it again to me...

    Tomw, did you ever find anything out with your support case? I'm considering opening one up myself, but if they had any solutions for you, I'd rather not have to.

  10. #10
    y@w's Avatar
    y@w
    y@w is offline Moderator
    Join Date
    Jan 2008
    Posts
    658
    Rep Power
    8

    Smile Fixed in 5.0.19

    Well, since I didn't hear anything, I opened my own support case..

    Support let me know that it was related to this bug and is fixed in 5.0.19:
    Bug 40354 – Deadlock during backup with async file copier when recording an error

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. Replies: 17
    Last Post: 08-30-2010, 11:59 AM
  2. UNAUTHORIZED ACCESS Totally fouled up install
    By Lostin60s in forum Installation Help
    Replies: 0
    Last Post: 08-28-2009, 10:17 PM
  3. zimbra opesource Backup Script Problem!
    By tashi in forum Administrators
    Replies: 4
    Last Post: 03-30-2009, 11:49 PM
  4. [SOLVED] Backups failing, "unable to read metadata for account"
    By smcgrath1111 in forum Administrators
    Replies: 10
    Last Post: 04-10-2008, 03:15 PM
  5. Backup Hanging
    By kirme3 in forum Administrators
    Replies: 2
    Last Post: 12-15-2006, 08:28 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •