Zimbra offers Open Source email server software and shared calendar for Linux and the Mac
Go Back   Zimbra :: Forums > Zimbra Collaboration Suite > Administrators

Welcome to the Zimbra :: Forums!
Welcome, if you would like to post a comment please register. We also encourage you to explore all things Zimbra with our team and members of the community.

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 09-28-2009, 02:08 PM
New Member
 
Posts: 4
Default [SOLVED] Backup Hanging (unable to stop)

v5.0.18_GA_3011 Network Edition

So I have a backup process that seems to be hung on a single user's mailbox. I've found evidence that indicates re-indexing the mailbox will help to prevent the hang from occurring. This is complicated by the fact that the mailbox is currently locked (maintenance mode).

Running zmbackupabort does not return any errors, nor does it abort the backup. A subsequent zmbackupquery shows the backup as "in progress". Running zmbadkupabort with a debug flag show nothing additional. I've poked around in the process lists for the system and cannot determine which processes are the current running backup, I did however determine that java hates the living.

I think I have the answer to my problem, but I cant seem to find out how to stop the backup process thread without taking down the whole server (which will result in the backup being restarted and hanging again). Any suggestions, Support ticket was opened, though only one user is impacted and does not classify as a high priority case for Zimbra, due to the user who is being impacted, it is a high priority issue for my team.
Reply With Quote
  #2 (permalink)  
Old 09-30-2009, 05:55 AM
New Member
 
Posts: 4
Default

So, to follow up, restarting mailboxd released the mailbox, and allowed for the backup to be aborted. I have re-indexed the mailbox, and now am waiting for our next full backup window. Thankfully the service outage was not all that long (less than 2 minutes).

To restart mailboxd:
zmmailboxdctl restart
Reply With Quote
  #3 (permalink)  
Old 10-02-2009, 12:40 PM
Junior Member
 
Posts: 7
Default

Hey Tom - thanks very much for your follow-up post. We had the exact same problem, and your suggestions solved it for us. Appreciated.

Last edited by jasonwilson; 10-02-2009 at 03:17 PM..
Reply With Quote
  #4 (permalink)  
Old 10-06-2009, 12:39 PM
y@w y@w is offline
Moderator
 
Posts: 658
Default

We just had the same thing happen on 5.0.18. We were forced to bounce mailboxd during production hours.. The weird thing is I see the account enter maintenance mode for the backup, leave maintenance mode, and then re-enter..

Code:
2009-10-05 23:35:18,837 INFO  [FullBackupThread] [name=user@domain.com;mid=2774;] backup - redo log file sequence is 3035 at full backup for user@domain.com
2009-10-05 23:35:18,942 INFO  [FullBackupThread] [name=user@domain.com;mid=2774;] backup - Full backup started for account user@domain.com (62794a62-338b-4120-9b3f-87ea317c7ae4) mailbox 2774
2009-10-05 23:35:18,943 INFO  [FullBackupThread] [name=user@domain.com;mid=2774;] mailbox - Locking mailbox 2774 for maintenance.
2009-10-05 23:35:19,807 INFO  [FullBackupThread] [name=user@domain.com;mid=2774;] backup - Number of blobs to backup for mailbox 2774: 29092
2009-10-05 23:35:19,807 INFO  [FullBackupThread] [name=user@domain.com;mid=2774;] mailbox - Ending maintenance on mailbox 2774.
2009-10-05 23:35:19,807 INFO  [FullBackupThread] [name=user@domain.com;mid=2774;] mbxmgr - Mailbox 2774 account 62794a62-338b-4120-9b3f-87ea317c7ae4 AVAILABLE
So, the mailbox was put into maintenance mode and taken out just a few seconds later.. Then, nearly 20 minutes later I find this:

Code:
2009-10-05 23:52:23,760 INFO  [FullBackupThread] [name=user@domain.com;mid=2774;] backup - Account user@domain.com in backup set full-20091006.040017.519: All pending file IO completed (29092 out of 29092)
2009-10-05 23:52:24,077 INFO  [FullBackupThread] [name=user@domain.com;mid=2774;] mailbox - Locking mailbox 2774 for maintenance.
There's no mention of the machine taking his account out of maintenance mode any time after that. Then, same story as the other two posters here, we had to bounce mailboxd to get the backup to abort and to allow the user to login. We could change the account's status all we wanted, it still threw the "mailbox in maintenance mode" at login.

Checking out the I/O and CPU usage graphs on the mailbox server in the monitoring system, it appears that the server was only actively copying data from the time that the backup was fired off until the time in the logs that it shows "All pending file IO completed".

It also appears that the user was actively in their account moving messages around between the two times that it was in maintenance mode. I see a ton of MsgActionRequest's where the user moved messages to his Inbox. The user also received and deleted several messages as well.

jasonwilson and tomw, are you seeing something similar in your logs?

For now, we've re-indexed the mailbox and will see if the backup succeeds tonight again.
__________________
What a n00b!
Reply With Quote
  #5 (permalink)  
Old 10-06-2009, 02:16 PM
Junior Member
 
Posts: 7
Default

Yeah - very similar, except it wasnt 20 mins. Here's a scrubbed snip from our logs:

Code:
2009-10-02 09:55:00,331 INFO  [FullBackupThread] [name=--------@-----.com;mid=63;] backup - redo log file sequence is 912 at full backup for --------@-----.com
2009-10-02 09:55:00,366 INFO  [FullBackupThread] [name=--------@-----.com;mid=63;] backup - Full backup started for account --------@-----.com (64eb000e-20d3-44af-9065-df4ae8971abd) mailbox 63
2009-10-02 09:55:00,366 INFO  [FullBackupThread] [name=--------@-----.com;mid=63;] mailbox - Locking mailbox 63 for maintenance.
2009-10-02 09:55:01,607 INFO  [FullBackupThread] [name=--------@-----.com;mid=63;] backup - Number of blobs to backup for mailbox 63: 24804
2009-10-02 09:55:01,608 INFO  [FullBackupThread] [name=--------@-----.com;mid=63;] mailbox - Ending maintenance on mailbox 63.
2009-10-02 09:55:01,608 INFO  [FullBackupThread] [name=--------@-----.com;mid=63;] mbxmgr - Mailbox 63 account 64eb000e-20d3-44af-9065-df4ae8971abd AVAILABLE

2009-10-02 09:55:16,411 INFO  [FullBackupThread] [name=--------@-----.com;mid=63;] backup - Account --------@-----.com in backup set full-20091002.163124.235: All pending file IO completed (23905 out of 23905)
2009-10-02 09:55:16,580 INFO  [FullBackupThread] [name=--------@-----.com;mid=63;] mailbox - Locking mailbox 63 for maintenance.
this was the account we had to re-index...

Some additional, possibly non-sequitur details:

I recently got samba/posix ldap integration working on our NE 5.18 installation. Our full backups begain failing after that, however, they weren't hanging like the one above...after I got the integration working, I added two test accounts, which were the only ones with the samba/posix ldap attributes populated. Looking through the logs, the fullback ups would hit only those two test accounts, and then finish, exiting normally. Then a day or so later we had the "locked in maintenance mode" issue which brought it to our attention.

Again, unsure if this is related, but strangely enough the full backups began working normally again once we re-indexed that one account that was locked in maintenance mode. Let me know if you see anything else you'd like me to check.
Reply With Quote
  #6 (permalink)  
Old 10-06-2009, 02:39 PM
y@w y@w is offline
Moderator
 
Posts: 658
Default

Thanks jasonwilson, I found a bit more..

It looks like this enter maintenance mode twice is by design (see: Bug 33583 – Reduce duration of maintenance mode during backup)

Backup behavior was changed a bit in 5.0.18 (ironically) so that accounts didn't have to be in maintenance mode nearly as long (see: Zimbra Product Portal).

It doesn't appear that is our issue. Does someone else know of something else that we can check? I'm not seeing anything out of the ordinary except that it's obviously not completing.
__________________
What a n00b!
Reply With Quote
  #7 (permalink)  
Old 10-06-2009, 02:45 PM
Junior Member
 
Posts: 7
Default

y@w, you mentioned you havent tried the full backup yet. We ran one manually right after re-indexing the offending account, and it completed successully..
Reply With Quote
  #8 (permalink)  
Old 10-07-2009, 06:11 AM
y@w y@w is offline
Moderator
 
Posts: 658
Default

Ours ran successfully overnight as well.
__________________
What a n00b!
Reply With Quote
  #9 (permalink)  
Old 11-09-2009, 11:31 AM
y@w y@w is offline
Moderator
 
Posts: 658
Unhappy

Did it again to me...

Tomw, did you ever find anything out with your support case? I'm considering opening one up myself, but if they had any solutions for you, I'd rather not have to.
__________________
What a n00b!
Reply With Quote
  #10 (permalink)  
Old 11-20-2009, 12:55 PM
y@w y@w is offline
Moderator
 
Posts: 658
Smile Fixed in 5.0.19

Well, since I didn't hear anything, I opened my own support case..

Support let me know that it was related to this bug and is fixed in 5.0.19:
Bug 40354 – Deadlock during backup with async file copier when recording an error
__________________
What a n00b!
Reply With Quote
Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes


Similar Threads

Why Join?

Registering let's you ask questions, makes it easier to search, displays any files attached to posts, and notifies you about replies.

blog.zimbra.com




 

SEO by vBSEO ©2011, Crawlability, Inc.