Zimbra offers Open Source email server software and shared calendar for Linux and the Mac
Go Back   Zimbra :: Forums > Zimbra Collaboration Suite > Administrators

Welcome to the Zimbra :: Forums!
Welcome, if you would like to post a comment please register. We also encourage you to explore all things Zimbra with our team and members of the community.

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 05-24-2010, 01:45 PM
Member
 
Posts: 12
Default [SOLVED] Zimbra failover due to zmmtaconfig not running?

Yesterday, My Zimbra 5.x NE (running on a pair of CentOS 4.x servers with RHCS) failed over. The error in /var/log/messages is the usual, saying that zmcluctl returned 1:

Code:
May 23 04:02:16 wsl-mx1 clurgmgrd: [5374]: <err> script:zimbra: status of /opt/zimbra-cluster/bin/zmcluctl failed (returned 1) 
May 23 04:02:16 wsl-mx1 clurgmgrd[5374]: <notice> status on script "zimbra" returned 1 (generic error) 
May 23 04:02:16 wsl-mx1 clurgmgrd[5374]: <notice> Stopping service mx.mydomain.com 
May 23 04:03:02 wsl-mx1 clurgmgrd[5374]: <notice> Service mx.mydomain.com is recovering 
May 23 04:03:02 wsl-mx1 clurgmgrd[5374]: <notice> Recovering failed service mx.mydomain.com
When I check out zimbra.log, I see that this was possibly due to zmmtaconfig and zmmtaconfigctl not running:

Code:
May 23 04:02:14 wsl-mx1 zmmailboxdmgr[3765]: status requested
May 23 04:02:14 wsl-mx1 zmmailboxdmgr[3765]: status OK
May 23 04:02:14 wsl-mx1 zimbramon[3802]: 3802:info: zmmtaconfig: zmmtaconfig started on mx.mydomain.com with loglevel=3 pid=3802 
May 23 04:02:16 wsl-mx1 zimbra-cluster[3255]: status - rc=1 from zmcontrol: output=[Host mx.mydomain.com <EOL>, 	antispam                Running <EOL>, 	antivirus               Running <EOL>, 	imapproxy               Running <EOL>, 	ldap                    Running <EOL>, 	logger                  Running <EOL>, 	mailbox                 Stopped <EOL>, 		zmmtaconfig is not running. <EOL>, 	zmmtaconfigctl is not running <EOL>, 		mailboxd is running. <EOL>, 	mta                     Running <EOL>, 	snmp                    Running <EOL>, 	spell                   Running <EOL>, 	stats                   Running ] 
May 23 04:02:17 wsl-mx1 zimbra-cluster[3969]: stop -  Zimbra stop initiated via zmcluctl 
May 23 04:02:17 wsl-mx1 zimbramon[4003]: 4003:info: Stopping services initiated by zmcontrol
First off, I don't understand why zmmtaconfig and zmmtaconfigctl were not running, or what other logs I could check to figure it out.

Second, what's with the "zmmtaconfig: zmmtaconfig started" message that comes in 2 seconds before it shows zmmtaconfig as not running? What is starting this, and why? Maybe this is some kind of automatic restart, and the zmcluctl script ran at exactly the wrong time, so it thought zmmtaconfig was not running? If so, what would have triggered this restart? Could it be a log rotation or something?

Thanks a lot!
Reply With Quote
  #2 (permalink)  
Old 05-24-2010, 02:47 PM
Zimbra Employee
 
Posts: 604
Default

Correct, this is log rotation. Which version of zcs are you running?
__________________
Bugzilla - Wiki - Downloads - Before posting... Search!
Reply With Quote
  #3 (permalink)  
Old 05-25-2010, 01:21 AM
Outstanding Member
 
Posts: 594
Default

Seems you are hitting Bug 36042 &ndash; Log rotation causes cluster failover You need to upgrade.
Reply With Quote
  #4 (permalink)  
Old 05-25-2010, 05:03 AM
Member
 
Posts: 12
Default

Thanks guys! As per my signature, I'm running 5.0.20_GA_3128.RHEL4_20091102090733. From the sounds of that bug report, this issue has NOT yet been fixed in the latest version (according to Mike Cathey), so upgrading might not help.

Any idea how long zmmtaconfig is down during these log rotations? If it's not long, I think the easiest solution might be to just make a wrapper script, something like:

Code:
#!/bin/bash
/opt/zimbra-cluster/bin/zmcluctl
[ "$?" -eq "0" ] && exit 0
echo "Failed, trying again in 30 seconds..."
sleep 30
/opt/zimbra-cluster/bin/zmcluctl
exit $?
This would run zmcluctl, and if it exits with a 1, it will sleep 30 seconds and then try it a second time, only returning 1 if both attempts fail. I feel like this would cut down on a lot of false positives. The only downside would be waiting an extra 30 seconds before failing over, but I can deal with that.
Reply With Quote
  #5 (permalink)  
Old 05-27-2010, 02:45 AM
Outstanding Member
 
Posts: 594
Default

This perhaps wont work as /opt/zimbra-cluster/bin/zmcluctl requires an argument either start, stop, status. You might want to tweak status subroutine
Reply With Quote
  #6 (permalink)  
Old 06-01-2010, 08:53 AM
Member
 
Posts: 12
Default

Thanks, yeah, I was basically just coding out loud there. A functional version would be:

Code:
#!/bin/bash
/opt/zimbra-cluster/bin/zmcluctl status
[ "$?" -eq "0" ] && exit 0
echo "Failed, trying again in 30 seconds..."
sleep 30
/opt/zimbra-cluster/bin/zmcluctl status
exit $?
I prefer making a wrapper script over modifying the Zimbra script, as the Zimbra script will get overwritten if I upgrade to a newer version. It also allows me to run the original script unedited if I so desire.

Moral of the story, this is a bug in 5.x that it appears hasn't been fixed yet. Marking this thread as solved. Thanks everyone!
Reply With Quote
Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes


Similar Threads

Why Join?

Registering let's you ask questions, makes it easier to search, displays any files attached to posts, and notifies you about replies.

blog.zimbra.com




 

SEO by vBSEO ©2011, Crawlability, Inc.