I'm currently playing with a cluster and I have a very strange (and PITA because Zimbra won't start) behaviour since yesterday.
I've stopped the cluster (to upgrade ILO firmware to try ILO fencinf instead of APC fencing).
Then launched it again.
rgmanager launches "zmcluctl start", the zimbra services start "nicely".
then it launches "zmcluctl status" but this one returns a "not running" value
so rgmanager stops the service and tries to put in on the other node
where the same thing happens
etc
I've stopped the second node and rgmanager and tried to launch zimbra by hand.
This way, I was able to do several "zmcluctl status" right after the start.
"zmcluctl status" says it's zmclamdctl that is not running.
But if I wait a bit more (30 seconds more than rmanager waits before launching the status test), "zmcluctl status" says it's correctly running.
So, since a couple of days (I noticed it yesterday after a reboot but it might be older), clamd is just too slow to start (at least for rgmanager).
I had a quick look at ClamAV 0.91rc2 release notes and it seems to be a known problem on their side : "improved handling of .mdb files (fixes long startup times)" as seen here :
WebSVN - clamav-devel - Rev 3110 - /trunk/README
Until ClamAV is updated in Zimbra, is there any way to "slow" rgmanager (or cheat on first status test) ?