Zimbra offers Open Source email server software and shared calendar for Linux and the Mac
Go Back   Zimbra :: Forums > Zimbra Collaboration Suite > Installation

Welcome to the Zimbra :: Forums!
Welcome, if you would like to post a comment please register. We also encourage you to explore all things Zimbra with our team and members of the community.

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 04-24-2009, 01:57 AM
New Member
 
Posts: 4
Unhappy Zimbra Cluster stopping after 5sec - "Clearing stale status file"

Hi,

I'm trying to get Zimbra to work on a 2 Node Cluster. (Followed Instructions on zimbra.com). It starts without problems when started on a single node with "servive zimbra start", but it doesn't work with Redhat Cluster Manager.

I think this is not a problem of my cluster.conf (worked fine with apache and other services).

When Zimbra is startet by the cluster manager everything seem to work fine. clustat -l shows Zimbra as "started" for about 5 seconds and then zimbra.log says:

Code:
Apr 24 12:30:50 o1n1 postfix/master[21183]: daemon started -- version 2.4.7, configuration /opt/zimbra/postfix-2.4.7.5z/conf
Apr 24 12:30:51 o1n1 zimbramon[17285]: 17285:info: Starting stats via zmcontrol
Apr 24 12:30:51 o1n1 saslauthd[21198]: detach_tty      : master pid is: 21198
Apr 24 12:30:51 o1n1 saslauthd[21198]: ipc_init        : listening on socket: /opt/zimbra/cyrus-sasl-2.1.22.3z/state/mux
Apr 24 12:30:54 o1n1 zimbra-cluster[17283]: start - rc=0 from zmcontrol: output=[Host zimbra.mycompany.de ,      Starting ldap...Done. ,    Starting logger...Done. ,      Starting mailbox...Done. ,         Starting antispam...Done. ,        Starting antivirus...Done. ,       Starting mta...Done. ,         Starting stats...Done. ]
Apr 24 12:31:05 o1n1 zimbra-cluster[21765]: Clearing stale status file.  Old content = zimbra
Apr 24 12:31:05 o1n1 zimbra-cluster[21765]: status - No Zimbra service is running.
Why does it say "Clearing stale status file"?

I'm running zcs-NETWORK-5.0.15_GA_2851.RHEL5_64.20090310170650 on CentOS 5.3 (we have a valid license)

DNS Records are correct.
Reply With Quote
  #2 (permalink)  
Old 04-26-2009, 06:12 AM
Zimbra Consultant & Moderator
 
Posts: 20,316
Default

Quote:
Originally Posted by Smiler View Post
I'm running zcs-NETWORK-5.0.15_GA_2851.RHEL5_64.20090310170650 on CentOS 5.3 (we have a valid license)
You may have a valid licence but I have to state the obvious: CentOS is not a supported platform and the only current certified Cluster support for Zimbra is on RHEL AS/ES 4, Update 5.

Quote:
Originally Posted by Smiler View Post
DNS Records are correct.
Some diagnostic information to show that's correct might help.

Is this an upgrade or a new installation? Have you read the ZCS Cluster Guides on this Documentation page?
__________________
Regards


Bill
Reply With Quote
  #3 (permalink)  
Old 04-27-2009, 02:48 AM
New Member
 
Posts: 4
Lightbulb

Ok, I found out what the problem seems to be.

I tried the following:

- manually brought up my service ip
- manually mounted my SAN
- started zimbra manually by
/opt/zimbra-cluster/bin/zmcluctl start zimbra

Everything starts fine... zimbra worked.

The problem is the status file /opt/zimbra-cluster/status/status.dat
Zimbra writes it when started (content of this file is "zimbra"), but do not seem to update it. If I do a /opt/zimbra-cluster/bin/zmcluctl status zimbra it says:

Code:
[root@o1n1 ~]# /opt/zimbra-cluster/bin/zmcluctl status zimbra
Clearing stale status file.  Old content = zimbra
status - No Zimbra service is running.
So this script does return an error that normally causes Red Hat Cluster Manager to think that the zimbra service has failed.

If i manually touch the file (and updating the access-time of it, content must be "zimbra"),
a /opt/zimbra-cluster/bin/zmcluctl status zimbra it says:

Code:
[root@o1n1 status]# /opt/zimbra-cluster/bin/zmcluctl status zimbra
Host zimbra.mycompany.de
        antispam                Running
        antivirus               Running
        imapproxy               Running
        ldap                    Running
        logger                  Running
        mailbox                 Running
        mta                     Running
        snmp                    Running
        spell                   Running
        stats                   Running
I found out that the script zmcluctl writes the status.dat file. Could increasing the time difference when status.dat is marked as "outdated" (stale) solve the problem?

I think it could also be a problem of a too slow start. My machine takes a lot of time when starting all services.

Quote:
Originally Posted by phoenix View Post
Is this an upgrade or a new installation? Have you read the ZCS Cluster Guides on this Documentation page?
It is a new installation. And I followed the install instructions ("Multi Node Cluster Installation Guide").

Last edited by Smiler; 04-27-2009 at 07:44 AM..
Reply With Quote
  #4 (permalink)  
Old 04-28-2009, 03:50 AM
New Member
 
Posts: 4
Smile

I found a solution!!

The idea is to write the status.dat file just after starting zimbra by zmcluctl if "zu - zimbra -c 'zmcontrol start" returns zero. This solution works fine in my case. The Redhat Cluster Manager now checks the status of Zimbra correctly.


Code snippet (/opt/zimbra-cluster/bin/zmcluctl Line 326):

Code:
if ($action eq 'start') {
    if (!setCurrentService($svcname)) {
        # Another Zimbra service is running.
        logError("start - Another Zimbra service is running.");
        exit(1);
    }
    createServiceDataSymlinks($svcname);
    restoreCrontab();
    my @output = `su - zimbra -c 'zmcontrol start'`;
    my $rc = $?;
    $rc >>= 8;
    if ($rc != 0) {
        clearCurrentService($svcname);
    }
    else {
        setCurrentService($svcname);
    }
    logError("start - rc=$rc from zmcontrol: output=[" . join( ", ", @outpu
    exit($rc);
} elsif .......
Notice the else condidion i added.

I've created a patch for it:

zmcluctl-patch-stale-status.diff

Code:
--- zmcluctl	2009-03-11 01:46:38.000000000 +0100
+++ zmcluctl.new	2009-04-27 16:39:02.000000000 +0200
@@ -337,6 +337,9 @@
     if ($rc != 0) {
         clearCurrentService($svcname);
     }
+    else {
+	setCurrentService($svcname);
+    }
     logError("start - rc=$rc from zmcontrol: output=[" . join( ", ", @output ) . "]");
     exit($rc);
 } elsif ($action eq 'stop') {
Hope this will help other users who have this Problem :-)
Reply With Quote
Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes


Similar Threads

Why Join?

Registering let's you ask questions, makes it easier to search, displays any files attached to posts, and notifies you about replies.

blog.zimbra.com




 

SEO by vBSEO ©2011, Crawlability, Inc.