Zimbra offers Open Source email server software and shared calendar for Linux and the Mac
Go Back   Zimbra :: Forums > Zimbra Collaboration Suite > Administrators

Welcome to the Zimbra :: Forums!
Welcome, if you would like to post a comment please register. We also encourage you to explore all things Zimbra with our team and members of the community.

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 03-12-2009, 05:11 AM
Intermediate Member
 
Posts: 19
Default [SOLVED] kill HUP crashes logger

Hi,

we have ZCS 5.0.13 Network Edition running on a Mac Mini with MacOS Tiger.

The daily zimbra cronjob /etc/periodic/daily/600.zimbra has the following in it
Code:
if [ -f /opt/zimbra/log/logswatch.pid ]; 
  then echo "Sending sighup to zmlogswatch"; 
  kill -HUP $(cat /opt/zimbra/log/logswatch.pid | head -1); 
fi
This crashes the logswatch daemon. Instead of reloading the configuration, it is being stopped. Any idea, why this happens, or how to debug this?

We tried setting up another cronjob, that runs zmlogswatchctl start after the crash. When run manually, this script works fine, but it does not work as part of a cron job.
Running it as part of a cron job results in perl library path issues:

Code:
Can't locate Swatch/Actions.pm in @INC (@INC contains: /System/Library/Perl/5.8.6/darwin-thread-multi-2level /System/Library/Perl/5.8.6 /Library/Perl/5.8.6/darwin-thread-multi-2level /Library/Perl/5.8.6 /Library/Perl /Network/Library/Perl/5.8.6/darwin-thread-multi-2level /Network/Library/Perl/5.8.6 /Network/Library/Perl /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level /System/Library/Perl/Extras/5.8.6 /Library/Perl/5.8.1 .) at /tmp/.swatch_script.2875 line 29.
BEGIN failed--compilation aborted at /tmp/.swatch_script.2875 line 29.
Any help is appreciated.
Reply With Quote
  #2 (permalink)  
Old 03-13-2009, 03:48 AM
Intermediate Member
 
Posts: 19
Default

I'll just jot down the current status:

The temporary watcher script /tmp/.swatch_script.xxxxx contains:

Code:
$SIG{'TERM'} = $SIG{'HUP'} = 'goodbye';
with goodbye being a perl function, that kills the process.
So it seems, the swatch is not able to interpret a SIGHUP correctly.

My untested solution is:
- Uncomment the HUP line in /etc/periodic/daily/600.zimbra
- Add a restart-time to zmlogswatchctl:

Code:
${zimbra_home}/libexec/logswatch --config-file=${configfile} \
      --use-cpan-file-tail --pid-file=${pidfile}\
      --restart-time=03:20\
      --script-dir=/tmp -t /var/log/zimbra.log > $logfile 2>&1 &
I'll report, if that works.
Reply With Quote
  #3 (permalink)  
Old 03-17-2009, 06:18 AM
Active Member
 
Posts: 45
Default

I think that we have been experiencing this issue also.

The logger mysteriously dies every so often, with no usable output to trace the problem.

Have you found a resolution?
Reply With Quote
  #4 (permalink)  
Old 03-18-2009, 02:43 AM
Intermediate Member
 
Posts: 19
Default

Hi skenkin,

I successfully used the workaround, that I explained above.

However, you have to make sure, that your logger crashes for the same reason (i.e. always at the same time, because of the mentioned cronjob). I have read in this forum, that there might be other reasons for the logger to crash.
Reply With Quote
  #5 (permalink)  
Old 03-24-2009, 06:10 PM
Intermediate Member
 
Posts: 17
Default

This may be why it's failing; the following code from /etc/periodic/daily/600.zimbra is trying to kill the zmlogswatch and zmswatch processes:

Code:
if [ -f /opt/zimbra/log/logswatch.pid ]; 
  then echo "Sending sighup to zmlogswatch"; 
  kill -HUP $(cat /opt/zimbra/log/logswatch.pid | head -1); 
fi
if [ -f /opt/zimbra/log/swatch.pid ]; 
  then echo "Sending sighup to zmswatch"; 
  kill -HUP $(cat /opt/zimbra/log/swatch.pid | head -1); 
fi
But observe the following:

Code:
odmxserve:log zimbra$ zmswatchctl start
Starting swatch...done.
odmxserve:log zimbra$ ps ax | grep swatch
  694 s000  S      0:00.12 /usr/bin/perl /opt/zimbra/libexec/swatch --config-file=/opt/zimbra/conf/swatchrc --use-cpan-file-tail --script-dir=/tmp -t /var/log/zimbra.log
  698 s000  S      0:00.14 /usr/bin/perl /tmp/.swatch_script.694
  701 s000  R+     0:00.00 grep swatch
odmxserve:log zimbra$ cat /opt/zimbra/log/swatch.pid
694
odmxserve:log zimbra$ zmlogswatchctl start
Starting logswatch...done.
odmxserve:log zimbra$ ps ax | grep swatch
  694 s000  S      0:00.12 /usr/bin/perl /opt/zimbra/libexec/swatch --config-file=/opt/zimbra/conf/swatchrc --use-cpan-file-tail --script-dir=/tmp -t /var/log/zimbra.log
  698 s000  S      0:00.15 /usr/bin/perl /tmp/.swatch_script.694
  763 s000  S      0:00.10 /usr/bin/perl /opt/zimbra/libexec/logswatch --config-file=/opt/zimbra/conf/logswatchrc --use-cpan-file-tail --pid-file=/opt/zimbra/log/logswatch.pid --script-dir=/tmp -t /var/log/zimbra.log
  765 s000  S      0:00.13 /usr/bin/perl /tmp/.swatch_script.763
  825 s000  R+     0:00.00 grep swatch
odmxserve:log zimbra$ cat /opt/zimbra/log/logswatch.pid
765
Note that swatch.pid(694) is the pid of the parent process; the child process(698) contains the parent pid(694) as part of the script name.

However, logswatch.pid(765) is the pid of the CHILD process; the child process(765) contains the parent pid(763) as part of the script name.

Why does logswatch.pid have the pid of the child, not the parent?

/opt/zimbra/libexec/swatch and /opt/zimbra/libexec/logswatch are identical, the difference is that "--pid-file=/opt/zimbra/log/logswatch.pid" is passed to logswatch.

Is /etc/periodic/daily/600.zimbra really trying to kill the child or should it be sending the HUP to the parent? zmswatchctl amd zmlogswatchctl are coded to perform in this manner, but swatch successfully restarts and logswatch doesn't.
Reply With Quote
  #6 (permalink)  
Old 03-25-2009, 06:38 AM
Member
 
Posts: 11
Default

We are having this same issue, is there any update?
Reply With Quote
  #7 (permalink)  
Old 03-25-2009, 06:42 AM
Senior Member
 
Posts: 59
Default

Quote:
Originally Posted by gosborne View Post
We are having this same issue, is there any update?
having the same issue here
Reply With Quote
  #8 (permalink)  
Old 03-25-2009, 11:47 AM
Intermediate Member
 
Posts: 17
Default

The parent/child pid issue for zmlogswatchctl is not the problem. I still don't see why zmswatch and zmlogswatch are coded slightly differently, but I'll keep digging.
Reply With Quote
  #9 (permalink)  
Old 03-26-2009, 07:06 AM
Senior Member
 
Posts: 59
Default

FYI the technician assigned to my trouble ticket was able to reproduce and filed bug 36545

Bug 36545 – logswatch not running after nightly log rotation
Reply With Quote
  #10 (permalink)  
Old 03-26-2009, 09:41 AM
Intermediate Member
 
Posts: 17
Default

After much analyzing, it IS the parent/child pid at the root of the problem. Here's some background.

There are two similar functions that process /var/log/zimbra.log:
zmswatch sends SNMP traps based on certain lines in zimbra.log;
zmlogswatch writes lines from zimbra.log to a pipe read by zmlogger that keeps statistics.
Each function is comprised of multiple processes.

zmswatch
Process: user interface - /opt/zimbra/bin/zmswatchctl
zmswatchctl controls and reports on the status of the control process.
Writes pid of control process to /opt/zimbra/log/swatch.pid
Commands:
Start - start the control process
Stop - stop the control process by sending it a TERM signal
Restart - stop and start the control process
Reload - send the control process a HUP signal to cause it to restart child process
Status - report whether the control process is running or stopped
Process: control(parent) - /opt/zimbra/libexec/swatch
swatch creates, controls and monitors the status of the following process.
Writes minimal logging to /opt/zimbra/log/zmswatch.out
Signals:
INT, QUIT, TERM - send TERM to child to make it stop
ALRM, HUP - send TERM to child to make it stop, then start child
Process: watch(child) - /tmp/.swatch_script.${ppid}
watch tails /opt/log/zimbra.log and processes selected lines
Signals:
HUP, TERM - terminate

zmlogswatch
Process: user interface - /opt/zimbra/bin/zmlogswatchctl
zmlogswatchctl controls and reports on the status of the control process.
Commands:
Start - start the control process
Stop - stop the control process by sending it a TERM signal
Restart & Reload - stop and start the control process
Status - report whether the control process is running or stopped
Process: control(parent) - /opt/zimbra/libexec/logswatch
logswatch creates, controls and monitors the status of the following process.
Writes minimal logging to /opt/zimbra/log/zmlogswatch.out
Writes pid of watch(child) process to /opt/zimbra/log/logswatch.pid
Signals:
INT, QUIT, TERM - send TERM to child to make it stop
ALRM, HUP - send TERM to child to make it stop, then start child
Process: watch(child) - /tmp/.swatch_script.${ppid}
watch tails /opt/log/zimbra.log and writes lines to a pipe read by zmlogger
Signals:
HUP, TERM - terminate

Note the differences, for zmswatch, the pid file contains the pid of the parent; for zmlogswatch, the pid file contains the pid of the child.
For zmswatch, Restart and Reload do different things; for zmlogswatch, they do the same thing.

When /etyc/periodic/daily/600.zimbra is executed, it sends a HUP to the the processes identified swatch.pid and logswatch.pid. For swatch this does exactly
what we want: the parent gets the HUP, it stops the current child and starts a new one that tails the new zimbra.log. For logswatch this fails, the HUP goes
to the child, it terminates and then the parent terminates. Bingo, no zmlogswatchctl function.

Here is a patch file that will make zmlogswatch behave like zmswatch; copy it to /opt/zimbra/bin and execute: patch -b < zmlogswatchctl.patch.

Code:
--- zmlogswatchctl.orig 2009-03-10 16:58:38.000000000 -0700
+++ zmlogswatchctl      2009-03-26 09:28:51.000000000 -0700
@@ -69,8 +69,12 @@
     fi

     ${zimbra_home}/libexec/logswatch --config-file=${configfile} \
-      --use-cpan-file-tail --pid-file=${pidfile}\
+      --use-cpan-file-tail\
       --script-dir=/tmp -t /var/log/zimbra.log > $logfile 2>&1 &
+    pid=$!
+    if [ "x$pid" != "x" ]; then
+      echo $pid > $pidfile
+    fi
     for ((i=0; i < 30; i++)); do
       checkrunning
       if [ $running = 1 ]; then
@@ -115,7 +119,7 @@
             fi
           done
         else
-          kill -9 $pid
+          kill $pid
         fi
         sleep 1
       done
@@ -128,10 +132,18 @@
     fi
     exit 0
   ;;
-  restart|reload)
+  restart)
     $0 stop
     $0 start
   ;;
+  reload)
+    checkrunning
+    if [ $running = 1 -a "x$pid" != "x" ]; then
+      echo -n "Reloading logswatch..."
+      kill -HUP $pid
+      echo "done."
+    fi
+  ;;
   status)
     echo -n "zmlogswatch is "
     checkrunning
Reply With Quote
Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes


Similar Threads

Why Join?

Registering let's you ask questions, makes it easier to search, displays any files attached to posts, and notifies you about replies.

blog.zimbra.com




 

SEO by vBSEO ©2011, Crawlability, Inc.