After much analyzing, it IS the parent/child pid at the root of the problem. Here's some background.
There are two similar functions that process /var/log/zimbra.log:
zmswatch sends SNMP traps based on certain lines in zimbra.log;
zmlogswatch writes lines from zimbra.log to a pipe read by zmlogger that keeps statistics.
Each function is comprised of multiple processes.
zmswatch
Process: user interface - /opt/zimbra/bin/zmswatchctl
zmswatchctl controls and reports on the status of the control process.
Writes pid of control process to /opt/zimbra/log/swatch.pid
Commands:
Start - start the control process
Stop - stop the control process by sending it a TERM signal
Restart - stop and start the control process
Reload - send the control process a HUP signal to cause it to restart child process
Status - report whether the control process is running or stopped
Process: control(parent) - /opt/zimbra/libexec/swatch
swatch creates, controls and monitors the status of the following process.
Writes minimal logging to /opt/zimbra/log/zmswatch.out
Signals:
INT, QUIT, TERM - send TERM to child to make it stop
ALRM, HUP - send TERM to child to make it stop, then start child
Process: watch(child) - /tmp/.swatch_script.${ppid}
watch tails /opt/log/zimbra.log and processes selected lines
Signals:
HUP, TERM - terminate
zmlogswatch
Process: user interface - /opt/zimbra/bin/zmlogswatchctl
zmlogswatchctl controls and reports on the status of the control process.
Commands:
Start - start the control process
Stop - stop the control process by sending it a TERM signal
Restart & Reload - stop and start the control process
Status - report whether the control process is running or stopped
Process: control(parent) - /opt/zimbra/libexec/logswatch
logswatch creates, controls and monitors the status of the following process.
Writes minimal logging to /opt/zimbra/log/zmlogswatch.out
Writes pid of watch(child) process to /opt/zimbra/log/logswatch.pid
Signals:
INT, QUIT, TERM - send TERM to child to make it stop
ALRM, HUP - send TERM to child to make it stop, then start child
Process: watch(child) - /tmp/.swatch_script.${ppid}
watch tails /opt/log/zimbra.log and writes lines to a pipe read by zmlogger
Signals:
HUP, TERM - terminate
Note the differences, for zmswatch, the pid file contains the pid of the parent; for zmlogswatch, the pid file contains the pid of the child.
For zmswatch, Restart and Reload do different things; for zmlogswatch, they do the same thing.
When /etyc/periodic/daily/600.zimbra is executed, it sends a HUP to the the processes identified swatch.pid and logswatch.pid. For swatch this does exactly
what we want: the parent gets the HUP, it stops the current child and starts a new one that tails the new zimbra.log. For logswatch this fails, the HUP goes
to the child, it terminates and then the parent terminates. Bingo, no zmlogswatchctl function.
Here is a patch file that will make zmlogswatch behave like zmswatch; copy it to /opt/zimbra/bin and execute: patch -b < zmlogswatchctl.patch.
Code:
--- zmlogswatchctl.orig 2009-03-10 16:58:38.000000000 -0700
+++ zmlogswatchctl 2009-03-26 09:28:51.000000000 -0700
@@ -69,8 +69,12 @@
fi
${zimbra_home}/libexec/logswatch --config-file=${configfile} \
- --use-cpan-file-tail --pid-file=${pidfile}\
+ --use-cpan-file-tail\
--script-dir=/tmp -t /var/log/zimbra.log > $logfile 2>&1 &
+ pid=$!
+ if [ "x$pid" != "x" ]; then
+ echo $pid > $pidfile
+ fi
for ((i=0; i < 30; i++)); do
checkrunning
if [ $running = 1 ]; then
@@ -115,7 +119,7 @@
fi
done
else
- kill -9 $pid
+ kill $pid
fi
sleep 1
done
@@ -128,10 +132,18 @@
fi
exit 0
;;
- restart|reload)
+ restart)
$0 stop
$0 start
;;
+ reload)
+ checkrunning
+ if [ $running = 1 -a "x$pid" != "x" ]; then
+ echo -n "Reloading logswatch..."
+ kill -HUP $pid
+ echo "done."
+ fi
+ ;;
status)
echo -n "zmlogswatch is "
checkrunning