Zimbra offers Open Source email server software and shared calendar for Linux and the Mac
Go Back   Zimbra :: Forums > Zimbra Collaboration Suite > Administrators

Welcome to the Zimbra :: Forums!
Welcome, if you would like to post a comment please register. We also encourage you to explore all things Zimbra with our team and members of the community.

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #11 (permalink)  
Old 06-05-2007, 09:09 AM
Active Member
 
Posts: 26
Default

Quote:
Originally Posted by padraig View Post
This looks good but there is a danger this could mask an underlying problem
if processes dies regularly
the posted config doesnt work... for a bunch of reasons.

I'm working on a much better version right now... should have it finished today, i'll test for a week and then post back here if it works
Reply With Quote
  #12 (permalink)  
Old 02-28-2008, 07:41 PM
New Member
 
Posts: 3
Default

Anyone get this working successfully?
This thread is a little old but im hoping somebody got this to work.

Quote:
Originally Posted by Leesbian View Post
the posted config doesnt work... for a bunch of reasons.

I'm working on a much better version right now... should have it finished today, i'll test for a week and then post back here if it works
Reply With Quote
  #13 (permalink)  
Old 02-28-2008, 08:11 PM
Former Zimbran
 
Posts: 5,606
Default

There is a very fundamental issue with this work flow that needs to be considered:

If a service stops, it stops for a reason. This work flow does nothing to address that problem.

This means that if there is larger issue, such as an unhanded exception...well it's only a matter of time before it goes down again. Since this idea would automatically restart the service, you may never know if you hit an unhanded exception. It also might make it worse....

Zimbra has great handlers. We have our own watchdog proc for things like mta, clam, and java. If those die, it tries to restart them. If there is a condition preventing the restart, it won't restart them.

The moral of the story is that if the server goes down, you really should figure out why, as opposed to just restarting the service.

I do think this is a good idea, which is why I'm saying it's a problem with the work flow itself.

There's a high availability/fail over script floating around. You might want to look at that.
Reply With Quote
  #14 (permalink)  
Old 02-29-2008, 09:52 AM
New Member
 
Posts: 3
Default

does the watchdog process send an email to the admin if a process dies and it has to restart it or cant restart it? is there an option to set something like that up? i realize that if a service does die that there could be a bigger underlying issue, but i would like an alert telling me its died and could/couldnt be restarted rather than just finding out by all my customers calling and complaining ;-)

i was just trying to be proactive in being alerted to the issue first if something were to happen.

thanks for the input.
Reply With Quote
  #15 (permalink)  
Old 02-29-2008, 10:05 AM
Former Zimbran
 
Posts: 5,606
Default

Well, it wouldn't be able to send an e-mail because the server is down, thus smtp is down. If e-mail's down, you probably won't get the message anyway.

What I would do is to have a script that monitors the services. If a condition is raised where the services go down, you could have it sent an http post to your "support server" or something. If you're using windows nt, you would whip up a script where if that post is received, it uses windows messaging service (not MSN messenger, but the messenger protocol built into windows nt machines) to send your machine an alert.

Just some thoughts.

Definitely possible.
Reply With Quote
  #16 (permalink)  
Old 02-29-2008, 10:07 AM
Former Zimbran
 
Posts: 5,606
Default

Correction:
SMTP may not be down, but another service could be down. In any case, since this is a disaster-related script, you should plan for the event that smtp is unavailable.
Reply With Quote
  #17 (permalink)  
Old 03-03-2008, 03:39 AM
Elite Member
 
Posts: 286
Default multistore is worthwhile to be monitored using monit

all what u say, john, is right..but:
i have a multistore architecture with store servers wan-connected to a central hub;
i have a store that die when wan connection with master goes away; at this moment i dunno any way to resort it without using monit; if u would suggest me something different u are welcome!
any advice will be glad
Reply With Quote
  #18 (permalink)  
Old 10-10-2008, 08:23 AM
Starter Member
 
Posts: 2
Post A working monitor...

Hey there... no one's done anything with this in a while, but I figured I would post my working monitor script. The one thing to note is that the purpose of the script is NOT to restart a failed process, simply to give the administrator a heads up that something is about to go bad (Eg. process hung, running out of resources, process died... etc).

Code:
check system myhost.local
  if loadavg (1min) > 4 then alert
  if loadavg (5min) > 2 then alert
  if memory usage > 85% then alert
  if cpu usage (user) > 70% then alert
  if cpu usage (system) > 50% then alert
  if cpu usage (wait) > 20% then alert

check process Zimbra.Apache
  with pidfile "/opt/zimbra/log/httpd.pid"
  if children > 255 for 5 cycles then alert
  if cpu usage > 95% for 3 cycles then alert
  if failed port 80 protocol http then alert
  group zimbra

check process Zimbra.Logwatch
  with pidfile "/opt/zimbra/log/logswatch.pid"
  if children > 255 for 5 cycles then alert
  if cpu usage > 95% for 3 cycles then alert
  group zimbra

check process Zimbra.MySQL
  with pidfile "/opt/zimbra/db/mysql.pid"
  if children > 255 for 5 cycles then alert
  if cpu usage > 95% for 3 cycles then alert
  if failed port 7306 protocol mysql then alert
  group zimbra

check process Zimbra.MySQL_Logger
  with pidfile "/opt/zimbra/logger/db/mysql.pid"
  if children > 255 for 5 cycles then alert
  if cpu usage > 95% for 3 cycles then alert
  depends on Zimbra.MySQL
  group zimbra

check process Zimbra.MTA_Config
  with pidfile "/opt/zimbra/log/zmmtaconfig.pid"
  if children > 255 for 5 cycles then alert
  if cpu usage > 95% for 3 cycles then alert
  group zimbra

check process Zimbra.Mailbox_Java
  with pidfile "/opt/zimbra/log/zmmailboxd_java.pid"
  if children > 255 for 5 cycles then alert
  if cpu usage > 95% for 3 cycles then alert
  if failed port 143 protocol imap then alert
  group zimbra

check process Zimbra.Mailbox_Control
  with pidfile "/opt/zimbra/log/zmmailboxd_manager.pid"
  if children > 255 for 5 cycles then alert
  if cpu usage > 95% for 3 cycles then alert
  group zimbra

check process Zimbra.ClamAV
  with pidfile /opt/zimbra/log/clamd.pid
  if children > 255 for 5 cycles then alert
  if cpu usage > 95% for 3 cycles then alert
  group zimbra

check process Zimbra.Cyrus_SASL
  with pidfile /opt/zimbra/cyrus-sasl/state/saslauthd.pid
  if children > 255 for 5 cycles then alert
  if cpu usage > 95% for 3 cycles then alert
  group zimbra

check process Zimbra.Postfix
  with pidfile /opt/zimbra/data/postfix/spool/pid/master.pid
  if children > 255 for 5 cycles then alert
  if cpu usage > 95% for 3 cycles then alert
  if failed port 25 protocol smtp then alert
  group zimbra

check process Zimbra.LDAP
  with pidfile /opt/zimbra/openldap/var/run/slapd.pid
  if children > 255 for 5 cycles then alert
  if cpu usage > 95% for 3 cycles then alert
  if failed host myhost.local port 389 protocol ldap3 then alert
  group zimbra

check process Zimrba.Amavis
  with pidfile /opt/zimbra/log/amavisd.pid
  if children > 255 for 5 cycles then alert
  if cpu usage > 95% for 3 cycles then alert
  group zimbra
So, think of this as an early warning system. Monit can easily be set to use a different SMTP server than your Zimbra server, so it gets around that problem as well.
Reply With Quote
  #19 (permalink)  
Old 10-10-2008, 09:00 AM
Senior Member
 
Posts: 52
Thumbs up To each his/her own..

Quote:
Originally Posted by jholder View Post
There is a very fundamental issue with this work flow that needs to be considered:

If a service stops, it stops for a reason. This work flow does nothing to address that problem.

This means that if there is larger issue, such as an unhanded exception...well it's only a matter of time before it goes down again. Since this idea would automatically restart the service, you may never know if you hit an unhanded exception. It also might make it worse....

Zimbra has great handlers. We have our own watchdog proc for things like mta, clam, and java. If those die, it tries to restart them. If there is a condition preventing the restart, it won't restart them.

The moral of the story is that if the server goes down, you really should figure out why, as opposed to just restarting the service.

I do think this is a good idea, which is why I'm saying it's a problem with the work flow itself.

There's a high availability/fail over script floating around. You might want to look at that.
Everyone's requirements are different, so your mileage will vary. I've had processes die, and they could die for many reasons, sometimes even under load from a spam attack.

Depending on your environment, you may not want the service down, if say it happened at 4am and you get a wakeup call at 8am from irate users. Your investigation time would be limited, you would have to restart the service.

So the real moral of the story, know what you need before you implement. Just leaving a service down is great in theory, as we take our time to exchange pleasantries with Zimbra tech support to get the issue resolved. But that's not always a quick thing.

As someone mentioned later, monit can be configured to send alerts via another smtp server, so based on your alerts config, you will be notified of a down situation.

You can also comment out the start/stop lines and just have the alerts sent out, pretty flexible.
Reply With Quote
  #20 (permalink)  
Old 10-10-2008, 09:46 AM
Starter Member
 
Posts: 2
Default Absolutely

Oh, I completely agree... That's the whole point of the monitrc posting that I put up... all it does is let the admin know that either (A) a service has gone down, or (b) the server appears to be struggling with something... either way, they should look into it. The monit script I posted doesn't even have start/stop lines, and that's completely intentional.

The idea behind having the alerts for children processes/memory utilization/load etc. is that the administrator can get in, and worst case scenario, alert the users that the system is going down. In my experience, I've seen that generally the anger level of a client is inversely proportional to the amount of warning they had. eg. "You're getting a lot of spam, it looks like it's about to hang the system" is often appreciated more than "The reason you haven't received email in the last 4 hours is because spam clogged the system".

... god I hate spam.
Reply With Quote
Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes


Similar Threads

Why Join?

Registering let's you ask questions, makes it easier to search, displays any files attached to posts, and notifies you about replies.

blog.zimbra.com




 

SEO by vBSEO ©2011, Crawlability, Inc.