View Single Post
  #19 (permalink)  
Old 10-10-2008, 08:00 AM
azilber azilber is offline
Senior Member
 
Posts: 52
Thumbs up To each his/her own..

Quote:
Originally Posted by jholder View Post
There is a very fundamental issue with this work flow that needs to be considered:

If a service stops, it stops for a reason. This work flow does nothing to address that problem.

This means that if there is larger issue, such as an unhanded exception...well it's only a matter of time before it goes down again. Since this idea would automatically restart the service, you may never know if you hit an unhanded exception. It also might make it worse....

Zimbra has great handlers. We have our own watchdog proc for things like mta, clam, and java. If those die, it tries to restart them. If there is a condition preventing the restart, it won't restart them.

The moral of the story is that if the server goes down, you really should figure out why, as opposed to just restarting the service.

I do think this is a good idea, which is why I'm saying it's a problem with the work flow itself.

There's a high availability/fail over script floating around. You might want to look at that.
Everyone's requirements are different, so your mileage will vary. I've had processes die, and they could die for many reasons, sometimes even under load from a spam attack.

Depending on your environment, you may not want the service down, if say it happened at 4am and you get a wakeup call at 8am from irate users. Your investigation time would be limited, you would have to restart the service.

So the real moral of the story, know what you need before you implement. Just leaving a service down is great in theory, as we take our time to exchange pleasantries with Zimbra tech support to get the issue resolved. But that's not always a quick thing.

As someone mentioned later, monit can be configured to send alerts via another smtp server, so based on your alerts config, you will be notified of a down situation.

You can also comment out the start/stop lines and just have the alerts sent out, pretty flexible.
Reply With Quote