Zimbra offers Open Source email server software and shared calendar for Linux and the Mac
 
Go Back   Zimbra - Forums > Zimbra Collaboration Suite > Administrators

Welcome to the Zimbra - Forums!
Welcome, if you would like to post a comment please register. We also encourage you to explore all things Zimbra with our team and members of the community.

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 07-09-2009, 01:34 PM
Active Member
 
Posts: 49
Default [SOLVED] Message Queue getting stuck

Over the past 2 days, I have been seeing a problem that the Deferred mail queue starts filling up and no new mail is delivered locally.

A restart of Zimbra solves the problem, and the queue empties out immediately upon the restart (zmcontrol stop, zmcontrol start)

In the log files (zimbra.log) I see the following (with names changed to protect the innocent, of course):

>Jul 9 10:43:43 XXXXXX postfix/qmgr[10896]: 242AD90BE2D: to=<name@mail.host>, relay=none, delay=0.03, delays=0.01/0.02/0/0, dsn=4.4.2, status=deferred (delivery temporarily suspended: conversation with mail.xxx.xxx.xxx[nnn.nnn.nnn.nnn] timed out while receiving the initial server greeting

Not sure where to look for the problem. Can anyone point me to what I should be looking for? I don't think its a network problem, because as I said, a zimbra restart fixes the problem. And it only started happening over the past two days.

Has anyone run into this before?

Thanks for any help,
__________________
Jeffrey Turmelle
International Research Institute for Climate and Society
Earth Institute at Columbia University
Reply With Quote
  #2 (permalink)  
Old 07-09-2009, 01:56 PM
Zimbra Consultant & Moderator
 
Posts: 11,508
Default

Have you done any updates to the system in the past few days? Are there any other errors in the log files around the time you get the error above? Can you also look in the /etc/security/limits.conf file and see if you have the following entries:

Code:
zimbra soft nofile 524288
zimbra hard nofile 524288
If they're not set to that could you change them and restart your system.
__________________
Regards


Bill
Reply With Quote
  #3 (permalink)  
Old 07-09-2009, 02:41 PM
Active Member
 
Posts: 49
Default no other errors that I can see

The only system change was the recent Zimbra security fix 30754 which I installed early last week

I checked /var/log/messages, /var/log/maillog, /opt/zimbra/log/mailbox.log in addition to /var/log/zimbra.log and nothing else out of the ordinary except for
> Jul 9 10:18:12 XXXXXX postfix/qmgr[10896]: warning: connect to transport retry: No such file or directory
...
...
> Jul 9 10:34:55 XXXXXX postfix/qmgr[10896]: warning: connect to transport retry: No such file or directory

Which continuously logs during the email outage (probably on every new incoming email until I restart zimbra), which I guess this means that the qmgr is losing its connection to postfix?, and can't recover, but how would this happen?

zmcontrol status returns the all/ok messages (as does the GUI)

The /etc/security/limits.conf file is correct

Any other log files I might check for clues?

Thanks again
__________________
Jeffrey Turmelle
International Research Institute for Climate and Society
Earth Institute at Columbia University

Last edited by jefft@iri.columbia.edu : 07-09-2009 at 02:43 PM. Reason: forgot something
Reply With Quote
  #4 (permalink)  
Old 07-09-2009, 11:19 PM
Zimbra Consultant & Moderator
 
Posts: 11,508
Default

Is your DNS server on the Zimbra server or another machine? My guess is that it's not able to resolve the DNS recods when this problem happens, does the problem recur after a while? Is there any likelihood that HD is getting full on this server?
__________________
Regards


Bill
Reply With Quote
  #5 (permalink)  
Old 07-10-2009, 12:47 AM
Elite Member
 
Posts: 369
Default

Do you have transports defined ? also master.cf is needed for further analysis.
Reply With Quote
  #6 (permalink)  
Old 07-10-2009, 10:03 AM
Active Member
 
Posts: 49
Default DNS is not on the same server

The DNS server is external to our mail server, but we constantly monitor it (our DNS server) for reply times, and haven't seen anything excessive recently.

Disk space is plentiful.

I've attached my master.cf, but I think its the default zimbra release.

We don't have any transports. This server handles all mail, although we do have a front-end spam/virus gateway that delivers mail to Zimbra. But again, we monitor that machine for reply times, and it has been fine.

Since this is intermittent, I don't think its a 'Zimbra' problem per-se. The problem probably lies in that Zimbra [some process] is not connecting to a service or losing a socket connection after a timeout, but I'm not very good at debugging zimbra and wonder what flags I might turn on to increase log messaging to trace the actual problem.

Thanks again for your help.
Attached Files
File Type: txt master_cf.txt (5.0 KB, 3 views)
__________________
Jeffrey Turmelle
International Research Institute for Climate and Society
Earth Institute at Columbia University
Reply With Quote
  #7 (permalink)  
Old 07-13-2009, 09:44 AM
Active Member
 
Posts: 49
Default

If it can't resolve the DNS records (even if only momentarily), will that cause it to stay disconnected? I am starting to think that this may be the problem.
__________________
Jeffrey Turmelle
International Research Institute for Climate and Society
Earth Institute at Columbia University
Reply With Quote
  #8 (permalink)  
Old 07-13-2009, 09:54 AM
Zimbra Consultant & Moderator
 
Posts: 11,508
Default

Quote:
Originally Posted by jefft@iri.columbia.edu View Post
If it can't resolve the DNS records (even if only momentarily), will that cause it to stay disconnected? I am starting to think that this may be the problem.
It's quite possible. Is this happening on a regular basis? What happens when this problem occurs, are you able to ssh into the Zimbra server? If you can do that try and see if DNS resolution is still working.
__________________
Regards


Bill
Reply With Quote
  #9 (permalink)  
Old 07-13-2009, 11:00 AM
Active Member
 
Posts: 49
Default

yes, I usually login immediately and the DNS is fine. But I was wondering if a small hiccup of the DNS server might cause this? I believe this is probably a postfix problem, so I am going to head over to those forums to try there too.
__________________
Jeffrey Turmelle
International Research Institute for Climate and Society
Earth Institute at Columbia University
Reply With Quote
  #10 (permalink)  
Old 07-13-2009, 11:21 AM
Elite Member
 
Posts: 369
Default

>ul 9 10:18:12 XXXXXX postfix/qmgr[10896]: warning: connect to transport retry: No such file or directory
...
...
> Jul 9 10:34:55 XXXXXX postfix/qmgr[10896]: warning: connect to transport retry: No such file or directory

Can you paste the lines above this please ? This is half information of the error
Reply With Quote
Reply


Thread Tools
Display Modes


Similar Threads

Why Join?

Registering let's you ask questions, makes it easier to search, displays any files attached to posts, and notifies you about replies.

Zimbrablog.com




 

Search Engine Optimization by vBSEO 3.1.0