Zimbra offers Open Source email server software and shared calendar for Linux and the Mac
Go Back   Zimbra :: Forums > Zimbra Collaboration Suite > Administrators

Welcome to the Zimbra :: Forums!
Welcome, if you would like to post a comment please register. We also encourage you to explore all things Zimbra with our team and members of the community.

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 05-12-2011, 06:37 PM
Loyal Member
 
Posts: 86
Default Incoming fine, outgoing time out

Incoming messages come in no problem, but with outgoing, I get a lot of messages like below.

Over 1000 messages in defereed queue, I put all on hold, to see if that would help but even with just a few in queue, I still get this.
Split DNS fine, DNS lookups on in global & server settings. Ran zmfixperms, upgraded to 7.1.0 from 7.0.0, etc.

Apart from these errors, I don't see anything else in logs.

ANY advice would be greatly appreciated. 3:30 am here and as I'm an ISP, people need to get their messages out before business start...

:
May 13 03:27:40 mail postfix/smtp[27106]: A69D8255809E: to=<lemontree@intekom.co.za>, relay=mail.intekom.com[196.25.211.70]:25, delay=184614, delays=184572/3/5.8/33, dsn=4.4.2, status=deferred (lost connection with mail.intekom.com[196.25.211.70] while sending MAIL FROM)
status=deferred (lost connection with mail.intekom.com[196.25.211.70] while sending MAIL FROM)
status=deferred (lost connection with mail.intekom.com[196.25.211.70] while performing the EHLO handshake)
May 13 03:28:08 mail postfix/smtp[26889]: 9909329580C1: lost connection with j.mx.mail.yahoo.com[66.94.237.64] while sending message body
May 13 03:28:20 mail postfix/smtp[27070]: 548215CC030C: lost connection with mx2.telkomsa.net[196.25.211.172] while sending message body
May 13 03:28:20 mail postfix/smtp[27049]: 20A3F21D8034: lost connection with mail.telkomsa.net[196.25.211.70] while sending DATA command
status=deferred (lost connection with g.mx.mail.yahoo.com[98.137.54.238] while sending MAIL FROM)
Reply With Quote
  #2 (permalink)  
Old 05-12-2011, 10:59 PM
Zimbra Consultant & Moderator
 
Posts: 20,313
Default

Quote:
Originally Posted by ekkas View Post
Incoming messages come in no problem, but with outgoing, I get a lot of messages like below.

Over 1000 messages in defereed queue, I put all on hold, to see if that would help but even with just a few in queue, I still get this.
Split DNS fine, DNS lookups on in global & server settings. Ran zmfixperms, upgraded to 7.1.0 from 7.0.0, etc.

Apart from these errors, I don't see anything else in logs.

ANY advice would be greatly appreciated. 3:30 am here and as I'm an ISP, people need to get their messages out before business start...

:
May 13 03:27:40 mail postfix/smtp[27106]: A69D8255809E: to=<lemontree@intekom.co.za>, relay=mail.intekom.com[196.25.211.70]:25, delay=184614, delays=184572/3/5.8/33, dsn=4.4.2, status=deferred (lost connection with mail.intekom.com[196.25.211.70] while sending MAIL FROM)
status=deferred (lost connection with mail.intekom.com[196.25.211.70] while sending MAIL FROM)
status=deferred (lost connection with mail.intekom.com[196.25.211.70] while performing the EHLO handshake)
May 13 03:28:08 mail postfix/smtp[26889]: 9909329580C1: lost connection with j.mx.mail.yahoo.com[66.94.237.64] while sending message body
May 13 03:28:20 mail postfix/smtp[27070]: 548215CC030C: lost connection with mx2.telkomsa.net[196.25.211.172] while sending message body
May 13 03:28:20 mail postfix/smtp[27049]: 20A3F21D8034: lost connection with mail.telkomsa.net[196.25.211.70] while sending DATA command
status=deferred (lost connection with g.mx.mail.yahoo.com[98.137.54.238] while sending MAIL FROM)
You need to look at what's causing the highlighted problem (yes, I know it's stating the obvious ). Is this a new problem? What's the output of the 'Verrify...' commands in the Split DNS article? Is there a performance problem on this server (what's the specification)? Does a Zimbra restart or reboot of the srver improve anything? Is it a VM or on real hardware? How much RAM on the server? Does 'top' show any performance problem or excessive i/o? Is it on a RAID system and if so what RAID level? Is there any firewall or SElinux enabled? When did the problem start and have any updates been done to the server?
__________________
Regards


Bill
Reply With Quote
  #3 (permalink)  
Old 05-13-2011, 01:58 AM
Advanced Member
 
Posts: 222
Default

It might not be related, but I've had the simillar problem. My issue was datacenter's provider DNS, which I was using for my Zimbra box - some DNS queries simply did not get back.
Changed to public available (reliable) DNS server and queue on server got empty in next hour.

And another idea - jumbo frames? You might have netowrking issue, NIC adapter degrading in time. Mybe try setting it to 100 Mbps full-duplex speed. Worth trying.
Reply With Quote
  #4 (permalink)  
Old 05-13-2011, 02:41 AM
Loyal Member
 
Posts: 86
Default

Triple checked my DNS settings.
It seems my local (split) DNS is 100%, my ISP DNS is 100%, but the other ISPs whose majority of mail is failing does not see my DNS records.

dig mydomain.com mx - 100%
dig @myisp.dns mydomain.com mx - 100%
dig @failingdomain.dns mydomain.com mx - no records in answer.

But they are the largest ISP in South Africa, so that is strange.
To answer the other questions, yes it is a VM running om XenServer whose storage is on a SAN with RAID5. Running more than a year with no problems in this environment. Upped the RAM from 2GB to 3GB and upped CPUs from 2 to 4, but same issues...

What was strange is that mail to Google (and we send a lot) goes without (much) trouble, but mail to national ISP, which should be fine, is timing out.
I hope it is a DNS issue on their end and will see if it clears up.
Reply With Quote
  #5 (permalink)  
Old 05-13-2011, 03:18 AM
Zimbra Consultant & Moderator
 
Posts: 20,313
Default

Quote:
Originally Posted by ekkas View Post
Triple checked my DNS settings.
It seems my local (split) DNS is 100%, my ISP DNS is 100%, but the other ISPs whose majority of mail is failing does not see my DNS records.

dig mydomain.com mx - 100%
dig @myisp.dns mydomain.com mx - 100%
dig @failingdomain.dns mydomain.com mx - no records in answer.
Without exact details of the sites. obviously, I couldn't comment but the lack of response would indicate a DNS problem. I assume those commands were run on the Zimbra server?

Quote:
Originally Posted by ekkas View Post
To answer the other questions, yes it is a VM running om XenServer whose storage is on a SAN with RAID5. Running more than a year with no problems in this environment. Upped the RAM from 2GB to 3GB and upped CPUs from 2 to 4, but same issues...
A RAID5 is not recommended for a production server with more than 100 users (and prefer it wasn't used at all but I understand it's attraction), two processors should be sufficient and I'd also suggest more RAM for a reasonable size installation.

Quote:
Originally Posted by ekkas View Post
I hope it is a DNS issue on their end and will see if it clears up.
I hope it clears-up, let us know the outcome.
__________________
Regards


Bill
Reply With Quote
  #6 (permalink)  
Old 05-13-2011, 03:57 AM
Loyal Member
 
Posts: 86
Default

Quote:
I assume those commands were run on the Zimbra server?
Yes, run on Zimbra server.

Quote:
I'd also suggest more RAM for a reasonable size installation.
I'll see if I can get it up to 4GB. (500 users, but not using ZWC, mostly POP, handfull using IMAP)

Quote:
A RAID5 is not recommended for a production server with more than 100 users
I can't see why not. The RAID is running on a SAN with large amounts of cache. Increased storage requirements nowadays make mirroring (Raid1) impractical. Besides, the SAN perform at well over 100MBps (bytes) sustained write speed, saturating a 1Gbps ethernet link. But I do not want to start (another!) RAID x vs RAID y, NFS vs FC vs iSCSI vs FoE debate.
Our next SAN project is going to have 40Gbps Infiniband and SSDs acting as large non-volatile cache, apart from 32GB volatile cache (using Nexenta SAN software), making any RAID5/RAID6 performance, almost a non-issue.

My word, I suspect I've dwelt slightly off-topic.
Reply With Quote
  #7 (permalink)  
Old 05-13-2011, 09:35 AM
Loyal Member
 
Posts: 86
Default

Nope, same problem, it seemed that the other ISP just rejected my DNS request, but after logging a call, all seems to be correct.

So I'm back to square 1.

Strange that only some domains give problems.
Any ideas where I should look?
Even if I use telnet, I get relatively quick timeout. Have to type really fast, otherwise I can't send:

[root@mail ~]# telnet 196.25.211.70 25
Trying 196.25.211.70...
Connected to mail.telkomsa.net (196.25.211.70).
Escape character is '^]'.
220 as5.telkomsa.net ESMTP
helo mail.mydomain.co.za
250 as5.telkomsa.net
mail from:support@mydomain.co.za
250 sender <support@mydomain.co.za> ok
Connection closed by foreign host.

Sometimes I get till after "mail to:" command, but kicks me off quite quick.
They say they do not know of any issues, and I say I can send to most other domains, so I'm stuck with no idea where to look.

If it was Postfix issues, then telnet should at least be working?
Now it seems even Telnet times out after a few seconds.
Maybe some other CentOS setting? This started happening out of the blue, I did yum & Zimbra updates since, but it didn't cure the problem.

Thanks for the replies so far.

Ekkas
Reply With Quote
  #8 (permalink)  
Old 05-13-2011, 09:55 AM
Loyal Member
 
Posts: 86
Default

Also tried to change MTU to lower setting, checked timeout settings, don't know if it will help...

[zimbra@mail root]$ postconf | grep timeout
connection_cache_protocol_timeout = 5s
daemon_timeout = 18000s
ipc_timeout = 3600s
lmtp_connect_timeout = 0s
lmtp_data_done_timeout = 600s
lmtp_data_init_timeout = 120s
lmtp_data_xfer_timeout = 180s
lmtp_lhlo_timeout = 300s
lmtp_mail_timeout = 300s
lmtp_quit_timeout = 300s
lmtp_rcpt_timeout = 300s
lmtp_rset_timeout = 20s
lmtp_starttls_timeout = 300s
lmtp_tls_session_cache_timeout = 3600s
lmtp_xforward_timeout = 300s
milter_command_timeout = 30s
milter_connect_timeout = 30s
milter_content_timeout = 300s
qmqpd_timeout = 300s
smtp_connect_timeout = 300s
smtp_data_done_timeout = 600s
smtp_data_init_timeout = 120s
smtp_data_xfer_timeout = 180s
smtp_helo_timeout = 300s
smtp_mail_timeout = 300s
smtp_quit_timeout = 300s
smtp_rcpt_timeout = 300s
smtp_rset_timeout = 20s
smtp_starttls_timeout = 300s
smtp_tls_session_cache_timeout = 3600s
smtp_xforward_timeout = 300s
smtpd_policy_service_timeout = 100s
smtpd_proxy_timeout = 100s
smtpd_starttls_timeout = 300s
smtpd_timeout = ${stress?10}${stress:300}s
smtpd_tls_session_cache_timeout = 3600s
trigger_timeout = 10s
Reply With Quote
  #9 (permalink)  
Old 05-13-2011, 10:52 AM
Zimbra Consultant & Moderator
 
Posts: 20,313
Default

Quote:
Originally Posted by ekkas View Post
Nope, same problem, it seemed that the other ISP just rejected my DNS request, but after logging a call, all seems to be correct.

So I'm back to square 1.

Strange that only some domains give problems.
Any ideas where I should look?
Even if I use telnet, I get relatively quick timeout. Have to type really fast, otherwise I can't send:

[root@mail ~]# telnet 196.25.211.70 25
Trying 196.25.211.70...
Connected to mail.telkomsa.net (196.25.211.70).
Escape character is '^]'.
220 as5.telkomsa.net ESMTP
helo mail.mydomain.co.za
250 as5.telkomsa.net
mail from:support@mydomain.co.za
250 sender <support@mydomain.co.za> ok
Connection closed by foreign host.

Sometimes I get till after "mail to:" command, but kicks me off quite quick.
They say they do not know of any issues, and I say I can send to most other domains, so I'm stuck with no idea where to look.

If it was Postfix issues, then telnet should at least be working?
Now it seems even Telnet times out after a few seconds.
Maybe some other CentOS setting? This started happening out of the blue, I did yum & Zimbra updates since, but it didn't cure the problem.

Thanks for the replies so far.

Ekkas
I have no problem connecting to their mail servers, it doesn't kick me off if I try and send an email. The MTU should be set at 1500 for your network, I assume you are also on a fairly recent version of XEN? Do you have any firewall hardware (or any CISCO PIX devices) between you and the outside world?

I do remember there was a problem with XEN a while back with the 'checksum offload' function (I haven't used it for years so it may have been fixed) - feel free to ignore the following if it doesn't apply anymore:

This probably is a the NIC causing the problem, you can check the by doing 'tcpdump -nvvi eth0' in your Dom0 and then initiating some traffic, you can run a 'traceroute microsoft.com' and see what output tcpdump gives, if there's any error about 'bad chksum' then you need to modify your NIC driver. The problem is caused by checksum offloading in the NIC driver and you can check it with the following commands:

$ethtool -k eth0 -- display the setting for your driver, you should see something like this:

tx-checksumming: on

If that's the case, disable it with:

$ethtool -K eth0 tx off

I assume that SElinux is disabled on the server? Apart from either a DNS issue or a router/firewall issue between you and the receiving sit I can't really imagine what else it could be.
__________________
Regards


Bill
Reply With Quote
  #10 (permalink)  
Old 05-13-2011, 12:26 PM
Loyal Member
 
Posts: 86
Default

Thanks a lot, I'll try it and see what happens.
I have rx on and tried to turn it off, or should I not do that?
Otherwise I can swap Interfaces later and see if it's maybe the physical NIC that's giving problems.

[root@Xen2 ~]# ethtool -k eth1
Offload parameters for eth1:
Cannot get device flags: Operation not supported
Cannot get device GRO settings: Operation not supported
rx-checksumming: on
tx-checksumming: off
scatter-gather: off
tcp-segmentation-offload: off
udp-fragmentation-offload: off
generic-segmentation-offload: off
generic-receive-offload: off
large-receive-offload: off
Reply With Quote
Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes


Similar Threads

Why Join?

Registering let's you ask questions, makes it easier to search, displays any files attached to posts, and notifies you about replies.

blog.zimbra.com




 

SEO by vBSEO ©2011, Crawlability, Inc.