I seem to have a conundrum on my hands, and I am hoping that the Zimbra community might be able to help me get to the bottom of it. I am currently managing a Zimbra install that is split into 2 servers. The LDAP & Store are running on one server, and the MTA is running on the other server. The server specs are as follows:
- Quad Xeon 3.0 GHz
- 6G RAM
- 500G DAS SCSI Storage (RAID 5)
- Xeon 3.0 GHz
- 4G RAM
- 40G internal storage
I'm sure that there are other stats that are relevant, so please let me know what you need.
The issue that I am facing is that it seems to me that these specs are mostly adequate for my 100 mailbox user base, and yet I have times when the load average on the mail store server gets so high (I saw the 1 minutes load average cross 10 this morning) that the server becomes unresponsive. I have left the server for upwards of an hour, hoping that the load would stabilize, but it never does. I end up hard shutting down the server, and bringing it back up.
I am running both of these servers on Ubuntu 8.04 32 bit, and they are running inside of VMWare ESXi 3.5.
What I find the most vexing is that I am managing another ZCS install for another company using Community Edition and less beefy machines. The second company has a slightly larger userbase, and significantly more mail traffic (upwards of 3 times as much); yet the mail store server for the second company never exhibits this behavior.
While I know that the RAID5 is not ideal, it does not seem to me that I have an I/O issue, either at the ESXi level, or at the vServer level. My servers are never swapping (according to top). When this comes up, there always seems to be a java process that is tying everything. I do not currently have an example of exactly what the command is, and it does seem to vary slightly.
Does anyone out there have any thoughts?