Hello Falks, i am in need of some help for our Zimbra solution.
We have an OpenSource 6.0.6.1 Zimbra Installation.
First we had it on a very decent HP Blade, but due to hardware problems we had to switchover to our cold-standby vmware instance.
the vmware instance is as follows:
Debian Lenny 64bit
5.8 G RAM
4 core processors (2.6 ghz)
iSCSI connection to SATA NetApp Filer (I know, sata, but it was the failover machine and I am fairly sure the I/O is not the problem!)
currently 400 configured accounts
I attached some charts which where from this Sunday/Monday/Tuesday, as you can see, Sunday is no problem, but on the other days, everytime the heap get's full and the garbage collection kicks in, we see loads of upto 20-25 for a prolonged period of time up to 15 minutes. The standard load for this server is between 2 and 4-5.
The log zmmailbox.out shows at times GC stops of seconds upto minutes (yesterday I had 477 seconds GC Time once!). The biggest problem with this is that after 477 seconds of unavailability of the java process a lot of mail accumulates and prolonges the time the server needs to calm down and users experience latencies ...
I read the wikis and performance, etc. and I ruled out the I/O as it's not related to the spikes in GC and is pretty low overall. What I find curios is the spikes which corellates with the GC in the SoftIRQ's, does the GC use SoftIRQ's in some form or the other? Or could it be that the high SoftIRQ's are responsible for the long GC times ?? The graphs show you the cpu systime, softirqs, irqs (I do not have load graphs) and all show the similar pattern, but the SoftIRQ's stand out as they are in the spikes higher than I ever saw them.
Right now I have implemented these startup parameters for mailboxd:
mailboxd_java_heap_memory_percent = 25
mailboxd_java_heap_new_size_percent = 25
mailboxd_java_options = -server -Djava.awt.headless=true -XX:+UseConcMarkSweepGC -XX:NewRatio=2 -XX:PermSize=128m -XX:MaxPermSize=128m -XX:SoftRefLRUPolicyMSPerMB=1 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCApplicationStoppedTime -XX:+UseParNewGC
mailboxd_thread_stack_size = 256k
Today i'l try these:
mailboxd_java_heap_memory_percent = 20
mailboxd_java_options = -server -Djava.awt.headless=true -XX:+UseConcMarkSweepGC -XX:NewRatio=2 -XX:PermSize=128m -XX:MaxPermSize=128m -XX:SoftRefLRUPolicyMSPerMB=1 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCApplicationStoppedTime -XX:+UseParNewGC
But I really want to know why we have these problems, are they related to mailboxd_java_heap_new_size_percent? should I lower it? what is it doing?
Should I use the parallel garbace collection instead of the Concurrent Mark Sweep collector? What could be the drawback?
I read about a few people as well experiencing garbage collection related problems, how did you solve yours?


LinkBack URL
About LinkBacks

