So I changed settings on my ZCS 5.0.18 sever with 32GB RAM to shift memory from mailboxd to mysql and UseConcMarkSweepGC.
Code:
<key name="mailboxd_java_heap_memory_percent">
- <value>35</value>
+ <value>22</value>
</key>
<key name="mailboxd_java_options">
- <value>-client -XX:NewRatio=2 -XX:MaxPermSize=128m -Djava.awt.headless=true
-XX:SoftRefLRUPolicyMSPerMB=1 -XX:+UseParallelGC -verbose:gc -XX:+PrintGCDetail
s -XX:+PrintGCTimeStamps</value>
+ <value>-server -Djava.awt.headless=true -Xmn6400m -XX:+UseConcMarkSweepGC -
XX:NewRatio=2 -XX:PermSize=128m -XX:MaxPermSize=128m -XX:SoftRefLRUPolicyMSPerMB
=1 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCApplicatio
nStoppedTime -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/zimbra/log</v
alue>
</key>
Result: on average, SOAP operations took 50-100% longer.
This bit about application threads being stopped for a whole two seconds, if meant to be taken literally, corresponds with user experience.
Code:
1097066.442: [GC [1 CMS-initial-mark: 914157K(1681408K)] 5887730K(8228672K), 1.4
985010 secs]
Total time for which application threads were stopped: 1.5006810 seconds
1097067.941: [CMS-concurrent-mark-start]
1097069.273: [CMS-concurrent-mark: 1.332/1.332 secs]
1097069.273: [CMS-concurrent-preclean-start]
1097069.286: [CMS-concurrent-preclean: 0.011/0.012 secs]
1097069.286: [CMS-concurrent-abortable-preclean-start]
CMS: abort preclean due to time 1097070.318: [CMS-concurrent-abortable-preclean
: 0.148/1.032 secs]
1097070.319: [GC[YG occupancy: 5152805 K (6547264 K)]1097070.319: [Rescan (paral
lel) , 1.7253600 secs]1097072.045: [weak refs processing, 0.0016260 secs] [1 CMS
-remark: 914157K(1681408K)] 6066963K(8228672K), 1.7271760 secs]
Total time for which application threads were stopped: 1.7290230 seconds
1097072.047: [CMS-concurrent-sweep-start]
1097072.480: [CMS-concurrent-sweep: 0.433/0.433 secs]
1097072.480: [CMS-concurrent-reset-start]
1097072.492: [CMS-concurrent-reset: 0.012/0.012 secs]
1097074.495: [GC [1 CMS-initial-mark: 913547K(1681408K)] 6444062K(8228672K), 1.3
286600 secs]
Total time for which application threads were stopped: 1.3306650 seconds
1097075.824: [CMS-concurrent-mark-start]
1097077.220: [CMS-concurrent-mark: 1.396/1.396 secs]
1097077.252: [CMS-concurrent-preclean-start]
1097077.266: [CMS-concurrent-preclean: 0.013/0.014 secs]
1097077.266: [CMS-concurrent-abortable-preclean-start]
CMS: abort preclean due to time 1097078.289: [CMS-concurrent-abortable-preclean
: 0.139/1.022 secs]
1097078.290: [GC[YG occupancy: 5854796 K (6547264 K)]1097078.290: [Rescan (paral
lel) , 2.2002250 secs]1097080.491: [weak refs processing, 0.0018230 secs] [1 CMS
-remark: 913547K(1681408K)] 6768344K(8228672K), 2.2022440 secs]
Total time for which application threads were stopped: 2.2043020 seconds
On the micro level, the only server statistic that changed substantially was context switches per second. The average jumped from less than 5K to 100K.
Reverting to UseParallelGC et al and restarting mailboxd immediately returned context switches to the previous level, and appears to have addressed performance problems at the application layer.
Did I misunderstand the performance wiki article, or are some of these java options Just Wrong for pre-ZCS 6.0 systems?