Zimbra offers Open Source email server software and shared calendar for Linux and the Mac
 
Go Back   Zimbra - Forums > Zimbra Collaboration Suite > Administrators

Welcome to the Zimbra - Forums!
Welcome, if you would like to post a comment please register. We also encourage you to explore all things Zimbra with our team and members of the community.

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 11-05-2009, 03:20 PM
Junior Member
 
Posts: 6
Default zmmailboxd: garbage collector stops application for several minutes

Hi All

I am currently using zcs-6.0.2_GA_1912.UBUNTU8_64 (Open Source Edition) and sometimes the zmmailbox process stops for several minutes. The entry in zmmailboxd.out is the following:

49663.762: [GC 49663.762: [ParNew (promotion failed): 93675K->90505K(94016K), 2.6561510 secs]49666.418: [CMS[Unloading class sun.reflect.GeneratedMethodAccessor217]
[Unloading class sun.reflect.GeneratedMethodAccessor1]
[Unloading class sun.reflect.GeneratedSerializationConstructorAcces sor2]
[Unloading class javax.servlet.Servlet]
[Unloading class sun.reflect.GeneratedConstructorAccessor9]
[Unloading class sun.reflect.GeneratedConstructorAccessor3]
[Unloading class sun.reflect.GeneratedConstructorAccessor87]
[Unloading class sun.reflect.GeneratedConstructorAccessor86]
[Unloading class org.apache.jasper.servlet.JspServlet]
[Unloading class sun.reflect.GeneratedMethodAccessor232]
[Unloading class sun.reflect.GeneratedMethodAccessor228]
[Unloading class sun.reflect.GeneratedSerializationConstructorAcces sor1]
[Unloading class sun.reflect.GeneratedConstructorAccessor2]
[Unloading class sun.reflect.GeneratedSerializationConstructorAcces sor3]
[Unloading class javax.servlet.GenericServlet]
[Unloading class sun.reflect.GeneratedMethodAccessor2]
[Unloading class sun.reflect.GeneratedConstructorAccessor6]
[Unloading class sun.reflect.GeneratedConstructorAccessor5]
[Unloading class sun.reflect.GeneratedMethodAccessor216]
[Unloading class sun.reflect.GeneratedConstructorAccessor10]
[Unloading class javax.servlet.http.HttpServlet]
[Unloading class javax.servlet.ServletConfig]
[Unloading class sun.reflect.GeneratedMethodAccessor233]
[Unloading class sun.reflect.GeneratedMethodAccessor227]
[Unloading class sun.reflect.GeneratedMethodAccessor234]
[Unloading class sun.reflect.GeneratedConstructorAccessor4]
: 287868K->105350K(315392K), 193.3510190 secs] 378582K->105350K(409408K), [CMS Perm : 71863K->71612K(131072K)], 196.0071700 secs] [Times: user=0.50 sys=0.12, real=196.01 secs]
Total time for which application threads were stopped: 196.0071700 seconds


My GC configuration is the following:

$ zmlocalconfig mailboxd_java_options | fmt
mailboxd_java_options = -server -XX:NewRatio=2 -Djava.awt.headless=true
-XX:MaxPermSize=128m -XX:SoftRefLRUPolicyMSPerMB=1
-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-XX:+PrintGCApplicationStoppedTime -XX:+UseConcMarkSweepGC

$ zmlocalconfig mailboxd_java_heap_new_size_percent
mailboxd_java_heap_new_size_percent = 25

Can anyone help?

Thanks
Reply With Quote
  #2 (permalink)  
Old 11-13-2009, 09:54 PM
Intermediate Member
 
Posts: 16
Default

I have something similar. I see
ParNew (promotion failed)
as you do.

I see the java process go cpu intensive after some
time of mailboxd running (days sometimes) of
Release 6.0.2_GA_1912.UBUNTU8 UBUNTU8 NETWORK edition.

I get mailboxd service delays with
CMS: abort preclean due to time

I can do a /opt/zimbra/bin/zmmailboxdctl restart
to restart the java process and things run better (CPU
load drops, no CMS aborts for a while).


Total time for which application threads were stopped: 0.0006790 seconds
Total time for which application threads were stopped: 0.0002510 seconds
en_US
Total time for which application threads were stopped: 0.0019170 seconds
Total time for which application threads were stopped: 0.0002690 seconds
91944.382: [GC 91944.382: [ParNew (promotion failed): 554816K->554816K(554816K), 10.0587830 secs]91954.441: [CMS[Unloading class sun.reflect.GeneratedMetho\
dAccessor1]
[Unloading class sun.reflect.GeneratedConstructorAccessor2]
[Unloading class sun.reflect.GeneratedConstructorAccessor93]
[Unloading class sun.reflect.GeneratedConstructorAccessor99]
[Unloading class sun.reflect.GeneratedConstructorAccessor92]
[Unloading class org.apache.jasper.servlet.JspServlet]
[Unloading class javax.servlet.http.HttpServlet]
[Unloading class sun.reflect.GeneratedConstructorAccessor5]
[Unloading class sun.reflect.GeneratedMethodAccessor118]
[Unloading class sun.reflect.GeneratedMethodAccessor22]
[Unloading class sun.reflect.GeneratedConstructorAccessor30]
[Unloading class sun.reflect.GeneratedMethodAccessor122]
[Unloading class sun.reflect.GeneratedConstructorAccessor4]
[Unloading class sun.reflect.GeneratedConstructorAccessor6]
[Unloading class javax.servlet.GenericServlet]
[Unloading class sun.reflect.GeneratedConstructorAccessor100]
[Unloading class sun.reflect.GeneratedConstructorAccessor10]
[Unloading class sun.reflect.GeneratedSerializationConstructorAcces sor1]
[Unloading class sun.reflect.GeneratedMethodAccessor119]
[Unloading class sun.reflect.GeneratedMethodAccessor21]
[Unloading class sun.reflect.GeneratedMethodAccessor117]
[Unloading class sun.reflect.GeneratedSerializationConstructorAcces sor2]
[Unloading class sun.reflect.GeneratedMethodAccessor115]
[Unloading class sun.reflect.GeneratedConstructorAccessor11]
[Unloading class sun.reflect.GeneratedConstructorAccessor25]
[Unloading class sun.reflect.GeneratedConstructorAccessor8]
[Unloading class sun.reflect.GeneratedConstructorAccessor3]
[Unloading class sun.reflect.GeneratedConstructorAccessor26]
[Unloading class sun.reflect.GeneratedConstructorAccessor82]
[Unloading class javax.servlet.Servlet]
[Unloading class sun.reflect.GeneratedConstructorAccessor95]
[Unloading class sun.reflect.GeneratedConstructorAccessor98]
[Unloading class sun.reflect.GeneratedMethodAccessor120]
[Unloading class javax.servlet.ServletConfig]
[Unloading class sun.reflect.GeneratedMethodAccessor2]
[Unloading class sun.reflect.GeneratedMethodAccessor121]
: 1630777K->727223K(1851392K), 6.2803740 secs] 2118124K->727223K(2406208K), [CMS Perm : 47726K->47524K(131072K)], 16.3396850 secs] [Times: user=10.26 sys=8\
.25, real=16.34 secs]
Total time for which application threads were stopped: 16.3410590 seconds
91960.740: [GC [1 CMS-initial-mark: 727223K(1851392K)] 728034K(2406208K), 0.0036430 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
Total time for which application threads were stopped: 0.0059940 seconds
91960.744: [CMS-concurrent-mark-start]
91963.556: [CMS-concurrent-mark: 2.812/2.812 secs] [Times: user=2.86 sys=0.21, real=2.81 secs]
91963.557: [CMS-concurrent-preclean-start]
91963.577: [CMS-concurrent-preclean: 0.018/0.021 secs] [Times: user=0.01 sys=0.01, real=0.02 secs]
91963.577: [CMS-concurrent-abortable-preclean-start]
en_US
en_US
Total time for which application threads were stopped: 0.0035950 seconds
CMS: abort preclean due to time 91968.620: [CMS-concurrent-abortable-preclean: 0.325/5.042 secs] [Times: user=0.47 sys=0.23, real=5.04 secs]
91968.658: [GC[YG occupancy: 25104 K (554816 K)]91968.658: [Rescan (parallel) , 0.0236440 secs]91968.682: [weak refs processing, 0.0000620 secs] [1 CMS-rem\
ark: 727223K(1851392K)] 752328K(2406208K), 0.0241580 secs] [Times: user=0.05 sys=0.00, real=0.03 secs]
Total time for which application threads were stopped: 0.0616090 seconds
91968.683: [CMS-concurrent-sweep-start]
Total time for which application threads were stopped: 0.0014030 seconds
Total time for which application threads were stopped: 0.0030990 seconds
91969.900: [CMS-concurrent-sweep: 1.217/1.217 secs] [Times: user=1.17 sys=0.05, real=1.21 secs]
91969.900: [CMS-concurrent-reset-start]
91969.940: [CMS-concurrent-reset: 0.040/0.040 secs] [Times: user=0.04 sys=0.00, real=0.04 secs]
Total time for which application threads were stopped: 0.0381810 seconds
Total time for which application threads were stopped: 0.0603880 seconds
Total time for which application threads were stopped: 0.0018140 seconds
Total time for which application threads were stopped: 0.0031500 seconds
Total time for which application threads were stopped: 0.0282040 seconds
Total time for which application threads were stopped: 0.0447860 seconds
en_US
en_US
Total time for which application threads were stopped: 0.0015210 seconds
Total time for which application threads were stopped: 0.0029940 seconds
Total time for which application threads were stopped: 0.0289840 seconds
en_US
Total time for which application threads were stopped: 0.0366490 seconds
Total time for which application threads were stopped: 0.0009820 seconds
Total time for which application threads were stopped: 0.0003620 seconds
Total time for which application threads were stopped: 0.0006070 seconds
Total time for which application threads were stopped: 0.0002480 seconds

I don't know why en_US shows up in the log. That's unrelated I think.


Info from zmlocalconfig
-----------------------
mailboxd_java_options = -server -Djava.awt.headless=true -XX:+UseConcMarkSweepGC -XX:NewRatio=2 -XX:PermSize=128m -XX:MaxPermSize=128m -XX:SoftRefLRUPolicy\
MSPerMB=1 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCApplicationStoppedTime
zimbra_zmjava_options = -Xmx256m
zimbra_mysql_connector_maxActive = 100
zimbra_index_max_uncommitted_operations = 200
mailboxd_java_heap_memory_percent = 50
mailboxd_java_heap_new_size_percent = 25
search_dbfirst_term_percentage_cutoff = 0.8

Mysql Info
----------
max_connections = 110
innodb_buffer_pool_size = 1220404377
innodb_max_dirty_pages_pct = 10
innodb_flush_method = O_DIRECT

Java Process
------------
zimbra 5631 5630 28 23:46 ? 00:00:02 /opt/zimbra/java/bin/java -XX:ErrorFile=/opt/zimbra/log -client -Xmx256m -Dzimbra.home=/opt/zimbra -Djava.l\
ibrary.path=/opt/zimbra/lib -Djava.ext.dirs=/opt/zimbra/java/jre/lib/ext:/opt/zimbra/lib/jars:/opt/zimbra/lib/ext-common:/opt/zimbra/lib/ext/backup:/opt/zi\
mbra/lib/ext/clamscanner:/opt/zimbra/lib/ext/network:/opt/zimbra/lib/ext/zimbra-license:/opt/zimbra/lib/ext/zimbrahsm:/opt/zimbra/lib/ext/zimbrasync com.zi\
mbra.cs.account.ProvUtil -l gacf
zimbra 5923 4828 0 23:46 pts/1 00:00:00 grep java
root 21729 1 0 Nov12 ? 00:00:00 /opt/zimbra/libexec/zmmailboxdmgr start -server -XX:NewRatio=2 -Djava.awt.headless=true -XX:MaxPermSize=128\
m -XX:SoftRefLRUPolicyMSPerMB=1 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCApplicationStoppedTime -XX:+UseConcMarkSweepGC -XX:PermS\
ize=128m -Xss256k -Xms2409m -Xmx2409m -Xmn602m
zimbra 21730 21729 1 Nov12 ? 00:23:54 /opt/zimbra/java/bin/java -server -XX:NewRatio=2 -Djava.awt.headless=true -XX:MaxPermSize=128m -XX:SoftRefL\
RUPolicyMSPerMB=1 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCApplicationStoppedTime -XX:+UseConcMarkSweepGC -XX:PermSize=128m -Xss2\
56k -Xms2409m -Xmx2409m -Xmn602m -Djava.io.tmpdir=/opt/zimbra/mailboxd/work -Djava.library.path=/opt/zimbra/lib -Djava.endorsed.dirs=/opt/zimbra/mailboxd/c\
ommon/endorsed -Dzimbra.config=/opt/zimbra/conf/localconfig.xml -Djetty.home=/opt/zimbra/mailboxd -DSTART=/opt/zimbra/mailboxd/etc/start.config -jar /opt/z\
imbra/mailboxd/start.jar /opt/zimbra/mailboxd/etc/jetty.properties /opt/zimbra/mailboxd/etc/jetty-setuid.xml /opt/zimbra/mailboxd/etc/jetty.xml

I posted earlier about this with
CMS: abort preclean due to time/concurrent mode failure

Let me know if you learn anything about your issue. I think I have
more to learn about java memory / garbage collection. Something
is not right.

thanks
John
Reply With Quote
  #3 (permalink)  
Old 11-14-2009, 11:18 AM
Junior Member
 
Posts: 6
Default

No solution yet. Currently I am restarting zmmailboxd every night at 5.00 AM. Even with the restart, this problem has happened one time in the last 10 days.

Cheers
Reply With Quote
  #4 (permalink)  
Old 11-15-2009, 05:27 AM
Elite Member
 
Posts: 369
Default

Can you show the charts for the day ?

zmstat-chart -s /opt/zimbra/zmstat/<date stat> -d /tmp/charts/
Reply With Quote
  #5 (permalink)  
Old 11-16-2009, 03:36 PM
Intermediate Member
 
Posts: 16
Default

Here's some zmstat charts for a day where java was not being nice,
users were complaining and so
I did a zmmailboxdctl restart at around 1:12pm. After that the rest of
the day it was okay. It's after a few days it seems to get stupid, though
doing an early morning preemptive zmmailboxdctl restart seems to have
good effect.

Summary.txt shows no values, so I don't know what's up there.
(Maybe ubuntu thing)?

===========================================
# Perf Stats #
===========================================
Mail Server CPU Sys :
Mail Server CPU User :
Mail Server CPU Idle :
Mail Server CPU Iowait :
Full GC% :
Young GC% :
Mailbox add rate :
Mailbox add latency :
Mailbox get latency :
Soap response time:
Pop response time:
Imap response time:


but the graphs are nice enough. (I'm modified for hostname = zimbra)
See attached.

Here's a bit of zmmailboxd.out at what I think is the time of the restart.

659891.299: [Full GC 659891.299: [CMS: 1851391K->1851391K(1851392K), 10.6088180 secs] 2406207K->2405946K(2406208K), [CMS Perm : 47486K->47482K(131072K)], 10.6090670 secs] [Times: user=10.32 sys=0.21, real=10.61 secs]
Total time for which application threads were stopped: 10.6108320 seconds
659901.910: [GC [1 CMS-initial-mark: 1851391K(1851392K)] 2405985K(2406208K), 1.0150460 secs] [Times: user=0.93 sys=0.01, real=1.02 secs]
Total time for which application threads were stopped: 1.0162430 seconds
659902.925: [CMS-concurrent-mark-start]
659902.987: [Full GC 659902.987: [CMS659914.927: [CMS-concurrent-mark: 11.999/12.001 secs] [Times: user=8.06 sys=0.03, real=12.00 secs]
(concurrent mode failure): 1851391K->1851391K(1851392K), 24.2305050 secs] 2406207K->2405670K(2406208K), [CMS Perm : 47482K->47480K(131072K)], 24.2308000 secs] [Times: user=18.57 sys=0.30, real=24.23 secs]
Total time for which application threads were stopped: 24.2324320 seconds
Heap
par new generation total 554816K, used 554279K [0x15660000, 0x3b060000, 0x3b060000)
eden space 493184K, 99% used [0x15660000, 0x337ffff0, 0x33800000)
from space 61632K, 99% used [0x33800000, 0x373a9c50, 0x37430000)
to space 61632K, 0% used [0x37430000, 0x37430000, 0x3b060000)
concurrent mark-sweep generation total 1851392K, used 1851391K [0x3b060000, 0xac060000, 0xac060000)
concurrent-mark-sweep perm gen total 131072K, used 47480K [0xac060000, 0xb4060000, 0xb4060000)
659927.238: [Full GC 659927.238: [CMS: 1851391K->1851391K(1851392K), 11.5150760 secs] 2406202K->2405705K(2406208K), [CMS Perm : 47483K->47483K(13107\
2K)], 11.5157270 secs] [Times: user=10.32 sys=0.30, real=11.52 secs]
CompilerOracle: exclude com/zimbra/cs/session/SessionMap.putAndPrune
CompilerOracle: exclude com/zimbra/cs/mailbox/MailItem.delete
CompilerOracle: exclude org/apache/xerces/impl/XMLDocumentFragmentScannerImpl$FragmentContentDisp atcher.dispatch
Total time for which application threads were stopped: 0.0004250 seconds
0 INFO [main] log - Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
Total time for which application threads were stopped: 0.0003980 seconds
Total time for which application threads were stopped: 0.0004740 seconds
Zimbra server reserving server socket port=110 bindaddr=null ssl=false
Zimbra server reserving server socket port=995 bindaddr=null ssl=true
Zimbra server reserving server socket port=143 bindaddr=null ssl=false
Zimbra server reserving server socket port=993 bindaddr=null ssl=true
Zimbra server reserving server socket port=7025 bindaddr=null ssl=false
Total time for which application threads were stopped: 0.0009940 seconds
5242 INFO [main] log - Setting umask=027
5244 INFO [main] log - Current rlimit_nofiles (soft=524288, hard=524288)
5244 INFO [main] log - Set rlimit_nofiles (soft=65536, hard=65536)
5244 INFO [main] log - Setting GID=1005

before that I was getting things like


CMS: abort preclean due to time 659306.572: [CMS-concurrent-abortable-preclean: 2.999/5.137 secs] [Times: user=2.26 sys=0.08, real=5.14 secs]
659306.573: [GC[YG occupancy: 541888 K (554816 K)]659306.573: [Rescan (parallel) , 2.0809960 secs]659308.655: [weak refs processing, 0.0000260 secs] [1 CMS-remark: 1851379K(1851392K)] 2393268K(2406208K), 2.0814040 secs] [Times: user=1.20 sys=0.72, real=2.08 secs]
Total time for which application threads were stopped: 2.0826130 seconds

and

659136.042: [GC 659136.042: [ParNew: 554815K->554815K(554816K), 0.0000730 secs]659136.042: [CMS659141.193: [CMS-concurrent-mark: 7.484/7.487 secs] [Times: user=7.78 sys=0.19, real=7.48 secs]
(concurrent mode failure): 1851378K->1851391K(1851392K), 16.7058680 secs] 2406193K->2375479K(2406208K), [CMS Perm : 47477K->47474K(131072K)], 16.7062600 secs] [Times: user=16.38 sys=0.24, real=16.71 secs]
Total time for which application threads were stopped: 16.7076830 seconds


thanks,
John
Attached Images
File Type: pdf index.pdf (1.17 MB, 4 views)
File Type: pdf soap.pdf (998.2 KB, 1 views)
Reply With Quote

Why Join?

Registering let's you ask questions, makes it easier to search, displays any files attached to posts, and notifies you about replies.

Zimbrablog.com




 

Search Engine Optimization by vBSEO 3.1.0