| Welcome to the Zimbra :: Forums! | |
Welcome, if you would like to post a comment please register.
We also encourage you to explore all things Zimbra with our team and members of the community.
|  | | 
02-23-2009, 07:46 AM
| | Trained Alumni | |
Posts: 343
| | Java eating up the CPU We have a mailbox (1 of the 5 we have) that we just upgraded to 5.0.13 in the last 2 weeks. Twice now since upgrading (never been a problem before) it has gone to very high CPU utilization, and a 'top' shows that it is Java. Code: top - 09:41:56 up 9 days, 9:48, 7 users, load average: 32.93, 31.28, 29.58
Tasks: 189 total, 4 running, 185 sleeping, 0 stopped, 0 zombie
Cpu(s): 99.2% us, 0.6% sy, 0.0% ni, 0.2% id, 0.0% wa, 0.0% hi, 0.1% si
Mem: 8167756k total, 8042964k used, 124792k free, 157148k buffers
Swap: 16386292k total, 287228k used, 16099064k free, 1316356k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
13283 zimbra 15 0 4289m 3.9g 54m S 376.6 50.0 12895:24 java
7302 zimbra 16 0 2222m 2.0g 4352 S 17.9 25.7 579:14.72 mysqld
11411 zimbra 25 0 1248m 13m 7708 R 2.0 0.2 0:00.06 java
6025 root 15 0 0 0 0 S 0.3 0.0 0:44.51 kjournald
9240 root 15 0 15852 1640 864 S 0.3 0.0 5:24.57 hald
1 root 16 0 4772 516 428 S 0.0 0.0 0:00.92 init
2 root RT 0 0 0 0 S 0.0 0.0 0:02.31 migration/0
3 root 34 19 0 0 0 S 0.0 0.0 0:35.03 ksoftirqd/0
4 root RT 0 0 0 0 S 0.0 0.0 0:01.05 migration/1
5 root 34 19 0 0 0 S 0.0 0.0 0:55.66 ksoftirqd/1
6 root RT 0 0 0 0 S 0.0 0.0 0:01.86 migration/2
7 root 34 19 0 0 0 S 0.0 0.0 1:16.34 ksoftirqd/2
8 root RT 0 0 0 0 S 0.0 0.0 0:01.66 migration/3
9 root 34 19 0 0 0 S 0.0 0.0 0:29.73 ksoftirqd/3 We've been through all the logs and such and haven't found any good reason why this is happening. Is there any way to tell what Java is doing that is eating up resources?
This is a Sun X4200 (at least 2 dual core CPUs and 8GB of memory) so the hardware is not the problem...this never happened on this server until we installed 5.0.13. All our other mailbox servers with similar numbers of users have not had this same problem.
Restarting mailboxd corrects the issue...at least seems to as the load eventually drops....but we'd really like to find out why this is happening.
Thanks,
Matt | 
02-23-2009, 08:05 AM
| | Trained Alumni | |
Posts: 343
| | We just found the process....it's not a zimbra process, but a root java process called 'start.jar'.
We're using a tool called VisualGC. It looks like it is doing GC every few seconds. I don't really understand yet how to read the information from VisualGC, but I'll post a screenshot....
The "Eden" bar grows and shrinks very quickly and I'm thinking every time it gets to the top a GC happens and the column shrinks back down.
The "Old" bar grows to the very top over a period of time. As it grows higher the load on the server climbs further. Eventually the "Old" bar will drop down to about 25% of the graph and the load on the server will drop by 40 or 50 percent (but still very high). You can see that evidenced in the 'xload' window.
Matt | 
02-23-2009, 08:30 AM
| | | mysqld usage is pretty high too, that can often indicate a logger issue. do you have logger running on that server and if so have you tried disabling it? | 
02-23-2009, 08:47 AM
| | Trained Alumni | |
Posts: 343
| | No logger on this server....logger is running on a different mailbox host. | 
04-08-2009, 04:57 AM
| | Intermediate Member | |
Posts: 22
| | Hello,
I'm running on a similiar issue. I have a new Zimbra Installation for evaluation and the performance is really bad.
It's running into an exclusive server, so all processes belong to zimbra and that's what I'm seeing: PHP Code: Tasks: 117 total, 3 running, 114 sleeping, 0 stopped, 0 zombie
Cpu(s): 77.1%us, 21.8%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.2%hi, 1.0%si, 0.0%st
Mem: 2075548k total, 1602244k used, 473304k free, 11452k buffers
Swap: 401400k total, 0k used, 401400k free, 240988k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
11520 zimbra 20 0 484m 26m 12m R 96 1.3 0:04.41 java
11508 zimbra 20 0 484m 26m 12m R 88 1.3 0:04.23 java
6726 zimbra 20 0 1118m 880m 40m S 4 43.4 5:51.01 java
5171 zimbra 20 0 103m 14m 3708 S 1 0.7 0:06.10 mysqld
5620 zimbra 20 0 487m 21m 10m S 1 1.0 0:16.25 java
The server takes about 10 minutes to boot and restarting Zimbra (zmcontrol startup) takes about 9 minutes.
I have another Zimbra Network Edition installed installed in an associated university and I don't see the same issue there, but they are running Release 5.0.11_GA_2695.RHEL5_64_20081117021643 RHEL5_64 NETWORK edition while I'm running Release 5.0.15_GA_2851.UBUNTU8 UBUNTU8 NETWORK edition.
Any thoughts?
[]'s
Eri | 
04-08-2009, 05:15 AM
| | Zimbra Consultant & Moderator | |
Posts: 20,313
| | Quote:
Originally Posted by Eri The server takes about 10 minutes to boot and restarting Zimbra (zmcontrol startup) takes about 9 minutes. | Let me start with an obvious question, how long does the server take to boot without Zimbra? Disable it from starting on boot then reboot the server and see what happens, when it's booted start Zimbra manually with: Code: su - zimbra
zmcontrol start Does this server authenticate against an external server at boot? If that's the case then disable that feature and reboot and se how long it takes. Have you disabled the firewall and AppArmor on this server? What's the specifications of the server?
Did you set-up a Split DNS for this server and what's the output of the following commands (run on the Zimbra server): Code: cat /etc/hosts
cat /etc/resolv.conf
dig yourdomain.com mx
dig yourdomain.com and
host `hostname` <-- use taht exact command and single quotes not backticks
__________________
Regards
Bill
| 
04-08-2009, 06:22 AM
| | Intermediate Member | |
Posts: 22
| | Hello, Bill.
Thanks for your reply.
1-) Boot time*:
Without Zimbra (I removed the rc.* links): 01:05
With Zimbra: 12:20
* Boot time = from POST to Login prompt
2-) Starting Zimbra manually: Code: zimbra@pilot:~$ time zmcontrol start
Host pilot.smu.ca
Starting ldap...Done.
Starting logger...Done.
Starting mailbox...Done.
Starting imapproxy...Done.
Starting antispam...Done.
Starting antivirus...Done.
Starting snmp...Done.
Starting spell...Done.
Starting mta...Done.
Starting stats...Done.
real 8m51.180s
user 3m25.950s
sys 2m47.300s 3-) External authentication
No, that's not the case. Local authentication only
4-) AppArmor / Firewall Code: root@pilot:~# iptables -L -n
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
root@pilot:~# apparmor_status
apparmor module is loaded.
0 profiles are loaded.
0 profiles are in enforce mode.
0 profiles are in complain mode.
0 processes have profiles defined.
0 processes are in enforce mode :
0 processes are in complain mode.
0 processes are unconfined but have a profile defined. 5-) Server specification:
It's a virtual slice inside an ESXi server. This ESXi load history (until last monday) was about 15%. When I started the Zimbra Pilot it kicked up to 95%. Code: root@pilot:~# cat /proc/cpuinfo |grep Inte
vendor_id : GenuineIntel
model name : Intel(R) Xeon(TM) CPU 2.40GHz
vendor_id : GenuineIntel
model name : Intel(R) Xeon(TM) CPU 2.40GHz
root@pilot:~# free -m
total used free shared buffers cached
Mem: 2026 1420 606 0 10 248
-/+ buffers/cache: 1161 865
Swap: 391 0 391 6-) DNS Code: # cat /etc/hosts
127.0.0.1 localhost
140.184.200.240 pilot.smu.ca pilot
# cat /etc/resolv.conf
search smu.ca
nameserver 140.184.1.21
nameserver 140.184.1.22
# dig pilot.smu.ca mx
; <<>> DiG 9.4.2 <<>> pilot.smu.ca mx
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 27863
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; QUESTION SECTION:
;pilot.smu.ca. IN MX
;; ANSWER SECTION:
pilot.smu.ca. 10832 IN MX 10 pilot.smu.ca.
;; ADDITIONAL SECTION:
pilot.smu.ca. 10832 IN A 140.184.200.240
;; Query time: 2 msec
;; SERVER: 140.184.1.21#53(140.184.1.21)
;; WHEN: Wed Apr 8 07:21:05 2009
;; MSG SIZE rcvd: 62
# dig pilot.smu.ca
; <<>> DiG 9.4.2 <<>> pilot.smu.ca
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 23395
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;pilot.smu.ca. IN A
;; ANSWER SECTION:
pilot.smu.ca. 9350 IN A 140.184.200.240
;; Query time: 1 msec
;; SERVER: 140.184.1.21#53(140.184.1.21)
;; WHEN: Wed Apr 8 07:45:48 2009
;; MSG SIZE rcvd: 46
# host `hostname`
pilot.smu.ca has address 140.184.200.240
pilot.smu.ca mail is handled by 10 pilot.smu.ca. I don't see anything outstanding on my configuration. Any ideas?
[]'s
Eri
Last edited by Eri; 04-08-2009 at 06:26 AM..
| 
04-08-2009, 06:50 AM
| | Zimbra Consultant & Moderator | |
Posts: 20,313
| | Your hosts file should contain the following: Code: 127.0.0.1 localhost.localdomain localhost
140.184.200.240 pilot.smu.ca pilot You're actually using a sub-domain for this server and your FQDN should be something like mail.pilot.smu.ca - as this is a test server that doesn't really matter for the moment.
According to your output the AppArmor module is still loaded, that can continue to cause problems could you please disable it completely. So this VM has 2GB of RAM, how much does the server have in total? Hve you made any modifications to the VM configuration? Does this VM have multiple vCPUs assigned to it? If it has multiple vCPUs then could you change it to a single vCPU and try a reboot and see if that changes anything. Was this a newly created VM or a migrated one?
__________________
Regards
Bill
| 
04-08-2009, 09:17 AM
| | Intermediate Member | |
Posts: 22
| | Hi, Bill. Thanks again for the reply.
I disabled/removed apparmor (including the packages) and changed the /etc/hosts without any improvements.
Regarding the VM, I started a fresh install exclusively for this pilot. Initially it was a default VM with 512MB RAM and a single CPU. Due to the performance issues I increased to 1GB.
Since the problem could (hypothetically speaking) being caused by other VMs on the same ESXi box I modified the slice to have 500Mhz reserved.
No luck there, so I moved the VM to another (more powerful) ESXi, where it's currently running. It now has two vCPUs and 2GB (from a total of 4.5GB into the physical server) .
A good reason to believe this is NOT a VM issue is the fact that the CPU load INSIDE the virtualized host is high.
However, the load is not high all the time. It peaks and then lowers again in a matter of minutes: Code: $ uptime
10:27:58 up 1:25, 1 user, load average: 0.83, 1.59, 1.75
$ uptime
10:30:36 up 1:28, 1 user, load average: 3.37, 2.45, 2.05
uptime
10:32:30 up 1:30, 1 user, load average: 1.19, 2.02, 1.94 Whenever a peak happens It seems that it's always something like this: Code: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5518 zimbra 20 0 485m 30m 12m S 97 1.5 0:04.22 java
30596 zimbra 20 0 1141m 691m 39m S 25 34.1 3:17.04 java Sometimes mysql also shows up being the second one with more CPU usage, but in most cases it's only java.
I'm sitting here and watching TOP for a few minutes and a noticed a java <defunc> process showing up, disappearing and then a new java process (new PID) showing and spiking the CPU all over again.
Is there any debugging setting I can enable?
[]'s
Eri | | Thread Tools | Search this Thread | | | | | Display Modes | Linear Mode | | Why Join? Registering let's you ask questions, makes it easier to search, displays any files attached to posts, and notifies you about replies.  |