[SOLVED] Zimbra nginx process defuncting 3000 times per second
Hi everybody,
i'm new to the forum and to zimbra, please be kind to my errors.
I've installed the debian 32 bits opensource edition of zimbra 5.0.6 on a 32 bits updated Debian etch, on a 64 bits server.
With only 5 mail users and only some 20 mails a day, my load average is strange : 2.6.
So i went to see top : it seems nginx is birthed and defuncted 3000 times a second.
I've got a 9.4GB /opt/zimbra/log/nginx.log file, with something like 10000 lines added each second.
Looking on google, the zimbra forum and bugzilla, with the errors from nginx.log, i couldn't find a similar problem (but well, as i said, i'm a newbie with zimbra).
So i wonder if you could point me to my (i hope) obvious failure :-)
Here is the version, load, top, nginx.log and zimbra config :
$ zmcontrol -v
Release 5.0.6_GA_2313.DEBIAN4.0 DEBIAN4.0 FOSS edition
$ cat /proc/version
Linux version 2.6.18-6-686 (Debian 2.6.18.dfsg.1-18etch5) (dannf@debian.org) (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP Sat May 24 10:24:42 UTC 2008
$ uptime
17:34:28 up 2 days, 5:29, 1 user, load average: 2.60, 2.77, 2.72
$ top -u zimbra
top - 17:36:35 up 2 days, 5:32, 1 user, load average: 3.06, 2.86, 2.75
Tasks: 110 total, 5 running, 103 sleeping, 0 stopped, 2 zombie
Cpu(s): 5.3%us, 43.9%sy, 0.0%ni, 49.0%id, 1.8%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 1027012k total, 1007568k used, 19444k free, 2464k buffers
Swap: 2650684k total, 64k used, 2650620k free, 321680k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15074 zimbra 25 0 4424 836 408 R 27 0.1 21:29.70 nginx
26566 zimbra 15 0 2232 1164 856 R 0 0.1 0:00.02 top
21324 zimbra 19 0 194m 31m 5148 S 0 3.1 0:43.63 slapd
23894 zimbra 25 0 3072 1540 960 S 0 0.1 0:00.00 mysqld_safe
23940 zimbra 15 0 338m 51m 6740 S 0 5.2 0:25.40 mysqld
24191 zimbra 25 0 3072 1536 960 S 0 0.1 0:00.00 mysqld_safe
24230 zimbra 18 0 99548 16m 6028 S 0 1.6 0:23.55 mysqld
30202 zimbra 18 0 56756 47m 2176 S 0 4.7 0:00.91 amavisd
32659 zimbra 16 0 59328 50m 3384 S 0 5.0 0:01.26 amavisd
32663 zimbra 16 0 58488 49m 3380 S 0 4.9 0:01.37 amavisd
32666 zimbra 17 0 58432 49m 3344 S 0 4.9 0:00.74 amavisd
32673 zimbra 16 0 59356 50m 3384 S 0 5.0 0:01.19 amavisd
32678 zimbra 15 0 58856 50m 3352 S 0 5.0 0:01.78 amavisd
32681 zimbra 16 0 60056 51m 3340 S 0 5.1 0:00.69 amavisd
32685 zimbra 16 0 58444 49m 3376 S 0 4.9 0:01.69 amavisd
32688 zimbra 16 0 60652 51m 3356 S 0 5.2 0:01.55 amavisd
32691 zimbra 15 0 61248 52m 3340 S 0 5.2 0:01.01 amavisd
32694 zimbra 16 0 58764 49m 3396 S 0 5.0 0:01.38 amavisd
954 zimbra 17 0 89412 77m 964 S 0 7.7 0:04.78 clamd
1826 zimbra 18 0 11992 6708 2316 R 0 0.7 0:00.03 httpd
1830 zimbra 25 0 11992 6076 1676 S 0 0.6 0:00.00 httpd
1832 zimbra 25 0 11992 6076 1676 S 0 0.6 0:00.00 httpd
1833 zimbra 25 0 11992 6076 1676 S 0 0.6 0:00.00 httpd
1834 zimbra 25 0 11992 6076 1676 S 0 0.6 0:00.00 httpd
1835 zimbra 25 0 11992 6076 1676 S 0 0.6 0:00.00 httpd
10871 zimbra 15 0 6788 2888 2212 S 0 0.3 0:00.40 saslauthd
12509 zimbra 15 0 6788 2872 2196 S 0 0.3 0:00.29 saslauthd
12510 zimbra 15 0 6788 2872 2196 S 0 0.3 0:00.40 saslauthd
12511 zimbra 19 0 6788 2872 2196 S 0 0.3 0:00.42 saslauthd
12514 zimbra 15 0 6788 2872 2196 S 0 0.3 0:00.31 saslauthd
13541 zimbra 18 0 4900 3464 1520 S 0 0.3 0:42.30 zmstat-proc
13545 zimbra 18 0 4772 3292 1520 S 0 0.3 0:00.05 zmstat-cpu
13549 zimbra 18 0 4772 3288 1516 S 0 0.3 0:00.05 zmstat-vm
13551 zimbra 18 0 4776 3300 1524 S 0 0.3 0:00.06 zmstat-io
13557 zimbra 18 0 4768 3288 1516 S 0 0.3 0:00.07 zmstat-io
13605 zimbra 18 0 4636 3232 1516 S 0 0.3 0:00.05 zmstat-fd
13617 zimbra 18 0 4772 3352 1516 S 0 0.3 0:07.86 zmstat-mysql
13619 zimbra 18 0 4636 3240 1516 S 0 0.3 0:00.66 zmstat-mtaqueue
21913 zimbra 18 0 1588 604 508 S 0 0.1 0:00.01 iostat
21927 zimbra 18 0 1712 568 468 S 0 0.1 0:00.00 vmstat
21930 zimbra 18 0 1584 612 512 S 0 0.1 0:00.00 iostat
26255 zimbra 24 0 5532 3940 1552 S 0 0.4 0:00.06 logswatch
26298 zimbra 24 0 5532 3948 1552 S 0 0.4 0:00.06 swatch
26443 zimbra 15 0 8232 6536 1600 S 0 0.6 0:00.14 perl
26464 zimbra 15 0 8212 6508 1600 S 0 0.6 0:00.14 perl
26954 zimbra 19 0 6800 4856 1144 S 0 0.5 0:37.86 zmmtaconfig
27734 zimbra 16 0 8344 4580 2284 S 0 0.4 0:01.04 zmlogger
21021 zimbra 25 0 709m 219m 38m S 0 21.9 0:17.42 java
14664 zimbra 15 0 12480 804 392 S 0 0.1 0:00.00 memcached
23918 zimbra 17 0 3748 1140 908 S 0 0.1 0:00.00 su
24090 zimbra 15 0 4496 1976 1340 S 0 0.2 0:00.01 bash
6275 zimbra 25 0 0 0 0 Z 0 0.0 0:00.00 nginx <defunct>
6276 zimbra 25 0 0 0 0 Z 0 0.0 0:00.00 nginx <defunct>
$tail -f /opt/zimbra/log/nginx.log > /tmp/ng.log
2008/06/13 17:40:34 [alert] 11422#0: socket() failed (97: Address family not supported by protocol)
2008/06/13 17:40:34 [error] 11422#0: cannot connect to memcached server 127.0.1.1:11211 (rc:-1)
2008/06/13 17:40:34 [notice] 15074#0: start worker process 11422
2008/06/13 17:40:34 [alert] 11423#0: socket() failed (97: Address family not supported by protocol)
2008/06/13 17:40:34 [error] 11423#0: cannot connect to memcached server 127.0.1.1:11211 (rc:-1)
2008/06/13 17:40:34 [notice] 15074#0: start worker process 11423
2008/06/13 17:40:34 [alert] 11424#0: socket() failed (97: Address family not supported by protocol)
2008/06/13 17:40:34 [error] 11424#0: cannot connect to memcached server 127.0.1.1:11211 (rc:-1)
2008/06/13 17:40:34 [notice] 15074#0: start worker process 11424
2008/06/13 17:40:34 [alert] 11425#0: socket() failed (97: Address family not supported by protocol)
2008/06/13 17:40:34 [error] 11425#0: cannot connect to memcached server 127.0.1.1:11211 (rc:-1)
2008/06/13 17:40:34 [notice] 15074#0: start worker process 11425
2008/06/13 17:40:34 [notice] 15074#0: signal 29 (SIGIO) received
2008/06/13 17:40:34 [notice] 15074#0: signal 17 (SIGCHLD) received
2008/06/13 17:40:34 [alert] 15074#0: worker process 11422 exited on signal 11
2008/06/13 17:40:34 [alert] 15074#0: worker process 11423 exited on signal 11
2008/06/13 17:40:34 [alert] 15074#0: worker process 11424 exited on signal 11
2008/06/13 17:40:34 [alert] 15074#0: worker process 11425 exited on signal 11
2008/06/13 17:40:34 [alert] 11426#0: socket() failed (97: Address family not supported by protocol)
2008/06/13 17:40:34 [error] 11426#0: cannot connect to memcached server 127.0.1.1:11211 (rc:-1)
2008/06/13 17:40:34 [notice] 15074#0: start worker process 11426
2008/06/13 17:40:34 [alert] 11427#0: socket() failed (97: Address family not supported by protocol)
2008/06/13 17:40:34 [error] 11427#0: cannot connect to memcached server 127.0.1.1:11211 (rc:-1)
2008/06/13 17:40:34 [notice] 15074#0: start worker process 11427
2008/06/13 17:40:34 [alert] 11428#0: socket() failed (97: Address family not supported by protocol)
2008/06/13 17:40:34 [error] 11428#0: cannot connect to memcached server 127.0.1.1:11211 (rc:-1)
2008/06/13 17:40:34 [notice] 15074#0: start worker process 11428
2008/06/13 17:40:34 [alert] 11429#0: socket() failed (97: Address family not supported by protocol)
2008/06/13 17:40:34 [error] 11429#0: cannot connect to memcached server 127.0.1.1:11211 (rc:-1)
2008/06/13 17:40:34 [notice] 15074#0: start worker process 11429
2008/06/13 17:40:34 [notice] 15074#0: signal 29 (SIGIO) received
2008/06/13 17:40:34 [notice] 15074#0: signal 17 (SIGCHLD) received
2008/06/13 17:40:34 [alert] 15074#0: worker process 11426 exited on signal 11
2008/06/13 17:40:34 [alert] 15074#0: worker process 11427 exited on signal 11
2008/06/13 17:40:34 [alert] 15074#0: worker process 11428 exited on signal 11
2008/06/13 17:40:34 [alert] 15074#0: worker process 11429 exited on signal 11
David Pradier
This is a bug in official nginx
See the bug fix in official nginx:
The potential bug incurred by "one_addr".
See the bug fix in nginx-zimbra:
https://bugzilla.zimbra.com/show_bug.cgi?id=54439
If u are the users before this bug fix, make sure that the memcache hostname in nginx.conf.memcache "servers" directive map to only ONE IP. For example, fix in your /etc/hosts.