Results 1 to 8 of 8

Thread: mysqld hogging all the cpu

  1. #1
    jeffw is offline Loyal Member
    Join Date
    Oct 2007
    Posts
    88
    Rep Power
    7

    Default mysqld hogging all the cpu

    As a quick bit of background, we're evaluating Zimbra at work (Tufts University), but I'm also in the midst of moving to Zimbra for my personal mail. At home, I'm running:

    Release 5.0.1_GA_1902.UBUNTU6 UBUNTU6 FOSS edition

    Over the past few days, I've been working on moving my mail archives (around 570,000 messages going more than 10 years, almost 3GB) to my zimbra server.

    Prior to the import, I was testing with a few months of mail and my daily incoming mail (about 1-2k messages total) and performance was great. Now that the import is done, I'm seeing terrible performance.

    My server is on the small side, but I'm the only user on it and it's a dedicated zimbra server. It's a 2.4Ghz P4 with 1GB ram and a 6 disk hardware raid-5 (Yes, I know RAID5 sucks for mail and wouldn't ever do it in 'production', but I'm seeing reasonable wait/service times). A quick grep of the logs shows me getting 200-600 incoming messages per day over the past week.

    When idle, mysql is always consuming at least 25% of the CPU, and I've seen it spike to 99% for long periods of time. Here's a snip of top:

    Code:
    top - 17:47:01 up 14 days,  7:16,  5 users,  load average: 2.18, 2.22, 2.22
    Tasks: 112 total,   3 running, 109 sleeping,   0 stopped,   0 zombie
    Cpu(s):  4.3% us,  0.7% sy,  0.0% ni,  0.0% id, 94.7% wa,  0.0% hi,  0.3% si
    Mem:   1035672k total,  1024132k used,    11540k free,      472k buffers
    Swap:   497972k total,   314308k used,   183664k free,   222680k cached
    
      PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                            
    11882 zimbra    16   0  343m 247m 2992 S  4.3 24.5  81:28.82 mysqld
    
    top - 18:05:26 up 14 days,  7:34,  5 users,  load average: 1.74, 2.06, 2.18
    Tasks: 111 total,   2 running, 109 sleeping,   0 stopped,   0 zombie
    Cpu(s): 68.6% us,  4.3% sy,  0.0% ni,  0.0% id, 27.1% wa,  0.0% hi,  0.0% si
    Mem:   1035672k total,  1025068k used,    10604k free,      152k buffers
    Swap:   497972k total,   305196k used,   192776k free,   215136k cached
    
      PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND           
    11882 zimbra    16   0  343m 248m 3008 S 72.4 24.6  87:11.77 mysqld
    My mail client of choice is mutt, which doesn't seem to deal well with the slowness. It's nearly unusable, taking seconds to just move up and down the message selector. A quick look at strace shows that sometimes it's seeing 10-15seconds delay on the recv() syscall. The Zimbra webmail is slightly usable, and Thunderbird is about the same. Apple Mail is really unhappy and just recently started showing me doubles of most of my recent mail in my Inbox.

    Looking thru the myslow.log, I see a lot of SELECT COUNT(*) transactions showing up and sometimes taking more than 5-10+ seconds:

    Code:
    # Time: 080115 17:51:51
    # User@Host: zimbra[zimbra] @ localhost [127.0.0.1]
    # Query_time: 14  Lock_time: 0  Rows_sent: 1  Rows_examined: 556853
    SELECT COUNT(*) FROM mboxgroup3.mail_item WHERE mailbox_id = 3 AND folder_id = 28488 AND type IN (5,16,6) AND (imap_id IS NULL OR imap_id = 0 OR imap_id > 0);
    # Time: 080115 17:52:05
    # User@Host: zimbra[zimbra] @ localhost [127.0.0.1]
    # Query_time: 12  Lock_time: 0  Rows_sent: 1  Rows_examined: 556853
    SELECT COUNT(*) FROM mboxgroup3.mail_item WHERE mailbox_id = 3 AND folder_id = 28488 AND type IN (5,16,6) AND (imap_id IS NULL OR imap_id = 0 OR imap_id > 0);
    # Time: 080115 17:52:20
    # User@Host: zimbra[zimbra] @ localhost [127.0.0.1]
    # Query_time: 10  Lock_time: 0  Rows_sent: 1  Rows_examined: 556853
    SELECT COUNT(*) FROM mboxgroup3.mail_item WHERE mailbox_id = 3 AND folder_id = 28488 AND type IN (5,16,6) AND (imap_id IS NULL OR imap_id = 0 OR imap_id > 0);
    # Time: 080115 17:52:35
    # User@Host: zimbra[zimbra] @ localhost [127.0.0.1]
    # Query_time: 10  Lock_time: 0  Rows_sent: 1  Rows_examined: 556853
    SELECT COUNT(*) FROM mboxgroup3.mail_item WHERE mailbox_id = 3 AND folder_id = 28488 AND type IN (5,16,6) AND (imap_id IS NULL OR imap_id = 0 OR imap_id > 0);
    # Time: 080115 17:52:49
    # User@Host: zimbra[zimbra] @ localhost [127.0.0.1]
    # Query_time: 9  Lock_time: 0  Rows_sent: 1  Rows_examined: 556853
    SELECT COUNT(*) FROM mboxgroup3.mail_item WHERE mailbox_id = 3 AND folder_id = 28488 AND type IN (5,16,6) AND (imap_id IS NULL OR imap_id = 0 OR imap_id > 0);
    # Time: 080115 17:52:52
    # User@Host: zimbra[zimbra] @ localhost [127.0.0.1]
    # Query_time: 2  Lock_time: 0  Rows_sent: 1  Rows_examined: 29118
    SELECT COUNT(*) FROM mboxgroup3.mail_item WHERE mailbox_id = 3 AND folder_id = 663 AND type IN (5,16,6) AND (imap_id IS NULL OR imap_id = 0 OR imap_id > 581022);
    # Time: 080115 17:53:08
    # User@Host: zimbra[zimbra] @ localhost [127.0.0.1]
    # Query_time: 14  Lock_time: 0  Rows_sent: 1  Rows_examined: 556853
    SELECT COUNT(*) FROM mboxgroup3.mail_item WHERE mailbox_id = 3 AND folder_id = 28488 AND type IN (5,16,6) AND (imap_id IS NULL OR imap_id = 0 OR imap_id > 0);
    # Time: 080115 17:53:21
    # User@Host: zimbra[zimbra] @ localhost [127.0.0.1]
    # Query_time: 10  Lock_time: 0  Rows_sent: 1  Rows_examined: 556853
    SELECT COUNT(*) FROM mboxgroup3.mail_item WHERE mailbox_id = 3 AND folder_id = 28488 AND type IN (5,16,6) AND (imap_id IS NULL OR imap_id = 0 OR imap_id > 0);
    # Time: 080115 17:53:36
    # User@Host: zimbra[zimbra] @ localhost [127.0.0.1]
    # Query_time: 9  Lock_time: 0  Rows_sent: 1  Rows_examined: 556853
    SELECT COUNT(*) FROM mboxgroup3.mail_item WHERE mailbox_id = 3 AND folder_id = 28488 AND type IN (5,16,6) AND (imap_id IS NULL OR imap_id = 0 OR imap_id > 0);
    Lastly, looking at iostat shows a large amount of read activity (almost 10MB/sec) that I can't really pin down. Is there any way to better trace what Zimbra is doing?

    Code:
    Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
    sda          3.99  49.10 154.69 17.56 18703.39  517.56     9.13     0.25   111.58     1.33    7.73   3.88  66.87
    sdb          0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    sdc          0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    
    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
              54.38    0.00    4.58   41.04    0.00    0.00
    
    Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
    sda          2.39   5.18 108.76  6.97 12698.01  135.46     6.20     0.07   110.88     0.87    7.52   4.73  54.78
    sdb          0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    sdc          0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    
    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
              35.73    0.00    3.99   60.28    0.00    0.00
    
    Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
    sda          4.79   0.00 170.66  5.59 29954.49   64.07    14.63     0.03   170.32     1.00    5.69   3.77  66.47
    sdb          0.00   0.00  0.20  0.20    3.19    3.19     0.00     0.00    16.00     0.00   10.00  10.00   0.40
    sdc          0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    
    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
              25.95    0.00    3.99   53.89    0.00   16.17
    
    Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
    sda          6.19   2.99 157.49 10.58 19811.58  119.76     9.67     0.06   118.59     2.13   12.67   3.52  59.08
    sdb          0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    sdc          0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    
    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
              58.08    0.00    5.99   35.93    0.00    0.00
    
    Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
    sda          4.59   0.00 116.97  3.19 11356.49   36.33     5.55     0.02    94.81     0.70    5.85   3.85  46.31
    sdb          0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    sdc          0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    
    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
              37.72    0.00    5.39   56.89    0.00    0.00
    
    Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
    sda          2.00   0.20 157.09  4.39 27934.53   57.49    13.64     0.03   173.35     0.84    5.19   3.57  57.68
    sdb          0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    sdc          0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    
    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
              25.35    0.00    5.19   66.07    0.00    3.39
    
    Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
    sda        159.08   0.00 226.55  8.78 21189.62  103.59    10.35     0.05    90.48     1.24    5.26   3.26  76.65
    sdb          0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    sdc          0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    I realize the box I'm on is a bit small, but I would have hoped for a single user performance would have been ok. Is it possible I've tripped on a bug?

    Any suggestions for how to proceed would be most welcome.

    -j

  2. #2
    jeffw is offline Loyal Member
    Join Date
    Oct 2007
    Posts
    88
    Rep Power
    7

    Default

    any ideas? mysql is still eating up all the CPU:

    Code:
    Tasks: 107 total,   3 running, 104 sleeping,   0 stopped,   0 zombie
    Cpu(s): 90.1% us,  6.6% sy,  0.0% ni,  0.0% id,  3.0% wa,  0.0% hi,  0.3% si
    Mem:   1035672k total,  1000272k used,    35400k free,      352k buffers
    Swap:   497972k total,   313872k used,   184100k free,   199056k cached
    
      PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND           
    11882 zimbra    16   0  343m 248m 3060 S 54.6 24.6  94:35.97 mysqld
    11750 zimbra    18   0  8220 3100 1012 R  0.3  0.3   0:11.05 zmmtaconfig
        1 root      16   0  1568  368  348 S  0.0  0.0   0:01.40 init
        2 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/0
        3 root      34  19     0    0    0 S  0.0  0.0   3:16.14 ksoftirqd/0
        4 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 watchdog/0
        5 root      10  -5     0    0    0 S  0.0  0.0   0:00.52 events/0
        6 root      10  -5     0    0    0 S  0.0  0.0   0:00.01 khelper
        7 root      10  -5     0    0    0 S  0.0  0.0   0:00.01 kthread
        9 root      10  -5     0    0    0 S  0.0  0.0   0:00.43 kblockd/0
       10 root      20  -5     0    0    0 S  0.0  0.0   0:00.00 kacpid
      131 root      18  -5     0    0    0 S  0.0  0.0   0:00.00 aio/0
      130 root      15   0     0    0    0 S  0.0  0.0  18:20.78 kswapd0

  3. #3
    Rich Graves is offline Outstanding Member
    Join Date
    Jan 2007
    Location
    Minnesota
    Posts
    717
    Rep Power
    9

    Default

    You're swapping: 300MB swap used, 190MB cache.

    1GB RAM is low for the default ZCS settings. Another gig is cheap and would help immensely. Failing that, look at the tunables in the "Performance for Large Deployments" wiki page and turn them down. Especially see /opt/zimbra/conf/my.cnf

    Does most of your 3GB of mail live in a single "saved mail" folder that's hit frequently? In general, I'm telling people that it is a *good* practice to search rather than sort, but in low-memory conditions this migth be a bad thing.

    If you have the means, moving db/data to a RAID1 or RAID 10 partition would help substantially. store and index can stay on RAID5. On a multiuser system, openldap-data also wants RAID10, but that doesn't apply to you. You could also create a RAID1 store2 partition and make it the primary. All your new incoming mail would be faster. Consider the strategy of initial imapsync to RAID5 with production switch to a new RAID10 LUN if/when you migrate Tufts.

  4. #4
    jeffw is offline Loyal Member
    Join Date
    Oct 2007
    Posts
    88
    Rep Power
    7

    Default

    Quote Originally Posted by Rich Graves View Post
    You're swapping: 300MB swap used, 190MB cache.

    1GB RAM is low for the default ZCS settings. Another gig is cheap and would help immensely. Failing that, look at the tunables in the "Performance for Large Deployments" wiki page and turn them down. Especially see /opt/zimbra/conf/my.cnf
    I don't think I was really 'swapping', based on watching vmstat. My resident set was just a bit under 1GB, so I think it reached equilibrium and wasn't actively paging. That said, I noticed that the 10 amavisd's were using more than 250MB (RSS), so I killed off anti-spam/anti-virus for now. Now I'm only using 14MB of swap:

    Code:
    top - 10:58:22 up 15 days, 27 min,  6 users,  load average: 1.12, 1.11, 1.30
    Tasks:  96 total,   2 running,  94 sleeping,   0 stopped,   0 zombie
    Cpu(s): 92.4% us,  7.6% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.0% si
    Mem:   1035672k total,   994592k used,    41080k free,     2932k buffers
    Swap:   497972k total,    14360k used,   483612k free,   445816k cached
    
      PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND           
     6851 zimbra    16   0  340m 263m 4412 S 99.9 26.1   8:39.70 mysqld

    I've ordered another gig of memory (sadly, this box can only go to 2GB) and it should be here Thursday or Friday.

    Quote Originally Posted by Rich Graves View Post
    Does most of your 3GB of mail live in a single "saved mail" folder that's hit frequently? In general, I'm telling people that it is a *good* practice to search rather than sort, but in low-memory conditions this migth be a bad thing.
    No, it's spread out over a bunch of folders. The biggest 'active' one is Sent, which is around 30,000 messages. The biggest single folder is 120,000.

  5. #5
    Rich Graves is offline Outstanding Member
    Join Date
    Jan 2007
    Location
    Minnesota
    Posts
    717
    Rep Power
    9

    Default

    Zimbra's IMAP server is definitely slower than Cyrus (and probably iPlanet), but shoudn't be nearly that bad.

    Code:
    # Time: 080115 17:52:49
    # Query_time: 9  Lock_time: 0  Rows_sent: 1  Rows_examined: 556853
    SELECT COUNT(*) FROM mboxgroup3.mail_item WHERE mailbox_id = 3 AND folder_id = 28488 AND type IN (5,16,6) AND (imap_id IS NULL OR imap_id = 0 OR imap_id > 0);
    If your biggest folder is 120,000 messages, then you shouldn't be scanning 556,853 rows. Given that you're doing that, 9 seconds is not terribly surprising. In user zimbra's environment, "mysql" is a script that sets the password. Manually run "select count(*) FROM mboxgroup3.mail_item WHERE mailbox_id = 3 AND folder_id = 28488". How many rows do you get? If you run the entire slow queries manually, are they always slow, or do repeated invocations benefit from caching effects?

    You could have a bad index, or something. Sheeri could troubleshoot this in 5 minutes. Failing that, /opt/zimbra/libexec/zmdbintegrityreport runs daily and sends email to the "admin" account. View that mailbox. Zimbra really ought to admonish people to read or forward it; there's a lot of noise from the AS/AV system, but a few useful things, too.

    If there's nothing wrong with your indices, it might be helpful to know the IMAP commands that mutt is issuing. Turn on protocol telemetry in your client if possible, else temporarily enable cleartext connections and sniff.

    In a full day, my production 4.5.10 system with 2600 users saw only 3 mysql queries taking longer than 10 seconds, and no slow queries scanning more than 100,000 rows. I have 5 users with more than 3GB mail, and at least a couple dozen folders with > 100K folders. Obviously, my hardware is a bit more robust (dual xeon 5160, 16GB RAM, FC SAN).

  6. #6
    jeffw is offline Loyal Member
    Join Date
    Oct 2007
    Posts
    88
    Rep Power
    7

    Default

    Quote Originally Posted by Rich Graves View Post
    Zimbra's IMAP server is definitely slower than Cyrus (and probably iPlanet), but shoudn't be nearly that bad.
    Yeah, all of my big mailstore experience is with the iPlanet server, which has always impressed me with it's speed (even in the face of huge huge huge mailboxes)

    Quote Originally Posted by Rich Graves View Post
    If your biggest folder is 120,000 messages, then you shouldn't be scanning 556,853 rows. Given that you're doing that, 9 seconds is not terribly surprising. In user zimbra's environment, "mysql" is a script that sets the password. Manually run "select count(*) FROM mboxgroup3.mail_item WHERE mailbox_id = 3 AND folder_id = 28488". How many rows do you get? If you run the entire slow queries manually, are they always slow, or do repeated invocations benefit from caching effects?

    Code:
    mysql> SELECT COUNT(*) FROM mboxgroup3.mail_item WHERE mailbox_id = 3 AND folder_id = 28488 AND type IN (5,16,6) AND (imap_id IS NULL OR imap_id = 0 OR imap_id > 0);
    +----------+
    | COUNT(*) |
    +----------+
    |    31623 | 
    +----------+
    1 row in set (11.89 sec)
    I think you're on to something. It's always the same folder id in the slow queries log (28488). If I run the query multiple times, I can get the query down to 6-7 seconds.

    The mailbox was listed as one of mutt's mailboxes to check for new mail. I pulled it out of that list and now mutt is much more responsive (still not excellent, but at least usable).

    Quote Originally Posted by Rich Graves View Post
    You could have a bad index, or something. Sheeri could troubleshoot this in 5 minutes. Failing that, /opt/zimbra/libexec/zmdbintegrityreport runs daily and sends email to the "admin" account. View that mailbox. Zimbra really ought to admonish people to read or forward it; there's a lot of noise from the AS/AV system, but a few useful things, too.
    Thanks for the tip about forwarding admin's mail. I didn't know stuff was going there!

    The last db report in admin's account was from the 13th and it was totally empty.

    I just tried running /opt/zimbra/libexec/zmdbintegrityreport -v and it reported no errors found.

    So, what's the best way to get the database fixed for that one mailbox?

    Thanks!

    -j

  7. #7
    Rich Graves is offline Outstanding Member
    Join Date
    Jan 2007
    Location
    Minnesota
    Posts
    717
    Rep Power
    9

    Default

    It's not broken; an empty report is a good report. (There's an open RFE to send nothing, or at least to make it clear that empty news is good news.)

    You've reached the limit of what I know about MySQL, but at least you have a place to keep looking.

    I know that there are several ways to "check for new mail" in IMAP. Zimbra supports IDLE, which is generaly considered the "right" way, if available. IDLE only seems to be supported in the 1.5 development branch of mutt.

  8. #8
    mmorse's Avatar
    mmorse is offline Moderator
    Join Date
    May 2006
    Location
    USA
    Posts
    6,242
    Rep Power
    20

    Default

    Let's intensify the logs a bit & see what IMAP commands Mutt is issuing so often and why - to trace:
    zmprov addAccountLogger user@domain.com zimbra.imap debug

    To undo run: zmprov removeAccountLogger user@domain.com zimbra.imap

    (Let us know how often Mutt is set to poll as well.)

    count(*) is a table scan in mysql, and that kinda seems a bit excessive...

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. High CPU usage on server and client
    By jack-Z in forum Administrators
    Replies: 10
    Last Post: 01-07-2009, 10:12 PM
  2. Upgrade 4.5.1 -> 4.5.2 Network Edition on RH Failed
    By CJ.deb in forum Installation
    Replies: 7
    Last Post: 03-01-2007, 07:05 AM
  3. Replies: 2
    Last Post: 10-02-2006, 08:44 AM
  4. 3.1 on FC4 problems
    By cohnhead in forum Installation
    Replies: 8
    Last Post: 05-26-2006, 11:16 AM
  5. System-Load with M4
    By billybofh in forum Administrators
    Replies: 7
    Last Post: 02-15-2006, 02:11 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •