Page 2 of 3 FirstFirst 123 LastLast
Results 11 to 20 of 21

Thread: [SOLVED] Missing Mail and NO_SUCH_BLOB

  1. #11
    gracedman is offline Special Member
    Join Date
    May 2009
    Posts
    134
    Rep Power
    6

    Default Appears to be a database issue

    Hello, all. We are making some progress toward understanding what has happened but still have not fixed the problem. It does not appear that this problem is related to our manually running the cron jobs but is rather a database issue. We have been able to reproduce the issue.

    The first time was after we noticed we did not have a clean shutdown of the system. The second, we crashed the system (we are still pre-production and doing some experimentation). After the crash, we saw the same symptoms: It was as if mail had rolled back to a previous point, newly created domains and users were missing, existing users had deleted mail in their inboxes and missing new mails.

    When we check the message store, it appears the messages are there but they are not showing up in the database. We also are seeing error messages in the database, e.g., this is when we had our first problems:

    Code:
    090507 22:27:02  mysqld started
    InnoDB: The log sequence number in ibdata files does not match
    InnoDB: the log sequence number in the ib_logfiles!
    090507 22:27:03  InnoDB: Database was not shut down normally!
    InnoDB: Starting crash recovery.
    InnoDB: Reading tablespace information from the .ibd files...
    InnoDB: Restoring possible half-written data pages from the doublewrite
    InnoDB: buffer...
    090507 22:27:05  InnoDB: Started; log sequence number 0 3354455
    090507 22:27:05 [Note] /opt/zimbra/mysql/libexec/mysqld: ready for connections.
    Followed shortly thereafter with:
    Code:
    090507 22:52:06 [Note] /opt/zimbra/mysql/libexec/mysqld: ready for connections.
    Version: '5.0.67-log'  socket: '/opt/zimbra/db/mysql.sock'  port: 7306  Source distribution
    090507 22:59:50090507 22:59:50 [ERROR] Cannot find table mboxgroup16/appointment from the internal data dictionary
    of InnoDB though the .frm file for the table exists. Maybe you
    have deleted and recreated InnoDB data files but have forgotten
    to delete the corresponding .frm files of InnoDB tables, or you
    have moved .frm files to another database?
    See http://dev.mysql.com/doc/refman/5.0/en/innodb-troubleshooting.html
    how you can resolve the problem.
    
    090507 22:59:50090507 22:59:50 [ERROR] Cannot find table mboxgroup16/appointment from the internal data dictionary
    of InnoDB though the .frm file for the table exists. Maybe you
    have deleted and recreated InnoDB data files but have forgotten
    to delete the corresponding .frm files of InnoDB tables, or you
    have moved .frm files to another database?
    See http://dev.mysql.com/doc/refman/5.0/en/innodb-troubleshooting.html
    how you can resolve the problem.
    We see similar messages tonight. Running zmdbintegrityreport -v gave similar messages and -r did not repair them.

    We next followed the database repair procedures described in Mysql Crash Recovery - Zimbra :: Wiki. Now, zmdbintegrityreport -v comes up clean but we still have the same problem.

    How does one get the messages from the message stored back into the database and recovery the other information? I'm suspecting but hoping that I will not hear the only way is restoration from backup and losing anything since the last backup. Thanks - John
    www.spiritualoutreach.com
    Making Christianity intelligible to secular society

  2. #12
    phoenix is online now Zimbra Consultant & Moderator
    Join Date
    Sep 2005
    Location
    Vannes, France
    Posts
    23,473
    Rep Power
    56

    Default

    Quote Originally Posted by gracedman View Post
    How does one get the messages from the message stored back into the database and recovery the other information? I'm suspecting but hoping that I will not hear the only way is restoration from backup and losing anything since the last backup. Thanks - John
    You can use zmlmtpinject. How much RAM is allocated this Zimbra server?

    Please update your forum profile with the output of the following:

    Code:
    zmcontrol -v
    Regards


    Bill


    Acompli: A new adventure for Co-Founder KevinH.

  3. #13
    gracedman is offline Special Member
    Join Date
    May 2009
    Posts
    134
    Rep Power
    6

    Default

    The server has up to 16GB available to it. It is using a little less than 3 right now.

    zmcontrol -v yields:

    Release 5.0.16_GA_2921.RHEL5_64_20090429021719 CentOS5_64 NETWORK edition
    I'll study zmlmtpinject a little more to make sure I don't misuse it.

    Thanks - John
    www.spiritualoutreach.com
    Making Christianity intelligible to secular society

  4. #14
    gracedman is offline Special Member
    Join Date
    May 2009
    Posts
    134
    Rep Power
    6

    Default

    Any thoughts about why a crash or unclean shutdown would cause this "data reversion"? I can't imagine it is the case but it's almost as if the transaction for the last several days changes was not committed to the database. Thanks - John
    www.spiritualoutreach.com
    Making Christianity intelligible to secular society

  5. #15
    gracedman is offline Special Member
    Join Date
    May 2009
    Posts
    134
    Rep Power
    6

    Default

    The reinjection seemed to work well. We put together a little script to facilitate it in case it helps anyone else:

    Code:
    #!/bin/bash
    if [ $# -ne 1 ];then
            echo "Usage: $(basename $0) <Message store ID number>"
            exit 10
    fi
    if [ ! -d /opt/zimbra/store/0/${1}/msg/0 ];then
            echo "Could not find message store for ${1}"
            exit 11
    fi
    
    ZIMDIR="/opt/zimbra"
    cd ${ZIMDIR}/store/0/${1}/msg/0/
    for MSG in $(ls)
    do
            TO=$(sed -n -e "1,/Original Message/ s/^To: \(.*\)/\1/p" ${MSG})
            FROM=$(sed -n -e "1,/Original Message/ s/^From: \(.*\)/\1/p" ${MSG})
            if [ -n "$(echo ${TO} | grep '<')" ];then
                    TO=${TO##*<}
                    TO=${TO%%>*}
            fi
            if [ -n "$(echo ${FROM} | grep '<')" ];then
                    FROM=${FROM##*<}
                    FROM=${FROM%%>*}
            fi
            ${ZIMDIR}/bin/zmlmtpinject -r ${TO} -s ${FROM} ${MSG}
    done
    We're still perplexed about why we had the problem in the first place. Any suggestions? Thanks - John
    www.spiritualoutreach.com
    Making Christianity intelligible to secular society

  6. #16
    gracedman is offline Special Member
    Join Date
    May 2009
    Posts
    134
    Rep Power
    6

    Default

    We're slowly working our way through this issue. It occurs when we shut down the underlying vserver host. Despite our creating a /etc/vservers/zimbra1/apps/vshelper/sync-timeout file and setting it to 900 (seconds) to give the system plenty of time to shutdown gracefully, it is not. If we watch the shutdown sequence, it is considering the zimbra server stopped long before it possibly could be. Thus, in effect, we are crashing the zimbra server.

    We have escalated to the vserver list to determine how to solve the problem from the vserver side. However, that doesn't mean there will be other reasons why the zimbra server might crash in the future. I'm a bit surprised by how much damage is done and how consistently. Are there settings which can be tweaked to make zimbra a little more resistant to damage during a crash? Thanks - John
    www.spiritualoutreach.com
    Making Christianity intelligible to secular society

  7. #17
    gracedman is offline Special Member
    Join Date
    May 2009
    Posts
    134
    Rep Power
    6

    Default

    I wonder if the severity of our crashes are compounded by the fact we have oodles of memory right now and very little activity. Perhaps we are being bitten by either this (from Performance Tuning Guidelines for Large Deployments - Zimbra :: Wiki):

    "Innodb writes out pages in its cache after a certain percent of pages are dirty. The default is 90%. This default setting will minimize the total number of writes, but it would cause a major bottleneck in system performance when 90% is reached and database becomes unresponsive because the disk system is writing out all those changes in one shot. We recommend you set the dirty flush ratio to 10%, which does cause a lot more net total IO, but will avoids spiky write load. "

    or this:

    "MySQL is configured to store its data in files, and the Linux kernel buffers file IO. The buffering provided by the kernel is not useful to innodb at all because innodb is making its own paging decisions - the kernel gets in the way. Bypass the kernel with"

    As I am not eager to keep mangling my system, I am hesitant to test these hypotheses! Can anyone with more experience furnish some guidance?

    As someone new to this forum, I do hope it is not bad etiquette to keep such a running post. I am hoping it might help some other poor slob who faces the same issues in the future. Thanks - John
    www.spiritualoutreach.com
    Making Christianity intelligible to secular society

  8. #18
    gracedman is offline Special Member
    Join Date
    May 2009
    Posts
    134
    Rep Power
    6

    Default

    We are still struggling with this issue but have narrowed it down quite a bit. We noticed that our data loss is always the data between 4:02AM the morning after zimbra was last restarted and the time of the next zimbra service restart. This happens whether we restart the server, restart the zimbra service, or simply do a zmstorectl restart.

    We also noticed that all logging stops at this same 4:02AM point in time. No entries in zimbra.log, mailbox.log, logger_myslow.log, etc.

    This is also the time the logrotate daily cron job kicks in. We did notice a problem in /etc/logrotate.d/zimbra. The postrotate routine for zimbra.log tries to restart syslogd but CentOS 5.3 (our platform) uses rsyslogd. Once we edited that call, zimbra.log was able to continually log past 4:02AM but it still did not fix the problem.

    We removed the zimbra logrotate file from /etc/logrotate.d/ last night but were able to reproduce the data loss this morning so it does not appear to be something within the zimbra file. We are also told by zimbra tech support that data missing from the logs would not provoke a data rollback / loss / corruption upon service restart so we are assuming this log failure is a parallel symptom rather than a cause.

    We will begin removing files from logrotate.d in an attempt to find what is the mystic coincidence between this failure and the cron.daily jobs. Any suggestions about what might be happening? Thanks - John
    www.spiritualoutreach.com
    Making Christianity intelligible to secular society

  9. #19
    gracedman is offline Special Member
    Join Date
    May 2009
    Posts
    134
    Rep Power
    6

    Default data loss and log stop correlation

    I believe we have found the correlation between the data loss and the log problems but have not yet found the root cause. I believe data is being lost from the database because the logs are not being written after 4:02AM, specifically the innodb log file.

    I am guessing that when mysql restarts and finds the data in the database does not match the data in the innodb log file (because we have ceased writing to it), it rolls back the database. I am deducing this from this mysql_error.log message:

    090605 06:32:19 mysqld started
    InnoDB: The log sequence number in ibdata files does not match
    InnoDB: the log sequence number in the ib_logfiles!
    090605 6:32:20 InnoDB: Database was not shut down normally!
    InnoDB: Starting crash recovery.
    InnoDB: Reading tablespace information from the .ibd files...
    InnoDB: Restoring possible half-written data pages from the doublewrite
    InnoDB: buffer...
    090605 6:32:23 InnoDB: Started; log sequence number 0 19382319
    090605 6:32:23 [Note] /opt/zimbra/mysql/libexec/mysqld: ready for connections.
    Version: '5.0.67-log' socket: '/opt/zimbra/db/mysql.sock' port: 7306 Source distribution
    090605 6:53:05 [Note] /opt/zimbra/mysql/libexec/mysqld: Normal shutdown

    This is after recovering the database and waiting for another 4:02 to pass. Our challenge is still to find out why logging stops and stops only for zimbra. All other logging is working fine past 4:02 AM.

    Would it be the case that mysql would roll back to 4:02AM if the innodb log contained no data later than 4:02?

    We have not verified that innodb is not updated after 4:02 as we just put these issues together. We'll verify that in the morning. 4:02 is the exact time the cron.daily jobs start. Any clues as to why our logging stops at 4:02AM and stays disabled until Zimbra is restarted? Thanks - John
    www.spiritualoutreach.com
    Making Christianity intelligible to secular society

  10. #20
    gracedman is offline Special Member
    Join Date
    May 2009
    Posts
    134
    Rep Power
    6

    Default Offending process isolated

    We have finally narrowed down the problem to the offending process. There is some kind of conflict between mysql/innodb and the vserver hashify function. The problem described previously occurs every time hashify is run either from a cron job (4:00AM every day) or manually.

    Since this thread contains a lot of accumulated extraneous information, I'll open a new thread to discuss this specific problem.
    www.spiritualoutreach.com
    Making Christianity intelligible to secular society

Page 2 of 3 FirstFirst 123 LastLast

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •