Page 1 of 3 123 LastLast
Results 1 to 10 of 21

Thread: [SOLVED] Missing Mail and NO_SUCH_BLOB

  1. #1
    gracedman is offline Special Member
    Join Date
    May 2009
    Posts
    134
    Rep Power
    5

    Unhappy [SOLVED] Missing Mail and NO_SUCH_BLOB

    Hello, all. We're having a bit of a crisis here just as we are moving from test to production. I am not an experienced mail administrator and am brand new to Zimbra. It seems we have suddenly lost two days worth of data - mail, calendar appointments, even a complete new domain we created. The emails I see in my mail box are emails I deleted. I get NO_SUCH_BLOB errors when I click on them.

    We thought all was running smoothly until we looked at our zimbra.log file and realized it was 17GB in size! It was filled with errors such as:
    Code:
    May  7 22:25:56 zimbra01 zimbramon[2695]: 2695:info: zmmtaconfig: Skipping Global system configuration No data returned.
    May  7 22:25:56 zimbra01 zimbramon[2695]: 2695:info: zmmtaconfig: Skipping Configuration for server zimbra01.ssiservices.biz No data returned.
    May  7 22:25:56 zimbra01 zimbramon[2695]: 2695:info: zmmtaconfig: Sleeping...Key lookup failed.
    May  7 22:25:59 smtp01 zimbramon[2393]: 2393:info: zmmtaconfig: Skipping Global system configuration update.
    May  7 22:25:59 smtp01 zimbramon[2393]: 2393:info: zmmtaconfig: gacf ERROR: service.FAILURE (system failure: ZimbraLdapContext) (cause: javax.naming.Commu
    May  7 22:26:00 smtp01 zimbramon[2393]: 2393:info: zmmtaconfig: Skipping All Reverse Proxy URLs update.
    May  7 22:26:00 smtp01 zimbramon[2393]: 2393:info: zmmtaconfig: Skipping getAllReverseProxyURLs ERROR: service.FAILURE (system failure: ZimbraLdapContext)
    May  7 22:26:00 smtp01 zimbramon[2393]: 2393:info: zmmtaconfig: Skipping All Reverse Proxy Backends update.
    May  7 22:26:00 smtp01 zimbramon[2393]: 2393:info: zmmtaconfig: Skipping getAllReverseProxyBackends ERROR: service.FAILURE (system failure: ZimbraLdapCont
    May  7 22:26:01 smtp01 zimbramon[2393]: 2393:info: zmmtaconfig: Skipping All Memcached Servers update.
    May  7 22:26:01 smtp01 zimbramon[2393]: 2393:info: zmmtaconfig: Skipping getAllMemcachedServers ERROR: service.FAILURE (system failure: ZimbraLdapContext)
    May  7 22:26:02 zimbra01 zimbramon[2695]: 2695:info: zmmtaconfig: Skipping Local server configuration No data returned.
    May  7 22:26:02 zimbra01 zimbramon[2695]: 2695:info: zmmtaconfig: Skipping Global system configuration No data returned.
    May  7 22:26:02 zimbra01 zimbramon[2695]: 2695:info: zmmtaconfig: Skipping Configuration for server zimbra01.ssiservices.biz No data returned.
    May  7 22:26:02 zimbra01 zimbramon[2695]: 2695:info: zmmtaconfig: Sleeping...Key lookup failed.
    May  7 22:26:02 smtp01 zimbramon[2393]: 2393:info: zmmtaconfig: Skipping All MTA Authentication Target URLs update.
    May  7 22:26:02 smtp01 zimbramon[2393]: 2393:info: zmmtaconfig: Skipping getAllMtaAuthURLs ERROR: service.FAILURE (system failure: ZimbraLdapContext) (cau
    May  7 22:26:03 smtp01 zimbramon[2393]: 2393:info: zmmtaconfig: Skipping Configuration for server smtp01.ssiservices.biz update.
    May  7 22:26:03 smtp01 zimbramon[2393]: 2393:info: zmmtaconfig: gs:smtp01.ssiservices.biz ERROR: service.FAILURE (system failure: ZimbraLdapContext) (caus
    May  7 22:26:03 smtp01 zimbramon[2393]: 2393:info: zmmtaconfig: Sleeping...Key lookup failed.
    May  7 22:26:07 zimbra01 zimbramon[2695]: 2695:info: zmmtaconfig: Skipping Local server configuration No data returned.
    May  7 22:26:07 zimbra01 zimbramon[2695]: 2695:info: zmmtaconfig: Skipping Global system configuration No data returned.
    May  7 22:26:07 zimbra01 zimbramon[2695]: 2695:info: zmmtaconfig: Skipping Configuration for server zimbra01.ssiservices.biz No data returned.
    May  7 22:26:07 zimbra01 zimbramon[2695]: 2695:info: zmmtaconfig: Sleeping...Key lookup failed.
    Some searching suggested running zmfixperms. We did that but the errors seemed to persist.

    We then noticed the cron log was also very large. It was filled with errors such as:
    Code:
    May  5 14:12:01 zimbra01 crond[9123]: Cannot make/remove an entry for the specified session
    May  5 14:12:01 zimbra01 crond[9123]: CRON (zimbra) ERROR: failed to open PAM security session: Success
    May  5 14:12:01 zimbra01 crond[9123]: CRON (zimbra) ERROR: cannot set security context
    May  5 14:14:01 zimbra01 crond[12627]: Cannot make/remove an entry for the specified session
    May  5 14:14:01 zimbra01 crond[12627]: CRON (zimbra) ERROR: failed to open PAM security session: Success
    May  5 14:14:01 zimbra01 crond[12627]: CRON (zimbra) ERROR: cannot set security context
    May  5 14:15:01 zimbra01 crond[14336]: Cannot make/remove an entry for the specified session
    May  5 14:15:01 zimbra01 crond[14336]: CRON (zimbra) ERROR: failed to open PAM security session: Success
    May  5 14:15:01 zimbra01 crond[14336]: CRON (zimbra) ERROR: cannot set security context
    May  5 14:16:01 zimbra01 crond[16038]: Cannot make/remove an entry for the specified session
    May  5 14:16:01 zimbra01 crond[16038]: CRON (zimbra) ERROR: failed to open PAM security session: Success
    May  5 14:16:01 zimbra01 crond[16038]: CRON (zimbra) ERROR: cannot set security context
    May  5 14:18:01 zimbra01 crond[18985]: Cannot make/remove an entry for the specified session
    May  5 14:18:01 zimbra01 crond[18985]: CRON (zimbra) ERROR: failed to open PAM security session: Success
    May  5 14:18:01 zimbra01 crond[18985]: CRON (zimbra) ERROR: cannot set security context
    We then listed the zimbra user crontab and ran the commands manually as the zimbra user to see which one was failing. None failed.

    We then restarted the server and it did not seem to shut down cleanly. The zimbra service said it failed on shutdown. The server did appear to come up clean. However, when I went into my email, I was shocked to see two days missing! I see such phenomena in the logs but related to migration.

    I really do not know where to begin but we need to get these two days worth of data back. We have not yet implemented the archiving feature. We are running GA16 on CentOS 5.3 within a VServer. What do we do? Thanks - John
    Last edited by gracedman; 05-07-2009 at 08:43 PM. Reason: Add post icon
    www.spiritualoutreach.com
    Making Christianity intelligible to secular society

  2. #2
    gracedman is offline Special Member
    Join Date
    May 2009
    Posts
    134
    Rep Power
    5

    Default

    As I examine this further, it is as if the whole world has gone back two days: account login times, missing created items, the message store. When I go to /opt/zimbra/store and look at the file dates, I see the newest messages and messages from two days and earlier - nothing in-between. This is most disconcerting in that I'd like to both recover the old data but, most importantly, ensure this never happens when we move to full production. I'm sure it must be something we've done but I do not know what.

    There appear to be no backups nor are there server statistics. I assume this must be related to the failed cron jobs. These are the commands I believe I ran manually to trouble shoot the cron errors assuming they'd be safe if cron was running them. I did them all as the zimbra user:

    /opt/zimbra/libexec/zmqueuelog

    /opt/zimbra/bin/zmtrainsa >> /opt/zimbra/log/spamtrain.log

    /opt/zimbra/bin/zmtrainsa --cleanup >> /opt/zimbra/log/spamtrain.log

    ls /opt/zimbra/data/dspam/data/z/i/zimbra/zimbra.sig (which returned file not found)

    ls /opt/zimbra/data/dspam/system.log (which returned file not found)

    ls /opt/zimbra/data/dspam/data/z/i/zimbra/zimbra.log (which returned file not found)

    /opt/zimbra/libexec/sa-learn -p /opt/zimbra/conf/salocal.cf --dbpath

    /opt/zimbra/data/amavisd/.spamassassin --siteconfigpath /opt/zimbra/conf/spamassassin --force-expire --sync

    find /opt/zimbra/data/amavisd/tmp -maxdepth 1 -type d -name 'amavis-*' -mtime +1 -exec rm -rf {} \;

    find /opt/zimbra/data/amavisd/quarantine -type f -mtime +7 -exec rm -f {} \;

    Any help would be greatly appreciated - John
    www.spiritualoutreach.com
    Making Christianity intelligible to secular society

  3. #3
    gracedman is offline Special Member
    Join Date
    May 2009
    Posts
    134
    Rep Power
    5

    Default

    We still have not solved this but did identify and resolve a related issue. Along with the failed cron jobs, we kept seeing this in /var/log/secure:

    Code:
    May  8 04:55:01 zimbra01 crond[14829]: pam_loginuid(crond:session): set_loginuid failed
    Apparently, pam_loginuid needs to write to /proc which is not available for write inside a vserver. We knew we had to disable this for sshd but it must be called elsewhere as well including cron. We this changed them all with:

    /bin/sed -i -e "s/^session.*required.*pam_loginuid.so/# session\trequired\tpam_loginuid.so/g" /etc/pam.d/*

    Now we have statistics and, I assume backups, but we still have our huge problem of missing data.
    www.spiritualoutreach.com
    Making Christianity intelligible to secular society

  4. #4
    bdial's Avatar
    bdial is offline Moderator
    Join Date
    Jul 2007
    Location
    Baltimore
    Posts
    1,649
    Rep Power
    10

    Default

    we had an issue recently where a bunch of accounts lost some .msg files around the same time. i can tell you how i fixed it which was a manual process that took a little while and would not be practical if you have a ton of data.

    first run zmblobchk -u <user>

    it will tell you maybe that theres some messages with metadata but not on disk. for me this was less than 10 for each user that had the problem.

    i then restored the account to the time frame right before i know the problem happened.

    then i copied the msg files from like example

    /opt/zimbra/store/0/388/msg/9/ to /opt/zimbra/store/0/70/msg/9/

    this assumes that mailboxid 388 is the restored account and mailboxid 70 is the original account and al the missing messages were under the 9/ dir

    this was really helpful

    Ajcody-Notes-No-Such-Blob - Zimbra :: Wiki

  5. #5
    gracedman is offline Special Member
    Join Date
    May 2009
    Posts
    134
    Rep Power
    5

    Default Missing the bigger problem

    Thanks. This is helpful although we merely deleted the mails giving the missing blob errors from the user interface (thankfully the scope as limited as we had not yet released to full production).

    However the missing mail is much more of a problem. For one, we'd like to recover it as there were some important messages (and backup was not running as explained in the reply about cron issues, CentOS, and vserver). Most importantly, we'd like to understand what happened so it does not happen in production. Although it is probably our error, it is frightful that the product is not resilient enough to protect us against our own stupidity. I know that sounds a bit unfair but email is so critical we need to be protected from ourselves!

    Any idea from anyone on why this happened? Thanks - John
    www.spiritualoutreach.com
    Making Christianity intelligible to secular society

  6. #6
    phoenix is online now Zimbra Consultant & Moderator
    Join Date
    Sep 2005
    Location
    Vannes, France
    Posts
    23,201
    Rep Power
    56

    Default

    If your email is critical then you should run itr on dedicated hardware. Using a vserver is unsupported and likely to cause problems, if you really must run it in a VM then use ESX, ESXi or Xenserver.
    Regards


    Bill


    Acompli: A new adventure for Co-Founder KevinH.

  7. #7
    gracedman is offline Special Member
    Join Date
    May 2009
    Posts
    134
    Rep Power
    5

    Default

    Thanks but I would disagree. We started with a VM while the environment is small. We assumed we would grow into a dedicated system and then a cluster. We soon realized, we would be better off growing unto a clustered virtualization host for greater stability rather than a stand-alone system until we are ready for a dedicated cluster. The VMs also give us other advantages such as portability.

    VServer itself is an interesting technology. Having been a heavy, early Xen user and now a regular KVM user, both of which are fine products, VServer has advantages over both in certain situations of which this is one. It does require understanding how the applications work in order to tune the environment but understanding what one is using is never a bad idea. At some point, we hope to share our internal documentation on how to do this, e.g., we found we had to enable the loopback masking nflag and disable the Single IP Special Casing. We also hit the loginuid issue which is specific to the RedHat family of products.

    At this point, we seem to be working fine other than an LDAP warning we are troubleshooting right now (amavis - NOTICE: do_search: trying again: LDAP_OPERATIONS_ERROR) and with a much better utilization of resources than we would have had under Xen or KVM.

    This mail loss error smells like something unrelated to VServer. I'm sure it is something we did in our experimenting and troubleshooting. I just want to understand what we did so we don't do it again and can report to the Zimbra team that there is a way users can cause serious damage to theirs systems without warning from Zimbra. Thanks again - John
    www.spiritualoutreach.com
    Making Christianity intelligible to secular society

  8. #8
    Rich Graves is offline Outstanding Member
    Join Date
    Jan 2007
    Location
    Minnesota
    Posts
    717
    Rep Power
    9

    Default

    What are you using for storage? You could have lost two days quite literally, like reverting /opt/zimbra/store to a two-day-old snapshot.

  9. #9
    phoenix is online now Zimbra Consultant & Moderator
    Join Date
    Sep 2005
    Location
    Vannes, France
    Posts
    23,201
    Rep Power
    56

    Default

    Quote Originally Posted by gracedman View Post
    VServer itself is an interesting technology. Having been a heavy, early Xen user and now a regular KVM user, both of which are fine products, VServer has advantages over both in certain situations of which this is one. It does require understanding how the applications work in order to tune the environment but understanding what one is using is never a bad idea. At some point, we hope to share our internal documentation on how to do this, e.g., we found we had to enable the loopback masking nflag and disable the Single IP Special Casing. We also hit the loginuid issue which is specific to the RedHat family of products.
    I fully understand what vserver and Xen are (I wasn't actually talking about this 'xen' and Xenserver is superior to that version) neither of which is supported and you are actually using an unsupported kernel (it's not even supported by CentOS). I'm not saying it won't work but as I said earlier, it's an unsupported platform - use it at your own risk. I'll say it again, the comment 'email is critical' and using unsupported vserver and kernels don't seem to go together in my mind.
    Regards


    Bill


    Acompli: A new adventure for Co-Founder KevinH.

  10. #10
    gracedman is offline Special Member
    Join Date
    May 2009
    Posts
    134
    Rep Power
    5

    Default

    Thanks. We are using a Z200 from Pogo Linux which is running Nexenta (in effect, a front end to ZFS on opensolaris). The drives are addressed as RAID0 across an array of iSCSI drives backed by ZFS zvols.

    We didn't do anything to the disk subsystem but we did manually run the cron jobs as described - John
    www.spiritualoutreach.com
    Making Christianity intelligible to secular society

Page 1 of 3 123 LastLast

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •