Zimbra offers Open Source email server software and shared calendar for Linux and the Mac
Go Back   Zimbra :: Forums > Zimbra Collaboration Suite > Administrators

Welcome to the Zimbra :: Forums!
Welcome, if you would like to post a comment please register. We also encourage you to explore all things Zimbra with our team and members of the community.

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 05-07-2009, 09:42 PM
Special Member
 
Posts: 130
Unhappy [SOLVED] Missing Mail and NO_SUCH_BLOB

Hello, all. We're having a bit of a crisis here just as we are moving from test to production. I am not an experienced mail administrator and am brand new to Zimbra. It seems we have suddenly lost two days worth of data - mail, calendar appointments, even a complete new domain we created. The emails I see in my mail box are emails I deleted. I get NO_SUCH_BLOB errors when I click on them.

We thought all was running smoothly until we looked at our zimbra.log file and realized it was 17GB in size! It was filled with errors such as:
Code:
May  7 22:25:56 zimbra01 zimbramon[2695]: 2695:info: zmmtaconfig: Skipping Global system configuration No data returned.
May  7 22:25:56 zimbra01 zimbramon[2695]: 2695:info: zmmtaconfig: Skipping Configuration for server zimbra01.ssiservices.biz No data returned.
May  7 22:25:56 zimbra01 zimbramon[2695]: 2695:info: zmmtaconfig: Sleeping...Key lookup failed.
May  7 22:25:59 smtp01 zimbramon[2393]: 2393:info: zmmtaconfig: Skipping Global system configuration update.
May  7 22:25:59 smtp01 zimbramon[2393]: 2393:info: zmmtaconfig: gacf ERROR: service.FAILURE (system failure: ZimbraLdapContext) (cause: javax.naming.Commu
May  7 22:26:00 smtp01 zimbramon[2393]: 2393:info: zmmtaconfig: Skipping All Reverse Proxy URLs update.
May  7 22:26:00 smtp01 zimbramon[2393]: 2393:info: zmmtaconfig: Skipping getAllReverseProxyURLs ERROR: service.FAILURE (system failure: ZimbraLdapContext)
May  7 22:26:00 smtp01 zimbramon[2393]: 2393:info: zmmtaconfig: Skipping All Reverse Proxy Backends update.
May  7 22:26:00 smtp01 zimbramon[2393]: 2393:info: zmmtaconfig: Skipping getAllReverseProxyBackends ERROR: service.FAILURE (system failure: ZimbraLdapCont
May  7 22:26:01 smtp01 zimbramon[2393]: 2393:info: zmmtaconfig: Skipping All Memcached Servers update.
May  7 22:26:01 smtp01 zimbramon[2393]: 2393:info: zmmtaconfig: Skipping getAllMemcachedServers ERROR: service.FAILURE (system failure: ZimbraLdapContext)
May  7 22:26:02 zimbra01 zimbramon[2695]: 2695:info: zmmtaconfig: Skipping Local server configuration No data returned.
May  7 22:26:02 zimbra01 zimbramon[2695]: 2695:info: zmmtaconfig: Skipping Global system configuration No data returned.
May  7 22:26:02 zimbra01 zimbramon[2695]: 2695:info: zmmtaconfig: Skipping Configuration for server zimbra01.ssiservices.biz No data returned.
May  7 22:26:02 zimbra01 zimbramon[2695]: 2695:info: zmmtaconfig: Sleeping...Key lookup failed.
May  7 22:26:02 smtp01 zimbramon[2393]: 2393:info: zmmtaconfig: Skipping All MTA Authentication Target URLs update.
May  7 22:26:02 smtp01 zimbramon[2393]: 2393:info: zmmtaconfig: Skipping getAllMtaAuthURLs ERROR: service.FAILURE (system failure: ZimbraLdapContext) (cau
May  7 22:26:03 smtp01 zimbramon[2393]: 2393:info: zmmtaconfig: Skipping Configuration for server smtp01.ssiservices.biz update.
May  7 22:26:03 smtp01 zimbramon[2393]: 2393:info: zmmtaconfig: gs:smtp01.ssiservices.biz ERROR: service.FAILURE (system failure: ZimbraLdapContext) (caus
May  7 22:26:03 smtp01 zimbramon[2393]: 2393:info: zmmtaconfig: Sleeping...Key lookup failed.
May  7 22:26:07 zimbra01 zimbramon[2695]: 2695:info: zmmtaconfig: Skipping Local server configuration No data returned.
May  7 22:26:07 zimbra01 zimbramon[2695]: 2695:info: zmmtaconfig: Skipping Global system configuration No data returned.
May  7 22:26:07 zimbra01 zimbramon[2695]: 2695:info: zmmtaconfig: Skipping Configuration for server zimbra01.ssiservices.biz No data returned.
May  7 22:26:07 zimbra01 zimbramon[2695]: 2695:info: zmmtaconfig: Sleeping...Key lookup failed.
Some searching suggested running zmfixperms. We did that but the errors seemed to persist.

We then noticed the cron log was also very large. It was filled with errors such as:
Code:
May  5 14:12:01 zimbra01 crond[9123]: Cannot make/remove an entry for the specified session
May  5 14:12:01 zimbra01 crond[9123]: CRON (zimbra) ERROR: failed to open PAM security session: Success
May  5 14:12:01 zimbra01 crond[9123]: CRON (zimbra) ERROR: cannot set security context
May  5 14:14:01 zimbra01 crond[12627]: Cannot make/remove an entry for the specified session
May  5 14:14:01 zimbra01 crond[12627]: CRON (zimbra) ERROR: failed to open PAM security session: Success
May  5 14:14:01 zimbra01 crond[12627]: CRON (zimbra) ERROR: cannot set security context
May  5 14:15:01 zimbra01 crond[14336]: Cannot make/remove an entry for the specified session
May  5 14:15:01 zimbra01 crond[14336]: CRON (zimbra) ERROR: failed to open PAM security session: Success
May  5 14:15:01 zimbra01 crond[14336]: CRON (zimbra) ERROR: cannot set security context
May  5 14:16:01 zimbra01 crond[16038]: Cannot make/remove an entry for the specified session
May  5 14:16:01 zimbra01 crond[16038]: CRON (zimbra) ERROR: failed to open PAM security session: Success
May  5 14:16:01 zimbra01 crond[16038]: CRON (zimbra) ERROR: cannot set security context
May  5 14:18:01 zimbra01 crond[18985]: Cannot make/remove an entry for the specified session
May  5 14:18:01 zimbra01 crond[18985]: CRON (zimbra) ERROR: failed to open PAM security session: Success
May  5 14:18:01 zimbra01 crond[18985]: CRON (zimbra) ERROR: cannot set security context
We then listed the zimbra user crontab and ran the commands manually as the zimbra user to see which one was failing. None failed.

We then restarted the server and it did not seem to shut down cleanly. The zimbra service said it failed on shutdown. The server did appear to come up clean. However, when I went into my email, I was shocked to see two days missing! I see such phenomena in the logs but related to migration.

I really do not know where to begin but we need to get these two days worth of data back. We have not yet implemented the archiving feature. We are running GA16 on CentOS 5.3 within a VServer. What do we do? Thanks - John
__________________
www.spiritualoutreach.com
Making Christianity intelligible to secular society

Last edited by gracedman; 05-07-2009 at 09:43 PM.. Reason: Add post icon
Reply With Quote
  #2 (permalink)  
Old 05-08-2009, 02:49 AM
Special Member
 
Posts: 130
Default

As I examine this further, it is as if the whole world has gone back two days: account login times, missing created items, the message store. When I go to /opt/zimbra/store and look at the file dates, I see the newest messages and messages from two days and earlier - nothing in-between. This is most disconcerting in that I'd like to both recover the old data but, most importantly, ensure this never happens when we move to full production. I'm sure it must be something we've done but I do not know what.

There appear to be no backups nor are there server statistics. I assume this must be related to the failed cron jobs. These are the commands I believe I ran manually to trouble shoot the cron errors assuming they'd be safe if cron was running them. I did them all as the zimbra user:

/opt/zimbra/libexec/zmqueuelog

/opt/zimbra/bin/zmtrainsa >> /opt/zimbra/log/spamtrain.log

/opt/zimbra/bin/zmtrainsa --cleanup >> /opt/zimbra/log/spamtrain.log

ls /opt/zimbra/data/dspam/data/z/i/zimbra/zimbra.sig (which returned file not found)

ls /opt/zimbra/data/dspam/system.log (which returned file not found)

ls /opt/zimbra/data/dspam/data/z/i/zimbra/zimbra.log (which returned file not found)

/opt/zimbra/libexec/sa-learn -p /opt/zimbra/conf/salocal.cf --dbpath

/opt/zimbra/data/amavisd/.spamassassin --siteconfigpath /opt/zimbra/conf/spamassassin --force-expire --sync

find /opt/zimbra/data/amavisd/tmp -maxdepth 1 -type d -name 'amavis-*' -mtime +1 -exec rm -rf {} \;

find /opt/zimbra/data/amavisd/quarantine -type f -mtime +7 -exec rm -f {} \;

Any help would be greatly appreciated - John
__________________
www.spiritualoutreach.com
Making Christianity intelligible to secular society
Reply With Quote
  #3 (permalink)  
Old 05-08-2009, 03:25 AM
Special Member
 
Posts: 130
Default

We still have not solved this but did identify and resolve a related issue. Along with the failed cron jobs, we kept seeing this in /var/log/secure:

Code:
May  8 04:55:01 zimbra01 crond[14829]: pam_loginuid(crond:session): set_loginuid failed
Apparently, pam_loginuid needs to write to /proc which is not available for write inside a vserver. We knew we had to disable this for sshd but it must be called elsewhere as well including cron. We this changed them all with:

/bin/sed -i -e "s/^session.*required.*pam_loginuid.so/# session\trequired\tpam_loginuid.so/g" /etc/pam.d/*

Now we have statistics and, I assume backups, but we still have our huge problem of missing data.
__________________
www.spiritualoutreach.com
Making Christianity intelligible to secular society
Reply With Quote
  #4 (permalink)  
Old 05-08-2009, 03:16 PM
Moderator
 
Posts: 1,531
Default

we had an issue recently where a bunch of accounts lost some .msg files around the same time. i can tell you how i fixed it which was a manual process that took a little while and would not be practical if you have a ton of data.

first run zmblobchk -u <user>

it will tell you maybe that theres some messages with metadata but not on disk. for me this was less than 10 for each user that had the problem.

i then restored the account to the time frame right before i know the problem happened.

then i copied the msg files from like example

/opt/zimbra/store/0/388/msg/9/ to /opt/zimbra/store/0/70/msg/9/

this assumes that mailboxid 388 is the restored account and mailboxid 70 is the original account and al the missing messages were under the 9/ dir

this was really helpful

Ajcody-Notes-No-Such-Blob - Zimbra :: Wiki
Reply With Quote
  #5 (permalink)  
Old 05-09-2009, 05:57 AM
Special Member
 
Posts: 130
Default Missing the bigger problem

Thanks. This is helpful although we merely deleted the mails giving the missing blob errors from the user interface (thankfully the scope as limited as we had not yet released to full production).

However the missing mail is much more of a problem. For one, we'd like to recover it as there were some important messages (and backup was not running as explained in the reply about cron issues, CentOS, and vserver). Most importantly, we'd like to understand what happened so it does not happen in production. Although it is probably our error, it is frightful that the product is not resilient enough to protect us against our own stupidity. I know that sounds a bit unfair but email is so critical we need to be protected from ourselves!

Any idea from anyone on why this happened? Thanks - John
__________________
www.spiritualoutreach.com
Making Christianity intelligible to secular society
Reply With Quote
  #6 (permalink)  
Old 05-09-2009, 06:06 AM
Zimbra Consultant & Moderator
 
Posts: 19,639
Default

If your email is critical then you should run itr on dedicated hardware. Using a vserver is unsupported and likely to cause problems, if you really must run it in a VM then use ESX, ESXi or Xenserver.
__________________
Regards


Bill
Reply With Quote
  #7 (permalink)  
Old 05-09-2009, 07:25 AM
Special Member
 
Posts: 130
Default

Thanks but I would disagree. We started with a VM while the environment is small. We assumed we would grow into a dedicated system and then a cluster. We soon realized, we would be better off growing unto a clustered virtualization host for greater stability rather than a stand-alone system until we are ready for a dedicated cluster. The VMs also give us other advantages such as portability.

VServer itself is an interesting technology. Having been a heavy, early Xen user and now a regular KVM user, both of which are fine products, VServer has advantages over both in certain situations of which this is one. It does require understanding how the applications work in order to tune the environment but understanding what one is using is never a bad idea. At some point, we hope to share our internal documentation on how to do this, e.g., we found we had to enable the loopback masking nflag and disable the Single IP Special Casing. We also hit the loginuid issue which is specific to the RedHat family of products.

At this point, we seem to be working fine other than an LDAP warning we are troubleshooting right now (amavis - NOTICE: do_search: trying again: LDAP_OPERATIONS_ERROR) and with a much better utilization of resources than we would have had under Xen or KVM.

This mail loss error smells like something unrelated to VServer. I'm sure it is something we did in our experimenting and troubleshooting. I just want to understand what we did so we don't do it again and can report to the Zimbra team that there is a way users can cause serious damage to theirs systems without warning from Zimbra. Thanks again - John
__________________
www.spiritualoutreach.com
Making Christianity intelligible to secular society
Reply With Quote
  #8 (permalink)  
Old 05-09-2009, 08:29 AM
Outstanding Member
 
Posts: 708
Default

What are you using for storage? You could have lost two days quite literally, like reverting /opt/zimbra/store to a two-day-old snapshot.
Reply With Quote
  #9 (permalink)  
Old 05-09-2009, 08:45 AM
Zimbra Consultant & Moderator
 
Posts: 19,639
Default

Quote:
Originally Posted by gracedman View Post
VServer itself is an interesting technology. Having been a heavy, early Xen user and now a regular KVM user, both of which are fine products, VServer has advantages over both in certain situations of which this is one. It does require understanding how the applications work in order to tune the environment but understanding what one is using is never a bad idea. At some point, we hope to share our internal documentation on how to do this, e.g., we found we had to enable the loopback masking nflag and disable the Single IP Special Casing. We also hit the loginuid issue which is specific to the RedHat family of products.
I fully understand what vserver and Xen are (I wasn't actually talking about this 'xen' and Xenserver is superior to that version) neither of which is supported and you are actually using an unsupported kernel (it's not even supported by CentOS). I'm not saying it won't work but as I said earlier, it's an unsupported platform - use it at your own risk. I'll say it again, the comment 'email is critical' and using unsupported vserver and kernels don't seem to go together in my mind.
__________________
Regards


Bill
Reply With Quote
  #10 (permalink)  
Old 05-09-2009, 08:47 AM
Special Member
 
Posts: 130
Default

Thanks. We are using a Z200 from Pogo Linux which is running Nexenta (in effect, a front end to ZFS on opensolaris). The drives are addressed as RAID0 across an array of iSCSI drives backed by ZFS zvols.

We didn't do anything to the disk subsystem but we did manually run the cron jobs as described - John
__________________
www.spiritualoutreach.com
Making Christianity intelligible to secular society
Reply With Quote
Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes


Why Join?

Registering let's you ask questions, makes it easier to search, displays any files attached to posts, and notifies you about replies.

blog.zimbra.com




 

SEO by vBSEO ©2011, Crawlability, Inc.