So, here's the basic scenario:

I've got a ZCS cluster with the zimbra-mta on a separate machine from the mailstore. In fact, in a separate location altogether.

Sadly, the mailstore's location suffers a catastrophic loss of power. Various problems occur preventing a clean shutdown and power is lost to the mailstore.

When I brought the mailstore back up, ZCS's mysql instance refused to start - several of the innodb files were broken.

After jumping through various hoops, I was able to get mysql started up in a readonly state, dump the databases, remove them, and recreate them from the dumps.

Now the mailstore starts up like a champ.

Sadly, mail to one particular mailbox refuses to deliver. The mta says:
postfix/lmtp[12065]: 86BB4126362: to=<BROKEN_EMAIL_ADDY@MYDOMAIN.COM>, relay=MY.MAILSTORE.MACHINES.FQDN[IP.OF.MY.MAILSTORE]:7025, delay=90, delays=87/0.02/0.01/2.6, dsn=5.0.0, status=bounced (host MY.MAILSTORE.MACHINES.FQDN[IP.OF.MY.MAILSTORE] said: 554 5.0.0 Permanent message delivery failure (in reply to end of DATA command))


Here's what the mailstore itself says (pulled from the mailbox.log):
2008-10-10 02:32:34,127 INFO [LmtpServer-115] [name=BROKEN_EMAIL_ADDY@MYDOMAIN.COM;mid=2;] mailop - Adding Message: id=14829060, Message-ID=<20081010073031.AF5D0126356@MY.ZIMBRA-MTA.FQDN>, parentId=-1, folderId=2, folderName=Inbox.
2008-10-10 02:32:34,301 ERROR [LmtpServer-115] [name=BROKEN_EMAIL_ADDY@MYDOMAIN.COM;mid=2;] Sieve - Evaluation failed. Reason: null
2008-10-10 02:32:34,301 INFO [LmtpServer-115] [name=BROKEN_EMAIL_ADDY@MYDOMAIN.COM;mid=2;] lmtp - rejecting message BROKEN_EMAIL_ADDY@MYDOMAIN.COM: exception occurred
com.zimbra.cs.mailbox.MailServiceException: object with that id already exists: 14829060
Code:mail.ALREADY_EXISTS ArgitemId, IID, "14829060")
at com.zimbra.cs.mailbox.MailServiceException.ALREADY _EXISTS(
at com.zimbra.cs.db.DbMailItem.create( :164)
at com.zimbra.cs.mailbox.Message.createInternal(Messa
at com.zimbra.cs.mailbox.Message.create( 310)
at com.zimbra.cs.mailbox.Mailbox.addMessageInternal(M
at com.zimbra.cs.mailbox.Mailbox.addMessage(Mailbox.j ava:4493)
at com.zimbra.cs.mailbox.Mailbox.addMessage(Mailbox.j ava:4449)
at com.zimbra.cs.filter.ZimbraMailAdapter.addMessage(
at com.zimbra.cs.filter.ZimbraMailAdapter.doDefaultFi ling(
at com.zimbra.cs.filter.ZimbraMailAdapter.executeActi ons(
at org.apache.jsieve.SieveFactory.evaluate(SieveFacto
at com.zimbra.cs.filter.RuleManager.applyRules(RuleMa
at com.zimbra.cs.lmtpserver.ZimbraLmtpBackend.deliver MessageToLocalMailboxes( )
at com.zimbra.cs.lmtpserver.ZimbraLmtpBackend.deliver (
at com.zimbra.cs.lmtpserver.LmtpHandler.processMessag eData(
at com.zimbra.cs.lmtpserver.TcpLmtpHandler.continueDA TA(
at com.zimbra.cs.lmtpserver.LmtpHandler.doDATA(LmtpHa
at com.zimbra.cs.lmtpserver.LmtpHandler.processComman d(
at com.zimbra.cs.lmtpserver.TcpLmtpHandler.processCom mand(
at com.zimbra.cs.tcpserver.ProtocolHandler.processCon nection(
at EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Wo Source)
Caused by: com.mysql.jdbc.exceptions.MySQLIntegrityConstraint ViolationException: Duplicate entry '2-14829060' for key 1

Query being executed when exception was thrown:

com.mysql.jdbc.ServerPreparedStatement[55] - INSERT INTO mboxgroup2.mail_item(mailbox_id, id, type, parent_id, folder_id, index_id, imap_id, date, size, volume_id, blob_digest, unread, flags, tags, sender, subject, name, metadata, mod_metadata, change_date, mod_content) VALUES (2, 14829060, 5, null, 2, 14829060, 14829060, 1223623954, 1342, 1, 'iIyr,eP8MhyyQSo23htGfhqjXN4=', 1, 0, 0, 'SENDERS.EMAIL.ADDY@SENDERS.DOMAIN.COM', 'SUBJECT OF INCOMING MESSAGE', null, 'd1:f18:Still looking hard1:s16:SENDERS.EMAILADDY@SENDERS.DOMAIN.COMO1:v i10ee', 14428800, 1223623954, 14428800)
at com.mysql.jdbc.SQLError.createSQLException(SQLErro
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.ja va:2870)
at com.mysql.jdbc.MysqlIO.sendCommand( 73)
at com.mysql.jdbc.ServerPreparedStatement.serverExecu te(
at com.mysql.jdbc.ServerPreparedStatement.executeInte rnal(
at com.mysql.jdbc.PreparedStatement.executeUpdate(Pre
at com.mysql.jdbc.PreparedStatement.executeUpdate(Pre
at com.mysql.jdbc.PreparedStatement.executeUpdate(Pre
at org.apache.commons.dbcp.DelegatingPreparedStatemen t.executeUpdate( 33)
at com.zimbra.cs.db.DbMailItem.create( :148)
... 21 more
2008-10-10 02:32:34,301 INFO [LmtpServer-115] [] lmtp - 554 5.0.0 Permanent message delivery failure (DATA)
2008-10-10 02:32:34,302 INFO [LmtpServer-115] [] ProtocolHandler - Handler exiting normally

To validate, I logged into the mailstore's mysql instance and checked out the mboxgroup2.mail_items table. Sure enough, a different message exists with an id of 14829060. In fact, messages with id's all the way up to 14829128 exist:

mysql> select * from mail_item where mailbox_id="2" and id="14829060"\G
*************************** 1. row ***************************
mailbox_id: 2
id: 14829060
type: 5
parent_id: NULL
folder_id: 4
index_id: 14829060
imap_id: 14829060
date: 1222487040
size: 2395
volume_id: 1
unread: 1
flags: 0
tags: 0
subject: Most reliable replica from Patek Philippe watch here
name: NULL
metadata: d1:f98:SPAM SPAM SPAM SPAM SPAM SPAM1:r55:SPAM SPAM SPAM SPAM 1:s41:"Anonymous Spammer"
mod_metadata: 14428140
change_date: 1222487040
mod_content: 14428140
1 row in set (0.00 sec)

Obviously, some data wasn't committed to the database during the power outtage.

With a little digging, I believe I found where the 14829060 id is coming from:

mysql> select * from mailbox where account_id=THE_ACCOUNT_ID_IN_QUESTION\G
*************************** 1. row ***************************
id: 2
group_id: 2
index_volume_id: 2
item_id_checkpoint: 14829059
contact_count: 0
size_checkpoint: 250813336735
change_checkpoint: 14429529
tracking_sync: 0
tracking_imap: 0
last_backup_at: NULL
last_soap_access: 1223610933
new_messages: 0
idx_deferred_count: 0
1 row in set (0.00 sec)

With a little checking, I found that merely updating that item_id_checkpoint value to the last used value wasn't enough. I had to first stop ZCS on the mailstore, launch the mysql server manually (`/opt/zimbra/bin/mysql.server start` as the zimbra user), update the zimbra.mailbox.item_id_checkpoint value, stop mysql, and start ZCS back up.

When I did that, the item_id_checkpoint value jumped up to 14829139 ( I would have expected it to jump to 14829129, since that's the next unused id). Still, I was then able to telnet to 7025 and manually lmtp a message:
#telnet localhost 7025
Connected to localhost.
Escape character is '^]'.
MAIL FROM: <> size=200
250 2.0.0 Sender OK
250 2.1.5 Recipient OK
354 End data with <CR><LF>.<CR><LF>
Subject: Test

250 2.1.5 OK

No errors, and the message showed up in the user's account.

I'm not going to remove the interim solution for this user until after business hours so I can do so in a controlled manner.

I have every reason to believe that this fixed the problem, but I was hoping that someone out there who's got a more complete understanding of the code that depends on this database could give me some feedback on this to let me know if there're considerations I'm missing. Should I do anything else before returning this user's mail delivery to normal?

I can't afford for this user to lose any mails and I don't want to get this mailstore into any less consistent of a state than it's already in. I've a full filesystem level backup of the mailstore that I took when the machine first came up from the outtage, so I've a fallback position; but I'd rather not have to use it.

Any thoughts on what I've done and the potential repercussions would be much appreciated!