So, here's the basic scenario:

I've got a ZCS cluster with the zimbra-mta on a separate machine from the mailstore. In fact, in a separate location altogether.

Sadly, the mailstore's location suffers a catastrophic loss of power. Various problems occur preventing a clean shutdown and power is lost to the mailstore.

When I brought the mailstore back up, ZCS's mysql instance refused to start - several of the innodb files were broken.

After jumping through various hoops, I was able to get mysql started up in a readonly state, dump the databases, remove them, and recreate them from the dumps.

Now the mailstore starts up like a champ.

Sadly, mail to one particular mailbox refuses to deliver. The mta says:
postfix/lmtp[12065]: 86BB4126362: to=<BROKEN_EMAIL_ADDY@MYDOMAIN.COM>, relay=MY.MAILSTORE.MACHINES.FQDN[IP.OF.MY.MAILSTORE]:7025, delay=90, delays=87/0.02/0.01/2.6, dsn=5.0.0, status=bounced (host MY.MAILSTORE.MACHINES.FQDN[IP.OF.MY.MAILSTORE] said: 554 5.0.0 Permanent message delivery failure (in reply to end of DATA command))

Ouch.

Here's what the mailstore itself says (pulled from the mailbox.log):
2008-10-10 02:32:34,127 INFO [LmtpServer-115] [name=BROKEN_EMAIL_ADDY@MYDOMAIN.COM;mid=2;] mailop - Adding Message: id=14829060, Message-ID=<20081010073031.AF5D0126356@MY.ZIMBRA-MTA.FQDN>, parentId=-1, folderId=2, folderName=Inbox.
2008-10-10 02:32:34,301 ERROR [LmtpServer-115] [name=BROKEN_EMAIL_ADDY@MYDOMAIN.COM;mid=2;] Sieve - Evaluation failed. Reason: null
2008-10-10 02:32:34,301 INFO [LmtpServer-115] [name=BROKEN_EMAIL_ADDY@MYDOMAIN.COM;mid=2;] lmtp - rejecting message BROKEN_EMAIL_ADDY@MYDOMAIN.COM: exception occurred
com.zimbra.cs.mailbox.MailServiceException: object with that id already exists: 14829060
ExceptionId:LmtpServer-115:1223623954152:04e1c21ba23722a9
Code:mail.ALREADY_EXISTS ArgitemId, IID, "14829060")
at com.zimbra.cs.mailbox.MailServiceException.ALREADY _EXISTS(MailServiceException.java:372)
at com.zimbra.cs.db.DbMailItem.create(DbMailItem.java :164)
at com.zimbra.cs.mailbox.Message.createInternal(Messa ge.java:386)
at com.zimbra.cs.mailbox.Message.create(Message.java: 310)
at com.zimbra.cs.mailbox.Mailbox.addMessageInternal(M ailbox.java:4629)
at com.zimbra.cs.mailbox.Mailbox.addMessage(Mailbox.j ava:4493)
at com.zimbra.cs.mailbox.Mailbox.addMessage(Mailbox.j ava:4449)
at com.zimbra.cs.filter.ZimbraMailAdapter.addMessage( ZimbraMailAdapter.java:352)
at com.zimbra.cs.filter.ZimbraMailAdapter.doDefaultFi ling(ZimbraMailAdapter.java:346)
at com.zimbra.cs.filter.ZimbraMailAdapter.executeActi ons(ZimbraMailAdapter.java:253)
at org.apache.jsieve.SieveFactory.evaluate(SieveFacto ry.java:159)
at com.zimbra.cs.filter.RuleManager.applyRules(RuleMa nager.java:196)
at com.zimbra.cs.lmtpserver.ZimbraLmtpBackend.deliver MessageToLocalMailboxes(ZimbraLmtpBackend.java:379 )
at com.zimbra.cs.lmtpserver.ZimbraLmtpBackend.deliver (ZimbraLmtpBackend.java:136)
at com.zimbra.cs.lmtpserver.LmtpHandler.processMessag eData(LmtpHandler.java:375)
at com.zimbra.cs.lmtpserver.TcpLmtpHandler.continueDA TA(TcpLmtpHandler.java:67)
at com.zimbra.cs.lmtpserver.LmtpHandler.doDATA(LmtpHa ndler.java:364)
at com.zimbra.cs.lmtpserver.LmtpHandler.processComman d(LmtpHandler.java:174)
at com.zimbra.cs.lmtpserver.TcpLmtpHandler.processCom mand(TcpLmtpHandler.java:61)
at com.zimbra.cs.tcpserver.ProtocolHandler.processCon nection(ProtocolHandler.java:160)
at com.zimbra.cs.tcpserver.ProtocolHandler.run(Protoc olHandler.java:128)
at EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Wo rker.run(Unknown Source)
at java.lang.Thread.run(Thread.java:619)
Caused by: com.mysql.jdbc.exceptions.MySQLIntegrityConstraint ViolationException: Duplicate entry '2-14829060' for key 1

Query being executed when exception was thrown:

com.mysql.jdbc.ServerPreparedStatement[55] - INSERT INTO mboxgroup2.mail_item(mailbox_id, id, type, parent_id, folder_id, index_id, imap_id, date, size, volume_id, blob_digest, unread, flags, tags, sender, subject, name, metadata, mod_metadata, change_date, mod_content) VALUES (2, 14829060, 5, null, 2, 14829060, 14829060, 1223623954, 1342, 1, 'iIyr,eP8MhyyQSo23htGfhqjXN4=', 1, 0, 0, 'SENDERS.EMAIL.ADDY@SENDERS.DOMAIN.COM', 'SUBJECT OF INCOMING MESSAGE', null, 'd1:f18:Still looking hard1:s16:SENDERS.EMAILADDY@SENDERS.DOMAIN.COMO1:v i10ee', 14428800, 1223623954, 14428800)
at com.mysql.jdbc.SQLError.createSQLException(SQLErro r.java:931)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.ja va:2870)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:15 73)
at com.mysql.jdbc.ServerPreparedStatement.serverExecu te(ServerPreparedStatement.java:1160)
at com.mysql.jdbc.ServerPreparedStatement.executeInte rnal(ServerPreparedStatement.java:685)
at com.mysql.jdbc.PreparedStatement.executeUpdate(Pre paredStatement.java:1400)
at com.mysql.jdbc.PreparedStatement.executeUpdate(Pre paredStatement.java:1314)
at com.mysql.jdbc.PreparedStatement.executeUpdate(Pre paredStatement.java:1299)
at org.apache.commons.dbcp.DelegatingPreparedStatemen t.executeUpdate(DelegatingPreparedStatement.java:2 33)
at com.zimbra.cs.db.DbMailItem.create(DbMailItem.java :148)
... 21 more
2008-10-10 02:32:34,301 INFO [LmtpServer-115] [] lmtp - 554 5.0.0 Permanent message delivery failure (DATA)
2008-10-10 02:32:34,302 INFO [LmtpServer-115] [] ProtocolHandler - Handler exiting normally

To validate, I logged into the mailstore's mysql instance and checked out the mboxgroup2.mail_items table. Sure enough, a different message exists with an id of 14829060. In fact, messages with id's all the way up to 14829128 exist:

mysql> select * from mail_item where mailbox_id="2" and id="14829060"\G
*************************** 1. row ***************************
mailbox_id: 2
id: 14829060
type: 5
parent_id: NULL
folder_id: 4
index_id: 14829060
imap_id: 14829060
date: 1222487040
size: 2395
volume_id: 1
blob_digest: SANITIZED_OUT_OF_EXCESSIVE_PARANOIA
unread: 1
flags: 0
tags: 0
sender: SOME_SPAMMER_WHO_SHALL_REMAIN_ANONYMOUS
subject: Most reliable replica from Patek Philippe watch here
name: NULL
metadata: d1:f98:SPAM SPAM SPAM SPAM SPAM SPAM1:r55:SPAM SPAM SPAM SPAM 1:s41:"Anonymous Spammer"
<SPAMMERS_SPOOFED_ADDY@WELL_KNOWN_DOMAIN.COM>1:vi1 0ee
mod_metadata: 14428140
change_date: 1222487040
mod_content: 14428140
1 row in set (0.00 sec)


Obviously, some data wasn't committed to the database during the power outtage.

With a little digging, I believe I found where the 14829060 id is coming from:

mysql> select * from mailbox where account_id=THE_ACCOUNT_ID_IN_QUESTION\G
*************************** 1. row ***************************
id: 2
group_id: 2
account_id: THE_ACCOUNT_ID_IN_QUESTION
index_volume_id: 2
item_id_checkpoint: 14829059
contact_count: 0
size_checkpoint: 250813336735
change_checkpoint: 14429529
tracking_sync: 0
tracking_imap: 0
last_backup_at: NULL
comment: BROKEN-EMAIL-ADDY@FQDN.COM
last_soap_access: 1223610933
new_messages: 0
idx_deferred_count: 0
1 row in set (0.00 sec)


With a little checking, I found that merely updating that item_id_checkpoint value to the last used mboxgroup2.mail_item.id value wasn't enough. I had to first stop ZCS on the mailstore, launch the mysql server manually (`/opt/zimbra/bin/mysql.server start` as the zimbra user), update the zimbra.mailbox.item_id_checkpoint value, stop mysql, and start ZCS back up.

When I did that, the item_id_checkpoint value jumped up to 14829139 ( I would have expected it to jump to 14829129, since that's the next unused id). Still, I was then able to telnet to 7025 and manually lmtp a message:
#telnet localhost 7025
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
220 MAILSTORE.FQDN.COM Zimbra LMTP ready
LHLO MTA.FQDN.COM
250-MAILSTORE.FQDN.COM
250-8BITMIME
250-ENHANCEDSTATUSCODES
250-SIZE
250 PIPELINING
MAIL FROM: <test@test.com> size=200
250 2.0.0 Sender OK
RCPT TO: <BROKEN.EMAIL.ADDY@MYDOMAIN.COM>
250 2.1.5 Recipient OK
DATA
354 End data with <CR><LF>.<CR><LF>
From: test@test.com
To: BROKEN.EMAIL.ADDY@MYDOMAIN.COM
Subject: Test

Test
.
250 2.1.5 OK


No errors, and the message showed up in the user's account.

I'm not going to remove the interim solution for this user until after business hours so I can do so in a controlled manner.

I have every reason to believe that this fixed the problem, but I was hoping that someone out there who's got a more complete understanding of the code that depends on this database could give me some feedback on this to let me know if there're considerations I'm missing. Should I do anything else before returning this user's mail delivery to normal?

I can't afford for this user to lose any mails and I don't want to get this mailstore into any less consistent of a state than it's already in. I've a full filesystem level backup of the mailstore that I took when the machine first came up from the outtage, so I've a fallback position; but I'd rather not have to use it.

Any thoughts on what I've done and the potential repercussions would be much appreciated!