Welcome to the forums,
While you might optimize tables in MySQL yearly, I haven't heard of any similar 'leftover mailbox entries for deleted items' issue in Zimbra sprawling to such a scale of 400GB within a week for 1.5 million sends. (Or did you mean each, so 9 mil? Either way 4-40GB is different than almost half a terrabyte.) Sounds like your exchange box is actually storing more than just an index/list of deleted rows.
---Extra info for the future should you switch to ZCS---
First I'll assume you have an external program connecting authing via SMTP that's taking care of the sending/rate limiting. (Or you plan to use other methods in the future to get them to Zimbra such as SOAP etc.)
If these are' mostly' identical messages that your archiving, are you interested in just the logs of who you sent to/if it got accepted instead?
Let's say you've exported the copies you want via whatever method such as tar/zip (.eml by default, which can be read by common mail clients or in a standard text editor) - Zimbra will still have the SMTP transaction logs and they're also inserted into a separate MySQL logger database for faster searching.
/var/log/zimbra.log will contain SMTP
Configured via /opt/zimbra/conf/zmlogrotate or /etc/logrotate.d/zimbra
/opt/zimbra/log & opt/zimbra/mailboxd/logs/ (mailbox services such as http/imap/pop)
Are configured via crontab -e
zimbraLogRawLifetime (default 31d) - lifetime (nnnnn[hmsd]) of raw log rows in consolidated logger tables
zimbraLogSummaryLifetime (default 730d)- lifetime (nnnnn[hmsd]) of summarized log rows in consolidated logger tables
(Note in 6.0 we're moving what we can into the stats service & possibly using RRDtool or SQLite instead of MySQL Bug 35214 - Discontinue use of mysql for logger service > Bug 24199 -kill logger service)
zmprov mcf zimbraLogRawLifetime #d
Default settings of note for Postfix (the MTA in ZCS):
smtpd_recipient_limit (default 1000) parameter controls how many recipients the SMTP server will take per message delivery request.
-You can't restrict this to a to/cc/bcc field - it's all recipients. For that you'd have to use a regular expression in header_checks to arbitrarily limit the length of each header to something reasonable. (We could do this in the web-client though if someone wants to open an RFE in bugzilla.)
smtpd_recipient_overshoot_limit (default 1000) - The number of recipients that a remote SMTP client can send in excess of the hard limit specified with smtpd_recipient_limit, before the Postfix SMTP server increments the per-session error count for each excess recipient. "Postfix will 4xx the 'overshoot' addresses so a sending MTA can try them again later."
Then see the smtpd_hard_error_limit (default 20) parameter to know at what number of errors it will disconnect.
smtpd_client_recipient_rate_limit (default: 0 no limit) - The maximum number of recipient addresses that an SMTP client may specify in the time interval specified via anvil_rate_time_unit default 60s -careful adjusting this affects other things!!! (there's also anvil_status_update_time = 600s for logging peak usage) Note that this is "regardless of whether or not Postfix actually accepts those recipients" Those over will receive a 450 4.7.1 Error: too many recipients from [the.client.ip.address] It's up to the client to deliver those recipients at some later time.
Smtpd_recipient_limit is not to be confused with default_destination_recipient_limit (50) parameter, which controls how many recipients a Postfix delivery agent will send with each copy of an email message. If an email message exceeds that value, the Postfix queue manager breaks up the list of recipients into smaller lists. Postfix will attempt to send multiple copies of the message in parallel. So that really isn't limiting the number of addresses, it just breaks it into chunks for other servers to accept easier.
Redologs - which contain a copy of mailbox transactions, get rolled over/deleted at 100MB-1GB on FOSS. On NE they get temporarily stored for move into incremental backups - if you only use full backups then you can set zimbraRedoLogDeleteOnRollover TRUE
su - zimbra
postconf -e smtpd_recipient_limit=#
If you are ever absolutely out of room on the disks housing your DB in a pinch the optimizeMboxgroups.pl script (in 5.0.10+) can help you recover wasted space in your mail_item, appointment, imap_folder, imap_message, open_conversation, pop3_message, revision, and tombstone tables on each mboxgroup. Just note that it temporarily locks each table, and could use considerable IO while they’re being rebuilt. You can certainly use it pro-actively during a maintenance window to reclaim space as well.
Should you be keeping all this mail around, there is some single instance storage for the messages themselves. Zimbra doesn't use any traditional mailstore like mbox or maildir. It's a proprietary file-per-message in a hashed-like dir hierarchy linked to a MySQL database for metadata, so it is much more efficient than either. Each message/attachment/etc is represented by a file blob. Checkout Account mailbox database structure - Zimbra :: Wiki for more info.
As discussed above, we also make use of hard links for identical messages that come in via LMTP at the same time/ID to multiple recipients if the accounts are on the same mailstore (aka single instance storage). I seem to remember SMTP clients who hit default_destination_recipient_limit of 50 (ie: split up delivery of larger messages into new sessions when sending), or when they get 450 error 'try again later' upon hitting the receiving servers smtp_recipient_limit/smtp_recipient_overshoot_limit (1000), end up creating a new blob (have to check if it's also a new message ID).