Import wizards won't change the dedupe story on the Zimbra server end.
If you've really got 25% shared mail to a large number of recipients and the distribution list messages contain attachments as big or bigger than personal messages, then indeed you do have a (very unusual) problem. I'm skeptical that it's as big a problem as you think, but I can think of a few workarounds.
If the RFC822 message content that you can get out of Exchange is really *identical*, then you can probably de-duplicate retroactively with a program like hardlink or freedups. Migrate (portions of) two accounts and compare the "View Source" of each. I'd worry that Exchange or the import process would alter something in the headers. Try both imapsync and the import wizard to see if their behavior is different. If you can manage to end up with perfect duplicate blobs in different accounts, then pointing freedups at /opt/zimbra/store will recover disk space.
If freedups won't work for you and you have a particular sender or subject that has filled everyone's mailbox with crap, you can tell imapsync to ignore it, make a shared mailbox including all of the distribution list's mail, and subscribe everyone to that shared mailbox.
Storage technologies exist that virtualize and de-duplicate based on file-level or block-level content. They are mostly targeted at archive and backup, but NetApp advertises this for primary storage. Buzzphrases include "content-based addressing" and "global compression." These solutions tend to be far more expensive than would be justified for only 60GB, but if you're in the market for storage for other purposes, take a look.
Last and least attractively, you could ask people to move bulk list mail to local PST folders, and only migrate what remains on the server.
Last edited by Rich Graves; 08-02-2008 at 03:13 PM..
|