Zimbra offers Open Source email server software and shared calendar for Linux and the Mac
Go Back   Zimbra :: Forums > Zimbra Collaboration Suite > Administrators

Welcome to the Zimbra :: Forums!
Welcome, if you would like to post a comment please register. We also encourage you to explore all things Zimbra with our team and members of the community.

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 12-02-2009, 12:19 PM
Intermediate Member
 
Posts: 21
Default De-duplicate mailstore

Hey all,

One of the useful features of Zimbra is that duplicate emails only get stored in the mailstore once. Tastes great, less filling.

Now, suppose a bunch of accounts are moved to a new server/mailstore via zmmailboxmove and those accounts received much the same emails. Oops, mailstore size skyrockets!

Is there a utility or process to de-dup a given mailstore?
Reply With Quote
  #2 (permalink)  
Old 12-02-2009, 07:41 PM
Outstanding Member
 
Posts: 594
Default

De-dupe is per mail store as the users moved to separate mail store have new sql database.
Reply With Quote
  #3 (permalink)  
Old 12-03-2009, 06:30 AM
Moderator
 
Posts: 1,209
Default

Bill,

Are you saying that if user A and B (whose mailboxes are on server C at first) have been emailing each other big PowerPoint files, then have their mailboxes moved to server D that the hard links on server D for those PowerPoint files no longer exist (and so the store size goes up)?

We've never tested that on our end.

All the best,
Mark
__________________
___________________________________
L. Mark Stone, CIO


"Uptime. All the time."

477 Congress Street | Portland, ME 04101-3431 | (207) 772-5678

proactive maintenance and monitoring | technology consulting
Zimbra groupware | EMR implementations | private cloud hosting
Reply With Quote
  #4 (permalink)  
Old 12-03-2009, 06:40 AM
Intermediate Member
 
Posts: 21
Default

My scenario is the following:

Users A,B,C,D on server S1 are on a mailing list and keep all their messages, so all the messages they receive are the same for each account.

Say User D is moved to server S2 via zmmailboxmove, where S2 uses a different mailstore. Then User C is moved to S2. The mailstore on S2 will be twice the size on S1.

I'm seeing this in practice, I've moved roughly half (in terms of number AND size) my users to a new server yet the mailstore size on the new server is much larger than the old server, by a factor of 2. I have users that get CC'd on a large number of emails.
Reply With Quote
  #5 (permalink)  
Old 12-03-2009, 07:05 AM
Moderator
 
Posts: 1,209
Default

I guess I would say I am not surprised at that.

Preserving the single instance store during a mailbox move would require the move script to compare the blobs in the mailbox being moved to every blob in the store on the target server in order to decide whether to create a new hard link or a new blob.

That sounds non-trivial in terms of programming complexity and very, very demanding of compute resources.

Veronica has already pointed out that the single-instance store is a creature of each mailbox server, not of a Zimbra multi-server farm, so this again seems "WAD" to me. ("Working As Designed" in old IBM mainframe-speak).

Wouldn't hurt to fill out an RFE though; I'd vote for it.

But the takeaway for me here is to be careful about correctly sizing a Zimbra mailbox server up front for the expected life of the server, so as to avoid the need to move mailboxes unless absolutely necessary. Or alternatively, to use 64-bit Xen deployments to move the Zimbra virtual server to new hardware when needed to avoid having to move mailboxes.

Hope that helps,
Mark
__________________
___________________________________
L. Mark Stone, CIO


"Uptime. All the time."

477 Congress Street | Portland, ME 04101-3431 | (207) 772-5678

proactive maintenance and monitoring | technology consulting
Zimbra groupware | EMR implementations | private cloud hosting
Reply With Quote
  #6 (permalink)  
Old 12-03-2009, 07:13 AM
Intermediate Member
 
Posts: 21
Default

Quote:
Originally Posted by LMStone View Post
Preserving the single instance store during a mailbox move would require the move script to compare the blobs in the mailbox being moved to every blob in the store on the target server in order to decide whether to create a new hard link or a new blob.
I agree, overly complex. Makes more sense to do a batch utility that combs through the mail store and de-dupes. Which is what I was hoping had already been written.
Reply With Quote
  #7 (permalink)  
Old 12-03-2009, 09:41 AM
Loyal Member
 
Posts: 96
Default

In that same vein of thought: Do emails get duped when doing an imap migration to zimbra? A batch script that de-duped would be very helpful for reducing the message store size in that scenario too.
Reply With Quote
  #8 (permalink)  
Old 12-03-2009, 11:59 PM
Moderator
 
Posts: 2,207
Default

Quote:
Originally Posted by cayaraa View Post
In that same vein of thought: Do emails get duped when doing an imap migration to zimbra?
Yes and in PST migration too.

Quote:
Originally Posted by cayaraa View Post
A batch script that de-duped would be very helpful for reducing the message store size in that scenario too.
Single-Copy Message Store and imapsync

However, if you use "manual hardlinks" instead of integrated SIS, what happens if one user deletes the mail the hardlink points to?
Reply With Quote
  #9 (permalink)  
Old 12-22-2009, 07:01 AM
y@w y@w is offline
Moderator
 
Posts: 658
Default

Did anyone create an RFE for this? I couldn't find one and would gladly fill it out as this is a feature that would be incredibly useful for us.
__________________
What a n00b!
Reply With Quote
  #10 (permalink)  
Old 12-22-2009, 11:39 AM
Zimbra Employee
 
Posts: 604
Default

one already exists
Bug 17057 – tool to consolidate duplicate e-mails
__________________
Bugzilla - Wiki - Downloads - Before posting... Search!
Reply With Quote
Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes


Similar Threads

Why Join?

Registering let's you ask questions, makes it easier to search, displays any files attached to posts, and notifies you about replies.

blog.zimbra.com




 

SEO by vBSEO ©2011, Crawlability, Inc.