View Single Post
  #5 (permalink)  
Old 09-18-2008, 09:19 AM
bjared bjared is offline
Senior Member
 
Posts: 51
Default Some contribution de moi.

I ran into a problem during this recent case, where I finally got an calendar (ICS format)
to export, and imported it successfully. Then, for some reason, I rsync'd the blobs from
that restored account which had the in-tact calendar, and I suddenly realized I had
2x the mail items for the mailbox I was trying to restore.

Inspired by the mailboxfixdb.pl, I give to you, mailboxfixblobs.pl!

(NOTE: This is only for when you KNOW the blobs not indexed are not supposed to
be in the mailbox. If you are not sure, then you might simply want to just reindex the
mailbox, rather than using this script.)

Code:
#!/usr/bin/perl

# ========================================================================
# This script was insipred by the mailboxfixdb.pl script.
#
# This script deletes blobs that aren't in the mysql table "mail_item"
# for the account specified.  (The script this is based on deleted records
# from the mysql database if no corresponding blob existed, which is
# the opposite of my script.)
#
# You'd only want to use this if you somehow restored/copied a bunch
# of .msg files into an account that you are certain aren't supposed
# to be there.  (i.e. I imported calendar items, and then rsynced
# the blobs, so I now had twice the .msg files.  Oops!)
#
# Usage (As the 'zimbra' user):
#
# ./mailboxfixblobs.pl user@their.domain.here
# ========================================================================

$DEBUG=1; # <-- Set to zero if you actually want to delete stuff!

if ($DEBUG) {
    print "=============================================================\n";
    print "NOTICE: You are in DEBUG MODE! You must set \$DEBUG=0 for\n";
    print "        this script to actually delete .msg files form the\n";
    print "        file system for the user's account.\n";
    print "\n";
    print "        If you plan on disabling debug mode, you should be\n";
    print "        certain you have sufficient backups of the user's\n";
    print "        blob files beforehand.\n";
    print "=============================================================\n";
} else {
    print "=============================================================\n";
    print "NOTICE: This script is planning on deleting .msg files that\n";
    print "        don't appear in the mail_item table for this user.\n";
    print "        You have a few seconds to hit cntrl-C if this is not\n";
    print "        what you want to do...\n";
    print "=============================================================\n";
}

# Make the user think for a few seconds. It's a good thing.
sleep 5;

my ($fbits, $fgbits, $mbits, $mgbits, $basepath) = split (' ',`echo "select file_bits,file_group_bits,mailbox_bits,mailbox_group_bits,path from volume where type='1'" | mysql -N zimbra`);
my $ARGV = shift @ARGV;

chomp $basepath;

my %mail_item;
my $mbmask = sprintf "1" x $mgbits;
my $fmask = sprintf "1" x $fgbits;

# This bugged me with the other script, so instead of relying on the
# (outdated, and inaccurate in my case) comment field of the "mailbox"
# table, I use zmprov to get the mailbox ID:
$line = `zmprov getMailboxInfo $ARGV | grep mailboxId`;
chomp $line;
($id) = $line =~ m/mailboxId:\s+(\d+)$/;
print "id=($id)\n";
if ($id eq "") {
    print "That account doesn't appear to exist.\n";
    exit 1;
}

# Figure out the full path to this user's mailbox data on the file system.
$line = `echo "select group_id, account_id, comment from mailbox where id=${id}" | mysql -N zimbra`;
chomp $line;
my $path = "$basepath/";
my ($grid, $aid, $nm) = (split(/\s+/, $line));
my $mbhash = $id >> $mbits;
$mbhash &= $mbmask;
$path .= $mbhash."/".$id."/msg/";

# Fill up a hash of items from the database, since reconnecting for each query is SLOW.
foreach my $mitem (`echo "select id,mod_content from mail_item where mailbox_id=${id}" | mysql -N mboxgroup${grid}`) {
    chomp $mitem;
    ($tmp_id, $tmp_modContent) = split(/\s+/, $mitem);
    $mail_item{"$tmp_id-$tmp_modContent"} = 1;
}

# For every blob on the file system, see if we have an entry in the database.
# If we don't have an entry for it, nuke the file. (SET $DEBUG=0 to actually delete)
foreach my $blob (`find ${path} -type f -print`) {
    chomp $blob;
    ($msgid, $modContent) = $blob =~ m/(\d+)-(\d+)\.msg$/;
    if (! defined $mail_item{"$msgid-$modContent"}) {
        if ($DEBUG) {
            print "DEBUG: $blob should be deleted.\n";
        } else {
            if (unlink $blob) {
                print "INFO: Deleted blob: $blob\n";
            } else {
                print "ERROR: Could not delete blob: $blob -- $!\n";
            }
        }
    } else {
        print "OK: Message $msgid with mod_content $modContent is in the database.\n";
    }
}
The output (without setting $DEBUG=0) looks something like this:
OK: Message 8872 with mod_content 40382 is in the database.
OK: Message 8316 with mod_content 37049 is in the database.
OK: Message 8257 with mod_content 36801 is in the database.
OK: Message 8255 with mod_content 36797 is in the database.
DEBUG: /opt/zimbra/store/0/33/msg/2/8573-38396.msg should be deleted.
OK: Message 8793 with mod_content 39588 is in the database.
OK: Message 8216 with mod_content 36620 is in the database.
OK: Message 8301 with mod_content 36969 is in the database.
OK: Message 8646 with mod_content 38932 is in the database.
OK: Message 8850 with mod_content 40262 is in the database.
OK: Message 8354 with mod_content 37242 is in the database.
DEBUG: /opt/zimbra/store/0/33/msg/2/8552-38286.msg should be deleted.
OK: Message 8259 with mod_content 36807 is in the database.
OK: Message 8869 with mod_content 40374 is in the database.


I hope this script is useful to some people still running ZCS 4.5.x. It saved my butt
last night when I went from 5,000+ mail items to 10,000+ because I wasn't paying
attention...

--Brian
Reply With Quote