Zimbra offers Open Source email server software and shared calendar for Linux and the Mac
Go Back   Zimbra :: Forums > Zimbra Collaboration Suite > Developers

Welcome to the Zimbra :: Forums!
Welcome, if you would like to post a comment please register. We also encourage you to explore all things Zimbra with our team and members of the community.

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 07-09-2009, 03:11 AM
New Member
 
Posts: 3
Default Search Lucene index from a script

The title says it all really. I'd prefer to write it in Perl, as there are Lucene bindings in the Ubuntu repositories, but if there is a simple shell command to do this then that would be perfect.

Basically, I need to get all emails that are to/from a particular address, regardless of who sent/received them.

Thanks in advance.

Last edited by dutchie; 07-09-2009 at 03:20 AM..
Reply With Quote
  #2 (permalink)  
Old 07-09-2009, 04:01 AM
New Member
 
Posts: 3
Default A solution?

OK, I found a solution. I can run "zmaccts", extract the account information using a regex, and run "zmmailbox -z -m <each one in turn> search ...", but this is very slow. Obviously a lot of this stuff can be cached, but as I'm going to be serving this online I can't be hanging around for a few seconds (edit MINUTES) while all this goes on.
Code:
#!/usr/bin/perl

use strict;
use warnings;

my $target = $ARGV[0];

die "Usage: $0 <target to search for>\n" unless $target;

die "Must run as user zimbra\n" if $> != 1001;

my @emails;

open ACCTS, "zmaccts|";
for (<ACCTS>) {
    last if /domain summary/o;
    next if /^-|account|^$/o;
    my @words = split /\s+/;
    push @emails, $words[0];
}

my @lines;
for (@emails) {
    my $found = `zmmailbox -z -m $_ s "from:$target OR to:$target"`;
    for my $line (split /\n/, $found) {
        next unless $line =~ /^\d+\./o;
        # get rid of meaningless rubbish at start of line
        $line =~ s/\d+\.\s+(-?\d+)\s+conv\s+//o;
        push @lines, $line;
        my $conv = `zmmailbox -z -m $_ gc $1`;
    }
}

# remove dupes from @lines
my %temp = map { $_, 1 } @lines;
@lines = sort keys %temp;

open OUTCACHE, "> $target.cache";
print OUTCACHE <<EOF;
Involving             Subject                                             Time
----------------------------------------------------------------------------------------
EOF
print OUTCACHE "$_\n" foreach @lines;

system "cat $target.cache";

Last edited by dutchie; 07-10-2009 at 02:06 AM.. Reason: added script
Reply With Quote
  #3 (permalink)  
Old 07-15-2009, 11:16 AM
Senior Member
 
Posts: 60
Default

That script, while painfully slow, works well - thanks for putting it out here. Have you had any success finding methods to increase performance?
Reply With Quote
  #4 (permalink)  
Old 07-22-2009, 12:14 PM
New Member
 
Posts: 3
Default

This script was written in a week of work experience, and this is as far as it got. It works, but is horrifically slow.

Code:
#!/usr/bin/perl

use strict;
use warnings;
use Text::Wrap;

my $target = $ARGV[0];

die "Usage: $0 <target to search for>\n" unless $target;

die "Must run as user zimbra\n" if $> != 1001;

mkdir $target;
chmod 0755, $target;

my @emails;
open ACCTS, "zmaccts|";
for (<ACCTS>) {
    last if /domain summary/;
    next if /^-|account|^$/;
    my @words = split /\s+/;
    push @emails, $words[0];
}

my @msgs;
for my $addr (@emails) {
    open SEARCH, "zmmailbox -z -m $addr s \"from:$target OR to:$target\"|";
    while (my $line = <SEARCH>) {
        next unless $line =~ /^\d+\./;
        # get rid of meaningless rubbish at start of line
        $line =~ s/\d+\.\s+(-?\d+)\s+conv\s+//;
        open CONV, "zmmailbox -z -m $addr gc $1|";
        while (my $msgline = <CONV>) {
            next unless $msgline =~ /^\d+\.\s+(\d+)/;
            my $mailmsg = `zmmailbox -z -m $addr gm $1`;
            push @msgs, $mailmsg;
        }
    }
}

# remove dupes from @msgs
my %temp = map { $_, 1 } @msgs;
@msgs = sort keys %temp;

open OUTCACHE, "> $target.cache";
print OUTCACHE <<EOF;
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
    <title>Emails involving $target</target>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
</head>

<body>
    <h1>Emails sent/received by $target</h1>
    <table border="1">
        <tr>
            <th>From</th>
            <th>To</th>
            <th>Subject</th>
            <th>Time</th>
        </tr>
EOF

my %dates_seen;
for (@msgs) {
    next if exists $dates_seen{$_};
    $dates_seen{$_}++;
    my ($date, $from, $to, $subject, $id);
    for (lines($_)) {
        $subject = $1 and last if /^Subject: (.*)/;
    }
    for (lines($_)) {
        $date    = $1 and last if /^Sent: (.*)/;
    }
    for (lines($_)) {
        $from    = $1 and last if /^From: (.*)/;
    }
    for (lines($_)) {
        $to      = $1 and last if /^To: (.*)/;
    }
    for (lines($_)) {
        $id      = $1 and last if /^Id: (.*)/;
    }
    print OUTCACHE <<EOF;
        <tr>
            <td>$from</td>
            <td>$to</td>
            <td><a href="$target/$id">$subject</a></td>
            <td>$date</td>
        </tr>
EOF
    s/^Id: |^Conversation-Id: |^Flags: |^Folder: |^Size//;
    open MSG, "> $target/$id";
    print MSG wrap("", "", $_);
}

print OUTCACHE <<EOF;
    </table>
</body>

</html>
EOF

sub lines {
    local $_ = shift;
    return split /\n/;
}
I also wrote a bit of PHP to display the results:

Code:
<?php
    $addr = $_REQUEST['email'];
    if (!$addr) {
        exit(1);
    }
    $filename = $addr . ".cache";
    if (file_exists($filename)) {
        $handle = fopen($filename, "r");
        $contents = fread($handle, filesize($filename));
        fclose($handle);
        echo $contents;
    }
?>
Reply With Quote
Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes


Similar Threads

Why Join?

Registering let's you ask questions, makes it easier to search, displays any files attached to posts, and notifies you about replies.

blog.zimbra.com




 

SEO by vBSEO ©2011, Crawlability, Inc.