Results 1 to 4 of 4

Thread: Search Lucene index from a script

  1. #1
    dutchie is offline New Member
    Join Date
    Jul 2009
    Posts
    3
    Rep Power
    6

    Default Search Lucene index from a script

    The title says it all really. I'd prefer to write it in Perl, as there are Lucene bindings in the Ubuntu repositories, but if there is a simple shell command to do this then that would be perfect.

    Basically, I need to get all emails that are to/from a particular address, regardless of who sent/received them.

    Thanks in advance.
    Last edited by dutchie; 07-09-2009 at 02:20 AM.

  2. #2
    dutchie is offline New Member
    Join Date
    Jul 2009
    Posts
    3
    Rep Power
    6

    Default A solution?

    OK, I found a solution. I can run "zmaccts", extract the account information using a regex, and run "zmmailbox -z -m <each one in turn> search ...", but this is very slow. Obviously a lot of this stuff can be cached, but as I'm going to be serving this online I can't be hanging around for a few seconds (edit MINUTES) while all this goes on.
    Code:
    #!/usr/bin/perl
    
    use strict;
    use warnings;
    
    my $target = $ARGV[0];
    
    die "Usage: $0 <target to search for>\n" unless $target;
    
    die "Must run as user zimbra\n" if $> != 1001;
    
    my @emails;
    
    open ACCTS, "zmaccts|";
    for (<ACCTS>) {
        last if /domain summary/o;
        next if /^-|account|^$/o;
        my @words = split /\s+/;
        push @emails, $words[0];
    }
    
    my @lines;
    for (@emails) {
        my $found = `zmmailbox -z -m $_ s "from:$target OR to:$target"`;
        for my $line (split /\n/, $found) {
            next unless $line =~ /^\d+\./o;
            # get rid of meaningless rubbish at start of line
            $line =~ s/\d+\.\s+(-?\d+)\s+conv\s+//o;
            push @lines, $line;
            my $conv = `zmmailbox -z -m $_ gc $1`;
        }
    }
    
    # remove dupes from @lines
    my %temp = map { $_, 1 } @lines;
    @lines = sort keys %temp;
    
    open OUTCACHE, "> $target.cache";
    print OUTCACHE <<EOF;
    Involving             Subject                                             Time
    ----------------------------------------------------------------------------------------
    EOF
    print OUTCACHE "$_\n" foreach @lines;
    
    system "cat $target.cache";
    Last edited by dutchie; 07-10-2009 at 01:06 AM. Reason: added script

  3. #3
    wdimmit is offline Senior Member
    Join Date
    Nov 2005
    Posts
    62
    Rep Power
    9

    Default

    That script, while painfully slow, works well - thanks for putting it out here. Have you had any success finding methods to increase performance?

  4. #4
    dutchie is offline New Member
    Join Date
    Jul 2009
    Posts
    3
    Rep Power
    6

    Default

    This script was written in a week of work experience, and this is as far as it got. It works, but is horrifically slow.

    Code:
    #!/usr/bin/perl
    
    use strict;
    use warnings;
    use Text::Wrap;
    
    my $target = $ARGV[0];
    
    die "Usage: $0 <target to search for>\n" unless $target;
    
    die "Must run as user zimbra\n" if $> != 1001;
    
    mkdir $target;
    chmod 0755, $target;
    
    my @emails;
    open ACCTS, "zmaccts|";
    for (<ACCTS>) {
        last if /domain summary/;
        next if /^-|account|^$/;
        my @words = split /\s+/;
        push @emails, $words[0];
    }
    
    my @msgs;
    for my $addr (@emails) {
        open SEARCH, "zmmailbox -z -m $addr s \"from:$target OR to:$target\"|";
        while (my $line = <SEARCH>) {
            next unless $line =~ /^\d+\./;
            # get rid of meaningless rubbish at start of line
            $line =~ s/\d+\.\s+(-?\d+)\s+conv\s+//;
            open CONV, "zmmailbox -z -m $addr gc $1|";
            while (my $msgline = <CONV>) {
                next unless $msgline =~ /^\d+\.\s+(\d+)/;
                my $mailmsg = `zmmailbox -z -m $addr gm $1`;
                push @msgs, $mailmsg;
            }
        }
    }
    
    # remove dupes from @msgs
    my %temp = map { $_, 1 } @msgs;
    @msgs = sort keys %temp;
    
    open OUTCACHE, "> $target.cache";
    print OUTCACHE <<EOF;
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
        "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
    <head>
        <title>Emails involving $target</target>
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    </head>
    
    <body>
        <h1>Emails sent/received by $target</h1>
        <table border="1">
            <tr>
                <th>From</th>
                <th>To</th>
                <th>Subject</th>
                <th>Time</th>
            </tr>
    EOF
    
    my %dates_seen;
    for (@msgs) {
        next if exists $dates_seen{$_};
        $dates_seen{$_}++;
        my ($date, $from, $to, $subject, $id);
        for (lines($_)) {
            $subject = $1 and last if /^Subject: (.*)/;
        }
        for (lines($_)) {
            $date    = $1 and last if /^Sent: (.*)/;
        }
        for (lines($_)) {
            $from    = $1 and last if /^From: (.*)/;
        }
        for (lines($_)) {
            $to      = $1 and last if /^To: (.*)/;
        }
        for (lines($_)) {
            $id      = $1 and last if /^Id: (.*)/;
        }
        print OUTCACHE <<EOF;
            <tr>
                <td>$from</td>
                <td>$to</td>
                <td><a href="$target/$id">$subject</a></td>
                <td>$date</td>
            </tr>
    EOF
        s/^Id: |^Conversation-Id: |^Flags: |^Folder: |^Size//;
        open MSG, "> $target/$id";
        print MSG wrap("", "", $_);
    }
    
    print OUTCACHE <<EOF;
        </table>
    </body>
    
    </html>
    EOF
    
    sub lines {
        local $_ = shift;
        return split /\n/;
    }
    I also wrote a bit of PHP to display the results:

    Code:
    <?php
        $addr = $_REQUEST['email'];
        if (!$addr) {
            exit(1);
        }
        $filename = $addr . ".cache";
        if (file_exists($filename)) {
            $handle = fopen($filename, "r");
            $contents = fread($handle, filesize($filename));
            fclose($handle);
            echo $contents;
        }
    ?>

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. Error after installation
    By robsontuxlinux in forum Installation
    Replies: 13
    Last Post: 09-11-2008, 09:48 PM
  2. 4.0.4 to 4.5.6 upgrade failed in network edition
    By chenthil in forum Administrators
    Replies: 1
    Last Post: 08-27-2007, 09:36 AM
  3. [SOLVED] Simple backup question...
    By dameron in forum Administrators
    Replies: 3
    Last Post: 08-25-2007, 09:36 PM
  4. Error message in Server status
    By Max Ma in forum Installation
    Replies: 20
    Last Post: 04-19-2007, 08:55 AM
  5. The mailbox and mta dies in FC4 GA version
    By meikka in forum Installation
    Replies: 72
    Last Post: 03-16-2006, 05:30 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •