Results 1 to 9 of 9

Thread: Training: importing mbox files of spam?

  1. #1
    Baylink is offline Elite Member
    Join Date
    Aug 2008
    Location
    St Pete FL USA
    Posts
    392
    Rep Power
    6

    Default Training: importing mbox files of spam?

    I have, on my old mail server from which I'm transitioning, half a dozen *really* big (ca 25k messages each or more) mbox format files full of spam.

    I know they're spam because, while the usernames are valid on that box, the *mailbox names* were not; there are only 3 valid addresses on the domain and one of them is 'postmaster'. :-)

    So, is there any easy way I can feed those raw mbox files into something that will allow Zimbra to train its underlying spam detection mechanisms on them, secure in the knowledge that there is no ham in the bunch?

    (Just so I'm clear, the only practical way I see to do this is from the Linux command line on the Z server itself; anything requiring individual handling of the messages is a non-starter due to volume.)
    Jay R. Ashworth - ZCS 6.0.9CE/CentOS5 - St Pete FL US - Music - Blog - Photography - IANAL - IAAMA
    Try to Ask Questions The Smart Way -- you'll get better answers.

    Put your product and version in your profile/signature - All opinions strictly my own, even though I have an employer these days.
    If you [SOLVE] something, please tell everyone how for the archives
    And, please... read what people write, and answer the questions they asked, not the ones they didn't.

  2. #2
    uxbod's Avatar
    uxbod is offline Moderator
    Join Date
    Nov 2006
    Location
    UK
    Posts
    8,017
    Rep Power
    24

    Default

    Well you could try something like
    Code:
    su - zimbra
    sa-learn --spam --mbox -C <path_to_zimbra_sa_prefs> -f <path_to_your_mbox_files>
    Never tried this so thinking out loud. Definitely take a backup first!

    Hmmm, another alternative would be to use migrating from mbox :: Wiki into a dummy account, and ZCS will process these via its current bayes. Any that do get missed will be in the Inbox of that dummy account so you can then just highlight them and mark as Junk.
    Last edited by uxbod; 09-15-2008 at 03:32 AM.

  3. #3
    Baylink is offline Elite Member
    Join Date
    Aug 2008
    Location
    St Pete FL USA
    Posts
    392
    Rep Power
    6

    Default

    Ok, I'll take a look at that.

    In related news, I do have 2 mailboxes from that domain, also as mbox, that are live mail, left over from the old MX feed. Can I use the same commandline to pull them in and get them delivered, in light of the fact that they've already *been* through a destination MTA once?

    Or will it get confused?
    Jay R. Ashworth - ZCS 6.0.9CE/CentOS5 - St Pete FL US - Music - Blog - Photography - IANAL - IAAMA
    Try to Ask Questions The Smart Way -- you'll get better answers.

    Put your product and version in your profile/signature - All opinions strictly my own, even though I have an employer these days.
    If you [SOLVE] something, please tell everyone how for the archives
    And, please... read what people write, and answer the questions they asked, not the ones they didn't.

  4. #4
    uxbod's Avatar
    uxbod is offline Moderator
    Join Date
    Nov 2006
    Location
    UK
    Posts
    8,017
    Rep Power
    24

    Default

    Well it seems to strip the headers so should be okay. What I would suggest is creating a replica mbox directory, move a couple of messages in, and the try the script.

  5. #5
    Baylink is offline Elite Member
    Join Date
    Aug 2008
    Location
    St Pete FL USA
    Posts
    392
    Rep Power
    6

    Default

    So, you're saying, cut out half a dozen messages, and feed them to it, first?

    Yeah, that was my plan.
    Jay R. Ashworth - ZCS 6.0.9CE/CentOS5 - St Pete FL US - Music - Blog - Photography - IANAL - IAAMA
    Try to Ask Questions The Smart Way -- you'll get better answers.

    Put your product and version in your profile/signature - All opinions strictly my own, even though I have an employer these days.
    If you [SOLVE] something, please tell everyone how for the archives
    And, please... read what people write, and answer the questions they asked, not the ones they didn't.

  6. #6
    uxbod's Avatar
    uxbod is offline Moderator
    Join Date
    Nov 2006
    Location
    UK
    Posts
    8,017
    Rep Power
    24

    Default

    Indeed, far better that processing 1000's and find it does not work

  7. #7
    Baylink is offline Elite Member
    Join Date
    Aug 2008
    Location
    St Pete FL USA
    Posts
    392
    Rep Power
    6

    Default

    Oh, sure; I was just looking for an opinion that you thought it was a suitable approach.

    On looking back, though, I suspect "sa-learn" is not actually the right approach for injecting real ham mail into mailboxes; is it?
    Jay R. Ashworth - ZCS 6.0.9CE/CentOS5 - St Pete FL US - Music - Blog - Photography - IANAL - IAAMA
    Try to Ask Questions The Smart Way -- you'll get better answers.

    Put your product and version in your profile/signature - All opinions strictly my own, even though I have an employer these days.
    If you [SOLVE] something, please tell everyone how for the archives
    And, please... read what people write, and answer the questions they asked, not the ones they didn't.

  8. #8
    uxbod's Avatar
    uxbod is offline Moderator
    Join Date
    Nov 2006
    Location
    UK
    Posts
    8,017
    Rep Power
    24

    Default

    sa-learn cannot inject the email into a mailbox, it is purely for learning either ham/spam from the command line.

  9. #9
    Baylink is offline Elite Member
    Join Date
    Aug 2008
    Location
    St Pete FL USA
    Posts
    392
    Rep Power
    6

    Default

    So the proper approach there is just to find a SMTP sender that will read an mbox file and spray it in?
    Jay R. Ashworth - ZCS 6.0.9CE/CentOS5 - St Pete FL US - Music - Blog - Photography - IANAL - IAAMA
    Try to Ask Questions The Smart Way -- you'll get better answers.

    Put your product and version in your profile/signature - All opinions strictly my own, even though I have an employer these days.
    If you [SOLVE] something, please tell everyone how for the archives
    And, please... read what people write, and answer the questions they asked, not the ones they didn't.

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. importing from thunderbird mbox files
    By iwoodnt in forum Installation
    Replies: 9
    Last Post: 09-22-2011, 07:37 AM
  2. Trying to understand Zimbra's anti-spam system
    By TaskMaster in forum Users
    Replies: 11
    Last Post: 01-25-2008, 09:59 AM
  3. How to check if spam training is working?
    By tbovingdon in forum Administrators
    Replies: 1
    Last Post: 03-13-2007, 05:57 AM
  4. Training spam and ham
    By Justin in forum Developers
    Replies: 2
    Last Post: 10-31-2006, 03:39 PM
  5. Spam training has no cron job
    By richard-hdd in forum Administrators
    Replies: 3
    Last Post: 09-13-2006, 11:50 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •