View Single Post
  #56 (permalink)  
Old 03-02-2007, 08:08 AM
azilber azilber is offline
Senior Member
 
Posts: 52
Default Does this work with 4.5.1?

Quote:
Originally Posted by unilogic View Post
Its pretty easy to train dspam with a the corpus files from spamassassin.
Download http://dspam.nuclearelephant.com/sou...trainer.tar.gz

Edit publiccorpus.pl as follows on line 50:
Code:
$cmd = "/opt/dspam/bin/dspam --user $user --class=$class --source=corpus --mode=teft --feature=chained,noise --stdout < $corpus
Goto http://spamassassin.apache.org/publiccorpus/ and download all the spam and ham files except 20030228_easy_ham_2.tar.bz2 to keep the number of ham and spam files even. Extract these files in the same directory as publiccorpus.pl.

Run: perl publiccorpus.pl zimbra
After it finishes, it will take a good half an hour depending on your cpu power as you're entering thousands of emails into its database,

Run: dspam_clean -p0 zimbra
It will clean up any uneeded or neutral entries in dspam's database. This will also take a good deal of time.

-Ben
I'd like to try this, but I'm wondering if it will work with 4.5.1, I'm also worried about this message in the readme:

Quote:
------------------------------------------------------------------------
***** IMPORTANT: Do Not Use These Mails For Testing a Live System ******

Please note: do NOT send these emails into a live email system. I've
received several complaints from my correspondents that they've received
bounce messages in response to mails in this corpus, due to misconfigured
*LIVE* email systems being tested against this public corpus!

I'm offering this as a service to spam filter developers, and causing
trouble for my acquaintances and various mailing list administrators
does NOT incline me to continue offering this data publically.

------------------------------------------------------------------------
Anyone try this on 4.5.1 or higher?
Reply With Quote