View Single Post
  #14 (permalink)  
Old 12-02-2005, 12:06 AM
unilogic unilogic is offline
Senior Member
 
Posts: 51
Default Training DSPAM

Its pretty easy to train dspam with a the corpus files from spamassassin.
Download http://dspam.nuclearelephant.com/sou...trainer.tar.gz

Edit publiccorpus.pl as follows on line 50:
Code:
$cmd = "/opt/dspam/bin/dspam --user $user --class=$class --source=corpus --mode=teft --feature=chained,noise --stdout < $corpus
Goto http://spamassassin.apache.org/publiccorpus/ and download all the spam and ham files except 20030228_easy_ham_2.tar.bz2 to keep the number of ham and spam files even. Extract these files in the same directory as publiccorpus.pl.

Run: perl publiccorpus.pl zimbra
After it finishes, it will take a good half an hour depending on your cpu power as you're entering thousands of emails into its database,

Run: dspam_clean -p0 zimbra
It will clean up any uneeded or neutral entries in dspam's database. This will also take a good deal of time.

-Ben

Last edited by unilogic; 01-21-2006 at 03:34 AM..
Reply With Quote