Quote:
Originally Posted by unilogic Its pretty easy to train dspam with a the corpus files from spamassassin.
Download http://dspam.nuclearelephant.com/sou...trainer.tar.gz
Edit publiccorpus.pl as follows on line 50: Code: $cmd = "/opt/dspam/bin/dspam --user $user --class=$class --source=corpus --mode=teft --feature=chained,noise --stdout < $corpus Goto http://spamassassin.apache.org/publiccorpus/ and download all the spam and ham files except 20030228_easy_ham_2.tar.bz2 to keep the number of ham and spam files even. Extract these files in the same directory as publiccorpus.pl.
Run: perl publiccorpus.pl zimbra
After it finishes, it will take a good half an hour depending on your cpu power as you're entering thousands of emails into its database,
Run: dspam_clean -p0 zimbra
It will clean up any uneeded or neutral entries in dspam's database. This will also take a good deal of time.
-Ben |
I'd like to try this, but I'm wondering if it will work with 4.5.1, I'm also worried about this message in the readme:
Quote:
------------------------------------------------------------------------
***** IMPORTANT: Do Not Use These Mails For Testing a Live System ******
Please note: do NOT send these emails into a live email system. I've
received several complaints from my correspondents that they've received
bounce messages in response to mails in this corpus, due to misconfigured
*LIVE* email systems being tested against this public corpus!
I'm offering this as a service to spam filter developers, and causing
trouble for my acquaintances and various mailing list administrators
does NOT incline me to continue offering this data publically.
------------------------------------------------------------------------
|
Anyone try this on 4.5.1 or higher?