View Single Post
  #46 (permalink)  
Old 12-14-2006, 12:53 PM
dlochart dlochart is offline
Advanced Member
 
Posts: 177
Default Klug

Below is my post and response from the postfix group:

Doug Lochart wrote:
> This is probably way simple but I am unable to get good results
> searching and I want this to be automated as much as possible.
>
> I would like to test and then eventually train my postfix/amavisd/sa
> setup. I found a stie http://www.untroubled.org/spam/ that has archives
> of spam messages. So I downloaded an archive. The spam mailss contain
> the full message including headers. I know the basic telnet/nc way of
> sending mail but these already have the headers and such I would just
> like to inject them into my postfix and watch what happens. I know I
> will need to change the rcpt to address so that I accept the mail.
>
> Is there a simple way to do this?

Yes. Don't. The nice thing about SpamAssassin compared to pure bayesian
filters is that it already comes with spam identification, based on spam
in the wild. It will use these patterns to reliably identify spam and
add tokens to the bayesian database. Once you have tokenized a certain
number of ham & spam messages (200 of each, by default), it starts
scoring messages with it.

If you train your database with messages that aren't aimed at your
site/users, you are likely to negatively affect performance & accuracy.
Since tokens will be expired eventually, there is no advantage gained by
filling the database with tokens that will never be used or may be
classified incorrectly. For example, I have clients that are in the
healthcare industry that *cannot* block a message just because it
contains brand names of "life-enhancing" drugs (that sentence alone
might score some spam points on some systems).

It's actually more useful to run sa-update daily, so that you get the
latest patterns. And you can also train the db on spam that still makes
it to your account (or into the junk folders of users you can trust). I
do this once a week for the few that get by, and I've never needed to
train any messages as ham.
Reply With Quote