So I enabled DSPAM a little over a week ago, initially using a score_factor of .1. I can confirm that this means DSPAM will add -.1 or 1 depending the determined spam status, so I've added the info to the wiki along with drozzini's method (which I haven't tested).
Initially as far as I could tell, nothing was being recognized as spam; later, DSPAM started getting more aggressive. However, it was also generating a lot of false positives. This generally wasn't enough to make a difference in terms of the amavis/SA thresholds, but in a way that could be a problem. Same for DSPAM false negatives whose final score still results in them being marked SPAM or SPAMMY by amavis. The problem: it's not entirely clear whether DSPAM is being retrained on its mistakes in those cases.
I suppose that I could create some Zimbra filters based on a combination of header fields to catch those cases and forward them to the ham & spam accounts, but I haven't done that so far.
Instead, I increased DSPAM's score_factor to .2, and more important I've trained it using a couple of spam corpuses. I based my steps on
HOWTO train or retrain your DSPAM - DirectAdmin Forums
1. Download ham and spam corpuses from
Index of /publiccorpus
2. Extract them using bzip2 and tar. Note that some of the corpus files at that link extract into the same name, so you'll want to rename any directories that get created, if you download more than one each of ham & spam. Also, I noticed that each directory includes a file called cmd, which you should probably delete.
3. Pick one spam directory and one ham directory.
4. As zimbra, do
/opt/zimbra/dspam/bin/dspam_train zimbra /path/to/spam_directory /path/to/ham_directory
5. If desired, repeat with another pair of corpora.
I don't think there'd be anything wrong with consolidated all the ham corpora into one directory, and the same for all the spam corpora, and then running dspam_train once. Based on the manpage for dspam_train, it doesn't matter if you have different amounts of ham & spam, although documentation for dspam does say that you want to have a fair amount of each.
You can also get current dspam stats using
/opt/zimbra/dspam/bin/dspam_stats -H. If you use more options with dspam_stats, be sure to do one per hyphen, or it may misinterpret them and create a new user folder. (No harm, just delete that folder, which is buried in the dspam hierarchy if I recall correctly.)
Anyway, I've been able to compare some mail from a mailing list thread which DSPAM was miscategorizing, and after the training it's now marking that mail "Innocent" so I'm hopeful that it will now be more accurate. If & as my confidence in DSPAM increases, I can also increase the scale_factor. I might even consider turning off SA within amavisd.conf.in and just using DSPAM's bayesian filter. In theory, this should be equivalent to using something like ASSP which only uses statistical scoring. There's an argument to be made (e.g. by the
author of DSPAM) that this is more accurate and lower-maintenance than manually-tuned rules as found in SA. (Although it should also be noted that DSPAM was intended to do per-user training and analysis instead of sitewide.)