I would like to improve Zimbra's ability to accurately distinguish spam from ham. My understanding is that there are two recommended ways to accomplish this:
1. Use the Zimbra webmail client to mark messages via the Junk and Not Junk buttons, respectively.
2. From any mail client, use the "Forward as Attachment" function to send single or multiple messages to the special spam/ham training user accounts.
The problem with option #1 is that this does not provide enough "ham" to the spam training accounts. Why? Because (thankfully) there are not enough false positives to mark as Not Junk. But the ability of the spam detection heuristics to determine spam from ham is dependent on analyzing roughly as many ham messages as spam messages. As far as I can tell, the Zimbra webmail client does not allow the user to click on a legitimate message in the inbox and click on a Not Junk button to send it along to the ham training account -- only the Junk button is available.
Once we figured out that messages had to be forwarded as attachments, option #2 above appears to be a viable method.
It occurred to me that there might be a third option, which would be to use a desktop IMAP client to copy messages from a real mail account into the mounted IMAP inbox of the respective spam/ham training user accounts. But I'm not sure how the mounting would be accomplished; while I seem to recall earlier Zimbra installs asked for passwords for these special accounts, I don't seem to remember setting passwords for these two special accounts in the most recent go-round (version 4.0.2 of open source edition, Mac OS X 10.3.8). Is this potential third option even possible?
We have been using both of the above two recommended techniques for several weeks, but unfortunately the accuracy level does not appear to be improving. The most significant reason for this, I believe, is that DSpam does not appear to be learning. In fact, DSpam's penalty is being added to every single message. I've been examining the X-Spam-Status header of all incoming messages, and they all have DSPAM_SPAM=2.5 in the "tests" array. Shortly after installing Zimbra, I modified the salocal.cf.in file in order to increase DSpam's weighting thusly:
header DSPAM_SPAM X-DSPAM-Result =~ /^Spam$/
describe DSPAM_SPAM DSPAM claims it is spam
score DSPAM_SPAM 2.5
header DSPAM_HAM X-DSPAM-Result =~ /^Innocent$/
describe DSPAM_HAM DSPAM claims it is ham
score DSPAM_HAM -2.0
So, from the above information, it would appear that "X-DSPAM-Result: Spam" must be appearing in the headers of every single message. Which is almost true, but not quite: that line appears in every message that DSpam has processed (i.e., DSpam thinks everything is spam -- not good), but there are some incoming messages in which no DSpam headers appear whatsoever. Why some messages would be coming through without any DSpam headers is a worrisome conundrum in and of itself, but even more perplexing is that even then, the "DSPAM_SPAM=2.5" score is being added in the "tests" array -- in the complete absence of any DSpam headers whatsoever.
The spamtrain.log is unfortunately of little assistance in diagnosing the above problems. While it appears that some messages are being used for learning...
command: '/opt/zimbra/dspam-3.6.2/bin/dspam' --class=innocent --source=corpus --user 'zimbra' --mode=teft --feature=chained,noise
/opt/zimbra/dspam/bin/dspam_corpus: 1 messages, 00:00:01 elapsed, 1.00 msgs./sec.
...the absence of a datestamp on each line (or at least at the beginning of the daily batch output) means that it's very difficult to grok when the logged output occurred. Also, there are other errors in the log, and it's not clear how serious these are and how they should be fixed:
config: could not find site rules directory
bayes: cannot open bayes databases /opt/zimbra/amavisd/.spamassassin/bayes_* R/O: tie failed: Inappropriate file type or format
bayes: cannot open bayes databases /opt/zimbra/amavisd/.spamassassin/bayes_* R/W: tie failed: Inappropriate file type or format
ERROR: the Bayes learn function returned an error, please re-run with -D for more information
My experience setting up DSpam manually in previous versions of Zimbra showed that DSpam is extraordinarily accurate when it is being fed a large enough corpus of spam and ham. Now that we've replaced that previous set-up with the integrated DSpam configuration included in Zimbra 4.0.x, it's not entirely clear why DSpam isn't marking any messages (as in zero) as Innocent.
Thank you for taking the time to read this very long-winded message. Any and all suggestions -- and especially point-by-point responses -- would be most sincerely appreciated!