OK, starting from 10,000 feet:
For more theory on the tools involved, I suggest
sa-learn - train SpamAssassin's Bayesian classifier spamassassin - simple front-end filtering script for SpamAssassin
SpamAssassin (henceforth SA) is a set of perl libraries. Zimbra puts *all* perl libraries in ~zimbra/zimbramon/lib, which in the case of SA doesn't make a lot of sense, but there it is. My guess of the history is that zimbramon started out as a home for the Swatch perl modules, and someone figured that it made sense to put other utility perl modules there, and it just sorta grew.
The canonical SpamAssassin distribution supports two ways to use the SA libraries. "spamassassin" (all lower case) is a small (< 1000 line) perl script that uses the SA libraries to process one message at a time. Historically, it was called from .procmailrc or somesuch. But forking a new process and loading all those libraries afresh for every new message is expensive, so nowadays, pretty much everyone uses the more efficient preforking spamd/spamc client-server architecture. spamassassin remains useful for one-off debugging, however.
ZCS uses neither spamassassin nor spamd/spamc. ZCS calls the SA libraries from the amavisd-new perl script, which until postfix started supporting sendmail's milter interface (still beta) was the best and easiest way to plug SA functionality into the delivery pipeline.
It is possible and (in my belief and personal experience) perfectly safe to use the spamassassin script with the ZCS-provided configuration. It's just another program using the same libraries and the same locking semantics.
The spamassassin script is nice for troubleshooting because it parses the config on the fly, without requiring you to restart amavisd or spamd, and you can interact directly with stdin/stderr/stdout. ZCS doesn't include the spamassassin script, but you can fetch and use *just that one file* from the upstream distribution.
"curl -O http://apache.seekmeup.com/spamassas...n-3.2.3.tar.gz" -- download the spamassassin distribution. For a list of mirrors go to
SpamAssassin: Downloads
"tar zxf Mail-SpamAssassin-3.2.3.tar.gz Mail-SpamAssassin-3.2.3/spamassassin.raw" -- the second argument tells tar to unpack just that one file, not the whole tarball.
"perl -i -pe GOBBLEDZYGOOK" -- Um, that's an in-place edit to change a couple config parameters. Within the file spamassassin.raw, change all occurrences of @@INSTALLSITELIB@@ to /opt/zimbra/zimbramon/lib, and @@DEF_RULES_DIR@@ to /opt/zimbra/conf/spamassassin.
Decomposing the spamassassin command line:
"HOME=/opt/zimbra/amavisd" -- Set an environment variable telling SA where the Bayes databases and stuff are.
"sudo -u zimbra" -- needs to run as the Zimbra user.
"Mail-SpamAssassin-3.2.3/spamassassin.raw" -- where the script is, if you just unpacked it from the tarball as above.
"-D -t" -- Options to debut and append headers. Use --help for a lot more options.
" < original-rfc822-message-source.eml" -- The spamassassin script expects to get a message via standard input. This would be how to read an RFC822 message from a file. You can also simply copy-paste into a terminal from a "Show Original" window.
...and then you'll get a lot of gobbledygook.
Back to your questions:
1) No, you're just grabbing the spamassassin client script, nothing else. The ZCS installation of SA is used unmodified.
2) Yeah, amavisd does a clear_headers and won't pass the X-Spammy/X-Spammy stuff. But if you're running spamassasin interactively, here's an example of what you might see:
X-Spam-Bayes: Tokens: new, 4; hammy, 3; neutral, 6; spammy, 2.
X-Spam-Hammy: 0.000-1--83h-0s--0d--H*F:U*rgraves,
0.112-1451--919736h-3278s--0d--H*F

*carleton.edu,
0.135-745--1008852h-4442s--0d--H*F

*edu
X-Spam-Spammy: 0.998-5293--51h-829s--0d--******,
0.985-84--7h-14s--0d--enlargement
The Spammy: bits are fairly obvious. This tells you that "******" and "enlargement" are high-confidence (> 98%) indicators of spamminess. But SpamAssassin also weighs header tokens, stored with prefixes like H*F:U* that mean nothing to the uninitiated. What the above means is that messages with "rgraves" in the username part of the From: address are very likely ham. Then you see that our site has 1,008,852 ham and 4,442 spam messages from .edu addresses, so a *.edu address is a very good indicator of non-spamminess. So I send a message to myself with spammy body, and in the end it rates BAYES_50. You probably have something similar going on.
*DO NOT* worry if Bayes is going the "wrong way." Yes, it is "bad" if spammers can get bayes points simply by spoofing your From: line. But there are lots of other SA rules that discourage that. Bayes is intended to throw some experiential learning into the mix, not to be "the answer."
3) Output goes to stdout/stderr. Especially with -D, you probably want to tee to a file.
4) "Show Original" will suffice; you don't need to trace to the filesystem (though I too would be curious how to do that; I've sometimes done a grep -r of the user's blobs in /opt/zimbra/backup). Log on to
https://server:7071/; View Mail for your "spam-XXXX" and "ham-YYYY" users; and "Show Original" for a few messages to get a better idea of how SA is rating things.
5) Forget my previous suggestion of restarting amavisd. Leave amavisd completely alone; it will continue running with the SA config as of boot time. Just remove the debug.cf or whatever when done fiddling with the spamassassin command line.
6) I'd expect that your Bayes database has seen more ham than spam from
klanknan@charter.net to your local user. So it's natural for the Bayes db to give mail from
klanknan@charter.net some positive boost, just in case they ever start talking about the economic situation in Nigeria. Bayes *should* mellow false positives like that.
Your user's fundamental problems are the odd forwarding From: line and the fetchmail hop. If you can get a normal flat forward from charter that leaves the original From: line intact, then the Bayes engine will behave more as you think it "should."
Btw, another good thing to know: when you hit the ZWC "Junk" or "Not Junk" buttons, this has precisely ***ZERO*** immediate effect. The mail is merely forwarded to the spam-XXXX or ham-YYYY account on your Zimbra server. The zmtrainsa program referenced from zimbra's crontab uses the contents of these accounts to train the Bayes database in nightly batches.