Zimbra offers Open Source email server software and shared calendar for Linux and the Mac
Go Back   Zimbra :: Forums > Zimbra Collaboration Suite > Administrators

Welcome to the Zimbra :: Forums!
Welcome, if you would like to post a comment please register. We also encourage you to explore all things Zimbra with our team and members of the community.

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 08-29-2007, 11:57 AM
Moderator
 
Posts: 1,027
Default [SOLVED] I don't think RBLs or Bayes are working for me

Following the instructions in the user manual, I enabled several of the RBLs to increase the reliability of my spam filtering. Running zmprov to check my set up nets me the following:

Code:
zimbra@mail:~$ zmprov gacf | grep zimbraMtaRestriction
zimbraMtaRestriction: reject_invalid_hostname
zimbraMtaRestriction: reject_non-fqdn_hostname
zimbraMtaRestriction: reject_non_fqdn_sender
zimbraMtaRestriction: reject_rbl_client bl.spamcop.net
zimbraMtaRestriction: reject_rbl_client sbl.spamhaus.org
zimbraMtaRestriction: reject_rbl_client relays.mail-abuse.org
However, here are the X-Spam headers from two messages I just sent myself from offsite:

Code:
X-Spam-Score: 3.159
X-Spam-Level: ***
X-Spam-Status: No, score=3.159 tagged_above=-10 required=6.6 tests=[AWL=1.124,
	BAYES_00=-2.599, DNS_FROM_RFC_POST=1.708, HTML_10_20=1.351,
	HTML_MESSAGE=0.001, HTML_SHORT_LENGTH=1.574]



X-Spam-Score: 2.784
X-Spam-Level: **
X-Spam-Status: No, score=2.784 tagged_above=-10 required=6.6 tests=[AWL=0.749,
	BAYES_00=-2.599, DNS_FROM_RFC_POST=1.708, HTML_10_20=1.351,
	HTML_MESSAGE=0.001, HTML_SHORT_LENGTH=1.574]
If I'm reading this right, I'm getting spam checks on Bayesian filters, DNS, HTML messages, and whatever AWL is (which I don't know), but there is no evidence of RBL checking. Given some of the junk that's getting thru my system I doubt that it is, witness this example of spam that also came through:

Code:
X-Virus-Scanned: amavisd-new at 
X-Spam-Score: 6.513
X-Spam-Level: ******
X-Spam-Status: No, score=6.513 tagged_above=-10 required=6.6
	tests=[BAYES_99=3.5, DNS_FROM_RFC_ABUSE=0.2, DNS_FROM_RFC_WHOIS=1.447,
	HTML_MESSAGE=0.001, SUBJ_ALL_CAPS=0.997, UPPERCASE_50_75=0.368]
This last message was an ATTENTION BENEFICIARY notice with about 3/4 caps and obviously one of those Nigerian 419-style scams.

Anyway, the question is this: Should RBL scores show in my headers whether or not the source IP has a hit in one of the databases, or will it only show in the case of a hit?

As a second point I think maybe I should weight my Bayesian filter higher than 3.5, but I don't see in the documentation how one goes about changing the absolute score a feature can give--in my case Bayes seems to top out at 3.5 which seems to let an awful lot through, because with a required score of 6.6 for spam, a 100% hit on Bayes is only 53% of the way to scoring as spam. So far I'd say that's insufficient. How do I adjust that range?

On that note, there are references all over the forum to changing the kill and tag percentages on the admin UI, but nowhere have I been able to find documentation of just how those percentage numbers relate mathematically to the various scores I see in the headers of emails. Could someone clarify this for me please?

Finally, I don't see DSPAM referenced at all in my headers, though there is a DSPAM directory in /opt/zimbra. Am I only getting amavisd and not dspam? If so, what do I do about it?

Last edited by dwmtractor; 08-29-2007 at 12:01 PM..
Reply With Quote
  #2 (permalink)  
Old 08-29-2007, 02:46 PM
Intermediate Member
 
Posts: 16
Default

RBL messages should stop all together. You will never see them. It is a blacklist, so if a message comes in that is listed on the blacklist, it will drop it, if the RBL's are configured correctly.

You should be able to take some of the messages that you have recieved and go check the RBL's to see if it is listed there.

As for the spam, according to the manual, you need to drop at least 200 messages in the spam email account and 200 in the non spam email account for it to start scoring spam properly.

Quote from admin manual:

"In order to get accurate scores to determine whether to mark
messages as spam at least 200 known spams and 200 known hams must be
identified."

-Jim
Reply With Quote
  #3 (permalink)  
Old 08-30-2007, 08:14 AM
Moderator
 
Posts: 1,027
Default

I have trained on at least 200 spam and over 900 ham messages.

I still would appreciate more detail on the way the settings relate to spam scoring, and on how I might increase relative scoring of the Bayesian filter. Otherwise known junk will never get blown out if it doesn't meet the other criteria since, as I said, the Bayesian score is topping out at 3.5 on even the worst spam.
Reply With Quote
  #4 (permalink)  
Old 08-30-2007, 08:18 AM
Intermediate Member
 
Posts: 16
Default

I understand where you are coming from. We are in much the same situation here, so hopefully someone might have some decent input.
Reply With Quote
  #5 (permalink)  
Old 08-30-2007, 08:49 AM
Moderator
 
Posts: 1,027
Default Spam Scoring

I have one particular spammer that sends me three to six messages a day, that seem to be caught by none of the "biggies" because he's spamming specifically heavy equipment dealers like ourselves. My Bayes gives him the full 3.5, the "Dear something" gives him 2.1, and the other scores are so low that they're essentially meaningless and his messages always get through.

If ANY of the following criteria were also included it'd probably push these messages over the threshold, but I don't see that they're options:

1) BCC - if the recipient (me) is only bcc'ed and not in the "To:" or "cc" lines, it ought to get a small score, maybe around 1 or 1.2. This wouldn't be enough to exclude legitimate mailing lists, but it would add to the aggregate problems for spam messages

2) I know some spam filters add a small score for any out-of-country messages; again not enough to kill them outright but enough to add to the score. The messages in question are coming from Singapore

3) If I could just increase the Bayesian weighting by about .75 to 1 point, it'd push him over the edge. This is more desirable than lowering the point threshold on ALL messages since it would just give more weight to my Bayesian scores for stuff I have classified as junk, rather than allowing the random combination of all the other scores to increase false positives.

Help anyone???
Reply With Quote
  #6 (permalink)  
Old 08-30-2007, 09:12 AM
Moderator
 
Posts: 6,237
Default

Quote:
Originally Posted by dwmtractor
zimbraMtaRestriction: reject_non-fqdn_hostname
This is incorrect, the - (dash) should be _ (underscore).

Quote:
Originally Posted by dwmtractor
zimbraMtaRestriction: reject_rbl_client relays.mail-abuse.org
Turn this off-it's now part of a paid trendmicro service-else you are just wasting bandwidth for 'no licence' return values.

dns checks:
reject_unknown_client -I leave off because every client needs a valid A record or it won't deliver
reject_unknown_hostname -I leave off because every server that sends you mail needs a A & MX record (and I definitely want alerts from some of my servers that don't have mx records)
reject_unknown_sender_domain -I leave this on; the @domain.com part of their email address must resolve proper A/mx

I also use:
host checks- to conform to the industry standards:
reject_invalid_hostname
reject_non_fqdn_hostname
reject_non_fqdn_sender
RBL's - Real Time Black Lists:
Code:
reject_rbl_client dnsbl.njabl.org
reject_rbl_client cbl.abuseat.org
reject_rbl_client bl.spamcop.net
reject_rbl_client dnsbl.sorbs.net
reject_rbl_client zen.spamhaus.org
zen combines spamhaus' sbl, xbl and pbl (while I don't always agree with the pbl, zen resolves much faster/has more copies out there, so i've given in to their policies)

Check their websites for details, some I usually get the most 'spam' returns from spamcop & spamhaus. Keep in mind they all tend to share info with each other-and score it differently, check their websites for details.

You enter them all on one (or use +) else you'll erase the currently set values:
Quote:
zmprov mcf zimbraMtaRestriction reject_invalid_hostname zimbraMtaRestriction reject_non_fqdn_hostname zimbraMtaRestriction reject_non_fqdn_sender zimbraMtaRestriction reject_unknown_sender_domain zimbraMtaRestriction “reject_rbl_client dnsbl.njabl.org” zimbraMtaRestriction “reject_rbl_client cbl.abuseat.org” zimbraMtaRestriction “reject_rbl_client bl.spamcop.net” zimbraMtaRestriction “reject_rbl_client dnsbl.sorbs.net” zimbraMtaRestriction “reject_rbl_client zen.spamhaus.org”
to check:
Code:
 zmprov gacf | grep zimbraMtaRestriction
To reduce email to accounts that you don't even have:
Change the entry in zmmta.cf for smtpd_reject_unlisted_recipients to 'yes', save the file and restart postfix. (postfix reload)

(Add your IP's to the trusted area of local.cfg, -you don't want some user marking an email from a coworker at your same organization as junk, then it affecting the bayes score (this is not to be confused with mtamynetworks-which is for submitting mail from remote networks)

Quote:
Anyway, the question is this: Should RBL scores show in my headers whether or not the source IP has a hit in one of the databases, or will it only show in the case of a hit?
Only added when you get a 'hit'.
-check the /opt/zimbra/conf/spamassassin folder for the points added

Quote:
As a second point I think maybe I should weight my Bayesian filter higher than 3.5, but I don't see in the documentation how one goes about changing the absolute score a feature can give--in my case Bayes seems to top out at 3.5 which seems to let an awful lot through, because with a required score of 6.6 for spam, a 100% hit on Bayes is only 53% of the way to scoring as spam. So far I'd say that's insufficient. How do I adjust that range?
Quote:
3) If I could just increase the Bayesian weighting by about .75 to 1 point, it'd push him over the edge. This is more desirable than lowering the point threshold on ALL messages since it would just give more weight to my Bayesian scores for stuff I have classified as junk, rather than allowing the random combination of all the other scores to increase false positives.
You can manually edit values in your /opt/zimbra/conf/spamassassin folder (there will be a bunch of files in there defining rules)
-Also see the all important /opt/zimbra/conf/amavisd.conf.in (only edit the .in not the live copy - it then gets copied live on restart)
While your browsing through that file you can fix any 'wheight listing' which starts by applying +- to mail from a certain address/domain (there's a few defaults provided.) Negative scores mean it's legit. The higher the positive score the worse.
Also, while there change the 'sa_dsn_cutoff_level' to something more realistic. (near top of file) You dont' want to send delivery status notifications "I got your mail" to the spammers.

(I suggest you change kill/tag levels through zmprov/admin console though -not amavisd.conf.in- so i'll keep through upgrades)

Quote:
On that note, there are references all over the forum to changing the kill and tag percentages on the admin UI, but nowhere have I been able to find documentation of just how those percentage numbers relate mathematically to the various scores I see in the headers of emails. Could someone clarify this for me please?
No sweat, it's the standard 20point system
20in spamassassin/amavisd.conf.in =100% in the admin console
10=50%
5=25%
etc

zmprov mcf zimbraSpamKillPercent 50
(It's given in percentages-so that would kill anything with 10pts on the 20pt scale)

100% = 20pts
33% = 6.6pts
75% = 15pts
etc

You can change the action (discard vs bounce etc) in amavisd.conf.in (don't edit amavisd.conf directly, edit the .in and restart)
$final_spam_destiny=D_DISCARD;

You can also play with the dsn (delivery status notification) setting; so over a certain level you won't be responding 'I got your mail' to the spammers.
$sa_dsn_cutoff_level = 50;

To delete/not bother quarantining high scoring spam (therefore reducing the number of items in the quarantine) this setting allows you to discard quarantined spam above this level:
$sa_quarantine_cutoff_level = 90;
It is cleaned up every day though:
0 1 * * * find /opt/zimbra/amavisd/quarantine -type f -mtime +7 -exec rm -f {} \; > /dev/null 2>&1

Note: In that amavisd.conf.in file, wherever possible it's better to set the values with zmprov/admin console (ie: the tag & kill levels) so that it stays consistent across upgrades.

zmprov mcf zimbraSpamTagPercent 30
-would put everything above 6pts in the junk folder (and label with a custom **SPAM** in the subject line if you have that enabled) - I personally don't because it's already in the junk folder.

tag level - Mail goes to the junk folder (Unless the user has their own filter that moves it elsewhere; then I suggest they/you make an accompanying x-spam-level header filter: say contains at least **** then move back to junk etc)

kill level - The mail does not get delivered to the users (unless you set final_spam_destiny to D_PASS - values are D_PASS, D_BOUNCE, D_REJECT and D_DISCARD -search the postfix documentation for descriptions)

For more ideas see Improving Anti-spam system - ZimbraWiki
I especially like graylisting- You take the mail 'hold it', then you send back a temporary error; so that they try mail delivery again. Then when a legit connection is attempted again the mail goes through. Spammers just tend to move on and not bother. The preferred method: if no retry is made within say 1hr you add x points to it's score and still deliver it.
Razor is also very good.

Last edited by mmorse; 10-10-2007 at 05:08 AM.. Reason: turning this into a general article...
Reply With Quote
  #7 (permalink)  
Old 08-30-2007, 01:18 PM
Moderator
 
Posts: 1,027
Default Bayes Criteria adjustment

Thanks for all the recommendations mmorse. I will be implementing a number of them right away.

Some may say RTFM on this, but I'm not sure what FM to R so please forgive me. . . . . . but I looked at a whole bunch of the config files in /opt/zimbra/conf/spamassassin and I can't figure out how which of those files influences the total points assigned by the Bayesian filter to each message.

When I look at my junk messages, I can see that the maximum score is 3.5 points (for what I'm guessing is a 100% or 95% + hit or something like that). If my theory is correct (and I'm far from certain it is), some configuration file somewhere says that the Bayesian filter has a range to play with, from -2.599 for known ham to +3.5 for known spam, and then it assigns a number based on a calculated probability that a message is ham or spam. This would mean, for example, that if a message is 60% match to the Baysesian spam database and a 10% match to the ham database, it would get a spam score of 3.5 * .6 = 2.1, and a ham score of - 2.6 * 0.1 = -0.21, which we would add together as 2.1 - 0.21 = 1.89 for the aggregated Bayes score. Am I anywhere close to correct here?

If I am, then what I want to do is change none of the calculations, but only to increase the number 3.5 to 4.5 or 5.0 to affect the total points awarded for a hit on the Bayes spam database. But I don't see those point ranges specified in the files, so either I'm reading the wrong files or I can't read the syntax.

I hope this makes my question more clear. . .
Reply With Quote
  #8 (permalink)  
Old 08-30-2007, 01:26 PM
Moderator
 
Posts: 6,237
Default

Not in front of a machine right now - but I believe that file started with a 50_ and down in the very end of that file.
Reply With Quote
  #9 (permalink)  
Old 08-30-2007, 03:19 PM
Moderator
 
Posts: 1,027
Default OK, I see where you mean

That file is 50_scores.cf. I did find that very score list way down the file. I had not read that far down because the stuff at the top was all the indicators of hot stocks, hot chicks, etc. The relevant section:
Code:
# make the Bayes scores unmutable (as discussed in bug 4505)
score BAYES_00 0.0001 0.0001 -2.312 -2.599
score BAYES_05 0.0001 0.0001 -1.110 -1.110
score BAYES_20 0.0001 0.0001 -0.740 -0.740
score BAYES_40 0.0001 0.0001 -0.185 -0.185
score BAYES_50 0.0001 0.0001 0.001 0.001
score BAYES_60 0.0001 0.0001 1.0 1.0
score BAYES_80 0.0001 0.0001 2.0 2.0
score BAYES_95 0.0001 0.0001 3.0 3.0
score BAYES_99 0.0001 0.0001 3.5 3.5
However, the top of the file ALSO says
Quote:
# Please don't modify this file as your changes will be overwritten with
# the next update. Use @@LOCAL_RULES_DIR@@/local.cf instead.
Does this mean that if I put the exact same syntax as above in local.cf (which is mostly commented out on my default install) it'll override the settings in 50_scores.cf?

And secondly, you mentioned turning on Razor as a good idea, but I don't see where I can turn it on. . .

Thanks again for all your help on this and many other threads!
Reply With Quote
  #10 (permalink)  
Old 08-30-2007, 08:29 PM
Moderator
 
Posts: 6,237
Default

it always amazes me how I remember where stuff like that is

That's the idea, but there's always plenty of stuff to recheck after upgrades...so I don't bother personally. (Plus I like knowing what the defaults are every time, that way I can make suggestions to help others.)

For other's reading this article, remember you need 200 spam & 200 not-spam to even start bayes filtering. see: CLI zmtrainsa - ZimbraWiki
(add your own networks to the trusted_networks section in /opt/zimbra/conf/spamassassin/local.conf)

Note-the below isn't supported by zimbra:
I put greylisting on my current build-but I'm happy enough with my spam levels that I didn't do razor/pyzor this time around.
Improving Anti-spam system - Razor2 - ZimbraWiki

I would tweak all the other settings as best as you can first, give yourself a month to assess and refine. Then see if you get any user complaints, and if your still getting too much spam go through that wiki for ideas. (and to some 'too much spam' means different things-to me as long as it goes to junk I leave things alone-but if stuff goes to people's inboxes instead then I tweak)

Last edited by mmorse; 09-05-2007 at 03:23 PM..
Reply With Quote
Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes


Similar Threads

Why Join?

Registering let's you ask questions, makes it easier to search, displays any files attached to posts, and notifies you about replies.

blog.zimbra.com




 

SEO by vBSEO ©2011, Crawlability, Inc.