Zimbra offers Open Source email server software and shared calendar for Linux and the Mac
Go Back   Zimbra :: Forums > Zimbra Collaboration Suite > Administrators

Welcome to the Zimbra :: Forums!
Welcome, if you would like to post a comment please register. We also encourage you to explore all things Zimbra with our team and members of the community.

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 04-01-2009, 04:08 PM
Loyal Member
 
Posts: 83
Default RBL for training instead of rejecting?

I found in this link an interesting comment about SA and RBL's: [SOLVED] RBL -- updates

I was thinking about letting stuff marked by the RBL's in, but only to train the spam filters better, as well as some other reasons I've been thinking about.

Anyway, does anybody here have more info on how this works? How to specify which RBL's to use and how to customize the weight given to the spam score with an RBL positive?

Also, what are your ideas of the pros and cons of doing this?

Thanks,

kazoo
Reply With Quote
  #2 (permalink)  
Old 04-01-2009, 11:36 PM
Moderator
 
Posts: 7,928
Default

Well if you use RBLs at the MTA level using zimbraMtaRestriction then emails will be rejected even before they hit SA. SA by default does perform a number of RBL lookups, and you can add your own, then you would be able to tune the scores to your liking. Bear one point in mind though that if you do remove perimeter testing then the load on your server will go up as SA will be doing more work. What are you really trying to achieve ?
__________________
Reply With Quote
  #3 (permalink)  
Old 04-02-2009, 06:51 AM
Zimbra Consultant & Moderator
 
Posts: 20,312
Default

Quote:
Originally Posted by kazooless View Post
Anyway, does anybody here have more info on how this works? How to specify which RBL's to use and how to customize the weight given to the spam score with an RBL positive?

Also, what are your ideas of the pros and cons of doing this?
By definition you would only need to train the anti-spam system with messages that get through the RBLs and other spam traps. You're wasting your cpu cycles for stuff that's already been classified as spam.
__________________
Regards


Bill
Reply With Quote
  #4 (permalink)  
Old 04-02-2009, 11:27 AM
Loyal Member
 
Posts: 83
Default

Thanks for the replies guys. I realize that most recommendations are to just reject an RBL hit at the MTA and this is going against the grain to some point. I'd still like to know the specifics on 'how' to do it though.

uxbod says:
Quote:
What are you really trying to achieve ?
Here are my thoughts. Maybe I'm incorrect on some of them:

1. I run a home server. I have 6 adult e-mail addresses, of which I am the biggest load. Plus I have 2 limited accounts for my oldest children. CPU cycles available are plenty. The machine (dual-core Atom cpu) is sleeping for the most part. I only run Samba and Netatalk on it besides the Zimbra related functions. So, I have plenty of cpu cycles at my disposal, which makes the larger load NOT a concern to me.

2. The RBL's are rejecting a great deal of incoming messages. However, every once in a while I've got a problem with some legitimate messages being bounced and usually the legitimate senders have no clue as to why since they're not technical. An example would be that my uncle just the other day who has a cox.net account and was using their mail server to send was bounced because one of the cox servers was listed in sorbs for some reason. Checking the logs with the RBL's in the front means you have to look here first and if it isn't in the MTA logs, then you've got to look at the SA logs (right?). It seems to me that to let it go through to SA would mean that you have fewer places to look for troubleshooting and for running statistical reports (amavis-logwatch instead of amavis- and postfix- logwatch, assuming I can get amavis-logwatch working: [SOLVED] Detailed Spam Reports?)

3. I would assume that there are a lot of drone spamming machines out there sending the same spam. Not all of them are going to be caught by the RBL lists. If this is true, then it seems to me that SA can learn from the messages that come through that ARE on the RBL's to more accurately discover the same spam messages sent by a non-RBL's IP.

4. Correct me if I'm wrong, but isn't there a way to have the SA rejected items (the higher percentage) go to a spam account instead of just being dropped? So, if I prevent those messages via SA from going to a user's inbox or even junk folder, but it ends up being a 'rejected' false positive, then at least I could go to that spam account and retrieve the false positive and have it learn.

5. What's the best way to use a blacklist/whitelist? Seems to me that you only want one, but if you do it at the mta instead of SA, then spam could get through more easily by spoofing an address that you whitelist. I haven't implemented SPF yet (didn't realize 'til recently that it wasn't included by default), but I would think that using SPF along with all the other tools in the SA arsenal would help with whitelist/blacklist management.

In summary, it just seems to me that having this as an 'option' would be a door to greater manageability for anti-spam management and log analysis.

Thanks,

kazoo
Reply With Quote
  #5 (permalink)  
Old 04-02-2009, 11:26 PM
Moderator
 
Posts: 7,928
Default

If that is your concern then just remove the RBL checking at MTA level ? As I said earlier you can add your own RBLs in at the SA level instead; and the weight the scores appropriately. If you really want to slam the SPAM then I would recommend having a read through [SOLVED] SaneSecurity ClamAV or FuzzyOCR SpamAssassin Plugins as IMHO they really do help; and from my experience pretty much zero FP.

I now only run two RBLs on the front-end
Code:
b.barracudacentral.org       127
zen.spamhaus.org              15
=================================
Total DNSBL rejections:       142
and a mixture of SS sigs and CRM114
Code:
=========================================================
SpamAssassin Rule Hits: Spam
------------------------------------------------------------------------------
Rank     Hits    % Msgs   % Spam    % Ham      Score Rule
----     ----    ------   ------    -----      ----- ----
   1       14    10.37%   93.33%    0.83%          1 CRM114_SPAM
   2       13     9.63%   86.67%    0.00%          8 L_AV_SS_Spam
   3       11     8.15%   73.33%   17.50%      0.001 HTML_MESSAGE
   4       10     7.41%   66.67%    0.00%      1.955 URIBL_BLACK
   5        9     6.67%   60.00%    0.00%        3.5 BAYES_99
   6        9     6.67%   60.00%    0.00%       1.96 RCVD_IN_BL_SPAMCOP_NET
   7        8     5.93%   53.33%    0.00%      1.501 URIBL_JP_SURBL
   8        8     5.93%   53.33%    0.00%        0.5 RAZOR2_CHECK
   9        7     5.19%   46.67%    0.00%        1.5 URIBL_WS_SURBL
  10        7     5.19%   46.67%    0.00%        1.5 RAZOR2_CF_RANGE_E8_51_100
  11        7     5.19%   46.67%    0.00%        0.5 RAZOR2_CF_RANGE_51_100
  12        7     5.19%   46.67%    0.00%       1.86 URIBL_AB_SURBL
  13        6     4.44%   40.00%    0.83%      1.499 URIBL_SBL
  14        6     4.44%   40.00%    0.00%      0.474 URIBL_SC_SURBL
  15        5     3.70%   33.33%    5.83%      1.457 MIME_HTML_ONLY
  16        5     3.70%   33.33%    0.00%      1.083 URIBL_RHS_DOB
  17        5     3.70%   33.33%    0.00%      0.001 HTML_SHORT_LINK_IMG_3
  18        5     3.70%   33.33%    0.00%        1.5 URIBL_OB_SURBL
  19        5     3.70%   33.33%    0.00%      1.546 HTML_IMAGE_ONLY_20
  20        5     3.70%   33.33%    0.00%        3.7 PYZOR_CHECK
...
=============================================================
and the level of SPAM I see in my account is pretty much .0001%
__________________

Last edited by uxbod; 04-02-2009 at 11:29 PM..
Reply With Quote
  #6 (permalink)  
Old 04-03-2009, 10:14 AM
Moderator
 
Posts: 1,432
Default

But I think he has a good point that hasn't been addressed, namely #3. I'm currently running a test installation where I'm deliberately doing nothing at the MTA level, just to see how SA works (and how well). There are a good number of instances where the only difference in whether a message gets through is whether it's from an RBL'ed IP or not. That is, I get two of the same spam, but one gets through and the other doesn't, with the difference in score made up of RBL values. If I was using RBLs at the MTA level, the ones that got through would still be delivered.

So at least according to one line of thinking, it would be valuable to take everything that comes from known "bad" IPs (such as those listed by highly reputable RBLs), and use that to train. (I'm pretty sure that ASSP and XWALL have modes which work this way.)

There are two mitigating issues here. First, if you use some of the tools suggested by Improving Anti-spam system - Zimbra :: Wiki, such as DCC, Pyzor, and Razor, you'll benefit from sharing live data on current spam runs, which is kind of similar to what you'd get by treating RBL-identification of spam as "authoritative".

Second, I haven't studied exactly how SA works, but if you first filter out the obvious spam using RBLs at the MTA level, and then train only on the false-positives/false-negatives contained in the residue, I think this could ultimately train SA to a higher degree of precision--it really depends on how it generates matches and scores.

With regard to #2, Cox is one of the biggest sources of false positives, probably because they have so many home customers with hijacked PCs, and they don't do a good enough job of filtering outgoing spam on their smarthosts. As a result they end up on RBLs like PSBL and SORBS. Basically what I'd recommend is to not employ useful but less-discriminating RBLs for blocking, only for scoring via SA. It's up to you to decide what counts as "useful but less discriminating" but SORBS seem to qualify in your case.

With regard to #4, you're talking about what's referred to as a "spam quarantine" in some systems (e.g. Exchange's native content filtering has this concept). I'm not aware of a way to do this with Zimbra. It may not be a bad idea although personally, anything scored high enough to be dropped is really bad. I suppose if you adjust your scoring weights you might need this, but the principle is that each weight should be an independent indicator of spamminess. A high-enough score therefore means that the message has multiple independent spam characteristics.
__________________
Elliot Wilen
Berkeley, CA

Don't forget to enter your Zimbra version in your forum profile.
Reply With Quote
  #7 (permalink)  
Old 04-03-2009, 12:55 PM
Loyal Member
 
Posts: 83
Default

These are both very helpful replies. Elliot, thank you for your thoughtfulness. I will bounce this around for a while and see what I come up with.

Another note, as soon as I implemented greylisting, the percentage of tagged spam that made it into my junk box automatically dropped to about 10% of previous numbers. Score!

And, I finally got the Amavis-logwatch to work (see link to thread above)

kazoo
Reply With Quote
  #8 (permalink)  
Old 04-03-2009, 02:20 PM
Moderator
 
Posts: 7,928
Default

Search the forums for Barracuda RBL I have been working with different AS techniques for about 10 years now and my current top 3 are :-

1) Barracuda RBL @ perimeter MTA
2) SaneSecurity Signatures
3) CRM14 to complement Bayes DB

Just my 2c worth.
__________________
Reply With Quote
  #9 (permalink)  
Old 12-01-2009, 07:25 PM
Senior Member
 
Posts: 63
Default Barracuda = too many FP for us

We were using the recommended list of RBLs for several months without issue, seeing about 15-20k/day stopped at the MTA, until we had reports of several false positives. Turns out barracudacentral had listed one of the provincial government mail servers, resulting in a BIG problem, as mail from most other schools was blocked. I watched this happen over several days, and each time, when I checked the server at barracuda, it was no longer listed, yet a log entry from hours before showed it was.
I finally ended up removing barracuda from the RBL list. I have wondered about just scoring RBL hits, and may try that next.
Reply With Quote
Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes


Similar Threads

Why Join?

Registering let's you ask questions, makes it easier to search, displays any files attached to posts, and notifies you about replies.

blog.zimbra.com




 

SEO by vBSEO ©2011, Crawlability, Inc.