Results 1 to 9 of 9

Thread: RBL for training instead of rejecting?

  1. #1
    kazooless is offline Loyal Member
    Join Date
    Mar 2009
    Posts
    91
    Rep Power
    6

    Default RBL for training instead of rejecting?

    I found in this link an interesting comment about SA and RBL's: [SOLVED] RBL -- updates

    I was thinking about letting stuff marked by the RBL's in, but only to train the spam filters better, as well as some other reasons I've been thinking about.

    Anyway, does anybody here have more info on how this works? How to specify which RBL's to use and how to customize the weight given to the spam score with an RBL positive?

    Also, what are your ideas of the pros and cons of doing this?

    Thanks,

    kazoo

  2. #2
    uxbod's Avatar
    uxbod is offline Moderator
    Join Date
    Nov 2006
    Location
    UK
    Posts
    8,017
    Rep Power
    24

    Default

    Well if you use RBLs at the MTA level using zimbraMtaRestriction then emails will be rejected even before they hit SA. SA by default does perform a number of RBL lookups, and you can add your own, then you would be able to tune the scores to your liking. Bear one point in mind though that if you do remove perimeter testing then the load on your server will go up as SA will be doing more work. What are you really trying to achieve ?

  3. #3
    phoenix is online now Zimbra Consultant & Moderator
    Join Date
    Sep 2005
    Location
    Vannes, France
    Posts
    23,586
    Rep Power
    57

    Default

    Quote Originally Posted by kazooless View Post
    Anyway, does anybody here have more info on how this works? How to specify which RBL's to use and how to customize the weight given to the spam score with an RBL positive?

    Also, what are your ideas of the pros and cons of doing this?
    By definition you would only need to train the anti-spam system with messages that get through the RBLs and other spam traps. You're wasting your cpu cycles for stuff that's already been classified as spam.
    Regards


    Bill


    Acompli: A new adventure for Co-Founder KevinH.

  4. #4
    kazooless is offline Loyal Member
    Join Date
    Mar 2009
    Posts
    91
    Rep Power
    6

    Default

    Thanks for the replies guys. I realize that most recommendations are to just reject an RBL hit at the MTA and this is going against the grain to some point. I'd still like to know the specifics on 'how' to do it though.

    uxbod says:
    What are you really trying to achieve ?
    Here are my thoughts. Maybe I'm incorrect on some of them:

    1. I run a home server. I have 6 adult e-mail addresses, of which I am the biggest load. Plus I have 2 limited accounts for my oldest children. CPU cycles available are plenty. The machine (dual-core Atom cpu) is sleeping for the most part. I only run Samba and Netatalk on it besides the Zimbra related functions. So, I have plenty of cpu cycles at my disposal, which makes the larger load NOT a concern to me.

    2. The RBL's are rejecting a great deal of incoming messages. However, every once in a while I've got a problem with some legitimate messages being bounced and usually the legitimate senders have no clue as to why since they're not technical. An example would be that my uncle just the other day who has a cox.net account and was using their mail server to send was bounced because one of the cox servers was listed in sorbs for some reason. Checking the logs with the RBL's in the front means you have to look here first and if it isn't in the MTA logs, then you've got to look at the SA logs (right?). It seems to me that to let it go through to SA would mean that you have fewer places to look for troubleshooting and for running statistical reports (amavis-logwatch instead of amavis- and postfix- logwatch, assuming I can get amavis-logwatch working: [SOLVED] Detailed Spam Reports?)

    3. I would assume that there are a lot of drone spamming machines out there sending the same spam. Not all of them are going to be caught by the RBL lists. If this is true, then it seems to me that SA can learn from the messages that come through that ARE on the RBL's to more accurately discover the same spam messages sent by a non-RBL's IP.

    4. Correct me if I'm wrong, but isn't there a way to have the SA rejected items (the higher percentage) go to a spam account instead of just being dropped? So, if I prevent those messages via SA from going to a user's inbox or even junk folder, but it ends up being a 'rejected' false positive, then at least I could go to that spam account and retrieve the false positive and have it learn.

    5. What's the best way to use a blacklist/whitelist? Seems to me that you only want one, but if you do it at the mta instead of SA, then spam could get through more easily by spoofing an address that you whitelist. I haven't implemented SPF yet (didn't realize 'til recently that it wasn't included by default), but I would think that using SPF along with all the other tools in the SA arsenal would help with whitelist/blacklist management.

    In summary, it just seems to me that having this as an 'option' would be a door to greater manageability for anti-spam management and log analysis.

    Thanks,

    kazoo

  5. #5
    uxbod's Avatar
    uxbod is offline Moderator
    Join Date
    Nov 2006
    Location
    UK
    Posts
    8,017
    Rep Power
    24

    Default

    If that is your concern then just remove the RBL checking at MTA level ? As I said earlier you can add your own RBLs in at the SA level instead; and the weight the scores appropriately. If you really want to slam the SPAM then I would recommend having a read through [SOLVED] SaneSecurity ClamAV or FuzzyOCR SpamAssassin Plugins as IMHO they really do help; and from my experience pretty much zero FP.

    I now only run two RBLs on the front-end
    Code:
    b.barracudacentral.org       127
    zen.spamhaus.org              15
    =================================
    Total DNSBL rejections:       142
    and a mixture of SS sigs and CRM114
    Code:
    =========================================================
    SpamAssassin Rule Hits: Spam
    ------------------------------------------------------------------------------
    Rank     Hits    % Msgs   % Spam    % Ham      Score Rule
    ----     ----    ------   ------    -----      ----- ----
       1       14    10.37%   93.33%    0.83%          1 CRM114_SPAM
       2       13     9.63%   86.67%    0.00%          8 L_AV_SS_Spam
       3       11     8.15%   73.33%   17.50%      0.001 HTML_MESSAGE
       4       10     7.41%   66.67%    0.00%      1.955 URIBL_BLACK
       5        9     6.67%   60.00%    0.00%        3.5 BAYES_99
       6        9     6.67%   60.00%    0.00%       1.96 RCVD_IN_BL_SPAMCOP_NET
       7        8     5.93%   53.33%    0.00%      1.501 URIBL_JP_SURBL
       8        8     5.93%   53.33%    0.00%        0.5 RAZOR2_CHECK
       9        7     5.19%   46.67%    0.00%        1.5 URIBL_WS_SURBL
      10        7     5.19%   46.67%    0.00%        1.5 RAZOR2_CF_RANGE_E8_51_100
      11        7     5.19%   46.67%    0.00%        0.5 RAZOR2_CF_RANGE_51_100
      12        7     5.19%   46.67%    0.00%       1.86 URIBL_AB_SURBL
      13        6     4.44%   40.00%    0.83%      1.499 URIBL_SBL
      14        6     4.44%   40.00%    0.00%      0.474 URIBL_SC_SURBL
      15        5     3.70%   33.33%    5.83%      1.457 MIME_HTML_ONLY
      16        5     3.70%   33.33%    0.00%      1.083 URIBL_RHS_DOB
      17        5     3.70%   33.33%    0.00%      0.001 HTML_SHORT_LINK_IMG_3
      18        5     3.70%   33.33%    0.00%        1.5 URIBL_OB_SURBL
      19        5     3.70%   33.33%    0.00%      1.546 HTML_IMAGE_ONLY_20
      20        5     3.70%   33.33%    0.00%        3.7 PYZOR_CHECK
    ...
    =============================================================
    and the level of SPAM I see in my account is pretty much .0001%
    Last edited by uxbod; 04-02-2009 at 11:29 PM.

  6. #6
    ewilen's Avatar
    ewilen is offline Moderator
    Join Date
    Jun 2008
    Location
    Berkeley, CA
    Posts
    1,474
    Rep Power
    9

    Default

    But I think he has a good point that hasn't been addressed, namely #3. I'm currently running a test installation where I'm deliberately doing nothing at the MTA level, just to see how SA works (and how well). There are a good number of instances where the only difference in whether a message gets through is whether it's from an RBL'ed IP or not. That is, I get two of the same spam, but one gets through and the other doesn't, with the difference in score made up of RBL values. If I was using RBLs at the MTA level, the ones that got through would still be delivered.

    So at least according to one line of thinking, it would be valuable to take everything that comes from known "bad" IPs (such as those listed by highly reputable RBLs), and use that to train. (I'm pretty sure that ASSP and XWALL have modes which work this way.)

    There are two mitigating issues here. First, if you use some of the tools suggested by Improving Anti-spam system - Zimbra :: Wiki, such as DCC, Pyzor, and Razor, you'll benefit from sharing live data on current spam runs, which is kind of similar to what you'd get by treating RBL-identification of spam as "authoritative".

    Second, I haven't studied exactly how SA works, but if you first filter out the obvious spam using RBLs at the MTA level, and then train only on the false-positives/false-negatives contained in the residue, I think this could ultimately train SA to a higher degree of precision--it really depends on how it generates matches and scores.

    With regard to #2, Cox is one of the biggest sources of false positives, probably because they have so many home customers with hijacked PCs, and they don't do a good enough job of filtering outgoing spam on their smarthosts. As a result they end up on RBLs like PSBL and SORBS. Basically what I'd recommend is to not employ useful but less-discriminating RBLs for blocking, only for scoring via SA. It's up to you to decide what counts as "useful but less discriminating" but SORBS seem to qualify in your case.

    With regard to #4, you're talking about what's referred to as a "spam quarantine" in some systems (e.g. Exchange's native content filtering has this concept). I'm not aware of a way to do this with Zimbra. It may not be a bad idea although personally, anything scored high enough to be dropped is really bad. I suppose if you adjust your scoring weights you might need this, but the principle is that each weight should be an independent indicator of spamminess. A high-enough score therefore means that the message has multiple independent spam characteristics.

  7. #7
    kazooless is offline Loyal Member
    Join Date
    Mar 2009
    Posts
    91
    Rep Power
    6

    Default

    These are both very helpful replies. Elliot, thank you for your thoughtfulness. I will bounce this around for a while and see what I come up with.

    Another note, as soon as I implemented greylisting, the percentage of tagged spam that made it into my junk box automatically dropped to about 10% of previous numbers. Score!

    And, I finally got the Amavis-logwatch to work (see link to thread above)

    kazoo

  8. #8
    uxbod's Avatar
    uxbod is offline Moderator
    Join Date
    Nov 2006
    Location
    UK
    Posts
    8,017
    Rep Power
    24

    Default

    Search the forums for Barracuda RBL I have been working with different AS techniques for about 10 years now and my current top 3 are :-

    1) Barracuda RBL @ perimeter MTA
    2) SaneSecurity Signatures
    3) CRM14 to complement Bayes DB

    Just my 2c worth.

  9. #9
    swrightsls is offline Senior Member
    Join Date
    Feb 2009
    Location
    Shawnigan Lake, BC, Canada
    Posts
    66
    Rep Power
    6

    Default Barracuda = too many FP for us

    We were using the recommended list of RBLs for several months without issue, seeing about 15-20k/day stopped at the MTA, until we had reports of several false positives. Turns out barracudacentral had listed one of the provincial government mail servers, resulting in a BIG problem, as mail from most other schools was blocked. I watched this happen over several days, and each time, when I checked the server at barracuda, it was no longer listed, yet a log entry from hours before showed it was.
    I finally ended up removing barracuda from the RBL list. I have wondered about just scoring RBL hits, and may try that next.

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. zen.spamhaus.org RBL
    By keffa in forum Administrators
    Replies: 27
    Last Post: 07-28-2010, 01:35 PM
  2. [SOLVED] Question on RBL
    By jet in forum Administrators
    Replies: 18
    Last Post: 05-04-2009, 12:21 PM
  3. [SOLVED] RBL -- updates
    By padraig in forum Administrators
    Replies: 22
    Last Post: 01-09-2008, 05:01 AM
  4. How to handle different RBL return codes
    By bking in forum Administrators
    Replies: 3
    Last Post: 10-10-2007, 04:52 AM
  5. Trend Micro RBL doesn't work
    By crowley in forum Administrators
    Replies: 2
    Last Post: 07-25-2007, 06:41 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •