| Welcome to the Zimbra :: Forums! | |
Welcome, if you would like to post a comment please register.
We also encourage you to explore all things Zimbra with our team and members of the community.
|  | 
04-01-2009, 04:08 PM
| | | RBL for training instead of rejecting? I found in this link an interesting comment about SA and RBL's: [SOLVED] RBL -- updates
I was thinking about letting stuff marked by the RBL's in, but only to train the spam filters better, as well as some other reasons I've been thinking about.
Anyway, does anybody here have more info on how this works? How to specify which RBL's to use and how to customize the weight given to the spam score with an RBL positive?
Also, what are your ideas of the pros and cons of doing this?
Thanks,
kazoo | 
04-01-2009, 11:36 PM
| | | Well if you use RBLs at the MTA level using zimbraMtaRestriction then emails will be rejected even before they hit SA. SA by default does perform a number of RBL lookups, and you can add your own, then you would be able to tune the scores to your liking. Bear one point in mind though that if you do remove perimeter testing then the load on your server will go up as SA will be doing more work. What are you really trying to achieve ?
__________________ | 
04-02-2009, 06:51 AM
| | Zimbra Consultant & Moderator | |
Posts: 20,312
| | Quote:
Originally Posted by kazooless Anyway, does anybody here have more info on how this works? How to specify which RBL's to use and how to customize the weight given to the spam score with an RBL positive?
Also, what are your ideas of the pros and cons of doing this? | By definition you would only need to train the anti-spam system with messages that get through the RBLs and other spam traps. You're wasting your cpu cycles for stuff that's already been classified as spam.
__________________
Regards
Bill
| 
04-02-2009, 11:27 AM
| | | Thanks for the replies guys. I realize that most recommendations are to just reject an RBL hit at the MTA and this is going against the grain to some point. I'd still like to know the specifics on 'how' to do it though.
uxbod says: Quote: |
What are you really trying to achieve ?
| Here are my thoughts. Maybe I'm incorrect on some of them:
1. I run a home server. I have 6 adult e-mail addresses, of which I am the biggest load. Plus I have 2 limited accounts for my oldest children. CPU cycles available are plenty. The machine (dual-core Atom cpu) is sleeping for the most part. I only run Samba and Netatalk on it besides the Zimbra related functions. So, I have plenty of cpu cycles at my disposal, which makes the larger load NOT a concern to me.
2. The RBL's are rejecting a great deal of incoming messages. However, every once in a while I've got a problem with some legitimate messages being bounced and usually the legitimate senders have no clue as to why since they're not technical. An example would be that my uncle just the other day who has a cox.net account and was using their mail server to send was bounced because one of the cox servers was listed in sorbs for some reason. Checking the logs with the RBL's in the front means you have to look here first and if it isn't in the MTA logs, then you've got to look at the SA logs (right?). It seems to me that to let it go through to SA would mean that you have fewer places to look for troubleshooting and for running statistical reports (amavis-logwatch instead of amavis- and postfix- logwatch, assuming I can get amavis-logwatch working: [SOLVED] Detailed Spam Reports?)
3. I would assume that there are a lot of drone spamming machines out there sending the same spam. Not all of them are going to be caught by the RBL lists. If this is true, then it seems to me that SA can learn from the messages that come through that ARE on the RBL's to more accurately discover the same spam messages sent by a non-RBL's IP.
4. Correct me if I'm wrong, but isn't there a way to have the SA rejected items (the higher percentage) go to a spam account instead of just being dropped? So, if I prevent those messages via SA from going to a user's inbox or even junk folder, but it ends up being a 'rejected' false positive, then at least I could go to that spam account and retrieve the false positive and have it learn.
5. What's the best way to use a blacklist/whitelist? Seems to me that you only want one, but if you do it at the mta instead of SA, then spam could get through more easily by spoofing an address that you whitelist. I haven't implemented SPF yet (didn't realize 'til recently that it wasn't included by default), but I would think that using SPF along with all the other tools in the SA arsenal would help with whitelist/blacklist management.
In summary, it just seems to me that having this as an 'option' would be a door to greater manageability for anti-spam management and log analysis.
Thanks,
kazoo | 
04-02-2009, 11:26 PM
| | | If that is your concern then just remove the RBL checking at MTA level ? As I said earlier you can add your own RBLs in at the SA level instead; and the weight the scores appropriately. If you really want to slam the SPAM then I would recommend having a read through [SOLVED] SaneSecurity ClamAV or FuzzyOCR SpamAssassin Plugins as IMHO they really do help; and from my experience pretty much zero FP.
I now only run two RBLs on the front-end Code: b.barracudacentral.org 127
zen.spamhaus.org 15
=================================
Total DNSBL rejections: 142 and a mixture of SS sigs and CRM114 Code: =========================================================
SpamAssassin Rule Hits: Spam
------------------------------------------------------------------------------
Rank Hits % Msgs % Spam % Ham Score Rule
---- ---- ------ ------ ----- ----- ----
1 14 10.37% 93.33% 0.83% 1 CRM114_SPAM
2 13 9.63% 86.67% 0.00% 8 L_AV_SS_Spam
3 11 8.15% 73.33% 17.50% 0.001 HTML_MESSAGE
4 10 7.41% 66.67% 0.00% 1.955 URIBL_BLACK
5 9 6.67% 60.00% 0.00% 3.5 BAYES_99
6 9 6.67% 60.00% 0.00% 1.96 RCVD_IN_BL_SPAMCOP_NET
7 8 5.93% 53.33% 0.00% 1.501 URIBL_JP_SURBL
8 8 5.93% 53.33% 0.00% 0.5 RAZOR2_CHECK
9 7 5.19% 46.67% 0.00% 1.5 URIBL_WS_SURBL
10 7 5.19% 46.67% 0.00% 1.5 RAZOR2_CF_RANGE_E8_51_100
11 7 5.19% 46.67% 0.00% 0.5 RAZOR2_CF_RANGE_51_100
12 7 5.19% 46.67% 0.00% 1.86 URIBL_AB_SURBL
13 6 4.44% 40.00% 0.83% 1.499 URIBL_SBL
14 6 4.44% 40.00% 0.00% 0.474 URIBL_SC_SURBL
15 5 3.70% 33.33% 5.83% 1.457 MIME_HTML_ONLY
16 5 3.70% 33.33% 0.00% 1.083 URIBL_RHS_DOB
17 5 3.70% 33.33% 0.00% 0.001 HTML_SHORT_LINK_IMG_3
18 5 3.70% 33.33% 0.00% 1.5 URIBL_OB_SURBL
19 5 3.70% 33.33% 0.00% 1.546 HTML_IMAGE_ONLY_20
20 5 3.70% 33.33% 0.00% 3.7 PYZOR_CHECK
...
============================================================= and the level of SPAM I see in my account is pretty much .0001% 
__________________
Last edited by uxbod; 04-02-2009 at 11:29 PM..
| 
04-03-2009, 10:14 AM
| | | But I think he has a good point that hasn't been addressed, namely #3. I'm currently running a test installation where I'm deliberately doing nothing at the MTA level, just to see how SA works (and how well). There are a good number of instances where the only difference in whether a message gets through is whether it's from an RBL'ed IP or not. That is, I get two of the same spam, but one gets through and the other doesn't, with the difference in score made up of RBL values. If I was using RBLs at the MTA level, the ones that got through would still be delivered.
So at least according to one line of thinking, it would be valuable to take everything that comes from known "bad" IPs (such as those listed by highly reputable RBLs), and use that to train. (I'm pretty sure that ASSP and XWALL have modes which work this way.)
There are two mitigating issues here. First, if you use some of the tools suggested by Improving Anti-spam system - Zimbra :: Wiki, such as DCC, Pyzor, and Razor, you'll benefit from sharing live data on current spam runs, which is kind of similar to what you'd get by treating RBL-identification of spam as "authoritative".
Second, I haven't studied exactly how SA works, but if you first filter out the obvious spam using RBLs at the MTA level, and then train only on the false-positives/false-negatives contained in the residue, I think this could ultimately train SA to a higher degree of precision--it really depends on how it generates matches and scores.
With regard to #2, Cox is one of the biggest sources of false positives, probably because they have so many home customers with hijacked PCs, and they don't do a good enough job of filtering outgoing spam on their smarthosts. As a result they end up on RBLs like PSBL and SORBS. Basically what I'd recommend is to not employ useful but less-discriminating RBLs for blocking, only for scoring via SA. It's up to you to decide what counts as "useful but less discriminating" but SORBS seem to qualify in your case.
With regard to #4, you're talking about what's referred to as a "spam quarantine" in some systems (e.g. Exchange's native content filtering has this concept). I'm not aware of a way to do this with Zimbra. It may not be a bad idea although personally, anything scored high enough to be dropped is really bad. I suppose if you adjust your scoring weights you might need this, but the principle is that each weight should be an independent indicator of spamminess. A high-enough score therefore means that the message has multiple independent spam characteristics. | 
04-03-2009, 12:55 PM
| | | These are both very helpful replies. Elliot, thank you for your thoughtfulness. I will bounce this around for a while and see what I come up with.
Another note, as soon as I implemented greylisting, the percentage of tagged spam that made it into my junk box automatically dropped to about 10% of previous numbers. Score!
And, I finally got the Amavis-logwatch to work (see link to thread above)
kazoo | 
04-03-2009, 02:20 PM
| | | Search the forums for Barracuda RBL  I have been working with different AS techniques for about 10 years now and my current top 3 are :-
1) Barracuda RBL @ perimeter MTA
2) SaneSecurity Signatures
3) CRM14 to complement Bayes DB
Just my 2c worth.
__________________ | 
12-01-2009, 07:25 PM
| | | Barracuda = too many FP for us We were using the recommended list of RBLs for several months without issue, seeing about 15-20k/day stopped at the MTA, until we had reports of several false positives. Turns out barracudacentral had listed one of the provincial government mail servers, resulting in a BIG problem, as mail from most other schools was blocked. I watched this happen over several days, and each time, when I checked the server at barracuda, it was no longer listed, yet a log entry from hours before showed it was.
I finally ended up removing barracuda from the RBL list. I have wondered about just scoring RBL hits, and may try that next. | | Thread Tools | Search this Thread | | | | | Display Modes | Linear Mode | | Why Join? Registering let's you ask questions, makes it easier to search, displays any files attached to posts, and notifies you about replies.  |