Page 1 of 2 12 LastLast
Results 1 to 10 of 16

Thread: Correcting poisoned Auto-Whitelist (AWL)

  1. #1
    dwmtractor's Avatar
    dwmtractor is offline Moderator
    Join Date
    Jul 2007
    Location
    San Jose, CA
    Posts
    1,027
    Rep Power
    9

    Default Correcting poisoned Auto-Whitelist (AWL)

    I have a number of messages that are being tagged as "not spam" despite a high Bayes-99 score. I looked into the headers on them, and I see that the main score that's cutting them down is a strong negative AWL score. I'm guessing, though I can't tell for certain, that this may be the result of my having run the zmtrainsa ham on an inbox that had not been fully purged, so good chance it's my own dumb fault.

    That said, I need to clean some sender addresses/domains out of the AWL database, and I can't find it. Studying different parts of the wiki, I have looked at /opt/zimbra/conf/salocal.cf and salocal.cf.in, and amavisd.cf and amavsd.cf.in. I've also gone to /opt/zimbra/conf/spamassassin and looked through all the 60_whitelist* files. None of these contain the domains that are sourcing this spam.

    Can anyone point me to the proper location so I can not only remove the specific offending domains I see now, but evaluate the whole list for others I should remove?

    Thanks

    Dan

  2. #2
    brained is offline Loyal Member
    Join Date
    Dec 2005
    Posts
    94
    Rep Power
    9

    Default

    I'm not sure if you can peruse the list and alter it.
    But on occasion I've been in the same boat. I just blow away the list and it'll automatically rebuild itself.

    Stop amavis

    rm -f /opt/zimbra/amavisd/.spamassassin/auto-whitelist*

    Start amavis

  3. #3
    mmorse's Avatar
    mmorse is offline Moderator
    Join Date
    May 2006
    Location
    USA
    Posts
    6,242
    Rep Power
    20

    Default

    Code:
    zmamavisdctl stop
    rm -f /opt/zimbra/amavisd/.spamassassin/auto-whitelist*
     zmamavisdctl start
    In later versions it's going to be /opt/zimbra/data/amavisd/.spamassassin/auto-whitelist* You can also use zmantispamctl if desired.

    While you did say you had a bunch, if you ever wanted to do it for one address it's like:
    spamassassin --remove-addr-from-whitelist user@domain.com
    (Remove the named email address from the automatic whitelist)

    That sooo needs to be called auto-averaging because it can technically go either way, the whole point is to average out the spikes for regular correspondents.
    Of course, by the very nature of averages (Law of averages - Wikipedia) this means that it will push the high points down, and the low points up, revolving towards the average for that sender. And as you discovered, it works perfectly as long as the average for that sender is on the right side of the spam/ham fence.

    Anyways, alternatively you could correct the low AWL score by --add-addr-to-blacklist
    (Add the named email address to the automatic whitelist with a high score, ensuring they will be 'blacklisted')
    Last edited by mmorse; 06-04-2009 at 07:11 PM. Reason: new /data/ directory

  4. #4
    dwmtractor's Avatar
    dwmtractor is offline Moderator
    Join Date
    Jul 2007
    Location
    San Jose, CA
    Posts
    1,027
    Rep Power
    9

    Default

    So, Mike, are you saying that brained is right and perusing the list to decide what needs purging is not an option? And how much averaging does it take? the message is already getting a maximum Bayes score but is getting about half of that score robbed back by the AWL. Any clue how many times I have to re-define a message as junk before AWL is neutralized?

  5. #5
    mmorse's Avatar
    mmorse is offline Moderator
    Join Date
    May 2006
    Location
    USA
    Posts
    6,242
    Rep Power
    20

    Default

    Quote Originally Posted by dwmtractor View Post
    So, Mike, are you saying that brained is right and perusing the list to decide what needs purging is not an option?
    If you know you have a lot wrong just dump it.
    Or lookup check_whitelist - It's like ./check_whitelist | grep user@domain.com
    Quote Originally Posted by dwmtractor View Post
    And how much averaging does it take? the message is already getting a maximum Bayes score but is getting about half of that score robbed back by the AWL.
    It's the average between the currentMessageScore & the currentHistoryAverage.
    (For the stats buffs that's x-bar of the x-bar-bar right? average of a sample of a grand sample? Sometimes it makes me wish I paid more attention back in the day...)

    For example: If someone with a currentHistoryAverage of 1 sent me an email that had some bad items in it, and it got a currentMessageScore of 9 before AWL is applied:
    The AWL system would get the resulting average as 5 (1+9=10 10/2=5) so the AWL would knock 4 points off the score in this case. And the final score is 5.

    (I believe there's also a setting for how much force/weighting to give it, ie: You could choose half of that 4 so it would only bring it down 2 points for a final score of 7.)

    A quicker description for what your having happen: Some spam that you got everyday with a past average of 10, and today gets a current score of 20 before AWL is applied, can result in the AWL knocking 5 points off the score, resulting in a final score of 15.

    The algorithm works using a database of entries. Each entry has a key formed by the From: address of the mail, and the IP address it originated at, and contains a TOTAL score and a COUNT number. The MEAN score is TOTAL/COUNT.
    The current algorithm works as follows:
    1. Compute the SCORE of the message without AWL
    2. Compute AWL DELTA as (MEAN-SCORE)*auto_whitelist_factor
    3. Increment TOTAL by SCORE
    4. Increment COUNT by one
    5. Set the final score of the message to SCORE+DELTA
    That finalScore isn't put back into the database of prior history for SA scores so your not slowly skewing it towards one value and fixing it there!
    Quote Originally Posted by dwmtractor View Post
    Any clue how many times I have to re-define a message as junk before AWL is neutralized?
    Sound's like a fun math problem to me...
    Last edited by mmorse; 10-14-2007 at 08:08 AM.

  6. #6
    mmorse's Avatar
    mmorse is offline Moderator
    Join Date
    May 2006
    Location
    USA
    Posts
    6,242
    Rep Power
    20

    Default

    Quote Originally Posted by mmorse View Post
    (I believe there's also a setting for how much force/weighting to give it, ie: you could choose half of that 4 so it would only bring it down 2 points for a final score of 7.)
    Found it & duh! It was already just farther down in that algorithm description quote - but upon realizing that I figured it was best to just make a new post for clarity.

    auto_whitelist_factor n (default of 0.5, possible range of 0 to 1)
    "How much towards the long-term mean for the sender to regress a message."

    Basically, your tracking the long-term average spam scores of messages for the sender (mean).
    Then, once the other checks have otherwise fully calculated the score for this message (aka preScore), AWL calculates the finalScore for the message as:
    Code:
    finalScore = preScore + (mean - preScore) * factor
    So if factor = 0.5, then it moves to half way between the calculated preScore and the mean. (toward the historyAverageScore)
    If factor = 0.3, then we'll move about 1/3 of the way from the score toward the mean.

    The higher the value the closer toward using the stored historyAverageScore you are brought.
    Thus a factor = 1 would be just to use the long-term average of all the SA scores.

    And with a factor = 0 you're just using the calculated preScore....so it's a good alternative to turning use_auto_whitelist to 0 because you would still be creating the awl database but not acting on it yet.

    ----

    So that would make my recommendation for first installs:
    -use_auto_whitelist 0 for one week so that you work out SA rules as desired (also called sa_auto_whitelist=0)
    -followed by a week of use_auto_whitelist 1 & auto_whitelist_factor 0 (to train the database)
    -then increase the factor to 0.3 for another week (to start using the database)
    -and factor of 0.5 from then on

    UPDATE found the difference:
    The $sa_auto_whitelist must be specified in amavisd.conf for SA versions older than 3.0, there was no equivalent options in local.cf. Starting with SA 3.0, there is now an option use_auto_whitelist to be specified in local.cf, and the $sa_auto_whitelist is ignored.
    ----

    It would be even cooler to have something also consider the moving range, variance, & standard deviation; but you'd be adding precious CPU cycles. Of course, I'm sure there's stuff out there that does and boy I bet it's hard to develop the desired logic.
    Last edited by mmorse; 10-31-2007 at 01:12 PM.

  7. #7
    brained is offline Loyal Member
    Join Date
    Dec 2005
    Posts
    94
    Rep Power
    9

    Default

    It's kind of covered with the above posts but is worth mentioning outright. You really need to know why it was scored so high or so low the first time(s). Clearing the database and starting over is of no value if it is just going to happen again.

    In my case, I've had users reply to SPAM, and after appropriate re-education, removing the spammers address from the address book, etc, etc. Then I'm ready to clear the AWL.

    Fix the SA scoring first, then clear the AWL. Not using it at all during the initial training is also a great idea.

  8. #8
    dwmtractor's Avatar
    dwmtractor is offline Moderator
    Join Date
    Jul 2007
    Location
    San Jose, CA
    Posts
    1,027
    Rep Power
    9

    Default

    Thanks for all this detail, Mike. Understanding the theory is very helpful here.

    Quote Originally Posted by mmorse View Post
    If you know you have a lot wrong just dump it.
    Or lookup check_whitelist - It's like ./check_whitelist | grep user@domain.com
    What's the path to check_whitelist? I'm not finding the command/script.

  9. #9
    dwmtractor's Avatar
    dwmtractor is offline Moderator
    Join Date
    Jul 2007
    Location
    San Jose, CA
    Posts
    1,027
    Rep Power
    9

    Default

    Just bumping this. Mike, is check_whitelist a script that is supposed to be part of zimbra somewhere? This command isn't working on my system.

    Quote Originally Posted by dwmtractor View Post
    Thanks for all this detail, Mike. Understanding the theory is very helpful here.



    What's the path to check_whitelist? I'm not finding the command/script.

  10. #10
    mmorse's Avatar
    mmorse is offline Moderator
    Join Date
    May 2006
    Location
    USA
    Posts
    6,242
    Rep Power
    20

    Default

    never done it myself...but here's a copy of it: http://spamassassin.apache.org/full/...heck_whitelist
    Last edited by mmorse; 11-01-2007 at 12:51 PM.

Page 1 of 2 12 LastLast

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •