I've noticed that some mails are getting left in the spam account for extended times. E.g. I just ran a test by doing
/opt/zimbra/bin/zmtrainsa >> /opt/zimbra/log/spamtrain.log 2>&1 /opt/zimbra/bin/zmtrainsa --cleanup >> /opt/zimbra/log/spamtrain.log 2>&1
Then I looked in spamtrain.log and saw:
Quote:
20090730122011 Starting spam/ham extraction from system accounts.
[] INFO: Total messages processed: 42
[] INFO: Total messages processed: 4
20090730122017 Finished extracting spam/ham from system accounts.
20090730122017 Starting spamassassin training.
netset: cannot include 127.0.0.0/8 as it has already been included
Learned tokens from 22 message(s) (42 message(s) examined)
netset: cannot include 127.0.0.0/8 as it has already been included
Learned tokens from 4 message(s) (4 message(s) examined)
netset: cannot include 127.0.0.0/8 as it has already been included
bayes: synced databases from journal in 1 seconds: 2982 unique entries (2982 total entries)
20090730122025 Finished spamassassin training.
20090730122036 Starting spam/ham cleanup
[] INFO: Total messages processed: 30
[] INFO: Total messages processed: 4
20090730122042 Finished spam/ham cleanup
|
As you can see, 12 fewer messages were processed by cleanup than by extraction. I confirmed this by looking in the spam account's inbox. Then I ran the two commands again and looked at the log:
Quote:
20090730122241 Starting spam/ham extraction from system accounts.
[] INFO: Total messages processed: 12
[] INFO: Total messages processed: 0
20090730122245 Finished extracting spam/ham from system accounts.
20090730122245 Starting spamassassin training.
netset: cannot include 127.0.0.0/8 as it has already been included
Learned tokens from 0 message(s) (12 message(s) examined)
netset: cannot include 127.0.0.0/8 as it has already been included
Learned tokens from 0 message(s) (0 message(s) examined)
netset: cannot include 127.0.0.0/8 as it has already been included
bayes: synced databases from journal in 0 seconds: 340 unique entries (539 total entries)
20090730122249 Finished spamassassin training.
20090730122254 Starting spam/ham cleanup
[] INFO: Total messages processed: 12
[] INFO: Total messages processed: 0
20090730122259 Finished spam/ham cleanup
|
So you can see the final 12 messages were extracted/processed AGAIN (potentially overweighting their contents?) then cleaned up.
Any idea why this is happening? The only clue I can suggest is that the cleanup might be missing messages marked as "read" (due to my browsing the spam account) but I don't think it is actually following that pattern.