全部博文(1144)
分类: LINUX
2006-02-10 10:36:08
I've used () for quite a while, but over time, it seemed to become less and less effective at filtering spam. I considered why this might be and thought that maybe the spammers were just outsmarting it (I saw a lot of scores just under 5), many were foreign-language spam with different character sets. I just got into this new feature... sa-learn. This is a method of informing spamassassin what you consider spam or ham in the context of email. To be effective, you must have built up quite a collection of each type.
These commands can be run on your existing mailbox(es) to teach spamassassin how to separate the ham from the spam.
These assume an mbox format.
sa-learn --spam --no-sync --showdots --local --mbox ~mark/imap/SpamTrap sa-learn --spam --no-sync --showdots --local --mbox ~mark/imap/SpamActual sa-learn --ham --no-sync --showdots --local --mbox ~mark/imap/2005 sa-learn --ham --no-sync --showdots --local --mbox ~mark/imap/2004 sa-learn --ham --no-sync --showdots --local --mbox ~mark/imap/2003 sa-learn --ham --no-sync --showdots --local --mbox ~mark/imap/2002 sa-learn --sync
Use man sa-learn to find out more.
At first, doing this didn't help my spam problem, because as it turned out, spamd was being run under a different user than myself (mark). Amavisd calls spamc which talks to spamd, so I used the pstree -aup to find out what user id was running spamd. It was root, so I ran the same sa-learn commands again, but this time as root.
Here are the message statistics from each mailbox, which I believe puts me into the sweet spot for bayesian effectiveness (based on the sa-learn man page).
Running sa-learn --sync produced this output...
expired old Bayes database entries in 82 seconds 126481 entries kept, 81987 deleted token frequency: 1-occurence tokens: 55.12% token frequency: less than 8 occurrences: 31.14%
Well based on what I have seen in /etc/cron.daily/amavisd-new it appears the bayesian database may need to be built/owned by the amavis user. So the commands I should use running spamc/spamd in conjunction with amavisd would seem to be...
su - amavis -- /usr/bin/sa-learn --spam --no-sync --showdots --local --mbox /tmp/Spam* su - amavis -- /usr/bin/sa-learn --ham --no-sync --showdots --local --mbox /tmp/200[2345] su - amavis -- /usr/bin/sa-learn --sync
Note that I had to copy my personal mboxes into /tmp and widen the perms for amavis to read them. *Sigh*
Now I am getting a strange error.
su - amavis -- /usr/bin/sa-learn --ham --no-sync --showdots --local --mbox /tmp/2006 bayes: bayes db version 0 is not able to be used, aborting! at /usr/share/perl5/Mail/SpamAssassin/BayesStore/DBM.pm line 160.
This has happened twice now. The problem seems to go away after I keep trying the command in rotation with:
sa-learn -D --sync
This is still a mystery though.