Chinaunix首页 | 论坛 | 博客
  • 博客访问: 5352312
  • 博文数量: 1144
  • 博客积分: 11974
  • 博客等级: 上将
  • 技术积分: 12312
  • 用 户 组: 普通用户
  • 注册时间: 2005-04-13 20:06
文章存档

2017年(2)

2016年(14)

2015年(10)

2014年(28)

2013年(23)

2012年(29)

2011年(53)

2010年(86)

2009年(83)

2008年(43)

2007年(153)

2006年(575)

2005年(45)

分类: LINUX

2006-02-10 10:36:08

I've used  () for quite a while, but over time, it seemed to become less and less effective at filtering spam. I considered why this might be and thought that maybe the spammers were just outsmarting it (I saw a lot of scores just under 5), many were foreign-language spam with different character sets. I just got into this new feature... sa-learn. This is a method of informing spamassassin what you consider spam or ham in the context of email. To be effective, you must have built up quite a collection of each type.

These commands can be run on your existing mailbox(es) to teach spamassassin how to separate the ham from the spam.

[]

Examples of Bayesian training

These assume an mbox format.

 sa-learn --spam --no-sync --showdots --local --mbox ~mark/imap/SpamTrap
 sa-learn --spam --no-sync --showdots --local --mbox ~mark/imap/SpamActual
 sa-learn --ham --no-sync --showdots --local --mbox ~mark/imap/2005
 sa-learn --ham --no-sync --showdots --local --mbox ~mark/imap/2004
 sa-learn --ham --no-sync --showdots --local --mbox ~mark/imap/2003
 sa-learn --ham --no-sync --showdots --local --mbox ~mark/imap/2002
 sa-learn --sync

Use man sa-learn to find out more.

At first, doing this didn't help my spam problem, because as it turned out, spamd was being run under a different user than myself (mark). Amavisd calls spamc which talks to spamd, so I used the pstree -aup to find out what user id was running spamd. It was root, so I ran the same sa-learn commands again, but this time as root.

Here are the message statistics from each mailbox, which I believe puts me into the sweet spot for bayesian effectiveness (based on the sa-learn man page).

  • SpamTrap - 2273 message(s)
  • SpamActual - 15 message(s)
  • 2005 - 436 message(s)
  • 2004 - 1368 message(s)
  • 2003 - 2286 message(s)
  • 2002 - 711 message(s)

Running sa-learn --sync produced this output...

 expired old Bayes database entries in 82 seconds
 126481 entries kept, 81987 deleted
 token frequency: 1-occurence tokens: 55.12%
 token frequency: less than 8 occurrences: 31.14%
[]

Update 2005-Dec-30

Well based on what I have seen in /etc/cron.daily/amavisd-new it appears the bayesian database may need to be built/owned by the amavis user. So the commands I should use running spamc/spamd in conjunction with amavisd would seem to be...

 su - amavis -- /usr/bin/sa-learn --spam --no-sync --showdots --local --mbox /tmp/Spam*
 su - amavis -- /usr/bin/sa-learn --ham --no-sync --showdots --local --mbox /tmp/200[2345]
 su - amavis -- /usr/bin/sa-learn --sync

Note that I had to copy my personal mboxes into /tmp and widen the perms for amavis to read them. *Sigh*

[]

Update 2006-Jan-19

Now I am getting a strange error.

 su - amavis -- /usr/bin/sa-learn --ham --no-sync --showdots --local --mbox /tmp/2006
 bayes: bayes db version 0 is not able to be used, aborting! 
  at /usr/share/perl5/Mail/SpamAssassin/BayesStore/DBM.pm line 160.

This has happened twice now. The problem seems to go away after I keep trying the command in rotation with:

 sa-learn -D --sync

This is still a mystery though.

Retrieved from ""
阅读(772) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~