Thursday, October 07, 2004

Trying Bogofilter...

A few days ago I did some maintenance of the software installed on my small server: among other things, the packages in it were outdated and I wanted to get the Libtool changes in (something that happened in pkgsrc...). So, I seized this oportunity to give Bogofilter a try, because SpamAssassin brought the machine to its knees.

I configured Bogofilter to parse all my incoming mail, fed to it by Procmail, following the examples given in the manual page; a painless process. The filter adds the X-Bogosity header to all mails, indicating if they are spam or not (non-spam is called ham, for those that don't know), so that you can later classify them with a simple Procmail rule.

After this little setup, it was frustrating: it catched no spam... obviously, because the words database was empty. So I started classifying all new mails in an "Archive" folder (i.e., "Trash") and in a "Spam" folder by hand, and set up a cron job to scan all mails in those folders periodically to make Bogofilter learn about my spam.

Up until now, I've fed it around 150 spams and more than 1600 hams... which is starting to have some effects: it is able to detect some spam, although there are still a lot of false negatives. I'll keep manually classifying them for some days, hoping that the situation improves (I have almost no doubts about this).

Even though, SpamAssassin catched spam out of the box, without having learned anything. And after learning from more than 15,000 mails, it produced very, very, very few false negatives. I know, this program does a lot more checks than Bogofilter (which is just a bayesian filter), so it can detect spams without training. But... as my server does not swap any more, I'll try to get the best out of Bogofilter. Do you use any of these two? If so, which one, and which are your experiences?

3 comments:

Anonymous said...

[Originally posted at 2004-10-23 18:07 pm UTC]

I use spamasassin and I get no false negatives at all, just 3 or 4 false positives per week and I have a rate of about 25 to 30 spams a day. Results where visible just out of the box, not much training needed.

regards,

pof

Julio M. Merino Vidal said...

[Originally posted at 2004-10-24 11:02 am UTC]

I checked yesterday the amount of ham/spam received in this month, and I was surprised to see that around 40% of my mail is spam. More or less 65 spams per day.

After 20 days of training and ~1300 spams processed, Bogofilter catches a good amount of it, maybe 70%. But still, more false negatives than desirable (however, no false positives). I hope it will keep improving.

The thing is that... my server does not swap at all since I removed Spamassassin, while it used ~20mb of swap easily before. So it could be nice to keep Bogofilter :)

Anonymous said...

[Originally posted at 2004-11-16 19:10 pm UTC]

I use dspam (pkgsrc/mail/dspam) with mysql as backend, and I have to say it works really great... and it doesn't swap as you said about spamaassassin, try it!