On TV.com: 5 TV Shows We'd Love on the BIG SCREEN
BNET Business Network:
BNET
TechRepublic
ZDNet
TalkBack 5 of 10:
Next »
« Previous
Aren't Bayesian filters fundamentally flawed?
Here's the question I pose to people about Bayesian filters -- haven't gotten a real answer yet.

Bayesian filters work by analyzing patterns in the incoming mail. They do this historically; so it's commonly said (though may not be accurate -- stat folks, step forward) that they become "more accurate" over time, as you select more spam examples. But I see a couple of problems with that:

First, as you select more examples for analysis, wouldn't the filter pass a point where it actually starts to become *less* accurate? I.e., it will less accurately identify messages as spam?

Second, won't spam creators develop techniques for spoofing Bayesian filters, by creating text patterns that look "normal" -- or, at least, look normal to an algorithm that doesn't understand natural language? (Actually, they already have, as we all can see from the nonsese subject lines in our inboxes.)

And in turn, doesn't this lead to a war of escalation with the spam purveyors? In such a war, the dark side of open source (i.e., spammers motivated by greed and fascination with their own "cleverness") will have terrific advantages.

Now, all that having been said, my Bayesian filter (in Mozilla Mail) is the only thing between me and massive green rage on some days (I get around 2000 spams a week); but it has been filtering a lot of email that I dont' want it to, lately. I have to check the spam box every day for improperly filtered messages.

(Yes, it's true that Mozilla is supposed to exempt from filter any message from someone in your address book. Alas, that feature doesn't actually work. From the Bugzilla thread it was hard to tell whether it was a bug or whether there was a disagreement about filter precedence that caused it to just not be implemented.)
Posted by: escoles@...   Posted on: 04/13/04 You are currently: a Guest | Members login | Terms of Use

Alert moderator to an offensive message

Subscribe to this discussion via Email or RSS

Now we need an OCR spam filter.  jskondel | 04/12/04
Re: Now we need an OCR spam filter.  Franklin_z | 04/12/04
The best I have found  bhanes@... | 04/13/04
It's about time...  BitTwiddler | 04/13/04
Aren't Bayesian filters fundamentally flawed?  escoles@... | 04/13/04
And they forget the 48MM+ kids....  Paul C. | 04/13/04
Spamifesto, Short Version: Attack the Advertisers  escoles@... | 04/13/04
Fundamental flaw in your solution  ShadeTree | 04/13/04
Fair enough, but it's a flaw with my language...  escoles@... | 04/13/04
Idiots and their open relay mail servers...  bjbrock | 04/14/04

What do you think?

SponsoredWhite Papers, Webcasts, and Downloads

advertisement
advertisement

SmartPlanet

Click Here