I gave some thought to this too, I thought I would try to use the full text index to do the hard work. Basically you would have your mail file and a spam bucket, which could be shared amongst many users. Incomming mail would be scanned and made into some kind of full text query, if it is very similar to lots of other messages in the spam bucket then that is where it should go.
|