Spam Solution - Duplicate Removal

One of the problems with spam at a site is that the spammers send many many copies to the same site. One way of countering this is to detect and remove multiple copies of the same message.

The spammers often send many copies of a message to the same site, with the intention of getting the site itself to distribute them to several users at the site. The usual way they do this is to use mailing list techniques. They send one copy of the message, but specify several people (sometimes hundreds) on the recipient list. Note that this is usually the recipient list in the SMTP envelope, not the recipient list visible to the final reader.

The mail server could set limits on the number of copies of an identical mail it will accept. For example, if the community at a site agrees, the mail server could simply drop any mails sent to more than twenty people at the same site.

To avoid silently losing legitimate bulk mail, the mail server could forward a single copy of the mail to a local bulk mail depository. From there, it would be possible to check back on the mailing list to determine if they should be added to the list of known senders.

One of the dangers with this approach is that there are legitimate reasons for sending "recipient suppressed" mail to a large number of users. For example, a legitimate mailing list may use this same approach to rduce the bandwidth on the Internet without revealing to their users who else is receiving the message.

Another way of handling this sort of bulk mail would be to add a header, such as "X-SpamWarning" whenever this sort of duplicate message is detected. This leaves the actual filtering to the final recipient, but allows the users to decide whether or not they wish to filter duplicates. Since some people manage entire sites over relatively slow connections, it would probably be best for the ISP to mark all duplicates, and allow the individual users to request that duplicates not be forwarded, or only a single copy be forwarded. The default should be that the message gets marked and forwarded.


This page maintained by Rob (at ewan dot com, of course).