I'm an ISP with an outbound spam problem from my
shared-hosting web servers.
That is the problem definition, web servers sending
legitimate email mixed with bursts of spam.
To make the problem easier, I do not allow the use of my
web servers as end-user email servers. So the only
legitimate email from these servers is form mail. That
makes what I ask for next much easier...
The solution I would like is this:
My routers detect outbound port 25 traffic from a web
server, and redirect it to a local email server operating
as a "moderated proxy".
All mail passes through a pattern filter. The filter has
both allow and deny filters. Allow filters allow the
email to be delivered. Deny filters cause messages to be
put into a denied folder.
Any email that passes through the pattern filter without
either an allow or a deny match is then put in a quarantine
queue.
The mail in the quarantine queue can be re-processed upon
command, where it is then fed through the pattern filter
again.
The way I want to use it is simple. I ssh into the server,
and examine the messages in the queue. If I find spam, I
compose a deny filter to match it. If I find legitimate
email I compose an allow filter to match it. I then
execute the command to re-process the queue, and examine
the remaining messages, repeating the process until the
queue is empty.
Even when the queue has 1.2 million messages, the above
process should only take 5 or 6 passes, because spammers
tend to repeat predictable patterns, and web forms
also generally follow predictable patterns. If I find a
web form that defies my attempts to generate a pattern,
then I demand the author modify the form to produce a
pattern for me to match.
This can be written in any language. I would prefer it
operate from a unix command line. I am a unix shell user
and vi is my editor, so user interface development is not
required for this project. If you think a web interface is
easier for you, then do it that way. I would rather
development not be slowed by any irrelevent specification
on my part.
The only think I do require is that the filters be reasonably fast so that the queue can be scanned quickly.