|dc.description.abstract||Email has become a crucial part of life as the Internet has developed. However, a massive
influx of spam emails has threatened the usefulness of email communication. Many techniques
have been developed, such as machine learning, authentication, collaboration, etc. However,
little has been done from a systems perspective to provide an effective, robust and efficient
anti-spam solution. The arms race between spammers and anti-spam researchers has brought
new challenges to the design of modern anti-spam systems.
This dissertation focuses on the systems aspect of the challenges that the anti-spam
researchers face in designing various anti-spam approaches. the system aspects. In particular,
we attempt to provide solutions to the challenges in the collaborative approach, stand-alone
approach and sender-based approach. These challenges are 1) preserving privacy of email
content in collaboration, 2) achieving both high accuracy and high processing speed, and 3)
selectively punishing email senders without exact knowledge of whether the email sender is
a spammer or a normal user.
We design a novel technique for message transformation to preserve the privacy of
email content and derive resemblance information for collaborative email classification. We
also carefully design a communication protocol to ensure email privacy during information
exchange among the collaborative entities. The experimental results demonstrate a comparable
accuracy and greater robustness compared to Bayesian and Distributed Checksum
Clearinghouse approaches. This dissertation proposes a new metric for privacy evaluation
and demonstrates a system with excellent privacy preservation.
This dissertation continues to explore the tradeoff between spam filtering accuracy and
speed by using approximate classification. It demonstrates about one order of magnitude of
speed improvement over two well-known spam filters, while achieving identical false positive
rates and similar false negative rates.
For cost-based approaches, we propose to push the spam filter to the early stage of the
SMTP conversation, and determine the cost based on the email quality and spam behavior.
The experimental results show that under state-of-the-art hardware, the proposed technique
can effectively limit the ability of the spammer effectively and significantly even if he possesses
more CPU resources than the normal sender.||