Applications of machine learning techniques to resale based e-commerce websites
MetadataShow full item record
Spam is still a widely prevalent problem in the internet. In this project, we applied supervised learning techniques to grapple with the spam problem for resale-based e-commerce (classi fied ad) websites such as Craigslist, eBay and Amazon. By leveraging the existence of structured information on these websites, we showed that we can build a better spam detection technique compared to traditional anti-spam techniques that are meant for e-mail and social network ecosystems. After scraping more than 20,000 posts from Craigslist, we tried out various supervised learning algorithms and showed that it is possible to build classi fiers that can have up to 98% true positive rate with less than 2% false positives. Moreover, the technique proposed and the features we developed lay the foundation for further work beyond spam detection such as automated tagging for these e-commerce websites.