Show simple item record

dc.contributor.authorBrewer, Douglas
dc.description.abstractThe use of web robots has exploded on today's World Wide Web (WWW). Web robots are used for various nefarious activities including click fraud, spamming, email scraping, and gaining an unfair advantage. Click fraud and unfair advantage present a way for bot writers to gain a monetary advantage. Click fraud web bots allow their authors to trick advertisement (ad) networks into paying for clicks on ads that do not belong to a human being. This costs ad networks and advertisers money they wanted to spend on advertisements for human consumption. It also affects the reputation for ad networks in the eyes of advertisers. These problems make combating web robots an important and necessary step on the WWW. Combating web robots is done by various methods that provide the means to make the distinction between bot visits and human visits. These methods involve looking at web server logs or modifying the pages on a website The enhanced log method for identifying web robots expands on previous work using web server access logs to find attributes and uses AI techniques like decision-trees and bayesian-networks for detection. Enhanced log analysis first breaks down attributes into passive and active methods; it also goes further looking at deeper levels of logging including traffic traces. This allows for previously unexplored attributes like web user agent sniffing, conditional comments, OS sniffing, javascript requests, and specific HTTP responses. Expanding the attribute footprint in log analysis allows for unique web bot identification. Enhanced log analysis is used to generate unique signatures for different web bots. Comparing the signatures allows specific web bots to be identified even when they lie about who they are (e.g. a spam bot pretends to be googlebot). When log analysis cannot detect sophisticated web robots, web page modification methods like Decoy Link Design Adaptation (DLDA) and Transient Links come into play. These methods dynamically modify web pages such that web robots can not navigate them, but still provide the same user experience to humans in a web browser. The two methods differentiate between the type of a web robot they defend against. DLDA works against a crawling web bot which looks at the web page source, and Transient Links works against replay web bots which make requests with links and data given to them. DLDA detects walking bots by obfuscating real links in with decoy (fake) links. The web bot is forced to randomly choose link and has probability of being detected by picking the decoy link. Transient Links detects replay bots by creating single-use URLs. When a replay bot uses these single-use links, Transient Links is able to tell these links have been used before and are being replayed.
dc.subjectWeb Robots
dc.subjectWeb Server
dc.subjectWeb Security
dc.subjectWeb Bot Detection
dc.subjectClick Fraud
dc.titleDetecting Web robots with passive behavioral analysis and forced behavior
dc.description.departmentComputer Science
dc.description.majorComputer Science
dc.description.advisorKang Li
dc.description.committeeKang Li
dc.description.committeeLakshmish Ramaswamy
dc.description.committeeJohn A. Miller

Files in this item


There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record