Show simple item record

dc.contributor.authorHan, Jiayun
dc.date.accessioned2014-03-04T16:24:05Z
dc.date.available2014-03-04T16:24:05Z
dc.date.issued2009-05
dc.identifier.otherhan_jiayun_200905_ms
dc.identifier.urihttp://purl.galileo.usg.edu/uga_etd/han_jiayun_200905_ms
dc.identifier.urihttp://hdl.handle.net/10724/25474
dc.description.abstractThis project is aimed to build an efficient, scalable, portable, and trainable part-of-speech tagger. Using 98% of Penn Treebank-3 as the training data, it builds a raw tagger, using Bayes’ theorem, a hidden Markov model, and the Viterbi algorithm. After that, a reinforcement machine learning algorithm and contextual transformation rules were applied to increase the tagger’s accuracy. The tagger’s final accuracy on the testing data is 96.51% and its speed is about 251,000 words per second on a computer with two-gigabyte random access memory and two 3.00 GHz Pentium duo processors. The tagger’s portability and trainability are proved by the tagger-maker’s success in building a new tagger out of a corpus that is annotated with the tagset different from that of Penn Treebank.
dc.languageeng
dc.publisheruga
dc.rightspublic
dc.subjectPart-of-Speech
dc.subjectTagging
dc.subjectMarkov Model
dc.subjectThe Viterbi Algorithm
dc.subjectThe Bayes' Theorem
dc.subjectMachine Learning
dc.subjectContextual Rules
dc.subjectNatural Language Processing
dc.titleBuilding an efficient, scalable, and trainable probability-and-rule-based part-of-speech tagger of high accuracy
dc.typeThesis
dc.description.degreeMS
dc.description.departmentArtificial Intelligence Center
dc.description.majorArtificial Intelligence
dc.description.advisorMichael Covington
dc.description.committeeMichael Covington
dc.description.committeeAlexander Williams
dc.description.committeePaula Schwanenflugel


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record