Show simple item record

dc.contributor.authorAl Jadda, Khalifeh
dc.date.accessioned2015-06-02T04:30:17Z
dc.date.available2015-06-02T04:30:17Z
dc.date.issued2014-12
dc.identifier.otheral-jadda_khalifeh_201412_phd
dc.identifier.urihttp://purl.galileo.usg.edu/uga_etd/al-jadda_khalifeh_201412_phd
dc.identifier.urihttp://hdl.handle.net/10724/31377
dc.description.abstractMachine learning algorithms are very useful in many disciplines like speech recognition, bioinformatics, recommendations, decision making, etc. These algorithms gain more importance in the big data era due to the power of the data driven solutions. Machine learning algorithms are considered the core of data driven models. However, scalability is considered crucial requirement for all the machine learning algorithms as well as any computational model. In order to scale up the machine learning algorithms to handle big data, two basic techniques can be followed: 1- The parallelization of the existing sequential algorithms. This technique is what Apache Mahout and Apache Spark follow to scale up the machine learning algorithms. 2- Re-design the structure of existing models to overcome the scalability limitation. The result of this technique (which is more challenging) is new models which extend the existing ones, like the Continuous Bag-of-Words model. In this thesis we apply the second technique to extend a well known machine learning technique which is Bayesian Networks to handle big data in a very efficient time and space manner. The proposed model will lead to an easily-scalable, more readable, and expressive implementation for problems that require probabilistic solutions for massive amounts of hierarchical data. We successfully applied this model to solve three different challenging probabilistic problems, namely, multi-label classification, latent semantic discovery, and semantically ambiguous keywords discovery on massive data sets. The model was successfully tested on a single machine as well as on a Hadoop cluster of 69 data nodes.
dc.languageeng
dc.publisheruga
dc.rightspublic
dc.subjectBig Data, PGMHD, GELATO, SAGE, Semantic Search
dc.titleScaling up machine learning algorithms to handle big data
dc.typeDissertation
dc.description.degreePhD
dc.description.departmentComputer Science
dc.description.majorComputer Science
dc.description.advisorWilliam York
dc.description.advisorJohn A. Miller
dc.description.committeeWilliam York
dc.description.committeeJohn A. Miller
dc.description.committeeKhaled Rasheed
dc.description.committeeKrzysztof J. Kochut


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record