Show simple item record

dc.contributor.authorXu, Kemin
dc.description.abstractNumerous information and data now are available to us with an increasing development of the Internet. Even though with so much useful and valuable information and data, there are still much to do and to think about how to make use of them. At the very beginning, it is required to discover the useful data for the research because different data are suitable for different researches. Then, what matters is how to put the data and information into reasonable use to construct a model to make prediction. At the same time, machine learning also plays a very important role for the big data. Machine learning is a subfield of computer science which includes lots of useful methods. I will use both decision tree and random forest method in my analysis. All the three methods will be used for the two datasets from a data science competition website which are regarding the survival from the sinking of Titanic and the crime category of San Francisco respectively. The purpose of Sinking of Titanic is to predict which passengers survived the tragedy and the purpose of the crime category of San Francisco is to predict the category of crimes that occurred in the city. I will combine the all three models' results to see if it is helpful to the accuracy of prediction.
dc.rightsOn Campus Only Until 2018-05-01
dc.subjectSan Francisco
dc.subjectDecision Tree
dc.subjectRandom Forest
dc.subjectLogistic Regression
dc.titlePrediction of crime categories in San Francisco area
dc.description.advisorJeongyoun Ahn
dc.description.committeeJeongyoun Ahn
dc.description.committeeJaxk Reeves
dc.description.committeeLiang Liu

Files in this item


There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record