The rise of the Big Data
MetadataShow full item record
Big Data has been the new trend in businesses. As technology advances, the ability to collect large amount of data and learn insights from them became more important. We will analyze a dataset provided by the Heritage Provider Network. The goal is to predict the number of days a patient stays in the hospital based on previous year’s insurance claim records. The tools used include logistic regression, Classification and Regression Trees, and Gradient Boosting Machine (GBM). We will compare two techniques of analyzing the data. The first method will use GBM to predict the outcome variable. While in the second method, we will use a sequential modeling method, where we partition the dataset into risky and non-risky groups, and then use GBM to predict the outcome for the risky patients and assign no-stay for the non-risky group. We also discuss Big Data topics such as More Data Beats Better Algorithm.