Regression: Prediction of Heart Disease Mortality Rate in the US

less than 1 minute read

Analysis of Heart Disease Mortality in the United States

This is the Microsoft Professional Program Capstone challenge hosted by Microsoft and DrivenData.

The goal of the project was to build a Regression Model to predict heart disease mortality rate based on a number of given features such as county areas, demographics and socioeconomic information from thousands of individuals. The data was made publicly by the United States Department of Agriculture Economic Research Service (USDA ERS).

I compared and evaluated five different models in terms of their RMSE value. The five models that were chosen to be evaluated are:

  • Ridge (L2 Regularization)
  • Lasso (L1 Regularization)
  • Gradient Boosting
  • XG Boost
  • Light Gradient Boosting

The model with the lowest RMSE value was chosen as my best model to submit. The initial submission of my best model placed me at rank 9th out of 376 total participants.

In addition to the model submission, I was also required to make an analysis report of the project for the Capstone competition.

The final result of the competition rewarded me this certificate of Data Science Microsoft Professional Program:

Certificate

You can access the full project here:

Full Project

Tags:

Updated: