In most Kaggle problems I used to blindly apply predictive models and test my results. However, this problem forced me to learn solving problems in a systematic manner.
Importance of Background Research and Parameter Understanding
Again, in most Kaggle problems I used to consider most variables as anonymous numbers and apply models on them. I never used to study about individual variables. This dataset and the challenge forced me to devote a large amount of time on background research.
Graphs are sexy! (thanks to ggplot2)
I hated graphs! This dataset has made me fall in love with the simplicity of graphs and how seamlessly it conveys information.
Sometimes Simple Math does the trick
As a novice data analyst, I always believed that more complex the model, better the results I get. I have not used anything more than averages and correlations in this whole project and yet I got some amazing results. Damn! Complexity is not always the best solution. Sometimes trust the basics too!
Data is everything
While doing this project, I came across many variables that could have proved very critical to the health trends. However, the data for those variables was not collected diligently. First step for any data science project - data collection, is THE most important step. You cannot move anything if you don't have the data in the first place.
Domain Knowledge is a Joker in the pack
One can always analyse a dataset without proper domain knowledge. However, if you understand the domain of the dataset perfectly, quarter of your tasks are done!
R is just beautiful
Before starting this project, I used to believe I know enough R. I don't know even a percent of this amazing language. R, we have a long way to go buddy!
This project has been a significant step in my data analysis study. I will add more learning as and when I trace back my knowledge to this project.