Consolidation of all Excel Files
As mentioned, there were over 150 individual excel files provided. Therefore the first task at hand was to develop a script that reads all excel files and merges them into one data frame in R.
Understanding all the variables
There were an over 100 variables ranging from Total Pregnant Women to Number of Child Deaths because of Respiratory Disease. Because of the lack of domain knowledge, each variable had to be studied and the its significance understood. For instance, why are IFA tablets important for pregant women and how does it affect a child's health after it is born? This particular step became very important in the later stages because every variable and its affects were known comprehensively.
Background research on the subject and identifying focus areas
Apart from understanding the variables, researching about the NRHM(National Rural Health Mission) became very crucial. The given dataset was collected as a part of that program. After due research, I was able to divide the whole study into 4 focus groups - 1) Maternal Health 2) Child Health – Immunisation & Disease Control 3) Family Planning - Population Growth Stability 4) Adult & Adolescent Health. And this division made the study were systematic and modular.
Making relations between the data and the research
Once I had the focus groups and the required understanding about the variables, I had to go back to the dataset and understand how I could corelate all the knowledge and start analysing it.
Taking up each focus area and analysing for trends
After much research and studying, I finally got down to understanding the trends and analysing the data. I took up each focus group, which were more or less independent of each other and started to play with the respective subset.
Formulating Indicators for the focus areas
Another important deliverable was to formulate important health indicators for the dataset. These indicators were either intutive, like the average mortality rate, or theoretical, like the maternal mortality rate(which are defined by WHO). These indicators further helped to discover trends in the data.