Thursday 27 March 2014

The methodology behind our Survey

Looking at the results of our survey, people have asked some interesting comments/questions:
  • "You talked to two-and-a-half lakh people? Are you serious? Why?"
  • "You did this offline? Using paper and pen? OMG! Why not online?
  • "You used semi-professional investigators? Why? You could've just used students from universities, for free!"
  • "Only 500 people? That's WAY too small!"
  • "Yeah right, you've been to vague-o places in India, sure."
And so on. Well, let me put the confusion to rest, as best I can, in this longish post. I'll describe the process we used in the survey and, in some cases, the reasons for the steps in the process.
To begin with, we decided that this would be a completely offline survey. Why? First of all, a very, very small percentage of India is available online (meaning, online enough to answer a survey, not online enough for Facebook or WhatsApp). Second, there's a strong bias towards certain socioeconomic segments among Indian Internet users anyway, so an online-only survey would not be a sample that is representative of the aam aadmi. Finally, if we did the offline as planned, an online version would be unnecessary anyway.
Having laid that beast to rest, we moved on to design the survey itself. We've done this for many years now and the key attribute of the model is that it is self-correcting: each respondent tells us what issues are important for him/her AND how his/her MLA/MP has performed re: that issue. So we as survey designers don't need to decide on what's important and what's not. People will do that for us. And this works very well - you see all kinds of variations, in terms of people's interests, across the country. For example, halfway through the survey, one of the surveyors called us, complaining that none of the respondents in his location in Goa seemed to  be worried about bijli sadak paani - was this not a problem for the survey? Not at all - that only speaks to the quality of design, we said, and patted ourselves on the back.
Once the survey form itself had been designed, we set some targets for ourselves. Survey theory told us that 384 randomized respondents would be enough to show constituency-level trends with 90% accuracy (you can do your own calculations at this website). We decided we would do 500 samples in each MP constituency, to ensure data quality and "believability". That meant 500 * 543 = 2,71,500 responses! We then figured that we did not want more than 40 responses in each "location" (meaning Village or Ward - more about that below). Why 40? Well, why not 40? Honestly: 50 seemed too large for each location, 25 seemed too small. Yes, great decisions are made on trivial factors. Anyway, going ahead with the Math, that means 271500 / 40 = 6,787.5 locations. Given that we would do this manually, we'd actually need a larger set of locations, since some places may be too small (!) for 40 random responses and some places may be too difficult to access. We finally decided on generating just under 10,000 locations.
Now for the tough part: we had to next make sure that the respondents were random enough to be representative of the constituency. A standard mechanism to prepare for randomization is to (strangely enough) stratify respondents in some manner. To do that, we first acquired a list of all Census locations in India (8.2 lakh locations: 7.28 Villages and the rest city Wards). Then we stratified the list by two factors: the ratio of Rural to Urban locations within the relevant State and the number of General populace (as against SC/ST/BC/BT) as a percentage of the overall population, again at the State level. This means:
  • We grouped all the locations within a State into two groups - Rural and Urban.
  • Within each such group, we calculated the General populace as a precentage of the total population of the location
  • We then sub-grouped locations by whether they were a High (> 66%), Medium (33% to 66%) or Low (<= 33%) General populace
  • We ran a randomization algorithm and generated numbers for each location
  • We figured out the ratio of Rural Vs Urban and High/Medium/Low General at the State level
  • We extracted locations from each sub-group in the same ratio as the ratio of the State.
We used State-level ratios of these factors because:
  • we noticed that the ratio of Rural vs Urban is significantly different among the States; some States (like Tamil Nadu, for example) are significantly more urbanized than others (like Chhattisgarh)
  • the ratio of General vs. SC/ST/BC/BT also varies quite a bit across the country
  • like it or not, caste is an important and acceptable factor to use in survey stratification in India.
Once we had the locations all set, we set up some rules about how the survey would be applied:
  • No surveying in public places like chai shops, etc.
  • All surveys to be conducted inside or just outside a house, surveying only one respondent in each house
  • Once a house is done, the next survey not in the next house but in the house next to that (so alternate houses)
  • Every third respondent to be a female respondent; if not found in the specific house chosen, repeat alternate selections till a female respondent is found.
Divya then went about writing a handbook for the investigators who would actually go to these locations and apply the survey. The handbook had the above methodology, as also other guidelines about how exactly to ask the questions, what to do about "prompting" by others (specifically men in the family when a female respondent is being asked the questions), etc. We had decided along the way that we would use as professional a group of investigators as possible. By "professional", we don't mean people who do surveys for a living - we actually wanted to stay away from that. We mean organizations and people who were not students.We have nothing against students, but the fact that the survey was to be conducted in a specific time-frame and that people would need to travel to some real vague-o places meant that the investigators had to be reasonably familiar with the area they were working in and that they had the time to do this during the work-week. Surveys would need to be conducted late in the evenings, since working people would only be available during those hours, and the investigators would need to travel back home late at night, possibly. All this precluded anyone below 18, and, honestly, anyone who did not do this for a fee of some kind - a small fee, but a fee nonetheless.
Armed with this model, we only needed to find the money and the people to make all this happen. That was the easy part! :)

UPDATE: ADR has a more accessible description of the survey methodology here.

No comments:

Post a Comment