the data dilemma
Chris Blattman has a good take on Pinkovskiy and Sala-i-Martin's paper that suggests that African poverty might be falling at a faster rate than we previously believed. As he notes:
Pinkovskiy and Sala-i-Martin are doing the best that can be done with bad data: they use the scant surveys to get the shape of the income distribution, but discard what the surveys tell us about income levels. They calculate levels and poverty rates by tying the distribution to national income data.This is an issue that we who study Africa have to wrestle with all the time: how do we analyze phenomena and trends when the data is so incomplete or unreliable?
...never, ever take data from low income countries too seriously. Doesn’t it strike you as odd that the World Development Indicators have annual infant mortality data for most countries in Africa for most years? It should. Most of that data is interpolated, and the rest is (as often as not) close to made up.
The way data is created in the D.R. Congo is particularly problematic, and you should trust almost none of it. As I understand it, the last time anyone took a census was 1982. (The voter registration drive for the 2006 elections helped some, but that only got information on adults of voting age who registered to vote.)
Almost every piece of data that comes out of the DRC - including the most basic population estimates and demographic indicators - is based on the 1982 census. This makes almost all of the data highly suspect, to put it mildly. For example, say you wanted to know how many women in Ituri test seropositive for HIV/AIDS. The number you'd be given would be based on the national HIV/AIDS seropositive rate (which is, if not quite made up, not at all reliable) and the overall female percentage of the population in what's now Ituri as of 1982.
In other words, nobody really has any idea how many women in Ituri are HIV/AIDS seropositive. We might know how many are being treated at clinic X or how many have tested positive at health center Y, but we really cannot answer the question with any kind of certainty.
The recent fuss over a Simon Fraser University Human Security Report study is a good example of how, at the end of the day, we really have no idea what we're dealing with in terms of data in the DRC. The Simon Fraser study concluded that previous International Rescue Committee-commissioned/Lancet-published studies, the latest of which estimated the excess death toll of the Congo wars to be around 5.4 million since 1998. We could argue all night about who's right on this and never come to a valid conclusion. (The Lancet surveys were peer reviewed, and the methodology was solid, but much of the dispute there comes down to what you define as an "excess death" due to war versus a death that occurs from living in abject poverty.)
The only semi-reliable data is that collected by researchers on a very small scale, and even then, it's hard to know what to believe. For example, for my dissertation research, I collected a lot of data on school enrollment figures in the Kivus. The vast majority of this data was stored in files at the offices of church bureaucracies (Houses of faith run most of the Congo's public schools.). So I'd go and sit patiently while a kind secretary would hand-copy the data charts onto sheets of A1 paper, or I'd copy down my own charts according to their preferences for school system after school system after school system.
Is this data reliable? Who knows? How can I be sure that someone didn't mistakenly copy a "5" when it should have been an "8," or that a "0" was added in? What does it mean for a child to be "enrolled?" Does the data I collected represent children who actually finished the school year and took their exams? Does it represent every child who walked through the door at some point in the academic year? Even the guys giving me the data might not be able to answer those questions, and I have to remember that everyone has an incentive to present themselves in the best possible light so as to get access to more funding.
As one panelist at APSA a couple of years back whose name I cannot remember put it, "There is a high correlation between state failure and missing data." This makes much of what we supposedly figure out a little questionable, even after instituting solid methodological controls to ensure that we've gotten it right.
Not very satisfying, I know. But it's good to keep these things in mind as you read reports on the Congo, or on other developing states. We're doing the best we can with what we have, but these explanations are far from perfect.