"Africa is, indeed, coming into fashion." - Horace Walpole (1774)


the data dilemma

Chris Blattman has a good take on Pinkovskiy and Sala-i-Martin's paper that suggests that African poverty might be falling at a faster rate than we previously believed. As he notes:
Pinkovskiy and Sala-i-Martin are doing the best that can be done with bad data: they use the scant surveys to get the shape of the income distribution, but discard what the surveys tell us about income levels. They calculate levels and poverty rates by tying the distribution to national income data.

...never, ever take data from low income countries too seriously. Doesn’t it strike you as odd that the World Development Indicators have annual infant mortality data for most countries in Africa for most years? It should. Most of that data is interpolated, and the rest is (as often as not) close to made up.
This is an issue that we who study Africa have to wrestle with all the time: how do we analyze phenomena and trends when the data is so incomplete or unreliable?

The way data is created in the D.R. Congo is particularly problematic, and you should trust almost none of it. As I understand it, the last time anyone took a census was 1982. (The voter registration drive for the 2006 elections helped some, but that only got information on adults of voting age who registered to vote.)

Almost every piece of data that comes out of the DRC - including the most basic population estimates and demographic indicators - is based on the 1982 census. This makes almost all of the data highly suspect, to put it mildly. For example, say you wanted to know how many women in Ituri test seropositive for HIV/AIDS. The number you'd be given would be based on the national HIV/AIDS seropositive rate (which is, if not quite made up, not at all reliable) and the overall female percentage of the population in what's now Ituri as of 1982.

In other words, nobody really has any idea how many women in Ituri are HIV/AIDS seropositive. We might know how many are being treated at clinic X or how many have tested positive at health center Y, but we really cannot answer the question with any kind of certainty.

The recent fuss over a Simon Fraser University Human Security Report study is a good example of how, at the end of the day, we really have no idea what we're dealing with in terms of data in the DRC. The Simon Fraser study concluded that previous International Rescue Committee-commissioned/Lancet-published studies, the latest of which estimated the excess death toll of the Congo wars to be around 5.4 million since 1998. We could argue all night about who's right on this and never come to a valid conclusion. (The Lancet surveys were peer reviewed, and the methodology was solid, but much of the dispute there comes down to what you define as an "excess death" due to war versus a death that occurs from living in abject poverty.)

The only semi-reliable data is that collected by researchers on a very small scale, and even then, it's hard to know what to believe. For example, for my dissertation research, I collected a lot of data on school enrollment figures in the Kivus. The vast majority of this data was stored in files at the offices of church bureaucracies (Houses of faith run most of the Congo's public schools.). So I'd go and sit patiently while a kind secretary would hand-copy the data charts onto sheets of A1 paper, or I'd copy down my own charts according to their preferences for school system after school system after school system.

Is this data reliable? Who knows? How can I be sure that someone didn't mistakenly copy a "5" when it should have been an "8," or that a "0" was added in? What does it mean for a child to be "enrolled?" Does the data I collected represent children who actually finished the school year and took their exams? Does it represent every child who walked through the door at some point in the academic year? Even the guys giving me the data might not be able to answer those questions, and I have to remember that everyone has an incentive to present themselves in the best possible light so as to get access to more funding.

As one panelist at APSA a couple of years back whose name I cannot remember put it, "There is a high correlation between state failure and missing data." This makes much of what we supposedly figure out a little questionable, even after instituting solid methodological controls to ensure that we've gotten it right.

Not very satisfying, I know. But it's good to keep these things in mind as you read reports on the Congo, or on other developing states. We're doing the best we can with what we have, but these explanations are far from perfect.


Anonymous Mike said...

Thanks Texas! Note that this lack of population data is only true at national level -- engaged local and provincial governments and sometimes agencies in the humanitarian zones have better data.

The basis for all this though is usually collected at Aire de Sante/Zone de Sante level. I think I read in a journal somewhere that WHO and the Health Ministry revised national population estimates in 2007. OCHA seems to have posted it to their website at: http://www.rdc-humanitaire.net/?-Donnees-sur-la-RDC- except the link is broken on the population section, and emailing them I haven't gotten a live response.

Anyone from OCHA or WHO read this blog? Anyone know what this population survey is about?

Tuesday, March 09, 2010 7:35:00 AM

Blogger Rachel said...

It's difficult when you are asked to write a brief paper for a donor about the general mortality rates etc in a certain area because it is next to impossible to find information... for me at least.

The population figures gotten from the BCZS can be a bit mystifying because in a zone with many IDPs/returnees, the figure they give you will be something like 198,671. What? Was a baby just born as they wrote down the number and so that made the "1"? (Okay, I'm sure the numbers come from complex algorithms that I do not understand, but still... it's funny to type in those EXACT numbers knowing that they can't possibly truly BE exact, right?)

WHO's link to their DRC reports has been broken for a while (sends you to their Tanzania page) & they haven't responded to my inquiring e-mails...

Tuesday, March 09, 2010 9:25:00 AM

Blogger texasinafrica said...

Thanks, Mike. You're right, the BCZS's often do have better data than anyone else. But in my experience it's usually kindof spotty in that some zones have very solid data while others do not. It would be very interesting to compare the quality of data from zones that have external partners (eg, through Project Axxess, etc.) with those that don't. But if you need systematic data across the country, it's very difficult to find.

Interestingly enough, I'm pretty sure it was a provincial health official who explained all the problems with DRC data to me in the first place. Go fig. :)

Rachel, I can put up a post begging for data. :) I know there are readers of this blog from the World Bank; not sure about OCHA or the WHO.

Tuesday, March 09, 2010 11:47:00 AM

Blogger Rachel said...

Today I've been asked to discover the Adult Literacy Rate in North Kivu.

This is an IMPOSSIBLE task.

Oh, man.

Wednesday, March 10, 2010 10:02:00 AM

Blogger texasinafrica said...

Maybe you could compare the number of high school graduates from the last 20 years with the overall guesstimated population? The EPSP provincial bureau (close to the stadium, near the Red Cross office) should have some data on grad rates or national exam pass rates. An illiterate person couldn't pass those exams.

Wednesday, March 10, 2010 12:18:00 PM

Anonymous Mike said...

N Kivu has a 47.8% over-15 illiteracy rate, according to the 2010 Humanitarian Action Plan.

So, I guess 52.2% adult literacy rate? I am not sure where those statistics come from. Probably a sausage factory, but... Coordination! Probably as definitive as can be.

The 2010 HAP also says that Equateur, Kasai Orientale and Katanga each has more educational emergencies than North Kivu (Page 52), yet North Kivu will get 50% more education money than all those provinces combined (Page 6). Sigh.

Who knows how good the data is, but what does it matter if we don't use it?

Thursday, March 11, 2010 2:51:00 AM


Post a Comment

<< Home