"Africa is, indeed, coming into fashion." - Horace Walpole (1774)


how social scientists think: correlation is not causation

Whenever I teach students about the difference between causation and correlation, I try to have them do something ridiculous. I might have one student repeatedly flip the light switch while having another jump up and down while another sings "I'm a Little Teapot." Then I ask, what caused the light to go on and off?

Students generally roll their eyes as they answer, "flipping the switch," but the point is clear: just because events happen at the same time doesn't mean they're in any way related. They learn one of the key principles of all sciences: correlation does not imply causation.

Social scientists spend our time trying to figure out whether phenomena are causally related - that is, whether one event or occurrence or circumstance causes another event to happen. We want to explain how and why specific events are related to one another in hopes of being able to explain more generally why similar events are related to one another.

Problem is, it's a lot harder to determine causality in the real world than it is with a ridiculous example in the classroom. It's especially difficult when there are multiple causes for an event, as is the case with violence in the eastern Congo. It's impossible to ever be 100% sure we have correctly determined the cause of an event, but we can reach a reasonable degree of certainty and have developed a number of means by which we can (hopefully) avoid confusing correlation and causation.

We do this through a couple of mechanisms. One is to isolate variables. Variables are just another way of talking about causes (which we call "independent variables") and effects (which we call "dependent variables"). Of course, most human behaviors and situations involve far more than just two variables, so we try to control for effects. This is pretty easy using statistical analysis; a social scientist using that method will use math (and, these days, sophisticated software) to control for the effects of the other variables so that she can look only at the one she thinks matters. She can then do statistical tests to determine whether she can establish a reasonable degree of certainty that the cause she has identified is indeed causing the observed effect.

With qualitative methods, it's a lot harder to establish causality, because the goal of causal inference is to determine what the effect would have looked like had an event or circumstance not happened. We call this idea of what could have been the "counterfactual." But real life doesn't usually allow us to establish counterfactuals (although the world of Randomized Control Trials is now opening up all kinds of possibilities in this regard). We can, however, look for real life counterfactuals, or places in which natural controls are in place. For example, I have an observed effect in my research which suggests that ethnicity may be an important causal variable, but I'm not certain enough about that to publish it yet. However, I have some new data from a town which is ethnically homogeneous. It's my hope that the data from this town will function as a kind of natural control, which will help me to figure this out with a higher degree of certainty.

The distinction between causation and correlation - and the obsession with making sure the two are not confused - sets quality research apart from shoddy or sloppy research. It's incredibly frustrating to me to read a hastily put-together advocacy report or journalist's account that assumes correlation means causation, despite the lack of evidence for such a claim. I understand why it happens; advocates and journalists have to work quickly, and if they talk to people who don't understand the difference, how would they know otherwise? But it's incredibly frustrating to see these errors made, especially when they lead to bad policy decisions.

Advocates, what do you think? Do most researchers in your field do a good job of distinguishing between correlation and causation? How could we better work together to make sure that the causes we're identifying are actually the causes of various events?



Anonymous Ash said...


This is by far the error that is most manipulated, by advocates, journalists and politicians. Sometimes it's ignorance but more of the time it's deliberate and planned. If only everyone in the wider world could understand this difference and take it to heart, I think the world would be a much better place!

Check this out, this is the best summary of correlation vs causation that I've seen:


Keep up the great blogging!

Thursday, October 21, 2010 1:23:00 AM

Blogger placenta sandwich said...

Ha! So I work in various aspects of abortion care and related research, and I daresay we've got one of the more exciting fields for those looking to start a career in conflating correlation with causation. Anti-abortion advocates are pretty stellar at doing this, and have no intellectual scruples or even moral scruples when it comes to getting others to believe what they want about supposed ill effects of abortion.

You may be familiar with the long back-and-forth about "the abortion-breast cancer link" (which they're reviving this month for breast cancer awareness month, of course!) but that was more an issue of glossing over systematic biases: recall bias in a case-control study, for example. At the moment I'm drafting a post about anti-abortion people playing "telephone" with research showing a correlation between intimate-partner violence and abortion, until they are outright stating that abortion causes violence against women, when in fact that was an odds-ratio and the violence was in women's past. (Of course, from MY advocacy point of view, I wouldn't be surprised if a lot of those women were not fully in control of sexual decisions affecting them -- i.e. assault by partners. So abortion might be a logical recourse.)

Thanks for writing this. I love public discussions about this because it affects so many issues and professions, and I wish there were a way to get more of THIS kind of information into the hands of regular people who read a news-piece and assume the journalists are well-versed in research methods.

Thursday, October 21, 2010 8:08:00 AM

Blogger texasinafrica said...

Ash, I keep that XKCD cartoon on my office door!

To be fair, I think plenty of social scientists manipulate and/or poorly understand the distinction between correlation and causation, too. The difference in our field is that those mistakes are usually caught through the peer review process.

Thursday, October 21, 2010 9:06:00 AM

Anonymous Bradford said...

Thanks again for this very interesting series of posts. (And I hope I'm not abusing the comment field by posting again!)

I would be very glad if, within this series of posts, you could explain your thoughts on "absence of evidence v. evidence of absence." If an academic were able to disprove theory A, that wold be nice; and it would be really nice if an academic were able to also prove alternative theory B. But it seems that some of the dissonance between academics and other actors results when researchers point out the absence of evidence for theory A but don't actually disprove it. If an academic then proposes alternate theory B but doesn't provide proof for this either, then it seems like a wash. Academics occasionally give the impression that observing the lack of evidence for theory A somehow makes their assertion of theory B, also unproven, more valid. But in these situations, nobody really has "proof" for or against either theory; each has a working theory based on either impressions they formed in the field or stuff they picked up from a colleague in the hall.

How do you think we should deal with situations like this?

Thursday, October 21, 2010 9:30:00 AM

Anonymous Dave Algoso said...

My 2 cents: I'm with you on the dangers of misunderstanding causation. Advocates are bad, journalists are worse. But for the advocates, I don't think that time pressure is the major factor leading to bad analysis. I think they jump to conclusions because they want so badly to be able to do something with the analysis. They want policy solutions and programmatic options to address what they see as problems in the world.

And this is my key frustration with the academic approach: too much academic analysis tells us nothing about how a given actor can actually make change. Knowing that X is the root cause of Y doesn't tell us anything about how to change X, or even if we (broadly defined) can change X. For example, if the analysis tells us that ethnicity is a key causal variable - then what? Should we look for ways to influence ethnicity? In reality, every Y can be described in terms of a hundred causes. Analysis that tells us which one is objectively most important is not as useful as analysis that tells us which ones are important and provide opportunities for action. So I want analysis that gives us (again, broadly defined to include actors within the communities in question) something to do about a problem.

To me, that's the benefit of RCTs. Not specifically that they relate cause and effect, though they do, but that they are focused on finding levers for action rather than abstract root causes.

Thursday, October 21, 2010 10:02:00 AM

Anonymous Daniel said...

Placenta Sandwich puts me in mind of an exacerbating error that I often see in my students (would-be advocates and policy analysts, hereabouts), which is something like a "fallacy of moral linkage." They often seem to start with a view that X is bad, and then try to find evidence that X also leads to some less-controversially-bad thing - which often is just correlation. Or, they will see two bad things happening more or less together, one of which is more easily manipulated as a policy lever and jump right to the idea that you can get a handle on the harder one by dealing with the easier one. The conflation of correlation and causation is, of course, most tempting when you really *want* there to be a link.

In terms of how to deal with it...? I'm not sure there's a better answer than teaching them methods - my students who are the best about avoiding these fallacies are the ones best equipped to investigate them and test their hypotheses, or at least consume and digest the research of others who have.

Though, some of these posts are making me question whether I've sufficiently caveated some of my claims in my *own* work...

Thursday, October 21, 2010 12:02:00 PM

Blogger katelmax said...

First, thank you, thank you, thank you for taking the time to lay out social scientists think.

Actually, when you started this series I had a set of ideas I had been mulling around about why many advocates tend to come to different conclusions than people who study situations, i.e. academics or people at think tanks etc. I think Dave is correct. Advocated come from a framework with a question that begins with "what can we do about x?" as opposed to "what do we know about x?" which is the first question for most analysis from academics. Sometimes, if the academic finds that he or she does indeed know something, anything about x then he or she might go onto the second question.

This to me also goes a long way in explaining all the pheonomen you've covered in this series and why academics get frustrated with adcovates and visa versa. They are not seeking the same ends, and thus do not use the same means.

Thursday, October 21, 2010 12:57:00 PM

Blogger texasinafrica said...

Bradford, great question - it's one I'll be touching on in tomorrow's post, which is about uncertainty and complexity.

Dave, you hit the nail on the head. Thanks everyone for the great contributions!

Thursday, October 21, 2010 1:23:00 PM

Blogger Ryan said...

A tiny correction, and I may just be misreading what you wrote: adding additional independent variables doesn't actually do anything to help solve the correlation/causation problem. All that it does is let the researcher say that X and Y might correlate even when some other variable Z is controlled for.

In many of the questions that I ponder (and then don't try and answer) it seems like qualitative methods have a hand up in demonstrating causality within narrow circumstances. Statistics can show generalizability, but without instrumental variables or real experiments demonstrating causation is hell.

Thursday, October 21, 2010 4:52:00 PM

Anonymous Don Stoll said...

Dave Algoso's suggestion above, seconded by Katelmax, that advocates start with "what can we do about X?" whereas academics start with "what do we know about X?" looks on the money, though it doesn't address why somebody might prefer one starting point to the other. In fairness to advocates, their free and easy attitude to knowledge does not seem extraordinary viewed against the backdrop of a generally casual stance toward knowledge.

I think, for example, of the prosperous Tanzanian businessman who a few weeks ago explained to me why his country's "lazy" villagers were doomed to remain poor indefinitely. Mild weather, good for growing crops and for living simply, was bound to produce a laziness which, determined to protect itself, therefore cultivated other vices. One of these was lying, which his poor countrymen practiced on naive white people like myself in order to take advantage of us. Ultimately their trick, the businessman assured me, consists in staying poor enough to go on collecting handouts, without becoming so poor that one also becomes miserable.

What prospect does reason have of reversing this gentleman's opinions, so clearly driven by his passions? Small wonder that many among us simply advocate that "We have to do something even if"--aren't these Bob Geldof's words?--"it doesn't work"!

Friday, October 22, 2010 12:56:00 AM

Blogger Matt Davies said...

Great article - the light's going on and off and "I'm a little teapot" being sung is an ingenious bit of teaching.

A piece of news this week from the UK rebuffed one old causation chestnut. It was reported that non-violent crime has gone down in the UK, that despite the recession. So the next time we hear increased crime bring correlated with rising poverty, let's think about light switches and teapots!

Saturday, October 23, 2010 12:57:00 PM


Post a Comment

<< Home