Friday, August 28, 2015

On inferring causation from correlation

In reading research, a question of enormous practical (and theoretical) importance is: Why do some children of adequate language skills, intelligence and educational opportunities lag behind their peers in their reading ability, or more simply put: What causes reading problems? This question has been tackled for decades, and the answer has proved to be incredibly complex. (There is still no agreed-upon answer to date.)

Experimentally, finding a causal influence of a behavioural outcome is somewhat tricky. Paraphrasing my undergraduate statistics textbook: there are three points that you need to show before claiming causality. (1) There is a correlation between the outcome measure (e.g., reading ability) and performance on the task which is proposed to cause the variability therein (say, phonological awareness). (2) The causal influence precedes the skill that it’s supposed to test (e.g., phonological awareness at an earlier time point is associated with reading at a later time point). (3) Experimentally manipulating the causing variable should affect performance on the outcome measure (e.g., children become better readers if you train them on a phonological awareness task).

There are two statistical procedures which are commonly used, in reading research, to show a causal relationship, even though they completely ignore Point (3). An experimental manipulation is essential for making a causal claim: both Points (1) and (2) are susceptible to the alternative explanation that a third factor influences both measures. For example, phonological awareness, even if measured before the onset of reading instruction, may be linked to reading ability in Grade 3, but both of them may be caused by vocabulary knowledge, parental involvement in their children’s education, the child’s intelligence, or statistical learning ability, just to name a few possibilities.

Most researchers know that correlation ≠ causation, but many seem to succumb to the temptation of inferring causation from structural equation models (SEMs). Paraphrasing my undergraduate statistics lecturer: SEM is a way of rearranging correlations in such a way that makes it look like you can infer causality. Here, the outcome measure and predictors are represented as boxes, and the unique variance of the particular link, obtained by a regression analysis, is written next to each arrow going from a predictor to the outcome measure. Even if a predictor is measured at an earlier time than the outcome measure (thus showing precedence, as per Point 2), this fails to show a causal relationship, as a third, unmeasured factor could be causing both.

Having just returned from a selective summer school on literacy, I have counted a total of four statements inferring a causal relationship from SEMs during this meeting, one by a prominent professor. They are in good company. Just to pick one example, a recent paper has used SEMs to infer causation (Hulme, Bowyer-Crane, Carroll, Duff, & Snowling, 2012)1.

While I’m at it, there is another methodological method that has been used to infer causation even though it can’t, namely reading age matched designs. The logic is as follows: if you compare poor readers to good readers, who are matched on age, on any task (say, phonological awareness), you can expect poor readers to perform worse than good readers. This could be because being skilled at this task facilitates learning to read, or performance on this task could be a result of more reading exposure among the good readers (because good readers tend to read in their free time, while poor readers don’t). In a reading age matched design, one compares a group of poor readers, given their age, to a group of younger readers, who are average or good given their age, but their absolute reading ability is equivalent to that of the poor readers. If poor readers perform worse on phonological awareness tasks than their younger controls, this suggests that the deficit in phonological awareness is not a result of a lack of reading exposure.

There are theoretical problems in matching children for their absolute reading ability, because older poor readers and younger average-good readers are unlikely to have identical performance on different aspects of reading (see Jackson & Coltheart, 2001): the control group could vary widely in their age and cognitive skill profiles, depending whether the task to match them measures their nonword reading accuracy, their word reading fluency, or text comprehension. Even if it was theoretically possible to match poor readers to younger controls in terms of their reading ability, the caveats from SEMs still apply: it is possible that poor phonological awareness and poor reading skills are both caused by a third underlying factor. Although I know of no peer-reviewed paper that explicitly makes a causal claim based on the reading age matched design, I have heard such claims at conference talks, and causality is often implied in published papers, without explicitly stating the alternative explanation. 

The TL;DR summary of this post is very simple: It is never OK to infer causality from correlations.

Hulme, C., Bowyer-Crane, C., Carroll, J. M., Duff, F. J., & Snowling, M. J. (2012). The Causal Role of Phoneme Awareness and Letter-Sound Knowledge in Learning to Read: Combining Intervention Studies With Mediation Analyses. Psychological Science, 23(6), 572-577. doi:10.1177/0956797611435921
Jackson, N., & Coltheart, M. (2001). Routes to reading success and failure: Toward an integrated cognitive psychology of atypical reading. New York, NY: Psychology Press.


[1] To be fair, this study also has a training component. Whether the paper makes a convincing claim for a causal relationship is a different question, but either way someone who only has a quick read of the title and abstract may get the impression that SEMs are a tool for assessing causality.