Sunday, May 12, 2013

The Eurasiatic sharpshooter fallacy?

A paper by Mark Pagel, Quentin Atkinson, Andreea Calude and Andrew Meade has the linguistic blogosphere buzzing, so I thought I'd contribute my own entry. Their paper "Ultraconserved words point to deep language ancestry across Eurasia" was published in Proceedings of the National Academy of Science ahead of print, and has already been the subject of fierce criticism from the ultra-doctrinaire community of comparative historical linguistics. The paper applies an intriguing statistical procedure to the LWED database of reconstructed proto-words in seven established language families, and purports to uncover 23 lexical items which unite the families into a Eurasiatic superfamily, much as was proposed many years ago by Joe Greenberg (among others.)

I'm not going to contribute a detailed critique of the paper here; I will note that the critique that was posted on Language Log by Sally Thomason includes the caveat that she is not qualified to judge the statistical procedures.  I think that if one is going to critique a scholarly paper, it should really be critiqued in its entirety and not just in bits and bites, but some may differ on that score.

I think two major criticisms have emerged from the various comments, which are "garbage in, garbage out" and the "Texas sharpshooter fallacy."  The second one (raised by Andrew McKenzie of the University of Kansas) is more interesting to me, since it actually involves the statistical interpretation. This statistical fallacy involves "discovering hidden structure" or clusters in data where there is really no evidence for anything.  It takes its name from the tale of a Texas gun for hire who was not a very good shot.  Being clever, he took out his two revolvers and fired 12 shots as best he could at the side of a barn, and then painted a target centered on the tightest cluster of bullet holes. He then showed the target to potential clients, claiming to be a sharpshooter.

In the Eurasiatic data, I guess the problem could be that the 23 "ultraconserved" lexical items which were found to unite the families could just be randomly like each other, but it is hard for me to draw this analogy with the Texas sharpshooter because the statistical results in the paper are so significant they seem to mitigate problems of this kind.  For one thing, there are 7 language families involved and not just two.  For another, the 23 lexical items emerge from the typical 200-word Swadesh list comparison.  Without any rigorous argument, it seems to me that there is a very low chance of 23 items out of 200 (that's 12.5%) randomly being similar across 7 language families.  A commonly cited real instance of a scientific study waylaid by the Texas sharpshooter was a Swedish epidemiological study of 800 medical conditions. They found a significant difference in the incidence of one ailment out of 800, among people who lived near electric transmission lines (this is cited on the Wikipedia page about the Texas sharpshooter). This result is now regarded as not reproducible, an instance of the Texas sharpshooter.  But let's take note that 1 ailment out of 800 is quite different from 23 words out of 188.

Quentin Atkinson assured me that he can stand behind this paper, and he may yet have to defend it in the pages of PNAS or some similar platform.  These authors are not going to make a clean getaway with such a provocative proposal, not once the anti-mass comparison folks in comparative linguistics got wind of it.  My own view in general is that we should embrace new sources of evidence in linguistics, rather than closing ranks and saying that methods developed over a century ago are really the only way.  Let's not forget that the standard comparative method is so strict that it can be carried out "by a trained eye," and without any statistical processing. Surely there must be some kind of computational analysis that can go beyond this.