Sunday, September 21, 2014

Cherry-picking data, patterns, hypotheses, and scientism.

     “Lie, damn lies, and statistics.” That’s the dismissive phrase trotted out when someone disagrees with a claim that doesn’t fit their biases. “Cherry-picking data” follows close on its heels. These aren’t exactly ad hominem attacks, but they come close. Only a scoundrel would select only those numbers that support their claim. Only a scoundrel will ignore the data that might refute it.
     And so it goes.
     But science depends on cherry-picked data. A new insight usually starts when someone thinks of something as weird that everyone else thinks is ho-hum run-of-the-mill background to what really matters. Or has dismissed it as already explained by some existing paradigm. Or just some accidental oddity that doesn’t mean anything. But all of that is cherry-picking data. It’s to see a signal in the noise that no one else has seen. Consider how the Higgs boson was discovered: by picking out a mere handful of events and calculating the odds that these few events are likely nor mere random glitches in the data. The Higgs boson was discovered by planning to cherry-pick the data that implied its existence.
     That humans see patterns all around them is a cliché. That most of them are constructs of our propensity to see patterns is another one. But every now and then one of these patterns turns out to be significant. It’s really there. And its existence and shape raise questions. Possible answers to those questions amount to a hypothesis. Framing it so that it can be tested against new or different data takes imagination.
     Inference: to do science starts when someone picks out the relatively rare significant differences between what we expect to see, or notices the oddness of a familiar patch of reality. It’s when these skills are used to support an a priori hypothesis that we get not science but scientism. Then the accusations at the head of this essay are relevant.

