Simpson’s Paradox In The Slush Pile

Simpson’s Paradox In The Slush Pile

In 1973, a time of pitchforks, flaming bras and napalm, the University of Berkeley received a total of 12,763 graduate program applications. 8,442 apps were from men, and the remaining 4,321 apps were from womenLet me forestall impertinent questions by quickly adding that before Al Gore invented the net, gender used to be a simple binary affair: you were either male or female; there were no in-between’s, no undecided’s, no none-of-the-above’s and no you-tell-me’s.. Of this hopeful lot, Berkeley admitted 3,738 men and 1,494 women. In other words, about 44% of the men were admitted compared with 35% of the women. A nine-percent difference. A woman applying to Berkeley’s grad programs had a 9% less chance of being admitted than a male. That, as statisticians like to joke, smelled of Fisher.

Assuming that women candidates were as qualified as the male candidates, what could explain why the discrepancy in admission rates was so large? Sexism, of course. There was no use pointing out that Berkeley was in effing California, not Louisiana. No use pointing out that 1,494 women had been admitted. Token women didn’t count. Women twice as good as men didn’t count twofold. Men as gentle as temple cows didn’t count. What counted was the 9%.

Eugene Hammel, the male Associate Dean of Graduate Studies, had the bright idea of asking Peter Bickel, a male statistics prof who was on the board of the Grad Council at Berkeley, to analyze the admissions data. The result of that analysis by Bickel, Hammel and O’Connell is now a statistical classic. They showed that on a department by department basis, if there was a bias, it was a slight one in favor of admitting women over men.

How was this possible? How could it be that at a departmental level, women were as likely, if not slightly more likely, to be admitted, but the admission rates for women were 9% lower than that of men? Was it… Could it be… Could it really be just… ARITHMETIC!!!

Yes. While the odds of admission did favor women on a department-by-department basis, the admission standards of different departments were not all the same. Some departments, say, Physics, had notoriously high standards. Other departments, say, Sociology, had notoriously low ones. What Bickel and gang showed was that women were applying in greater proportion to the more difficult programs rather than the easy ones, and so were getting rejected at higher rates. Men, strategically unambitious as always, were much more spread across the departments, and hence their slight disadvantage in odds was offset by the fact that more of them had sent their sweet nothings to the floozy departments. It was a tale with a statistical villain.

The villain’s name is Simpson’s Paradox. It is a statistical paradox that often arises when we calculate averages over aggregates. It sometimes happens that a statement may be true of every mixed subgroup (“Compared with men, women have a slightly higher odds of getting admitted to engineering/humanities/sciences/architecture/…”), but when you aggregate over all the groups, the statement turns false (“women have a significantly lower odds of getting admitted”). Simpson’s Paradox– that is, the potential for the paradox– plagues mixtures, heterogeneity, population studies of all kinds. It is perhaps the closest thing there is to the problem of evil in statistics.

So how is all this relevant to science fiction? Well, say there’s this fantasy world with two groups (genders) of writers (candidates): West and Other. Both groups have more or less the same distribution of talent. There are fewer Other writers than Western ones, and some chaps belong to both groups, but never mind that. Writers send in their stories to SF&F outlets (departments). Each outlet’s acceptance (admission) procedure is decided by an Editor. Not all the outlets are equally easy. Even though most outlets are in the West, the Others have a slightly better chance on a per outlet basis (because editors in this world act to encourage new voices). However, it turns out that the Others mostly apply to the harder-to-get-into outlets. Why? Well, these are the well-known ones, and if you’re in Pune, India, why send a story to the Vampire Gnome Anthology, when for the same time and postal expense and much greater potential benefit, you could send it to the New Yorker? And so it happens that there are great differences between acceptance rates for Westerners as compared to the Others. In this speculative world– hypothetical liberal world– Simpson’s Paradox, not racism, is the villain.

That world may not be our world. In our world, we have editors like William Sanders. But it also has editors who were willing to take chances with my writing, some of it truly godawful. So it’s hard to be sure. I’m going to give it the benefit of the doubt. Besides, I’d take doubt any day over the certainties of pitchforks, flaming bras and napalm.

***

Acknowledgements:

Featured image is from James Dean’s wonderful site Mighty Optical Illusions.

There are 14 comments

  1. abha

    I like this take, Anil.
    This piece opened my eyes to the problems of arithmetic, and that certain certainties are not certain. Apart from Simpson's Paradox being in existence, I also like the fact that you have sent it to the slush pile, banking more on the uncertainties of the real world. Kudos to you for taking your chances and getting published in an uncertain and wobbly world.

    Abha

    • anilMenon

      Thanks Abha. It's an ubiquitous paradox. Most recently, it reared its head in data for SAT test score gains in minority groups: American Indians, Hispanics and Whites gained an average of 8 points; Puerto Ricans, 18 points; African Americans, 19 points; Asians, 27 points. Overall though, the national SAT averages have dropped a few points. The paradox is very familiar to statisticians but it deserves to be more widely known.

  2. anilMenon

    Thanks Abha. It's an ubiquitous paradox. Most recently, it reared its head in data for SAT test score gains in minority groups: American Indians, Hispanics and Whites gained an average of 8 points; Puerto Ricans, 18 points; African Americans, 19 points; Asians, 27 points. Overall though, the national SAT averages have dropped a few points. The paradox is very familiar to statisticians but it deserves to be more widely known.

  3. riseandsinh

    Very nice post, glad to see you writing expository articles again. You're an absolute pro at that 🙂

    Simpson's Paradox knocked me off when I read it a few months ago, too. That, combined with reading Taleb's “Fooled by Randomness” (which very nicely chronicles different kinds of fallacies associated with conditional probability, with high stakes at play to boot), totally changed my view of 'hard numbers'.

    As a slightly academic extension to this post, you'll certainly like this: Anscombe's quartet 🙂

  4. anilMenon

    Hi Mohan: Thanks for the Taleb and Anscombe quartet links. Probability is endlessly fascinating. It's a puzzle to me why there's only one probability theory when there are multiple geometries and multiple algebras. For example, there's no obvious reason why conditional probability *has* be defined the way it is.

    And glad u like the expository pieces. Hope u know u are only encouraging the illness. 🙂

  5. KVM

    Very nice post, glad to see you writing expository articles again. You're an absolute pro at that 🙂

    Simpson's Paradox knocked me off when I read it a few months ago, too. That, combined with reading Taleb's “Fooled by Randomness” (which very nicely chronicles different kinds of fallacies associated with conditional probability, with high stakes at play to boot), totally changed my view of 'hard numbers'.

    As a slightly academic extension to this post, you'll certainly like this: Anscombe's quartet 🙂

    • anilMenon

      Hi Mohan: Thanks for the Taleb and Anscombe quartet links. Probability is endlessly fascinating. It's a puzzle to me why there's only one probability theory when there are multiple geometries and multiple algebras. For example, there's no obvious reason why conditional probability *has* be defined the way it is.

      And glad u like the expository pieces. Hope u know u are only encouraging the illness. 🙂


Comments are closed.