The Nature of Things
There is something ethereal about human intelligence, something hard-to-pin-down. It’s hard even to define. Is intelligence the ability to reason? Does it have to do with memory? Is it aptitude with language? With mathematics? All of the above? Plenty of folks would go so far as to say that you just can’t measure intelligence. Take the man credited with creating modern intelligence testing, French psychologist Alfred Binet, who wrote: “Intellectual qualities are not superposable and therefore cannot be measured as linear surfaces are measured.” This business is complex and complicated, warned Binet, not a thing, like the hundred yard dash, to have an objective outcome.

According to others, however, our picture of intelligence is perfectly lucid. Many scientists believe that we long ago deciphered intelligence testing, thanks, in part, to Binet himself, and a pair of early-century scientists, Karl Pearson and Charles Spearman, whose work created a means of quantification.

Modern intelligence testing is coming up on its one-hundredth birthday, but unlike many of the landmark scientfic ideas of a century ago, the idea of testing intelligence, though it has certainly enjoyed moments of prosperity during the twentieth century, has failed to gain a consensus of believers in the sciences. In fact, those scientists who most focus their attention on intelligence are more fractured now than ever about our ability to measure it—and our methods of doing so. Where we are, finally, is really where we’ve been from the outset: confronting the dubious nature of testing, its misuse and sometimes sordid history, and its uncertain future.

The first real scientific attempt to study human intelligence began in the early nineteenth century, with the strange (by today’s standards) idea that the measurement of skulls revealed something of intellect.

The thinking went that the larger the skull, the larger the brain, and the larger the brain, the higher the intelligence. This idea, called craniometry, was borne from an earlier science called phrenology, in which folds of the brain were associated with intellectual properties. The theory amounted to little more than a sneaking suspicion that the brain had something to do with intelligence and psychological functions.

Until this point, the concept of intelligence had been the sole province of philosophers like Descartes and Locke, whose speculation raised many interesting questions about man’s consciousness and ability to reason. But their era lacked the tools of investigation necessary to explore those ideas empirically.

It wasn’t until the nineteenth century that the development of scientific tools began to bring forth an ever-expanding arsenal of gadgets and ideas with which to combat the centuries of ignorance. With these new tools came a firm belief that everything could be explained — the formation of continents, the stars — even human intelligence.

Scientific Racism
Craniometry owes its importance in history to a nineteenth century American scientist named Samuel George Morton, who popularized the idea and practice. He devised a system of filling empty skulls with small seeds and then removing the seeds to measure the volume. Naturally, this required that the subjects be dead, and that the only “results” were comparative skull sizes of various groups, which led to hypotheses about those groups. Later, Paul Broca, the Frenchman famous for discovering an area of the brain associated with motor-speech, replaced the seeds with lead shot, but craniometry remained otherwise static for nearly a century.

Like much of the intelligence testing to come, craniometry evolved from a desire to scientifically explain the success and perceived superiority of people of northern European descent-and the relative failure of other groups. This “scientific racism,” was pervasive in nearly all early tests of intelligence.

Morton used his vast personal skull collection — which included hundreds of Native American skulls as well as a handful from ancient Egypt — to rank all the known races in order of skull size, and, consequently, intelligence. His results were predictable; Germans, English and Anglo-Americans on top; Native Americans and Blacks on the bottom. It wasn’t uncommon for Morton or any other craniometrist to employ extraordinary mathematical acrobatics to verify this shaky hypotheses. In fact, craniometry proved no more a science than astrology, and in truth needed some help.

Binet Discovers Discrepancies
Enter Alfred Binet. Binet was commissioned in 1904 by the minister of public education in France to develop a method for identifying children who might benefit from special education curricula.

Binet initially trusted the legacy of Broca and the wisdom of measuring head size to ascertain intelligence. Eventually, however, he began finding disconcerting data in his pools of subjects. For instance, he discovered that head size did not consistently predict intelligence as evinced by school performance. And he also began to find discrepancies between the measurements his assistant had taken and the ones he himself had taken, a fact that finally dismantled the practice of craniometry.

“The idea of measuring intelligence by measuring heads seemed ridiculous,” Binet wrote in 1900. “I was on the point of abandoning this work, and I didn’t want to publish a single line of it.”

I.Q. and the Birth of the Test
Four years later Binet did abandon those methods. He came up with a series of tests related to common tasks involving reasoning, comprehension, invention and censure .. In 1905, Binet published these tasks as the first Binet scale, and modern intelligence testing was born.

The Binet scale graded subjects relative to others the same age. The resulting “mental age” was then subtracted from the subject’s chronological age, and this figure represented an approximation of scholastic aptitude.

The term intelligence quotient, or I.Q., arrived seven years later when German psychologist W. Stern argued that Binet’s score should be revised so that mental age be divided by chronological age, not subtracted.

Binet felt the scale was useful for identification of children with special needs only, and was dangerous if employed otherwise. He believed that an emphasis should be placed on training those with low scores, but that they should not be labeled. He refused to use the term “intelligence test,” as intelligence was not a quality he believed his scale measured. Additionally, he did not believe that the tests should be used to rank children or that they should be used to evaluate “normal” children.

Binet proved to be a solid and cautious scientist. It was not long, however, before each of his prescient warnings were cast aside by a new generation of testers, and the floodgates of I.Q. testing opened in full.

“The Menace of the Feeble-Minded”
The term “moron” (Greek for “foolish”) survives in our lexicon today as a sitcom-friendly, though impolite, way of saying someone is dull-witted. The term was coined less than a hundred years ago by the American scientist H.H. Goddard, who sought a more precise term to describe people formerly known as “feeble-minded.”

Goddard was the director of research at New Jersey’s Vineland Training School for Feeble-Minded Girls and Boys in the years before the First World War. His work led him to extreme ideas about heredity, intelligence testing, and social policy. After identifying “morons” by name he set about entreating public policy-makers to do something about what he came to describe “the menace of the feeble-minded.”

In particular, Goddard wanted to segregate and institutionalize people based on intelligence and prevent them from breeding. In his vision resided the worst of Binet’s fears about intelligence testing. Just three years after Binet developed his scale, the test crossed the Atlantic and gave rise to the American eugenics movement.

Goddard began testing immigrants at New York’s Ellis Island using his translation of the Binet scale. He found that forty percent of the immigrants fell into the newly formed “moron” class, which he and his colleagues believed was a group doomed to crime and poverty. An especially troubling reality was that prospective test-takers were identified based on their appearance; that is, Goddard’s staff was literally hunting for people they thought “looked” like “morons.” Naturally, the deportation rates for mental deficiency skyrocketed under his leadership. Incidentally, some twenty years later, Goddard recanted nearly all of his views on heredity and intelligence testing.

The Rise of Mass Testing
If H.H. Goddard was responsible for bringing Binet’s scale to America, his contemporaries, Stanford professor Lewis M. Terman and Harvard psychologist R. M. Yerkes, were responsible for popularizing it. In 1916, Terman expanded the scale dramatically and gave it a new name-the Stanford-Binet. It was to become the standard for mental testing in the twentieth century, and all tests that followed were really just variations.

As with so much science in U.S. history, it was war that gave intelligence testing the boost it needed to gain prominence. As the country geared up to enter the Great War in 1918, Yerkes convinced the U.S. government that intelligence tests would benefit the army’s placement of recruits. A pencil and paper version of the Stanford-Binet test was quickly created, and almost overnight, the intelligence test had a mass market. Over the next two years, nearly 1.5 million I.Q. tests were given to army recruits. And in the first three years following the war, two million more children were tested for the purpose of tracking.

The Advent of Factor Analysis
Back in 1904-the same year Binet was assigned to his task in Paris-British psychologist Charles Spearman applied to a problem of psychology a new statistical technique called factor analysis, a method invented by Karl Pearson in 1901 to reduce a series of relationships between two or more variables to one, quantifiable score, or correlation coefficient. The event would have a wholesale impact on the future of psychology, not to mention science-at-large. It meant that even psychology, then struggling for a solid reputation among the sciences, could now be measured by statistics.

When Spearman applied factor analysis to mental testing he discovered something curious: that people who did well on one kind of mental test (a verbal skills test, say, or a memory test), tended to do well on other kinds, and those who did poorly on one tended to do poorly on other kinds. He named this global capacity phenomenon the “general intelligence factor,” or g, for short, and nobody could have guessed at the turmoil it would cause over the next hundred years.

The general intelligence factor was meant to do exactly what Alfred Binet said couldn’t be done: measure intelligence “as linear surfaces are measured.” If this could be done, then an obvious question followed: if we know who is smart, then we can figure out what made them that way by exploring their similarities and finding connections among them. Conversely, the same is true of those whose tests scores show them to have below-average intelligence. These ideas-one does not exist without the other-would eventually lead researchers into volatile debate, over, among other things, race.

Nature vs. Nurture and The History of g
The idea of g and intelligence testing as a whole thrived in the United States until the 1930s. The rise of behaviorism in the social sciences-coupled with liberalism in schools and relativism in anthropology decreased the importance of a general intelligence-at which time g took up its place at the center of the intelligence testing controversy, which was really the center of an even longer-lived dispute: the nature/nurture debate.

The 1960s tipped the scales again, though, and brought mental testing under the microscope once more, this time questioning the biased nature of test questions. At the end of that decade, however, Harvard psychologist Arthur Jensen revived g in a high-profile genetics/I.Q. study, which ignited arguments both about g, and more generally, about nature vs. nurture. Jensen’s claim, which has been echoed by others since, was that genetics plays a major role in human intelligence. The proving ground for this theory was racial differences in I.Q., an idea revisited by Charles Murray and Richard J. Hernstein twenty-five years later in their book, The Bell Curve (1994).