Sei sulla pagina 1di 7

Proceedings of The National Conference

On Undergraduate Research (NCUR) 2012


Weber State University, Ogden Utah
March 29 31, 2012

Does the Normal Curve Accurately Model the Distribution of Intelligence?


Lindsey R. Godwin and Kyle V. Smith
Department of Behavioral Science
Utah Valley University
800 West University Parkway
Orem, UT 84058
Faculty Advisor: Dr. Russell T. Warne
Abstract
Like many human characteristics, intelligence is theorized to be normally distributed. However, a vocal minority of
researchers and practitioners who study individuals with high intelligence have claimed that there are more people in
the upper echelons of intelligence than would be expected if the normal curve accurately modeled the distribution of
intelligence scores.1,2,3,4 To verify this claim we carefully searched articles from the journal Intelligence dated 1979
to 2012, completed an academic journal search and reviewed national data sets for samples that permit this claim to
be tested. To be included samples must have been (a) representative of the population that the intelligence test used
was normed on, (b) not be the tests norm sample, (c) have at least 1,000 subjects in the sample, and (d) examined
subject intelligence using an intelligence test with norms that are no more than 15 years old. This search yielded one
such sample used in a study by.5 Two national data sets were also identified for use in this review, the National
Longitudinal Study of Youth (NLSY) and the Early Childhood Longitudinal Study (ECLS). We reviewed the
information provided from these sources and determined that intelligence is indeed normally distributed.
Keywords: intelligence, normal distribution, human populations

1. Introduction
The scientific study of intelligence dates back to the nineteenth century. Although it is one of the oldest areas of
study in the psychological field, there are still many debates over the nature of intelligence. In an effort to quell
some of the disagreements in the field and correct popular misconceptions, a group of scientists signed a statement
affirming some of the mainstream findings within the field.6 Many issues were addressed in this statement, including
providing a definitive definition of intelligence:
Intelligence is a very general mental capability that, among other things, involves the ability to
reason, plan, solve problems, think abstractly, comprehend complex ideas, learn quickly and learn
from experience. It is not merely book learning, a narrow academic skill, or test-taking smarts.
Rather it reflects a broader and deeper capability for comprehending our surroundings catching
on, making sense of things, or figuring out what to do.6, p. 13
This statement on mainstream understandings of intelligence provided a general consensus on many topics. For
example, there are several different tests available for measuring intelligence. All are positively correlated due to the
existence of Spearmans g, the general factor of intelligence.7
Gottfredson, et al.6 also presented a consensus on one area of that has often been debated, the distribution of
intelligence. The 52 signers of the mainstream document stated that intelligence is normally distributed.6, p. 13
This idea concurred with Termans findings from nearly a century ago. When adapting Binets test for an American
audience, Terman theorized that intelligence was normally distributed and his observations of 905 children in his
norm sample generally supported this assertion.8 Yet, a small group of theorists today insist that intelligence is not

normally distributed, but rather is skewed, demonstrating a higher number of individuals in the upper echelons than
would be present in the normal curve.2,3,4,9,10,11,12
This idea of a non-normal distribution of intelligence is nearly as old as the claim that intelligence follows normal
distributions. Terman, in the first volume of his longitudinal study of gifted children was the first to voice dissent
against the theory of a normal distribution of intelligence. He stated, The number of very high cases is larger than
the standard deviation of the I.Q. distribution for unselected children would lead one to expect. It is doubtful
therefore whether the incidence of superior intelligence follows the normal probability curve.12 Since that time,
other researchers have noted the same unexpected outcome, leading them to believe that the number of individuals
with greater than average intelligence is higher than indicated in a normal curve. These findings coincide with those
of another critic of the normality hypothesis, Sir Cyril Burt, who found that there are more than ten times the
number of individuals with IQs over 160 than would occur under the assumption of normality. Burt determined that
the true distribution was much more highly peaked with elongated tails on both ends.10 Several other researchers
have found a similar pattern in their dealings in intelligence research.1,2,3,10
Although the idea that the normal curve does not fit has had support by some of the most prominent figures in
intelligence over the past century, there has been no change in the general assertion that intelligence is normally
distributed as demonstrated by the widespread support for the6 mainstream statement on intelligence. Is this because
this idea is misguided or that the field has just not yet come to realize the accuracy of this assertion? When differing
results exist, how can one determine the accurate distribution? The best answer to this would be to conduct an
extensive study with a representative sample and demonstrate the resulting distribution. However, such studies can
be very costly because of the extensive time and resources required to acquire such a sample, test subjects and
tabulate results. A reasonable alternative is to conduct an extensive, systematic review of existing literature and data
sets to identify samples that will allow this theory to be tested. Our goal in this paper is to examine appropriate data
sets and reports from literature in order to test the theory that intelligence is normally distributed.

2. Methods
2.1 Literature Review
An in depth literature review was conducted searching a variety of criteria to identify articles that showed a
distribution of intelligence scores in the upper levels of cognitive ability. The literature review included a thorough
search of all the articles published in the Intelligence journal from 1977 to early 2012. A search of the various
articles located in academic search engines such as PsycInfo and ScienceDirect was also conducted.
For an article to be included in this review, the sample must have been a representative of the population on which
the intelligence test was normed. However, the sample could not be the same with which the test was normed
because scores from norm samples are often forced to fit a normal distribution. We decided that a minimum of 1,000
subjects must have been included in the sample because smaller samples were unlikely to have a sufficient number
of people in the top percentiles to detect violations of normality. Finally, to avoid the influences of the Flynn13
effect, (i.e., the tendency for mean intelligence scores to gradually increase in a population), the sample must be
examined with an intelligence test with norms that are no more than 15 years old.
Samples were excluded if it was found that the test utilized had not been established as a producer of reliable data
(i.e., consistent), the author noted difficulties encountered during test administration or did not administer the full
test or thesample utilizes a non-representative sample of subjects, such as a volunteer subjects. The exclusion of
volunteers is necessary as it has been found that volunteers for intelligence testing often have above average
intelligence.14 Inclusion of such subjects could create selection bias.

2.2 Datasets
The search for applicable samples to be included in this research identified two publically available data sets, the
National Longitudinal Study of Youth (NLSY) and the Early Childhood Longitudinal Study (ECLS-K:98, hereafter
referred to as ECLS). Datasets must have met the same criteria as in the literature review requirements. In addition,
the tests must have been a reasonably good measure of Spearmans g. Finally, variables in datasets that
demonstrated an obvious ceiling effect were excluded. Both the NLYS and ECLS studies met the criteria for
inclusion.

988

As a part of the NLSY study, the Armed Services Vocational Aptitude Battery (ASVAB) was administered to
participants in 1980. Although the ASVAB was not originally prepared as a means of measuring intelligence alone,
but more for military preparedness and job placement, test items include arithmetic reasoning, word knowledge,
paragraph comprehension, general science, and mathematical knowledge all of which are attributed to intelligence
and are generally recognized to be at least moderately related to general intelligence.15 It should be noted that the
NLSY sample was used to norm the ASVAB subtests; however, in the norming process, the subtests were not forced
to conform to a normal distribution.16 Therefore, the data set was appropriate for usage in this study.
ECLS is a longitudinal study conducted at seven different points in time that measured participants academic
achievementamong other variablesfrom kindergarten in (in 1998) to the eighth grade (in 2007). Students were
tested at seven time points: general knowledge was tested at the start of kindergarten (C1), end of kindergarten (C2),
the start of 1st grade (C3), reading and math was tested at the end of 1st grade (C4), reading, math, and science at the
end of 3rd grade (C5), and science was tested at the end of 5th grade (C6). Although ECLS subjects were examined in
their academic achievement, we believe that these tests were reasonably strong correlates with g, as most academic
tests are.17,18,19,20
A Kolmogrov-Smirnov test was utilized to analyze the data found in both the NLSY and ECLS studies as
compared to a normal distribution. However, because of the large sample sizes for both datasets (The ECLS sample
size was 21,409, which was weighted to have an effective sample size of 3.56 million to 3.88 million, depending on
the measurement occasion), even trivial deviations from normality would be statistically significant. Therefore, we
also prepared Q-Q plots that compared the distribution with a normal distribution for variables for the two datasets.
To avoid the problems associated with statistical significance tests with large, we also made comparisons between
the observed number of subjects in the top percentiles and the theorized number of subjects in the top percentiles
under the normality assumption.
To make these comparisons, every persons score was converted to a z-score, which was then used to place
participants in four different groups. Participants that scored in the top one percent (z-score of 2.3263) as Group B,
participants that scored in the top two percent (z-score of 2.0537) as Group C, and those that scored in the top five
percent (z-score of 1.6449) as Group D. The proportions of participants who fall into each group were compared
to the proportions that would be expected if intelligence were normally distributed.

3. Results
3.1 Literature Review Results
The search for applicable samples to be included in this research identified an article from the Voortgezet Onderwijs
Cohort Leerlingen (VOCL) longitudinal cohort study. For the VOCL study, a representative sample of 19,391
students from 126 secondary schools in the Netherlands was collected. The participants started Grade 1 of secondary
school in 1999, with participants being an average of 12 years old, and were monitored throughout their secondary
education.5
Guldemond and his coauthors used the data that were collected in the first three years of the study for their study.
The sample included about 13,000 students who completed the Groningen Intelligence test for Secondary Education
(GIVO-test), a Dutch intelligence test, and was a representative sample of Dutch school children. Based on the
scores of the intelligence test, the students were categorized into four levels of intelligence: students that scored an
IQ of 144 or better, students that scored an IQ of 130-143, students with an IQ of 120-129, and students with an IQ
between 110 and 119.5
It was projected that on a normal distribution there would be about 22 students (21.80) with an IQ of 144, and
there were 20 found. It was also projected that there would be about 318 (317.55) students with an IQ higher than
130, and there were 342 found. It was projected that about 1,503 (1,503.29) students would be found with an IQ
over 120 and there were 1,286 found. Lastly, it was projected that about 4,786 (4,785.70) students found with an IQ
higher than 110, and 3,462 were found.
The projections and the actual number of students with an IQ over 144 were fairly close; and there were more
students than projected that had an IQ over 130. However, there were far fewer students than projected with an IQ
over 120 and even more fewer than projected with an IQ over 110. These results may not show a clear pattern, but
they do show that with the exception of students found with an IQ over 120, that intelligence seems to generally
conform to the distribution of a normal curve.

989

3.2 Data Set Results


The results from the data sets are listed in Table 1 (for ECLS) and Table 2 (for NLSY). Table 1 shows that the
distribution of the samples general knowledge at the beginning of kindergarten (C1) was close to the projections of
a normal distribution. However, when testing general knowledge at time points C2 and C3, there were far fewer
subjects in the upper echelons than would have been expected from the normal distribution.
Results for C4 reading and math achievement showed that the participants scored higher than was projected on a
normal distribution, but with C5 reading, math, and science achievement, the sample had fewer participants than
anticipated in ever group examined. C6 science achievement was also far lower than projected with a normal
distribution; only 0.4% of students obtained z-scores that would place the students in the top 2% on the normal
distribution.
Table 1. Number of participants that scored in group a, b, c, and d
Group

C1 Gen
Know

C2 Gen
Know

A
B
C
D

0.4
1.2
2.4
5.9

0.1
0.4
1.2
4.5

C3 Gen
Know

< 0.1
0.4
3.3

C4
Reading
1.5
2.0
4.2
7.1

C4
Math
1.3
2.2
3.6
6.9

C5
Reading

C5
Math

C5
Science

< 0.1
0.2
0.8
3.9

0.1
0.3
1.2
5.0

0.3
0.9
2.0
5.6

C6
Science

< 0.1
0.4
3.3

Note: Number of participants that are the projection of a normal distribution are in boldface.
Note. A = z-score of 2.5758; B = z-score of 2.3263; C = z-score of 2.0537; D = z-score of 1.6449.
Note. Cells marked with a dash () indicate that the test did not have a high enough ceiling to make the comparison
between observed scores and the normal distribution.
Table 2. Number of participants that scored in group a, b, c, and d
Group
A
B
C
D

ASVAB
Gen Science

7.2

ASVAB Arith
Reason

8.0

ASVAB
Speed

0.5
1.5
3.4

Coding

ASVAB
Math Know

7.5

ASVAB
Mech Comp

2.2
6.8

Note. Number of participants that are the projection of a normal distribution are in boldface.
Note. A = z-score of 2.5758; B = z-score of 2.3263; C = z-score of 2.0537; D = z-score of 1.6449
Note. Cells marked with a dash () indicate that the test did not have a high enough ceiling to make the comparison
between observed scores and the normal
In Table 2, results for the ASVAB subtests of general science, arithmetic reasoning, mathematical knowledge, and
mechanical comprehension all showed a number of participants that were higher than what would be projected on a
normal distribution. However, for the coding speed subtest, the results indicated that the number of participants was
lower in every group than projected on a normal distribution. This is particularly interesting because coding speed is
generally recognized as being related to intelligence and is even included as a subtest on some intelligence tests.21

990

4. Discussion
Based on the results of Guldemond et al.s5 study and the datasets we examined, the affirmation that more people are
located in the top percentiles of intelligence than would be expected in a normal distribution may not be supported.
Of the 42 tests of the theory of normally distributed intelligence shown in Tables 1 and 2, 23 (54.8%) showed that
there were fewer people than expected in the top ranges of intelligence. Because about half our results support the
beliefs of a skewed distribution of intelligence and others and half do not, we believe that support for the assertion
that there are more people than expected in the upper echelons of intelligence is inconsistent and irregular. In
general, we believe that these findings support the mainstream claim that the distribution of the IQ scores can be
represented as a normal curve, as stated in the mainstream statement on intelligence.6 It is true that there are some
departures from normality that can be seen in Tables 1 and 2. However, there is no consistent pattern to these
departures from the normal distribution. We believe that these variations are merely the result of the expected
differences between an idealized, theoretical distribution and real-life data.
Strength for our conclusion comes from a variety of sources that include multiple age groups that were sampled,
ranging from children to adults. The samples also came from multiple countriesthe United States and the
Netherlands. Moreover, all our results come from nationally representative samples that can be generalized to much
larger populations. Finally, the methods of measuring intelligence range from traditional intelligence tests to
academic achievement and aptitude testing. We believe that the weaknesses of each of these samples and tests are
balanced out by the inclusion of the other results.
So why are there many1,2,3,4,9,11 that believe that there are more individuals in the extremely high levels of
intelligence? We believe there are three main reasons. First, many of these authors have used tests that produced an
IQ score based on a ratio IQ (i.e., mental age divided by chronological age). The ratio IQ, however, was problematic
because it does not always produce a normal distribution. Also, the standard deviations of ratio IQs are larger than
the expected 15 or 16 points. These reasons are why this method for producing an intelligence test score was
abandoned decades ago.21 For example, Silverman4,22 has advocated the use of the Stanford Binet L-M, an old,
outdated test (from 1960) that produces an IQ test based on a ratio, and she still uses that test in her clinic for high
ability children.23,24 Given the flaws in ratio IQs, it is no surprise that Silvermanand others who still use tests that
produce these ratio IQsbelieve that there are more high IQ people than the normal curve would predict.
Second, some theorists may believe that there are more people with high intelligence than would be predicted by
the normal distribution because of the Flynn effect. The Stanford-Binet L-M was last normed in 1976; in the three
and a half decades since, it would be expected (based on data from Flynn,13 that the average IQ would rise from 100
points to roughly 110. With such severe IQ score inflation, it is no wonder that people who advocate (and use) the
Stanford-Binet L-M in the 21st century find so many more subjects obtaining high IQs than the normal curve would
lead them to expect. It is true that the Flynn effect seems most severe for lower intelligence populations than higher
intelligence populations,25 but it would be nave to state that the upper echelons are completely immune to the Flynn
effect.
Third, many of the authors who suggest that there are more intelligent people than there should be if intelligence
were normally distributed have come to this conclusion based on data from non-representative samples.3,4,12 In all of
these studies, the authors claim that they found more gifted children than they expected; however, none of these
authors made any attempt to take a nationally representative sample of children. Instead, the children in all of these
studies were referred to the researcher and in many cases large portions of the population (especially rural groups)
were undersampled or ignored completely. It is quite possible that non-representative samples can lead to incorrect
conclusions about the generalizability of a study.
There could be a variety of other reasons that the idea that intelligence is non-normally distributed is a stubbornly
persistent notion. Some examiners could be ignoring stop rules recommended by test creators in the administration
guidelines (as one author heard Silverman advocate in 2009). Its also possible that these theoristswho often
specialize in the study of gifted childrenspend a great deal of time around many high intelligence people, leading
them to believe that there are more high intelligence people than there really are. This frequent exposure to high
intelligence people may lead some researchers to a believe that there experience is more common than it actually
isa false consensus effect.26

991

5. Conclusion
Our research questionand the title of this paperwas, Does the normal curve accurately model the distribution
of intelligence? From the results that we presented, we believe that the answer is mostly yes. The data in Tables 1
and 2 do not perfectly conform to the normal distribution. However, the normal distribution is highly idealized and
minor deviations are to be expected.
However, in response to the question, Are there more people than expected in the upper echelons of
intelligence? the answer is a resounding no. We could find no pattern in our data that would indicate that the
normal distribution is not a reasonable model for the number of people who achieve high scores on g-loaded tests.
We believe that it is not possible to reject the hypothesis of a normally distribution of intellectual ability and that
there is strong evidence that intelligence scores do fall in a normal distribution. The results come from representative
samples that included multiple age groups, multiple countries, and many different ways of measuring Spearmans g.
The declaration of anything otherwise has not been sustained and we encourage intelligence researchers and
theorists to rally behind the mainstream statement on intelligence stating, The spread of people along the IQ
continuum, from low to high, can be represented well by the bell curve (in statistical jargon the normal curve).6, p.
13

6. References
___________________________________________
1. Gallagher, J.J. (2008). According to Jim: The flawed normal curve of intelligence. Roeper Review, 30, 211-212.
2. McGuffog, C., Feiring, C. & Lewis, M. (1987). The diverse profile of the extremely gifted child. Roeper Review,
10, 82-89.
3. Robinson, H. B. (1981). The uncommonly bright child. In M. Lewis & L. A. Rosenblum (Eds.), The uncommon
child (pp. 57-81). New York: Plenum Press
4. Silverman, L. K. (2009). The measurement of giftedness. International Handbook on Giftedness, 947-970. doi:
10.1007/978-1-4020-6162-2_48
5. Guldemond, H., Bosker, R., Kuyper, H., & van der Werf, G. (2007). Do highly gifted students really have
problems? Educational Research and Evaluation, 13, 555-568. doi:10.1080/13803610701786038
6. Gottfredson, L. S. (1997a). Mainstream science on intelligence: An editorial with 52 signatories, history, and
bibliography. Intelligence, 24, 13-23. doi:10.1016/S0160-2896(97)90011-8
7. Neisser, U., Boodoo, G., Bouchard, T. R., Boykin, A., Brody, N., Ceci, S. J., & . . . Urbina, S. (1996).
Intelligence: Knowns and unknowns. American Psychologist, 51, 77-101. doi:10.1037/0003-066X.51.2.77
8. Terman, L. M. (1916). The measurement of intelligence. Boston, MA: Houghton Mifflin.
9. Burt, C. (1963). Is intelligence distributed normally? The British Journal of Statistical Psychology, 16, 175-194.
10. Parkyn, G. W. (1945). The clinical significance of IQ's on the Revised Stanford-Binet Scale. Journal of
Educational Psychology, 36, 114-118. doi:10.1037/h0055705
11. Terman, L. M. (1922). A new approach to the study of genius. Psychological Review, 29, 310-318.
doi:10.1037/h0071072
12. Terman, L. M. (1926). Genetic studies of genius: Vol. I. Mental and physical traits of a thousand gifted children.
(2nd ed.). Stanford, CA: Stanford University Press.
13. Flynn, J. R. (1987). Massive IQ gains in 14 nations: What IQ tests really measure. Psychological Bulletin,
101(2), 171-191. doi: 10.1037/0033-2909.101.2.171
14. Rosenthal, R., & Rosnow, R. L. (1975). The volunteer subject. New York, NY: Wiley.
15. Gottfredson, L. S. (1997b). Why g matters: The complexity of everyday life. Intelligence, 24, 79-132.
doi:10.1016/S0160-2896(97)90014-3
16. Bock, R. D., & Mislevy, R. J. (1981). Data quality analysis of the Armed Services Vocational Aptitude Battery.
Chicago, IL: National Opinion Research Center.
17. Lohman, D. F. (2006). Beliefs about differences between ability and accomplishment: From folk theories to
cognitive science. Roeper Review, 29, 32-40.
18. Merwin, J. C., & Gardner, E. F. (1962). Development and application of tests of educational achievement.
Review of Educational Research, 32, 40-50. doi:10.2307/1169202
19. Schmeiser, C. B., & Welch, C. J. (2006). Test development. In R. L. Brennan (Ed.), Educational measurement
(4th ed., pp. 307-353). Westport, CT: Praeger Publishers.

992

20. Zwick, R. (2006). Higher education admissions testing. In R. L. Brennan (Ed.), Educational measurement (4th
ed., pp. 647-679). Westport, CT: Praeger Publishers.
21. Kaplan, R. M., & Saccuzzo, D. P. (2009). Psychological testing: Principles, applications, and issues. (7 ed.).
Belmont, CA: Wadsworth.
22. Silverman, L. K., & Kearney, K. (1992). The case for the Stanford-Binet L-M as a supplemental test. Roeper
Review, 15, 34-37
23. Konigsberg, E. (2006). Prairie fire. The New Yorker, 81(44), 44-57.
24. Silverman, L.K. (2002). Why we use the Stanford-Binet (Form L-M), The Examiner: The Journal of the Kansas
Association of School Psychologists, 28(3), 20-21.
25. Zhou, X., Zhu, J., & Weiss, L. G. (2010). Peeking inside the black box of the Flynn effect: Evidence from
three Wechsler instruments .Journal of Psychoeducational Assessment, 28, 399-411.
26. Marks, G., & Miller, N. (1987). Ten years of research on the false-consensus effect: An empirical and theoretical
review. Psychological Bulletin, 102, 72-90. doi:10.1037/0033-2909.102.1.72

993

Potrebbero piacerti anche