Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
David A. Fillman
Galena Park (High School) School District, Galena Park, Texas 77545
Godrej H. Sethna
Houston Museum of Natural Science, Houston, Texas 77030
Abstract
Science textbooks are frequently used to convey a great deal of the information that students
receive in science courses. They influence how science teachers organize the curriculum and
how students perceive the scientific enterprise. An overreliance on these teaching aids often
results in an overemphasis on terminology and vocabulary, and presents a false impression of
the nature of science. As a result of their importance, a method was developed to assess the
curricular emphasis in science textbooks. The procedure is explained in a 25-page manual to
train researchers to determine the relative emphasis that has been given to (a) science as a body
of knowledge, (b) science as a way of investigating, (c) science as a way of thinking, and (d)
the interaction among science, technology, and society. Textbooks in the areas of life science,
earth science, physical science, biology, and chemistry were used in the analyses. Interrater
agreements of at least 80% and kappas of at least 0.73 were achieved in the content analyses
among two experienced researchers and one science teacher who were given the training manual
to learn the assessment procedure.
Science textbooks have long been an object of interest and concern among science
educators. These teaching aids are widely used in science courses (Exline, 1984; Harms
& Yager, 1981); thus they convey a great deal of the scientific information that students
receive. Most importantly, these instructional materials influence how students and
their teachers perceive the scientific enterprise. Unfortunately, many science teachers
rely heavily on the assigned text, which probably gives students a false impression of
the nature of science (Yager, 1984). Many of the commercially available texts stress
facts and present science as a complete body of information that was derived in an
errorless manner. Science textbooks place too much emphasis on terminology and
0 1991 by the National Association for Research in Science Teaching
CCC 0022-4308/91/080713-13$04.00
Published by John Wiley & Sons, Inc.
714
Purpose
The purpose of this study was to develop a valid and reliable method to quantitatively
analyze the content of science textbooks, especially those used in middle and senior
high school science courses. The approach employed four aspects of scientific literacy
to determine curriculum balance in textbooks. The specific research question was: Can
a quantitative content analysis procedure be developed that will result in interrater
agreement of at least 80% and a kappa of at least 0.70, to determine the emphases in
written materials for science courses?
Review of Literature
A limited number of content analysis studies have been conducted in the field of
science education, whereas in the field of communication this procedure is a commonly
used research method. The studies that have been conducted to analyze the content of
science textbooks have reported high measures of reliability in their procedures. However,
many of these investigations used statistical tests that do not take into account agreement
by chance among raters. The authors often report percent agreement among the raters,
in spite of the waming against percent agreement as a reliability yardstick (Krippendorff,
1980; p. 135), while other authors do not report interrater agreement. In addition, the
authors of some of these studies are not clear on how the validity of their procedures
was established.
Levin and Lindbeck (1979) analyzed five secondary school biology textbooks for
coverage of 11 controversial issues and biosocial problems. Two science educators
rated these textbooks for quantitative and qualitative coverage of the 11 issues. The
Pearson product moment correlations of the ratings for the quantitative coverage ranged
from 0.71 to 1.O and for the qualitative coverage ranged from 0.87 to 1.O.
Prosser (1983) analyzed the conceptual difficulty (either concrete or formal) of
two chapters taken from a college physics textbook. He concluded that much of the
subject matter required formal-operational thinking. Prosser reported that there was
an intraclass correlational agreement among three raters of 0.91.
715
716
Procedure
The first problem to resolve in the present study was to insure that a valid method
be used to analyze science textbooks written for life science, earth science, physical
science, chemistry, and biology. The three authors found that Garcias descriptors,
which were used to analyze earth science textbooks, needed to be modified so that
the written material that appears in a variety of science textbooks could be properly
categorized. This phase involved the identification of all the important ideas that appear
in a variety of science textbooks in order to insure the content validity of the procedure.
The authors had to find descriptors which had a high rate of recognition for the four
major themes. This required many iterations of analyzing a large variety of science
textbooks, resulting in the construction of a 25-page training manual (Chiappetta,
Fillman, & Sethna, 1991). The four major themes (categories) of scientific literacy
and their descriptors, as they appear in the procedures manual, are as follows:
Categories for Analyzing Science Textbooks
1. The knowledge of science. Check this category if the intent of the text is to
present, discuss, or ask the student to recall information, facts, concepts, principles,
laws, theories, etc. It reflects the transmission of scientific knowledge where the student
receives information. This category typifies most textbooks and presents information
to be learned by the reader. Textbook material in this category:
(a) Presents facts, concepts, principles and laws.
(b) Presedts hypotheses, theories, and models.
(c) Asks students to recall knowledge or information.
2 . The investigative nature of science. Check this category if the intent of the
text is to stimulate thinking and doing by asking the student to find out. It reflects
the active aspect of inquiry and learning, which involves the student in the methods
and processes of science such as observing, measuring, classifying, inferring, recording
data, making calculations, experimenting, etc. This type of instruction can include
paper and pencil as well as hands-on activities. Textbook material in this category:
(a)
(b)
(c)
(d)
(a)
(b)
(c)
(d)
(e)
(f)
(8)
(h)
I17
718
are easy to recognize. Similarly, it is easy to categorize units of analysis that involve
the reader in carrying out a manulative or a mental task (the investigative nature of
science, Category 2). A more difficult categorization requires distinguishing between
science as a way of thinking (Category 3) and the knowledge of science (Category
I). For example:
Roentigen and Thompson found, independently, that the ionization of air produced
by x-ray discharges electrified bodies. The rate of discharge was shown to depend
on the intensity of the x-rays. This property was therefore used as a quantitative
means of measuring the intensity of an x-ray beam. As a result, careful quantitative
measurement of the properties and effects of x-rays could be made. [Harvard Project
Physics. (1968). An introduction to physics: Models of the atom (Vol. 5 , p. 56.)
New York: Holt, Rinehart and Winston.
The paragraph above indicates how the work of two scientists was used to further
scientific knowledge. The paragraph also provides information about the properties of
x-rays. In addition, the paragraph indicates how empirical data was used to study a
phenomena. These three ideas taken together place the unit of analysis into Category
3, because it illustrates how scientists use empirical data to advance science and how
scientists go about their work. This unit of analysis should indicate the difficulty
encountered by raters, because the paragraph not only contains information about the
work of scientists, but also presents information about x-rays. When one presents the
work of a scientist, it invariably is accompanied by a discussion of scientific facts,
concepts, and principles. Units of analysis that contain more than one theme are difficult
to rate accurately and consistently, which is the reason 25 different units of analysis
were selected from a variety of science textbooks and placed in the procedures manual.
In the development of a reliable procedure, one must also consider sample size.
How many textbook pages should be selected from a given text in order to insure that
a representative sample of all the major categories of scientific literacy have been
identified and that obscure categories have been included in the frequency in which
they exist in a given text? One must select the smallest sample size that does not omit
these important aspects of science education. For example, in some science textbooks
the authors write one page at the end of each chapter that describes career opportunities
as they relate to the topic under study. As career opportunities relate to an important
aspect of developing scientific literacy (the interaction of science, technology, and
society), these occurrences must not be overlooked in the sampling.
Garcia (1985) took several 5% random samples from one earth science textbook
and found that this relatively small proportion of total textbook pages produced the
same frequency distribution of the four aspects of scientific literacy. Similarily, one
of the authors of the present study took two random, 5% samples from a high school
biology textbook and found that these samples had roughly the same proportion of the
four aspects of scientific literacy in them: 78.0% versus 82.0% (Category l), 11.3%
versus 11.2% (Category 2), 2.6% versus 2.9% (Category 3), and 8.1% versus 5.0%
(Category 4).
Most science textbooks are quite lengthy. Therefore, when one analyzes a 5%
sample of the total pages of a textbook, the procedure results in many categorizations.
For example, there was an average number of 731.4 pages in the five biology textbooks
adopted by the State of Texas for 1987-88. The average number of pages in a 5%
719
sample of these textbooks is 36.6. The average number of units of analysis is 298.0,
and the average number of units of analysis per page is 8.1.
In the early phase of this work, an analysis was done on five physical science
textbooks which were recommended for adoption in senior high schools by the Texas
Education Agency. Interrater agreements of 78%, 78%, 79%, 82%, 84%, and their
respective kappas of 0.71, 0.71, 0.72, 0.76, 0.79 (Table 1) were obtained for the five
textbooks (Chiappetta, Sethna, & Fillman; 1987). These results show that the percent
agreements had almost reached the 80% level, and the kappas had reached 0.70. The
kappa statistic (Cohen, 1960; Fleiss, Cohen, & Everett, 1969; Fleiss, 1971; and Tinsley
& Weiss, 1975) is an appropriate statistic to compute interrater agreement when: (a)
two judges are working independently; (b) the units of analysis are independent; and
(c) the categories are independent, mutually exclusive, and contain nominal data.
Cohens kappa takes guessing into account. The kappa statistic has a range of
- 1.OO- 1.OO with 0 representing chance agreement among raters. Rubinstein and
Brown (1984) state that kappas greater than 0.75 indicate excellent agreement among
coders and that kappas between 0.40 and 0.75 indicate fair to good agreement.
Following the analysis of the physical science textbooks, the authors modified the
procedure and selected five different types of science textbooks to examine: life science,
earth science, physical science, biology, and chemistry. Each textbook was randomly
selected from the five science textbooks which were in use during the 1980s and which
had been recommended for that science discipline by the Texas Education Agency.
The textbooks selected are listed below:
Barr, B . B . , &Leyden, M.B. (1986). Life science. Menlo Park, CA: AddisonWesley.
Brown, E M . , & Kemper, G.H. (1979). Earth science. Morristown, NJ: Silver
Burdett.
Heimler, C.H., &Price, J. (1981). Focus onphysical science. Columbus, OH:
Charles E. Merrill.
Otto, J.H., & Towle, A. (1985). Modern biology. New York: Holt, Rinehart
and Winston.
Wilbraham, A.C., Staley, D.D., Simpson, C.J., & Matta, M.S. (1987). Chemistry. Menlo Park, CA: Addison-Wesley.
Table 1
Intercoder Agreement for the Analysis of Five Physical Science
Textbooks between Two Raters
Textbook
Energy: A Physical Science
(Harcourt Brace)
Holt Physical Science
(Holt, Rinehart and Winston)
Spaceship Earth-Physical Science
(Houghton Mifflin)
Focus on Physical Science
(Charles Memll)
Physical Science
(Prentice-Hall)
Percent agreement
Kappa
78
0.71
79
0.72
84
0.79
78
0.74
82
0.76
720
A/B
A/C
BIC
Mean
~~
Textbook
Life Science
(Addison-Wesley)
Earth Science
(Silver Burdett)
Focus on
Physical Science
(Menill)
Modern Biology
(Holt)
Chemistry
(Addison-Wesley)
Mean
agree
Kappa
agree
Kappa
agree
Kappa
agree
Kappa
93.9
0.92
88.9
0.85
89.5
0.86
90.8
0.88
90.1
0.87
92.2
0.90
92.3
0.90
91.5
0.89
89.8
0.86
92.9
0.91
89.7
0.86
90.8
0.88
94.3
0.92
92.7
0.90
96.9
0.96
94.6
0.93
82.8
90.2
0.77
0.87
82.8
89.9
0.77
0.87
80.0
89.7
0.73
0.86
81.9
89.9
0.76
0.87
12 1
122
category when examining any science textbook. Assigning units of analysis to Categories
1 and 4 did not cause difficulty. A considerable amount of written material in science
textbooks emphasizes basic knowledge of science, which is Category 1. This category
was coded with relative ease when the reader was presented with information or asked
to recall it. For example, facts, concepts, principles, laws, and theories, which are
placed in Category 1, the knowledge of science, are encountered with high frequency
in science textbooks. Category 4, the interaction of science, technology, and society,
is also relatively easy to code consistently, partly because this category occurs with
little frequency. In addition, it is relatively easy to identify units of analysis that stress
the positive or negative effects of science and technology, discuss a social issue, or
describe careers related to science and technology.
The refinement of descriptors for Categories 2 and 3 required considerable work.
Category 2 was defined so that instructions appearing on textbook pages, which engage
the reader in mental or manipulative activities, were coded as Category 2, the investigative
nature of science. If the reader was asked to use a chart or a table to answer a question,
this unit of analysis was placed in Category 2. Similarly, if the reader was asked to
make a calculation, refer to a table to produce an answer or even participate in a
thought experiment, the unit of analysis was placed in Category 2 .
Category 3 was defined so that it would be coded when a unit of analysis illustrates
how a person in general, or a scientist in particular, makes discoveries. A general
definition along with specific descriptors were constructed for this category that stress
how scientists engage in experimentation, gather empirical data, use assumptions,
show cause and effect, are disposed toward self-examination, etc. This helped to reduce
the problem of distinguishing between Categories 1 and 3.
In addition to modifying the descriptors that Garcia (1985) recommended for this
procedure, the selection and definition of units of analysis were modified. For example,
some of the textbooks that were analyzed contained goal and objective statements.
These elements were found to be confusing and reduced the consistency of the coding.
Consequently, these elements were identified as units of analysis that were not to be
coded.
The percentages of agreement found among the researchers and the science teacher
were above the levels set at the beginning of this inquiry. The authors hoped to obtain
interrater agreements of at least 80% and kappas of at least .70 between pairs of raters.
The overall range of percent agreements was between 80%to 97%, while the kappas
ranged from 0.73 to 0.96. The fact that the science teacher was able to categorize the
units of analysis in a manner that resulted in high agreement with the researchers, who
had much more experience with this procedure, suggests that the procedure has reached
a high level of reliability.
This procedure should be repeated by other researchers to verify the reliability of
the method, even though the results suggest that the procedure may be reliable and
can be used to determine the content messages in science textbooks, especially those
messages that pertain to the broad curriculum goals of scientific literacy. The importance
of replicating investigations cannot be overstated, since different results are often obtained (Turner, 1988).
The researchers in this study noted that when science textbook authors attempt to
weave two or more themes into a textbook paragraph, this may or may not enhance
the quality of the presentation. In any event, this style of presentation lowered interrater
agreement regarding the meaning of the message about science being conveyed to the
723
reader. The authors found that interrater agreements were lower in a few of the most
recently published science textbooks, because authors include4 several themes in a
given paragraph, making it difficult to code consistently. Note that one of the researchers
coded 8% of the chemistry text in Category 3 science as a way of thinking (Table
3), while another researcher coded 0% in this category. When interrater agreement
drops to the 80% level (Table 2), or lower, the percentage of coverage reported can
be misleading. The emphasis on the interactiun of science, technology, and society
averages approximately 9%, which suggests that some publishers are attempting to
make science textbooks more relevant for students. If one were to analyze some of
the most recent editions of high school chemistry textbooks, he/she might ascertain
that a significant percentage of a few of these texts are devoted to science, technology,
and society (STS), a theme that is attracting more attention in science education
(Chiappetta, Sethna, & Fillman, 1989).
Table 3
Percentage of Themes of Scientific Literacy Found among Five Science
Textbooks
Categories
Textbook
Rater
I1
111
IV
Life Science
(Addison-Wesley)
A
B
C
Mean
46.4
49.7
49.9
47.7
42.0
34.3
41.9
39.4
0.0
0.0
0.0
0.0
11.6
16.0
11.2
12.9
Earth Science
(Silver Burdett)
A
B
C
Mean
49.4
53.9
53.1
52.1
35.4
34.2
37.0
35.5
1.3
0.0
0.0
0.4
13.9
11.8
9.9
11.9
A
B
C
Mean
60.0
61.4
62.1
61.6
28.3
29.9
32.5
30.2
0.0
0.0
0.0
0.0
11.0
8.7
4.8
8.2
Modern Biology
(Holt)
A
B
C
Mean
92.8
93.8
95.4
94.0
1.5
0.5
3.6
1.9
2.6
2.6
0.0
1.7
3.1
3.1
1 .o
2.4
Chemistry
(Addison-Wesley)
A
B
C
Mean
66.9
71.3
81.3
73.2
14.0
14.0
14.2
14.1
8.1
1.5
0.0
3.2
11.0
13.2
4.5
9.6
65.7
24.2
1.1
9.0
Overall mean
724
When researchers analyze phenomena in the behavioral and social sciences, they
will experience difficulty developing methods of acceptable validity and reliability.
Human activity is complicated, and when researchers improve on the reliability of a
procedure, they often compromise on its validity. In the present study, the researchers
realized the importance of refining a procedure to place units of analysis into only one
category, because without this type of agreement, the method would be confusing
(Holsti, 1969; Krippendorff, 1980). In the analysis of most typical science textbooks,
this was not a significant problem. With some textbooks, however, where authors
place several themes in one paragraph and the raters must place units of analysis in
one category, the task of quantifying aspects of scientific literacy becomes quite
difficult. Nevertheless, this procedure has shown to be reliable with the science textbooks
currently on the market. However, the authors of this research are looking for textual
materials which utilize novel approaches to convey science to secondary school students.
This type of material could be characterized in order to determine its impact on student
interest and achievement.
There is a need for science education researchers to thoroughly study the contents
of science textbooks, given the central role they play in the cumculum. Many different
paradigms should be used to analyze these materials. The four-goal clusters of Project
Synthesis with its emphasis on student needs might provide one good model, as would
the literacy goals of science, mathematics, and technology for Project 2061. The
outcomes of these analyses can be used to determine the relationships between textbook
charateristics and student interest, and a greater insight into why science teachers adopt
certain textbooks. This line of research might be more meaningful than the readability
and comprehension studies that have been conducted in the past on science textbooks.
References
Chiappetta, E.L., Fillman, D.A., & Sethna, G.H. (1991).Procedures for conducting
content analysis ofscience textbooks. Houston, TX: University of Houston, Department
of Curriculum & Instruction.
Chiappetta, E.L., Sethna, G.H., & Fillman, D.A. (1987). Curriculum balance in
science textbooks. The Texas Science Teacher, 16(2), 9- 12.
Chiappetta, E.L., Sethna, G.H., & Fillman, D.A. (1989, March). Examination
ofhigh school chemistry textbooks. Paper presented at the annual meeting of National
Association for Research in Science Teaching, San Francisco.
Cohen, J.A. (1960). A coefficient of agreement for nominal scales. Educational
and Psychological Measurement, 20, 27-46.
Collette, A.T. & Chiappetta, E.L. (1986). Science instruction in the middle and
secondary schools. Columbus, OH: Charles Merrill.
Exline, J.D. (1984). National survey: Science textbook adoption process. The
Science Teacher, 51(1), 92-93.
Fensham, P.J. (1983). A research base for new objectives of science teaching.
Science Education, 67, 3 - 12.
Fleiss , J. L . ( 197 1). Measuring nominal scale agreement among many raters.
Psychological Bulletin, '76, 378-382.
Fleiss, J.L., Cohen, J., & Everitt, B.S. (1969). Large sample standard errors of
kappa and weighted kappa. Psychological Bulletin, 72, 323-327.
Gannaway, S.P. (1980). Development of a high school chemistry textbook evaluation
125