Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
To cite this article: Derya Kaltakci-Gurel, Ali Eryilmaz & Lillian Christie
McDermott (2017) Development and application of a four-tier test to assess pre-
service physics teachers’ misconceptions about geometrical optics, Research in
Science & Technological Education, 35:2, 238-260, DOI:
10.1080/02635143.2017.1310094
To link to this article: http://dx.doi.org/10.1080/02635143.2017.1310094
Article views: 29
ABSTRACT KEYWORDS
Background: Correct identification of misconceptions is an Misconceptions;
important first step in order to gain an understanding of student diagnostic tests; four-tier
learning. More recently, four-tier multiple choice tests have tests; geometrical optics
been found to be effective in assessing misconceptions.
Purpose: The purposes of this study are (1) to develop and validate a
four-tier misconception test to assess the misconceptions of pre-
service physics teachers (PSPTs) about geometrical optics, and (2) to
assess and identify PSPTs’ misconceptions about geometrical optics.
Sample: The Four-Tier Geometrical Optics Test (FTGOT) was
developed based on findings from the interviews (n = 16), open-ended
testing (n = 52), pilot testing (n = 53) and administered to 243 PSPTs
from 12 state universities in Turkey.
Design and Methods: The first phase of the study was the
development of a four-tier test. In the second phase of this
study, a cross-sectional survey design was used.
Results: Validity of the FTGOT scores was established by means of
some qualitative and quantitative methods. These are: (1) content and
face validations by six experts; (2) Positive correlations between the
PSPTs’ correct scores considering only first tiers of the FTGOT and
their confidence score for this tier (r = .194) and between correct
scores considering first and third tiers and confidence scores for both
of those tiers (r = .218) were found as evidences for construct validity.
(3) False positive (3.5%), false negative (3.3%) and lack of knowledge
(5.1%) percentages were found to be less than 10% as an evidence for
content validity of the test scores. (4) Explanatory factor analysis
conducted over correct and misconception scores yielded meaningful
factors for the correct scores as an evidence for construct validity.
Additionally, the Cronbach alpha coefficients were calculated for correct
scores (r = 0.59) and misconception scores (r = 0.42) to establish the
reliability of the test scores. Six misconceptions about geometrical
optics, which were held by more than 10% of the PSPTs, were
identified and considered to be significant.
Conclusions: The results from the present investigation
demonstrate that the FTGOT is a valid and reliable instrument
in assessing misconceptions in geometrical optics.
Introduction
Researchers from a variety of theoretical perspectives have argued that the most important
attribute that students bring to their classes are their conceptions (Ausubel 1968; Wandersee,
Mintzes, and Novak 1994), most of which differ from those of scientists (Griffiths and Preston
1992; Hammer 1996). Student conceptions that contradict the scientific view are often labe-
led ‘misconceptions’ (Al-Rubayea 1996; Barke, Hazari, and Yitbarek 2009; Schmidt 1997;
Smith, diSessa, and Roschelle 1993). Broadly, misconceptions have the common feature of
being strongly held, coherent conceptual structures which are resistant to change through
tradi-tional instruction and need special attention for students to develop a scientific
understand-ing (Gil-Perez and Carrascosa 1990; Hammer 1996; Hasan, Bagayoko, and
Kelley 1999). Therefore, correct identification of misconceptions has become an important
first step in order to gain an understanding of student learning.
be due to the misconception held, but might be a wrong answer with correct reasoning (i.e.
‘false negative’). For this reason, researchers extended MCTs into multiple-tier tests, with
two, three, or more recently four tiers in order to overcome some of the limitations of using
interviews, open-ended tests and traditional MCTs to diagnose students’ misconceptions.
Generally, the two-tier tests are diagnostic instruments with first tier including multi-ple-
choice content questions, and second tier including multiple-choice set of reasons for the
answer to the first tier (Adadan and Savasci 2012; Chen, Lin, and Lin 2002; Griffard and
Wandersee 2001; Treagust 1986). Students’ answers to each item are considered correct
when both the correct choice and reason are selected. Hence, two-tier tests provide the
opportunity to detect and calculate the proportion of wrong answers with correct reasoning
(i.e. false negatives) and the correct answer with wrong reasoning (i.e. false positives).
Identification of false positives and false negatives is important since they are discriminated
from student scores. However, two-tier tests cannot discriminate lack of knowledge from
misconceptions, as well as from scientific conceptions, false positives or false negatives
(Peşman and Eryılmaz 2010; Sreenivasulu and Subramaniam 2013). Thus, two-tier tests
might overestimate or underestimate students’ scientific conceptions (Chang et al. 2007) or
over-estimate the proportions of the misconceptions since lack of knowledge could not be
deter-mined by two-tier tests (Caleon and Subramaniam 2010a, 2010b; Peşman and
Eryılmaz 2010). The limitations mentioned for the two-tier tests were intended to be
compensated by incor-porating a third tier to each item of the test asking for the confidence in
the answers given in the first two-tiers (Caleon and Subramaniam 2010a; Eryılmaz 2010;
Peşman and Eryılmaz 2010).
With three-tier tests, misconceptions that are eliminated from lack of knowledge and errors
can be assessed. Misconceptions are not simply errors in answer to a question and its
reason. Yet, misconceptions are errors that are strongly held and advocated with high con-
fidence. In a specific misconception, the wrong answer and its wrong reasoning are related.
Therefore, all misconceptions are errors, but all errors are not necessarily misconceptions. In
three-tier tests, students’ answers to each item are considered correct when both the correct
choice and reason are selected and the subject expresses confidence about the answers for
the choices in the first two-tiers. Students’ answers to each item are considered as a
misconception when both the wrong choice and a related specific wrong reason are selected
with high confidence. Although three-tier tests seem to eliminate many disadvan-tages
mentioned for two-tier tests, they still cannot fully discriminate the confidence choices for the
main answer (first tier) from confidence choices for reasoning (second tier) and there-fore
may overestimate students’ scores and underestimate their lack of knowledge (Kaltakci-
Gurel, Eryılmaz, and McDermott 2015).
To date, the only four-tier diagnostic test in physics has been the modification of the three-
tier form of Caleon and Subramaniam (2010a) into four-tier format (2010b) for a mechanical
waves topic. A four-tier test is a multiple-tier diagnostic test. The first tier of it is an ordinary
multiple-choice test with its distractors addressing specific misconceptions. The second tier
of the test asks for the confidence of the answer in the first tier. The third tier of the test
asks for the reasoning for the answer in the first tier. The fourth tier of the test asks for the
confidence of the answer in the third (reasoning) tier. The present study provides effective
ways to establish the validity and reliability for the four-tier MCT development process and
to determine significant misconceptions by means of a different analysis. The
RESEARCH IN SCIENCE & TECHNOLOGICAL EDUCATION 241
blank alternatives provided in the main and reasoning tiers and their ways of
analysis are also new for the present study.
Research purpose
The purposes of this study are: (1) to develop and validate a four-tier misconception test to
assess the misconceptions of pre-service physics teachers (PSPTs) about geometrical optics,
and (2) to assess and identify PSPTs’ misconceptions about geometrical optics.
242 D. KALTAKCI-GUREL ET AL.
Method
Development of the test
The development of the four-tier multiple-choice misconception test on geometrical
optics involved the modified version of procedure outlined Peşman and Eryılmaz (2010)
and Eryılmaz (2010) for three-tier tests. The test development and application procedure
consists of four main phases. These are: (1) conducting interviews, (2) constructing and
administrating open-ended test, (3) constructing and administrating the pilot test, and (4)
developing and administering the four-tier test (FTGOT).
Firstly, based on the related literature and the experience of the researchers with the
PSPTs, the most useful contexts in terms of the prevalence of misconceptions among upper
grades and the contexts, which have not received much investigation, were determined.
These were found to be related to optical imaging for the upper grade students. Also, some of
the contexts in optical imaging such as hinged plane mirrors, diverging lenses, and convex
mirrors were found not been studied by many researchers. Only a few theoretical studies
were found about these contexts. As a result, the 35-item semi-structured interview guide
was prepared. A two-dimensional table with contexts in optical imaging (plane mirrors,
spherical mirrors, lenses) in one-dimension, and the cases in optical imaging (‘seeing/forming
an image of oneself’ and ‘seeing/forming an image of others’) in the other dimension was
prepared and equal importance were given to each context-case combinations. In each
phase (interviewing, open-ended testing, pilot testing, and four-tier testing), the generated
items with the purpose of assessing misconceptions in geometrical optics were judged by at
least three experts from the field for face and content validity. The interviews were con-ducted
individually during one semester within five sessions and each session lasted for
approximately one hour. Each interview session was video-recorded as a primary data
source, and the drawings of each participant on working sheets were collected as a
secondary data source for the interviewing session. The administration and the analysis of
the interviews took a total of eight weeks. The results of the analysis were primarily used to
develop the open-ended test for the study and to construct the alternatives of the multiple-tier
multi-ple-choice test. The detailed description and analysis of the interview and open-ended
test administration parts are the scope of another study (Kaltakci-Gurel, Eryilmaz, and
McDermott 2016).
Secondly, the researchers developed the Open-Ended Geometrical Optics Test (OEGOT)
by considering the results of previously conducted interviews for determining the most useful
contexts for detecting PSPTs’ misconceptions in geometrical optics, and by reviewing the
literature. The main aim of administering the open-ended test was to obtain greater
generalizability and to construct the alternatives of the multiple-tier MCT in geometrical optics.
Almost all of the questions in the OEGOT were selected from the interview guide with some
small modifications and a total of 24 items included in the test. Each item has a main
question about geometrical optics and a reasoning question that asks for the expla-nation of
their reasons. The OEGOT administration lasted approximately 90 min and the analysis of
the test results took a total of five weeks. For each item, the misconceptions about
geometrical optics were identified and sorted according to their percentages for later use in
the development of the multiple-tier multiple-choice instrument during the analysis.
Thirdly, in the light of the previous literature, the analysis of the interview and the
OEGOT results a preliminary multiple-tier multiple-choice test was developed with 22 items.
This
RESEARCH IN SCIENCE & TECHNOLOGICAL EDUCATION 243
form of the test was used as a pilot test in order to determine the effectiveness of the alter-
natives on the test by item analysis. The test was administered to 53 PSPTs and the data
were analyzed. In the analysis all correct answers to the first and second tier items with
confidence were coded as 1, and 0 otherwise. The test was found to be rather difficult with a
mean dif-ficulty index of 0.12, also quite discriminating with a discrimination index of 0.38.
The Cronbach's alpha coefficient for the pilot test scores was found to be 0.72 over the
correct scores for all three tiers. The item difficulty and item discrimination (point biserial
correlation) indexes of each item were used to determine the item revisions on the test.
244 D. KALTAKCI-GUREL ET AL.
After necessary modifications and revisions the 20-item FTGOT was developed (Kaltakci
2012b). In addition to removing or revising problematic items, blank alternatives to write any
answers different from the available alternatives were added at the end of both first and third
tiers of each item on the FTGOT. Figure 1 is an example item from the FTGOT and Table 1
shows the distribution of the items to the contexts and cases. Twenty-one misconceptions are
intended to be assessed by the FTGOT with PSPTs’ selection of alternative sets shown in
Appendix. Each misconception is assessed by at least two items on the FTGOT for the
validity of the test scores. Six experts from physics and physics education majors examined
the last version of the FTGOT for the face and content validity.
Figure 2. An example of how subjects’ raw data coded into nominal level by using
misconception alternatives and how MISC1 scores were produced. To simplify the
illustration, only the first and last three subjects in rows, and only first three and last two
misconceptions (M1, M2, M3, M20, M21) in columns are shown.
If a subject’s response to the first tier of Item 14 was d, it was coded as 1, otherwise coded as 0 for
this tier of the item. Similarly, if a subject’s response to the first tier of Item 15 was b, it was coded
as 1, otherwise coded as 0. As shown in the figure, each misconception was assessed with
different numbers of items (such as M1 with 7, M2 with 4, M3 with 3, and M21 with 2 items).
Therefore, in calculating the MISC1 score the weighted averages of each misconception alternative
selections were calculated for each subject. That is, the sum of students ’ raw data for the
nominal level for each misconception is divided into the number of items assessing that
specific misconception. From the obtained MISC1 scores, summing the rows and dividing it
to the total number of subjects gave the percentage of the prevalence of that
misconception (i.e. difficulty level), while summing the columns gave the MISC1 score
obtained by each sub-ject. To exemplify, 22% of the total subjects have the first
misconception (M1) according to the only first tiers of the FTGOT scores. The first subject
(St1) received a MISC1 score of 4.5 out of
RESEARCH IN SCIENCE & TECHNOLOGICAL EDUCATION 247
21, that is out of 21 misconceptions measured by the FTGOT St1 has 4.5
misconceptions accord-ing to the only first tiers of the FTGOT.
Figure 3. Scatter plots of (a) COR1 vs. CONF1 Scores; (b) COR3 vs. CONF2 Scores;
(c) COR1&3 vs. CONF1&2 Scores.
confidence level on the third tier was calculated as construct-related validity evidence (Çataloğlu
2002; Peşman and Eryılmaz 2010). Since the confidence ratings were asked for the first tier and
reasoning tier separately, the calculations of these correlations were mod-ified for the four-tier test
format. Hence, in the present study, three different Pearson product moment correlations were
calculated: (1) Correlation between first tiers (COR1) and second tiers (CONF1); (2) Correlation
between third tiers (COR3) and fourth tiers (CONF2); (3) Correlation between first and third tiers
(COR1&3) and second and fourth tiers (CONF1&2). There were small positive and significant
correlations between COR1 and CONF1 scores (r = 0.194, p < 0.005), and between COR1&3 and
CONF1&3 scores (r = 0.218, p < 0.005). It means that students with high scores have higher
confidence than students with low scores for these two cases. However, there was no
significant correlation between COR3 and CONF2 scores. The reasons for small and no
correlations can be investigated by using scatter plots. The sign and degree of relationship
between variables is what a scatter plot illustrates
RESEARCH IN SCIENCE & TECHNOLOGICAL EDUCATION 249
(Frankel and Wallen 2000). Figure 3 shows the scatter plots of COR1 and CONF1
scores (a); COR3 and CONF2 scores (b); COR1&3 and CONF1&2 scores (c).
From the three scatter plots it is obvious that there are no indications of curvilinear rela-
tionships between the variables. Therefore, it would be appropriate to calculate Pearson
product-moment correlations for those variable pairs (Pallant 2005). Although the correlation
coefficients are significant between COR1 and CONF1 scores, and COR1&3 and CONF1&2
scores, they are small in magnitude. For a strong positive relationship, students with higher
scores are expected to be more confident about their answers on the test, whereas students
with lower scores are likely to be less confident. However, at the right-bottom side of the
scatter plots there are some students with high confidence levels with low scores. Those
students are confident about their wrong answers, some of which might be robust miscon-
ceptions, or they might select wrong answers by chance. As stated by Crocker and Algina
(1986), ‘nearly all total test score parameters are affected by item difficulty’ (312). The other
reason for no or small correlation between the variables might be related to the difficulty of
the test. In the scatter plots, it can be seen that the proportion of students with high scores is
very few. So the slopes of the lines are small which indicates that students’ confidence scores
are generally more than their correct scores. The FTGOT, like the other misconception tests,
has very strong distracters. This might be the primary reason that the test is rather difficult. In
the previous studies (Çataloğlu 2002; Peşman and Eryılmaz 2010) on the devel-opment of
three-tier tests, the correlations calculated for the first two-tiers and confidence scores range
between 0.11 and 0.51. The results obtained from the correlations for the pres-ent study are
evidence for the items of the FTGOT work properly.
One of the common uses of factor analysis is construct validation, that is obtaining evi-
dence of convergent and discriminant validity by demonstrating that indicators of selected
constructs load onto separate factors in the expected manner (Brown 2006). Explanatory
factor analysis is typically used earlier in the process of test development and for construct
validation. In the present study, two explanatory factor analyses (using principal component
factor analysis with varimax factor rotation) were conducted: one for the correct scores and
the other for the misconceptions scores. The first factor analysis for the correct scores was
conducted because related items in the test would be expected to result in some acceptable
factors. The second factor analysis was conducted for the misconception scores to check
whether possible related misconceptions result in some acceptable factors. The correct
scores for the total test items considering the only first tier, first and third tiers, and all four
tiers produced meaningful factors with items loaded to the factors quite strongly (> 0.40). The
reliability of the items constituting the factors ranges between 0.50 and 0.56, and the total
variance explained ranges between 53 and 63% (which means these factors explain a total of
53–63% of the variance). Although not all items formed interpretable factors over the
correct scores, it was found that some of the items were loaded to meaningful factors
mainly according to either their contexts (i.e. plane mirrors, spherical mirrors, lenses) or
according to the similarity of their item format. This is evidence for the construct validity of
the FTGOT test correct scores. For the factor analysis conducted over misconception scores,
on the other hand, the interpretation of the formed factors was even more difficult. The
factor reliabilities for the misconception scores were found to be low. Since the reliability is
a measure of the internal consistency, that is, how closely related a set of misconceptions
are as a group, the low values show the low relation between the misconceptions (the
250
D. KALTAKCI-GUREL ET AL.
Table 3. The percentages of false positives, false negatives, and lack of knowledge on the FTGOT for correct scores.
Items (%)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 M SD
FP 7 3 4 0 8 2 0 5 1 8 0 2 0 8 6 1 2 3 5 5 3.5 2.8
FN 2 12 4 13 6 6 3 1 3 0 4 4 0 0 0 2 1 1 2 1 3.3 3.7
LK 4 6 5 4 3 4 5 5 12 8 3 2 9 1 5 5 5 8 3 5 5.1 2.6
FP: False positive, FN: False negative, LK: Lack of knowledge, M: Mean, SD: Standard deviation.
RESEARCH IN SCIENCE & TECHNOLOGICAL EDUCATION 251
Table 4. Descriptive statistics of the FTGOT correct all four tier (COR1–4) and
misconception all four tier (MISC1–4) scores.
Statistics COR1–4 MISC1–4
Number of students 213 213
Number of items/misconceptions 20 21
Mean 2.65 1.98
Median 2 1.8
Min. score 0 0
Max. score 11 5
Standard deviation 2.17 1.09
Std. error of mean .15 .07
Skewness 1.10 .36
Kurtosis 1.50 −.30
Point-biserial coefficients .32 .13
n < .20 3 17
.20–.29 6 1
.30–.39 4 1
.40–.49 5 0
.50–.59 2 1
.60–.69 0 0
>.70 0 0
Difficulty level .13 .09
n .00–.09 11 15
.10–.19 3 4
.20–.29 5 1
.30–.39 0 1
.40–.49 1 0
.50–.59 0 0
252 D. KALTAKCI-GUREL ET AL.
of knowledge is not so high for the present study. Only Items 9, 10, 13, and 18 are the
ones with the lack of knowledge percentages about 10%. The high proportion of lack of
knowl-edge for these items might stem from the contexts of the items, since they are all
not familiar types of questions for the subjects. Another reason for the high lack of
knowledge for those items might be due to their difficulty, which will be discussed in the
following sections. To conclude, low percentages of false positives, false negatives and
lack of knowledge are indicators for high content validity of the FTGOT test scores.
The Cronbach's alpha coefficient (α) was calculated as a measure of the internal
consist-ency of the test scores. Two reliability coefficients were calculated for the correct
all four tier (COR1–4) scores and misconception all four tier (MISC1–4) scores, and
found 0.59 and 0.42, respectively. That means, at least 59% of the total score variance
of COR1–4 score and 42% of the total score variance of MISC1–4 score are due to the
true score variance (Crocker and Algina 1986). The reliability coefficients calculated over
the misconceptions scores are gen-erally low compared to the ones over correct scores
(Eryılmaz 2010), so the situation is not surprising. Haladayna (1997) stated that:
Reliability strongly rests on a foundation of highly discriminating items. If items are highly dis-
criminating and test scores vary, you can bet that the reliability coefficient will be high. (248)
Hence, a discrimination index analysis was done to investigate whether the test items
were able to discriminate high scorer and low scorer subjects on the misconceptions
defined. Item discrimination indices are discussed in the next sub-section.
Another reason is that, the difficulty indices decrease from only first tiers (ordinary MCTs) to
all four tiers since the proportion of correct answers decreases from a one-tier to four tiers
scoring. This decrease is related to the inclusion of false positives and lack of knowledge
proportions in the correct answer proportions in the only first tier scores, which overesti-
mates the proportions of correct answers. Small values for the standard deviations show that
the scores are not spread, which in turn lowers the reliability coefficients.
The average discrimination indices for the correct and misconception scores were
esti-mated by point biserial coefficients for the individual items and by mean item-total
coeffi-cient for the average. Discrimination indices over correct scores are found to be
larger than the ones for the misconception scores, which indicate that misconceptions
are quite perva-sive among high scorers as well as low scorers. As mentioned before,
this might be one of the reasons for the low reliability coefficients obtained for the
misconceptions scores. Despite the low discrimination indices, they are acceptable
values in general (Beichner 1994). Due to the nature of the misconception tests, low
discrimination indices means that the subjects are homogeneous with respect to pre-
defined misconceptions in geometrical optics. No matter whether high scorer or low
scorer, misconceptions are quite common among subjects.
Acknowledgement
This study was completed as part of the first author’s doctoral dissertation. We would like to
thank the Physics Education Group at the University of Washington for the opportunities they
provided during the first author’s doctoral dissertation process.
256 D. KALTAKCI-GUREL ET AL.
Disclosure statement
No potential conflict of interest was reported by the authors.
Funding
This work was supported by the Scientific and Technological Research Council of Turkey
(TÜBİTAK) [grant number BİDEB-2211]; the Faculty Development Programme at Middle East
Technical University (ÖYP).
ORCID
Derya Kaltakci-Gurel http://orcid.org/0000-0003-3727-7516
References
Abell, S. K. 2007. “Research on Science Teacher Knowledge.” In Handbook of Research on Science Education,
edited by S. K. Abell and N. G. Lederman, 1105–1149. New Jersey, NJ: Lawrance Erlbaum.
Adadan, E., and F. Savasci. 2012. “An Analysis of 16–17 Year-Old Students’ Understanding
of Solution Chemistry Concepts Using a Two-Tier Diagnostic Instrument.” International
Journal of Science Education 34 (4): 513–544.
Al-Rubayea, A. M. 1996. An Analysis of Saudi Arabian High School Students’ Misconceptions about
Physics Concepts. Unpublished doctoral dissertation., Kansas State University, Manhattan, KS.
Andersson, B., and C. Karrqvist. 1983. “How Swedish Peoples, Aged 12–15 Years Understand
Light and Its Properties.” International Journal of Science Education 5 (4): 387–402.
Arslan, H. O., C. Cigdemoglu, and C. Moseley. 2012. “A Three-Tier Diagnostic Test to Assess
Pre-Service Teachers’ Misconceptions about Global Warming, Greenhouse Effect, Ozone
Layer Depletion, and Acid Rain.” International Journal of Science Education 34 (11): 1667–
1686. doi: 10.1080/09500693.2012.680618
Ausubel, D. 1968. Educational Psychology. New York: Holt, Rinehart & Winston.
Barke, H. D., A. Hazari, and S. Yitbarek. 2009. Misconceptions in Chemistry: Addressing Perceptions in
Chemical Education. Heidelberg: Springer.
Beichner, R. J. 1994. “Testing Student Interpretation of Kinematics Graphs.” American
Journal of Physics 62 (8): 750–762).
Bendall, S., F. Goldberg, and I. Galili. 1993. “Prospective Elementary Teachers’ Prior
Knowledge about Light.” Journal of Research in Science Teaching 30 (9): 1169–1187.
Brown, T. A. 2006. Confirmatory Factor Analysis for Applied Research. New York: Guilford.
Caleon, I. S., and R. Subramaniam. 2010a. “Development and Application of a Three‐Tier
Diagnostic Test to Assess Secondary Students’ Understanding of Waves.” International
Journal of Science Education 32 (7): 939–961.
Caleon, I. S., and R. Subramaniam. 2010b. “Do Students Know What They Know and What
They Don’t Know? Using a Four-Tier Diagnostic Test to Assess the Nature of Students’
Alternative Conceptions.” Research in Science Education 40: 313–337.
Çataloğlu, E. 2002. Development and Validation of an Achievement Test in Introductory
Quantum Mechanics: The Quantum Mechanics Visualization Instrument (QMVI).
Unpublished doctoral dissertation., The Pennsylvania State University, Pennsylvania.
Chang, H. P., J. Y. Chen, C. J. Guo, C. C. Chen, C. Y. Chang, S. Y. Lin, S. H. Lin et al. 2007.
“Investigating Primary and Secondary Students’ Learning of Physics Concepts in Taiwan.”
International Journal of Science Education 29 (4): 465–482.
Chen, S. M. 2009. “Shadows: Young Taiwanese Children’s Views and Understanding.” International Journal
of Science Education 31 (1): 59–79.
RESEARCH IN SCIENCE & TECHNOLOGICAL EDUCATION 257
Chen, C. C., H. S. Lin, and M. L. Lin. 2002. “Developing a Two-Tier Diagnostic Instrument to
Assess High School Students’ Understanding- the Formation of Images by Plane Mirror.”
Proceedings of the National. Science Council Part D 12 (3): 106–121.
Crocker, L., and J. Algina. 1986. Introduction to Classical and Modern Test Theory.
Philadelphia, PA: Holt, Rinehart and Winston.
Dedes, C., and K. Ravanis. 2009. “Teaching Image Formation by Extended Light Sources: The Use of a
Model Derived from the History of Science.” Research in Science Education 39: 57–73.
Ehrlén, K. 2009. “Drawings as Representations of Children's Conceptions.” International
Journal of Science Education 31 (1): 41–57.
Eryılmaz, A. 2010. “Development and Application of Three-Tier Heat and Temperature Test: Sample of
Bachelor and Graduate Students.” Eurasian Journal of Educational Research 40: 53–76.
Fetherstonhaugh, A., and D. F. Treagust. 1992. “Students’ Understanding of Light and Its Properties:
Teaching to Engender Conceptual Change.” Science Education 76 (6): 653–672.
Frankel, J. R., and N. E. Wallen. 2000. How to Design and Evaluate Research in Education.
4th ed. New York: McGraw-Hill.
Galili, I. 1996. “Students’ Conceptual Change in Geometrical Optics.” International Journal of
Science Education 18 (7): 847–868.
Galili, I., S. Bendall, and F. Goldberg. 1993. “The Effect of Prior Knowledge and Instruction on
Understanding Image Formation.” Journal of Research in Science Teaching 30 (3): 271–301.
Gil-Perez, D., and J. Carrascosa. 1990. “What to Do about Science ‘Misconceptions’.”
Science Education 74 (5): 531–540.
Goldberg, F. M., and L. C. McDermott. 1986. “Student Difficulties in Understanding Image
Formation by a Plane Mirror.” The Physics Teacher 24 (8): 472–481.
Goldberg, F. M., and L. C. McDermott. 1987. “An Investigation of Student Understanding of Real Image
Formed by a Converging Lens or Concave Mirror.” American Journal of Physics 55 (2): 108–119.
Griffard, P. B., and J. H. Wandersee. 2001. “The Two-Tier Instrument on Photosynthesis: What Does It
Diagnose?” International Journal of Science Education 23 (10): 1039–1052.
Griffiths, A. K., and K. R. Preston. 1992. “Grade-12 Students’ Misconceptions Relating to Fundamental
Characteristics of Atoms and Molecules.” Journal of Research in Science Teaching 29 (6): 611–628. Haladayna,
T. M. 1997. Writing Test İtems to Evaluate Higher Order Thinking. Boston, MA: Allyn and Bacon.
Hammer, D. 1996. “More than Misconceptions: Multiple Perspectives on Student Knowledge
and Reasoning, and an Appropriate Role for Educational Research.” American Journal of
Physics 64 (10): 1316–1325.
Hasan, S., D. Bagayoko, and E. L. Kelley. 1999. “Misconceptions and the Certainty of
Response Index (CRI).” Physics Education 34 (5): 294–299.
Heller, P., and F. Finley. 1992. “Variable Uses of Alternative Conceptions: A Case Study in Current
Electricity.” Journal of Research in Science Teaching 29: 259–275.
Helm 1980. “Misconceptions in Physics amongst South African Students.” Physics Education 15: 92–105.
Hestenes, D., and I. Halloun. 1995. “Interpreting the Force Concept Inventory. a Response to Huffman
and Heller.” The Physics Teacher 33: 502–502.
Hestenes, D., M. Wells, and G. Swackhamer. 1992. “Force Concept Inventory.” The Physics Teacher 30:
141–158.
Heywood, D. S. 2005. “Primary Trainee Teachers’ Learning and Teaching about Light: Some Pedagogic
Implications for Initial Teacher Training.” International Journal of Science Education 27 (12): 1447–1475.
Hubber, P. 2006. “Year 12 Students’ Mental Models of the Nature of Light.” Research in
Science Education 36: 419–439.
Kaltakci, D. 2012a. Development and Application of a Four-Tier Test to Assess Pre-Service
Physics Teachers’ Misconceptions about Geometrical Optics. Unpublished PhD Thesis., Middle
East Technical University, Ankara. https://etd.lib.metu.edu.tr/upload/12614699/index.pdf.
Kaltakci, D. 2012b. “Four-Tier Geometrical Optics Test (FTGOT).”
https://www.physport.org/assessments/ assessment.cfm?I=90&A=FTGOT.
Kaltakci, D., and D. Didis. 2007. Identification of Pre-Service Physics Teachers’ Misconceptions on
Gravity Concept: A Study with a 3-Tier Misconception Test. In Proceedings of the American
Institute of Physics, edited by S. A. Çetin and İ. Hikmet, 499–500. USA, 899.
258 D. KALTAKCI-GUREL ET AL.
Appendix. (Continued).
Misconceptions Item alternative sets
M18. An image can be seen if it is not at the focal point. No image is formed /seen 7.1.a,b,c; 7.2.a,b; 7.3.c; 7.4.a,b
of an object at the focal length in a convex mirror/lens. 8.1.a,b; 8.2.a,b; 8.3.a; 8.4.a,b
11.1.b; 11.2.a,b; 11.3.a; 11.4.a,b
12.1.b; 12.2.a,b; 12.3.a; 12.4.a,b
19.1.b; 19.2.a,b; 19.3.a; 19.4.a,b
20.1.b; 20.2.a,b; 20.3.a; 20.4.a,b
M19. A real image can be seen of a different size/orientation if a screen is placed at 16.1.a; 16.2.a,b; 16.3.a; 16.4.a,b
another location from the image point. 16.1.b; 16.2.a,b; 16.3.c; 16.4.a,b
16.1.c; 16.2.a,b; 16.3.b; 16.4.a,b
17.1.a, d; 17.2.a,b; 17.3.c; 17.4.a,b
M20. Only the number of mirrors determines the number of images seen in a 14.1.d; 14.2.a,b; 14.3.a; 14.4.a,b
hinged mirror. 15.1.b; 15.2.a,b; 15.3.a; 15.4.a,b
M21. Only the angle between mirrors determines the number of images seen in a 14.1.b,d; 14.2.a,b; 14.3.d; 14.4.a,b
hinged mirror. 14.1.c; 14.2.a,b; 14.3.c; 14.4.a,b
14.1.e; 14.2.a,b; 14.3.b; 14.4.a,b
15.1.a; 15.2.a,b; 15.3.d; 15.4.a,b