Sei sulla pagina 1di 13

Exams in a classroom setting: When is it useful?

Examinations are widely used around the world to help make decisions and evaluations unto a
person. Physical, psychological, medical, military, civil, and educational exams are some examples of it.
For instance, in a certain medical exam, the result given by the physician will tell something about that
persons health; thus, a decision whether that person will take a medication or not is made. In education,
exams have a very big impact in the lives of individuals as it usually determines their academic images
and careers after the years of their study. One expert said that we, in our times, live in a test conscious,
test-giving culture in which the lives of the people are in part determined by their test performance (S.B.
Sarason 1959,26 qtd in Zeidner, 1998,4). There are still many other types of exams but in this paper, the
one to be focused on is the exam used in the field of education; specifically the one used by teachers in a
classroom setting. These examinations are used by teachers as helpful tools not only to measure the
cognitive ability of the students but also to evaluate their classroom performance.
The idea of examinations were already present in the past and until now, it is still widely used in
many countries around the world although there are rare instances that exams are omitted in an
educational system. However, in the contemporary culture, irony was observed as more and more critics
against the subject arise despite of its long term existence. In fact, the existence of exams itself is the very
subject of many debates nowadays. Critics insist that exams should be outlawed in favor of other forms of
assessment. In developing this idea, the history of examinations will be further discussed in this paper.
This study was conducted to show that examinations cannot be revoked just like what critics say.
However, it also cannot be the sole basis in measuring intellectual ability of students because of its
limitations and other extraneous factors that must be considered. In analyzing this claim, the concept of
the difficulties in the mining process was depicted. Meanwhile, the usefulness of exams were analyzed by
comparing it to the methods used by the miners today diamond drilling.

This study is noteworthy as it gives clarification to some points that may have not been clear to
some whose faith was lost in the process of measurement in educational matters. In addition to that, this
paper has pedagogical importance as it points out what improvement should be done to strengthen the
credibility of exams- one procedure that is very essential to teaching. This also has significance in the
field of psychology since the behaviour of students was also incorporated in this study. All in all, this
paper intends to be an evidence that examinations, although cannot be the solitary basis for intellectual
measurement of students in a classroom setting, should not be revoked as it is a useful device that helps
teachers to measure students intellectual ability when the concept of validity is met.
Over the years, the word measurement has received many flawed notions. It is usually mistaken
and referred to as similar to the word assessment. Unfortunately, the two terms are different. People
practice the concept of assessment every day, whether theyre conscious or not. Observing itself is already
an act of assessment, a process by which things are differentiated. It only becomes measurement when
this observation or assessment made has been quantified (K. Hopkins & Stanley & B. Hopkins, 1990,
1,3). For example, if we carry a gallon of water, we can readily say that it is heavy. This is what we call
assessment. When we use a weighing scale, and express that heaviness into a quantitative matter, that is
the point when we can consider it as a measurement. Because precision and accuracy is a fundamental,
we consider measurement over assessment. The use of numbers reduces the subjectivity of any
observation, thus making a certain observation more dependable. However, not all things can be directly
measurable. Some human qualities are not directly observable. Intelligence, a human quality related to
educational goals as used in this paper, for example, does not have a standard measuring device unlike
temperature, height, weight, and volume, which have their respective measuring tool. In fact,
examinations, as already mentioned, were used to measure the intellectual ability of students.
The concept of examination originated from the interest of Francis Galton in studying individual
differences by measuring variety of human physical attributes in 1869. His motto Whenever you can,
count (qtd. in Walsh & Betz, 1990), reflected his devotion in studying inherited genius. In 1879, Wilhelm
2

Wundt developed the first psychological laboratory in Leipzig, Germany. Together with Hermann
Ebbinghaus who explored learning and retention rates, he suggested that psychological events, like
physical events, can be interpreted in quantitative terms, in pursuit of Galtons interest in human
intelligence (Walsh & Betz, 1990, 2). It was in the year 1905 when the first intelligence scale was
introduced by Alfred Binet, a French psychologist, and Theodore Simon. This test aimed to distinguish
children who are capable of normal functioning apart from those who are slow. Yet, this test, which was
revised by Lewis M. Terman in 1916, is still considered as the birth of intelligence testing and is still
followed in the later years. This individual testing was replaced by Arthur Otis in 1917 during the World
War I into the first written group testing namely Army Alpha Test for Literates and Army Beta Test for
Illiterates. After these tests were able to successfully select American Army personnel for differential
training, different schools started to use the same type of measure for assessment in their path of learning
through their school years. Since then, different and numerous methods in measuring cognitive abilities
have been developed (Walsh & Betz, 1990, 1-4; J. Linden & K. Linden, 1968, 15-16).
However, ironically, just as these advancements in educational measurements were developed,
the rise of critics regarding the method itself was evident:
There is a paradox in educational measurement today. While assessments of achievement and
competence are being more urgently called for and more widely employed than ever before, tests
are, at the same time, being more sharply criticized and more strongly opposed. (Ebel and Frisbie
1986 qtd. in K. Hopkins et al., 1990, 3).

In the present days, there are still many arguments, debates, proposals, and studies about the
examination process. But in spite of the many criticisms the process itself has gained, removing
examinations in an educational process had been very rare.
One country that does not practice the traditional exams is Finland. In this country, there are no
mandatory exams until a student reaches the average age of 18 (Wilby, 2013, para. 1). Even so, they have

a consistent high rating in PISA (Performance for International Student Assessment) for many years
which makes their methods to be regarded as a leading educational system around the world (Lopez,
2012, para. 4). However, critics say that it is possible for them not to put examination pressure in their
students because of their economic success (Wilby, 2013, para. 17). Another critique was that their
language is phonetic in nature, therefore making the students to have an advantage in comprehensive
literacy. Pasi Sahlberg, who heads an international center at the education in Finland, although not
refusing that fact, also defends that their students are generally wider readers and thus having an
advantage among other countries. Finlands decentralised system of education, which means local
teachers have the freedom to implement their own curricula in accordance with the specific students
personal need, also contributes to their success. (Wilby, 2013, para. 20-22; Lopez, 2012, para. 9).
Meanwhile, the argument about whether their system is exportable or not is evaluated by looking
at the factors which made the Finnish system successful as well as its process of development, historical
and cultural background. Hence, the no-mandatory-exams system of Finland is not feasible in other
countries because of the said factors which are still to be considered.
It is known that critics involve the argument that exams are not reliable when it comes to
intellectual measurement in educational process because of many grounds like individual differences,
constant environment-student transaction, and other extraneous factors affecting the performance during
exams. These grounds, having strong points to have a basis, contribute to the weight of the idea that
exams cannot be the sole basis in determining the intellectual ability of students. Individual differences
particularly the differences in mental ability are an undeniable fact that is readily observable between
individuals. As suggested by Kluckholn, Murray, and Schneider (1953, p 53 qtd. in Walsh & Betz, 1990,
15), Every person is in certain respects, like all other people, like some other people, like no other
person. This observation implies that exams cannot fully determine the intellectual capability of students
because a person tends to have different kind of intelligence or inclination compared to another person.
Another factor is the student environment transaction. Walsh and Betz said that in order for the
4

assessment of the person to be complete, an assessment of the environment must also be done, thus,
proving that there is a constant transaction between people and environment. Environment has two
classifications, the physical and the psychological. The physical environment includes the biological
surroundings from the place itself, buildings, and structures down to the smallest objects. The
psychological environment, as differentiated to the physical environment by Endler and Magnusson
(1976b qtd in Walsh & Betz, 1990, 318), is a subjective world rather than objective which comprises the
perception of an individual to the situations embedded in his/her life. Both the physical and the
psychological environment contribute to the intellectual ability and academic performance of students.
During exams, a student in an opportune location, for instance, a well ventilated and silent classroom is
more likely to perform better than a student who takes an exam on a too-cold or too-hot classroom with
many disturbances like other noisy students, or a visitor who comes in and starts a conversation with the
examiner. Furthermore, a student who has a family problem or had just lost a best friend, for example,
may carry out the exam poorly compared to a student who does not experience those situations.
Emotional factors, as included in the psychological environment, bear an impact, be it small or big, to a
students performance. Anxiety is under emotional factors. Exam-anxious students tend to have poor
performance on exams although they have the capacity to get high scores because during exams, they fail
to cope up even with the simplest instructions and easiest questions as they go block minded (Zeidner,
1998, 4). Additionally, there are many extraneous factors such as anxiety and motivation, practice,
coaching, test complexity, mode of administration, and cheating that also influences the performance of
students during exams. Although some of them may have a slight effect, still, the above arguments prove
the fact that exams cannot be the solitary basis for measuring intellectual ability of students.
On the other hand, exams, despite of numerous criticisms, serve lots of purposes in the field of
education. Exams, like gadgets and many other appliances, can be very useful if used properly and exams
can be said to be used properly if and only if it is reliable and valid. Reliability is defined in the MerriamWebsters dictionary as the extent to which an experiment, test, or measuring procedure yields the same

results on repeated trials. In exams, reliability is very important as it dictates the worth of the said exams.
Questions like, Is the exam worth conducting? may be apt to examine the reliability of exams because
unreliable exams do not measure anything rather than error variation (Walsh & Betz, 1990, 58). While
reliability is mainly concerned about dependability, consistency and precision of a certain procedure, it
does not readily imply validity; meaning, a reliable procedure may not provide evidence that we are
measuring what we really want to measure. Validity, in psychometrics, refers to the ability of a certain
process to measure what it is supposed to measure. To analyze the relationship between reliability and
validity, the concept of measurement itself can be used. For example, if a 65-kilo person steps on the
bathroom scale 5 times and the scale read 75, 65, 29, 200 and 150 kilos, then that scale is not reliable. If
that person steps on another scale and reads 70, 70, 70, 70 and 70, then that scale must be reliable but not
valid. If that person steps on another scale and reads 65, 65, 65, 65 and 65, then the scale used is both
reliable and valid. This example simply says that reliability does not imply validity. However, validity
implies reliability. The same thing goes with exams. Exams can be reliable but not valid, but exams
cannot be valid if they are not reliable in the first place. Exams need to be valid because it dictates its
usefulness. To determine whether a certain exam is valid, the foremost important thing to do is identify
what is going to be measured. In teacher-made exams, the teacher must first know what she wants to
obtain from the students before she makes an exam. Does the teacher want to measure arithmetic,
analytical, memorization, comprehension, synthesis, or evaluation abilities of the students? Knowing the
purpose of exams is very crucial in coming up with exam validity since a valid exam measures what it
aims to measure. For example, if a mathematics teacher gave a computational exam wherein she expects
to obtain samples of the mathematical abilities of the students, yet the exam contains problem solving
questions that contain long vague questions and too many irrelevant words that need deep comprehension,
then that exam must not be valid. It is simply because that teacher must not be measuring anymore the
mathematical ability of the students but their comprehension ability. Comprehension ability should only
be measured in such tests if it is included in the goals of the said exam. In view of this, the teacher should
know the different types of exams as well as their purposes and limitations.
6

Generally, there are two classifications of exams the subjective and the objective. Subjective
exams often include essay tests in which most of the questions require students to organize, synthesize
and analyze their thoughts on a particular topic which is found in the question. However, although essay
tests give the writing and thought freedom to the students, essays also have its sample model answers in
which the rating categories are based. This leads to its disadvantage its inconsistent scoring due to its
subjectivity. The other classification of exams, the objective one, includes multiple-choice items, true or
false or two option items, matching exercises, short-answer items, and completion items all of which
can be scored objectively unlike subjective tests. Multiple choice items are items that offer two or more
choices, typically, four to five choices in which one choice is correct or better than the others. Although
this type of exam usually measures rote learning, it is not limited to that as multiple choice items can also
include questions that really need analysis, comprehension, and arithmetic problem. For these kinds of
questions, the skill of the student is required and whether the choices are given or not, it can still be
answered. Multiple choice items, unlike essay tests, have more reliable and easier scoring because there is
a fixed answer per item which is found in the choices. However, the limitation with multiple choice items
is that it promotes guessing and the teacher will not know whether the student really understood the
question or not. True or false or two choice items, on the other hand, receives more criticisms than other
objective tests like multiple choice exams (K. Hopkins et al., 1990, 225).
If youre smart, you can pass a true or false test without being smart (Linus in Peanuts by
Schulz qtd. in Hopkins et al., 1990, 248).This quote supports the idea that true or false exams have its
limitations because of the guessing-factor or the instinct analysis of students. Multiple choice items also
have the risk of guessing but the percentage is much lower than true or false items. Statistically, a student
who does not know the correct answer in an exam has a little probability to get higher score. However, as
this paper argues, it can be very effective if the test is well - constructed. Well constructed true or false
questions do not include ambiguous words, obvious clues, and unnecessary complex sentence structures.
Modifications in true or false exams have also been made popular, wherein students were asked to

identify whats wrong in the sentence and replace it with the right answer. This is another way of making
this type of exam more reliable and valid. It is necessary to take note that true or false items are only
suggestible for points that express themselves unambiguously to the students (Cunnninghum, 1998, 69).
On the other hand, matching type items include two columns and each item in the first column needs to
be paired with the right answer which is in the other column. This type of exam reduces the guessing
probability for students because more choices are presented, and sometimes, there is a tricky choice that
does not belong to any item in the first column. This type of exam usually measures the what, who,
when and where situations (K. Hopkins et al., 1990, 252). It is most useful in measuring knowledge
of facts and terminologies, rather than the measurement of understanding and the ability to interpret
relationships because it usually requires memorization skills. Meanwhile, short-answer items, or often
referred to as identification items; consist of questions, statements, phrases or even sentences that require
certain answers that must be supplied by the students. In this type of exam, the guessing factor is also
decreased since the student supplies the answer and not just chooses from the presented choices. Short
answer tests are more suitable in mathematics or computations and charts that need to be analyzed
because of its incapability of measuring complex learning. One limitation of this exam is that there may
be a discrepancy of correctness of varied answers which may still be considered but this limitation will be
lessened if there is a careful phrasing. Completion exams usually involve a paragraph or a sentence that
needs to be completed. It has the same property with short-answer items as the student is also required to
supply the answer. Completion items, if prepared carefully, measure understanding rather than just rote
learning. There may be many more types of exams but those mentioned above are the most common types
and patterns of exams in a classroom setting. Each type of exams has their own limitations but they also
have their edges and it is the responsibility of the teacher, or anyone who makes the test to prepare each
items carefully and in accordance to the sole purpose of exams. Choosing the right kind of exams in the
first place makes the exam meet the concept of validity.

Exams have undeniable limitations but if the nature of the exam had reflected its purpose, the
concept of validity is met and thus, exams can be very useful in measuring the intellectual ability of the
students. This relation can be likened into a miner. In the earlier part of the century, miners use
underground excavation in getting ores from below the earth. They make tunnels underground and it is
quite expensive and the success is so uncertain. It has many disadvantages as Pfleider (1968) indicated
that underground mining costs very high. (See appendix a)
Also, his study has proven that the productivity of underground mining was low, 10-60 tons per
man shift. But in the latter years until today, it is possible for the miners to save time and money and
increase the productivity of their jobs by using diamond drills. According to a case study made by J. G.
Stone, V. M. Mejia, and G. T. Newell in 1988, diamond drilling is the most superior technique to be used
rather than other classical techniques. By using diamond drills, the miner need not to have a trial and error
technique as what tunnelling does because diamond drills are used to get samples by simply driving it on
the ground at different angles and samples of ores from the underground can be obtained. The samples
obtained are then brought to assayers and after examining their composition and after having chemical
analysis, the actual value of ore found underground can be estimated (Baron & Bernard, 1958, 1). These
provides miners ways to know whats underground without using destructive, time consuming, uncertain
and costly tunnels. Here, the sample of ores represents the samples of intellectual abilities of students
while the diamond drills correspond to the exam itself. Exams, like diamond drills, are tools that are
available for teachers which are useful in obtaining samples from students without having the costly, or
trial and error way that other forms of assessment may suggest. As T. Sylvester and C. Farrell (2009) put
it, other forms of assessment are costly and are more likely to be unreliable when it comes to scoring (p.
93).
Finally, in the process of mining, after knowing the estimated value of the ores that can be found
underground, the miner can consequently excavate where the ores are found. However, his success might
still not be guaranteed because of uncontrollable outer factors like faulty earth that causes sudden collapse
9

of any geologic formation, mine or structure; crash in the market may give in to poor sales of his ore; and
advancement of technology might lessen the demand for his ore. (Baron & Bernard, 1958, 2) Similarly,
exams have limitations and extraneous factors might affect its success. In fact, exams are not elixirs as
critics would expect it to be, that is why exams cannot be the sole basis in determining intellectual ability.
However, exams cannot also be revoked because absolutely, they are helpful tools or devices that the
teachers can use to measure the intellectual abilities of students in a classroom setting when the concept
of validity is met.
The misuse of exams, as it turns out, is the major root of most criticisms, if not all, examinations
had gained (J. Linden & K. Linden, 1968, 2). Even the most carefully prepared exams may still be useless
if exams are not used properly and if it had not served its sole purpose. For this reason, the test
constructor, teachers for the most part in a classroom setting exams, play a very important role in the
process of measuring the intellectual ability of students. Aside from constructing technically well
prepared exams and using them appropriately according its purpose, they also have to interpret the scores
because exams itself does not readily express or mean anything. This makes the concept of validity
complete (AERA, APA, & NCME, 11, qtd in Lee, 2008, 22). Messick (1989) also correlated the validity
of exams to the appropriateness and interpretations assigned to test scores rather than test scores
themselves (qtd. in Zeidner, 1998, 137). Just as the ore samples that the diamond drills obtain needs to
be chemically analyzed first before knowing the estimated underground ore, exam scores also need to be
interpreted so that the teacher would be able to make a sound evaluation. This evaluation will lead to an
enhanced understanding of the intellectual ability of students, always taking note of the objectivity of any
judgment.
Truly, the success of any intellectual measurement process still lies in the hands of the test
constructor and it would be better if educational systems around the world would put a greater weight in
informing and practicing test makers and teachers about this crucial part of educational measurement. In
doing so, a greater success in the fundamentals of the educational process itself will be established.
10

BIBLIOGRAPHY
Baron, D., & Bernard, H. (1958) Evaluation techniques for classroom teachers. USA: McGraw- Hill
Book Company, Inc.
Cunnningham G.K. (1998) Assessment in the classroom: Constructing and interpreting texts. USA:
Falmer Press. Retrieved from: http://books.google.com.ph/books?
id=k9uRAgAAQBAJ&printsec=frontcover#v=onepage&q&f=false
Hopkins, K., Stanley., & J., Hopkins, B. (1990) Educational and psychological measurement and
evaluation (7th ed.). New Jersey: Prentice Hall, Inc.
Lee, W. (2008) A study of the validity of essay test as a college admissions screening factor in Korea:
Evaluative approach to validation (Doctoral Dissertation) Retrieved from ProQuest database. (UMI
No. 3347575)
Linden, J., & Linden, K. (1968) Tests on trial. Boston: Houghton Mifflin Company.
Lopez, A. (2012, May 21). How Finnish Schools Shine. The Guardian. Retrieved from
http://www.theguardian.com/teacher-network/teacher-blog/2012/apr/09/finish-school-system
Pfleider, E. P., (1968) Rapid Underground Excavation As An Alternate. Society of Mining Engineers of
AIME. (Preprint number 68530). Retrieved from: http://www.onemine.org/view/?
d=1234567890123456789012345678901234567890123456789012345678901234155502
Stone, J.G., Mejia, V. M., Newell, G. T. (1988) Using diamond drilling to evaluate a placer deposit:
A case study. Mining Engineering. Retrieved from: http://www.onemine.org/view/?
d=12345678901234567890123456789012345678901234567890123456789012346125
Farrell T.S.C. (2009) Teaching reading to English language learners: A reflective guide. USA: Corwin
Press. Retrieved from:
http://books.google.com.ph/books?
id=NLiYADFg4HoC&printsec=frontcover#v=onepage&q&f=false
Walsh, W., & Betez, N. (1990) Tests and assessment (2nd ed.). New Jersey: Prentice Hall, Inc.

11

Wilby, P. (2013, July 1). Finlands ambassador spreads the word. The Guardian. Retrieved from
http://www.theguardian.com/education/2013/jul/01/education-michael-gove-finland-gcse
Zeidner, M. (1998) Tests and anxiety: The state of the art. New York: Plenum Press.
Retrieved from: http://books.google.com.ph/books?
id=oYBb7iLNiTkC&printsec=frontcover#v=onepage&q&f=false

APPENDIX A

12

13

Potrebbero piacerti anche