Sei sulla pagina 1di 122

CHAPTER 1 A Perspective on Educational Assessment, Measurement, and Evaluation

As teaching is causing learning among learners, teachers need to be thoroughly aware of the processes in determining how successful they are in the aforementioned task. They need to know whether their students are achieving successfully the knowledge, skills, and values inherent in their lessons. For this reason, it is critical for beginning teachers, to build a repertoire measurement and evaluation of student learning. This chapter is geared towards equipping you with the basic concepts in educational assessment, measurement, and evaluation. Measurement, Assessment, and Evaluation Measurement as used in education is the quantification of what students learned through the use of tests, questionnaires, rating scales, checklists, and other devices. A teacher, for example, who gave his class a 10 item quiz after a lesson on the agreement of subject and verb is undertaking measurement of what was learned by the students on that particular lesson.

Assessment, however, refers to the full range of information gathered and synthesized by teachers about their students and their classrooms (Arends, 1994). This information can be gathered in informal ways, such as through observation or verbal exchange. It cal also be gathered through formal ways, such as assignments, tests, and written reports or outputs. While measurement refers to the quantification of students performance and assessment as the gathering and synthesizing of information, evaluation is a process of making judgments, assigning value or deciding on the worth of students performance. Thus, when a teacher assigns a grade to the score you obtained in a chapter quiz or term examination, he is performing an evaluative act. This is because he places value on the information gathered on the test. Measurement answers the question, how much does a student learn or know? Assessment looks into how much change has occurred on the students acquisition of a skill, knowledge or value before and after a given learning experience. Since evaluation is concerned with making judgments on the worth or value of a performance, it answers the question, how good, adequate or desirable is it? Measurement and assessment are, therefore, both essential to evaluation.

Educational

Assessment:

context

for

Educational

Measurement and Evaluation


As a framework for educational measurement and evaluation, educational assessment is quite difficult to define. According to Stiggins and his colleagues (1996) assessment is a method of evaluating personality in which an individual, living in a group meets and solve a variety of lifelike problems. From the viewpoint of Cronbach, as cited by Jaeger (1997), three principal features of assessment are identifiable: (1) the use of a variety technique; (2) reliance on observations in structured and unstructured situations; and (3) integration of information. The aforementioned definition and features of assessment are applicable to a classroom situation. The term personality in the definition of assessment refers to an individuals characteristics which may be cognitive, affective and psychomotor. The classroom setting is essentially social, which provides both structured and unstructured phases. Even problem solving is a major learning task. Holistic appraisal of a learner, his or her environment, and his or her accomplishments is the principal objective of educational assessment.

Bloom (1970) has this to say on the process of educational assessment: Assessment characteristically starts with an analysis of the criterion and the environment in which an individual lives, learns, and works. It attempts to determine the psychological pressures the environment creates, the role expected, and the demands and pressures their hierarchical arrangement, consistency, as well as conflict. It then proceeds to the determination of the kinds of evidence that are appropriate about the individuals who are placed in this environment, such as their relevant strengths and weaknesses, their needs and personality characteristics, their skills and abilities. From the foregoing description of the process of educational assessment, it is very clear that educational assessment concerns itself with the total educational setting and is a more inclusive term. This is because it subsumes measurement and evaluation. It focuses not only on the nature of the learner but also on what is to be learned and how it is to be learned. In a real since, it is diagnostic in intent or purpose. This is due to the fact that through educational assessment the strengths and weaknesses of an individual learner can be identified and at the same

time, the effectiveness of the instructional materials used and the curriculum can be ascertained. Assessments are continuously being undertaken in all educational settings. Decisions are made about content and specific objectives, nature of students and faculty, faculty morale and satisfaction, and the extent to which student performances meet standards. Payne (2003) describes a typical example of how assessments can be a basis for decision making: 1. The teacher reviews a work sample, showing some column additions are in error and there are frequent carrying errors. 2. He / She assigns simple problems on proceeding pages, with consistent addition errors in some number combinations, as well as repeated errors in carrying from one column to another. 3. He / She give instruction through verbal explanation, demonstration, trial and practice. 4. The student becomes a successful in calculations made in each preparation step after direct teacher instruction. 5. The student returns to the original pages, completes it correctly, and is monitored closely when new processes are introduced.

From the foregoing example, it can be seen that there is a very close association between assessment and instruction. The data useful in decision making may be related from informal assessments, such as observations from interactions or from teacher made tests. Informed decision making in education is very important owing to the obvious benefits it can bring about (Linn, 1999). Foremost among these benefits evaluation of feelings of competence in the area of academic skill and the sense of ones perception of being able to function effectively in society is something obligatory. Finally, the affective side of development is equally important. Personal dimensions, like feelings of self worth, being able to adjust to people and cope with various situations lead to better overall life adjustment.

Purposes of Educational Assessment, Measurement and Evaluation


Educational assessment, measurement and evaluation serve the following purposes (Kellough, et al, 1993):

Improvement of Student Learning Knowing how well students are performing in class can lead teachers to devise ways and means of improving student learning.

Identification of Students Strengths and Weaknesses Through measurement, assessment, and evaluation, teachers can be able to single out their students strengths and weaknesses. Data on these strengths and weaknesses can serve as bases for undertaking reinforcement and / or enrichment activities for the students.

Assessment of the Effectiveness of a Particular Teaching Strategy Accomplishment of an instructional objective through the use of a particular teaching strategy is important to teachers. Competent teachers continuously evaluate their choice of strategies on the basis of student achievement.

Appraisal of the Effectiveness of the Curriculum Through educational measurement, assessment, and evaluation, various aspects of the curriculum are continuously evaluated by curriculum committees on the basis of the results of achievement test results.

Assessment and Improvement of Teaching Effectiveness Results of testing are used as basis for determining teaching effectiveness. Knowledge of the results of testing can provide school administrators inputs on the instructional competence of teachers under their charge. Thus, intervention programs to improve teaching effectiveness can be undertaken by the principals or even supervisors on account of the results of educational measurement and evaluation.

Communication with and Involvement of Parents in Their Childrens Learning Results of educational measurement, assessment, and evaluation are utilized by the school teachers in communicating to parents their childrens learning difficulties, knowing how well their children are performing academically can lead them to forge a partnership with the school in improving and enhancing student learning.

Types of Classroom Assessment


There are three general types of classroom assessment teachers are engaged in (Airisian, 1994). These are as follows: official; sizing up; and instructional. Official assessment is undertaken by teachers to carry out the bureaucratic aspects of teaching, such as giving students grades at the end of each marking period. This type of assessment can be done through formal tests, term papers, reports, quizzes, and assignments. Evidence sought by teachers in official assessment is mainly cognitive. Sizing up assessment, however, is done to provide teachers information regarding the students social, academic, and behavioral characteristics at the beginning of each school year. Information gathered by teachers, in this type of assessment, provides a personality profile of each of these students to boost instruction and foster communication and cooperation in the classroom. Instructional assessment is utilized in planning instructional delivery and monitoring the progress of teaching and learning. It is normally done daily throughout the school year. It, therefore, includes decisions on lessons to teach, teaching strategy to employ, and instructional materials and resources to use in the classroom.

Methods of Collecting Assessment Data


Airisian (1994) identified two basic methods of collecting information about the learners and instruction, namely: paper and pencil; and observational techniques. When the learners put down into writing their answers to questions and problems, the assessment method is pre and pencil technique. Paper and pencil evidence that teachers are able to gather includes tests taken by students, maps drawn, written reports; completed assignments and practice exercises. By examining these evidences, teachers are able to gather information about their students progress. There are two general types of paper and pencil techniques: supply and selection. Supply type requires the student to produce or construct an answer to the question. Book report, essay question, class project, and journal entry are examples of the supply type of paper and pencil technique. Selection type, on the other hand, requires the student to choose the correct answer from a list of choices or options. Multiple choice, matching test, alternate response test are technique as the students answer questions by simply choosing an answer from a set of options provided.

The second method teachers utilize is observation. This method involves watching the students as they perform certain learning tasks like speaking, reading, performing laboratory investigation and participating in group activities.

Sources of Evaluate Information


To be able to make correct judgments about students performance, there is a need for teachers to gather accurate information. Thus, teachers have to be familiar with the different sources of evaluative information. Cumulative Record. It holds all the information collected on students over the years. It is usually stored in the principals office or guidance office and contains such things as vital statistics, academic records, conference information, health records, family data and scores on tests of aptitude, intelligence, and achievement. It may also contain anecdotal and behavioral comments from previous teachers. These comments are useful in understanding the causes of the students academic and behavioral problems.

Personal Contact. It refers to the teachers daily interactions with his / her students. A teachers observation on students as he / she works and relaxes, as well as daily conversation with them can provide valuable clues that will be or great help in planning instruction. Observing students not only tells the teacher how well students are doing but allows him / her to provide them with immediate feedback. Observational information is available in the classroom as the teacher watches and listens to students in various situations. Examples of these situations are as follows:
1. Oral Reading. Can the student read well or not? 2. Answering Questions. Does the student understand concepts? 3. Following Directions. Does the student follow specified

instruction?
4. Seatwork. Does the student stay on task? 5. Interest in the Subject. Does the student participate actively in

learning activities?
6. Using Instructional Materials. Does the student use the

material correctly? Through accurate observations, a teacher can determine whether the students are ready for next lesson. He / She can also identify those students who are in need of special assistance.

Analysis. Through a teachers analysis of the errors committed by students, he / she can be provided with much information about their attitude and achievement. Analysis can take place either during or following instruction. Through analysis, the teacher will be able to identify immediately students learning difficulties. Thus, teachers have to file samples of students work for discussion during parent teacher conferences. Open ended Themes and Diaries. One technique that can be used to provide information about students is by asking them to write about their lives in and out of the school. Some questions that students can be asked to react to are as follows: 1. What things do you like and dislike about school? 2. What do you want to become when you grow up? 3. What things have you accomplished which you are proud of? 4. What subjects do you find interesting? Uninteresting? 5. How do you feel about your classmates? The use of diaries is another method for obtaining data for evaluative purposes. A diary can consist of a record, written every 3 or 4 days, in which students write about their ideas, concerns, and feelings. An analysis of students diaries often gives valuable evaluative information.

Conferences. Conferences with parents and the students previous teachers can also provide evaluative information. Parents often have information which can explain why students are experiencing academic problems. Previous teachers can also describe students difficulties and the techniques they employed in correcting them. Guidance counselors can also be an excellent source of information. They can also shed light on test results and personality factors, which might affect students performance in class. Testing. Through testing, teachers can measure students cognitive achievement, as well their attitudes, values, feelings, and motor skills. It is probably the most common measurement technique employed by teachers in the classroom.

Types of Evaluation
Teachers need continuous feedback in order to plan, monitor, and evaluate their instruction. Obtaining this feedback may take any of the following types: diagnostic, formative, and summative. Diagnostic evaluation is normally undertaken before instruction, in order to assess students prior knowledge of a particular topic or lesson. Its purpose is to anticipate potential learning problems and group / place students in the proper course or unit of study. Placement of some elementary school children in special reading programs based on a reading comprehension test is an example of this type of evaluation. Requiring entering college freshmen to enroll in Math Plus based on the results of their entrance test in Mathematics is another example. Diagnostic evaluation can also be called pre assessment, since it is designed to check the ability levels of the students in some areas so that instructional starting points can be established. Through this type of evaluation, teachers can be provided with the valuable information concerning students knowledge, attitudes, and skills when they begin studying a subject and can be employed as basis for remediation or special instruction. Diagnostic evaluation can be based on teacher made tests, standardized tests or observational techniques.

Formative

evaluation

is

usually

administered

during

the

instructional process to provide feedback to students and teachers and how well the former are learning the lesson being taught. Results of this type of evaluation permit teachers modify instruction as needed. Remedial work is normally done to remedy deficiencies noted and bring the slow learners to the level of their classmates or peers. Basically, formative evaluation asks, how are my students doing? It uses pretests, homework, seatwork, and classroom questions. Results of formative evaluation are neither recorded, nor graded but are used for modifying or adjusting instruction. Summative evaluation is undertaken to determine students achievement for grading purposes. Grades provide the teachers the rationale for passing or failing students, based on a wide range of accumulated behaviors, skills, and knowledge. Through this type of evaluation, students accomplishments during a particularly marking term are summarized or summed up. It is frequently based on cognitive knowledge, as expressed through test scores and written outputs. Examples of summative evaluation are chapter tests, homework grades, completed project grades, periodical tests, unit test and achievement tests.

This type of evaluation answers the question, how did my students fare? Results of summative evaluation can be utilized not only for judging student achievement but also for judging the effectiveness of the teacher and the curriculum.

Approaches to Evaluation
According to Escarilla and Gonzales (1990), there are two approaches to evaluation, namely: norm referenced and criterion referenced. Non referenced evaluation is one wherein the performance of a student in a test is compared with the performance of the other students who took the same examination. The following are examples of norm referenced evaluation: 1. Karls score in the periodical examination is below the mean. 2. Cynthia ranked fifth in the unit test in Physics. 3. Reys percentile rank in the Math achievement test is 88.

Criterion referenced evaluation on the other hand, is an approach to evaluation wherein a students performance is compared against a predetermined or agreed upon standard. Examples of this approach are as follows: 1. Sid can construct a pie graph with 75% accuracy. 2. Yves scored 7 out of 10 in the spelling test. 3. Lito can encode an article with no more than 5 errors in spelling.

REFERENCES
Airisian, P.W. (1994). Classroom Assessment, 2nd Ed. New York: McGraw Hill, Inc. Bloom, B.S. (1970). The Evaluation of Instruction: Issues and Problems. New York: Holt, Rinehart & Winston. Clark, J. & I. Starr (1977). Secondary School Teaching Methods. New York: Macmillan Publishing Company. Escarilla, E. R. & E. A. Gonzales (1990). Measurement and Evaluation in Secondary Schools. Makati: Fund for Assistance to Private Education (FAPE). Jaeger, R. M. (1997). Educational Assessment: Trends and Practices. New York: Holt, Rinehart & Winston. Kellough, R. D., et al (1993). Middle School Teaching Methods and Resources, New York: Macmillan Publishing Company. Payne, D. A. (2003). Measuring and evaluating Educational Outcomes. New York: Macmillan Publishing Company.

CHAPTER 2 Test and Their Uses in Educational Assessment


The most common important aspect of student evaluation in most classrooms involves the tests teachers make and administer to their students (Grondlund & Linn, 1990). Teachers, therefore, need to understand the different types of tests and their uses in the assessment and evaluation of the students learning. This chapter orients prospective teachers on tests and their uses in education.

Test Defined
A test is a systematic procedure for measuring an individuals behavior (Brown, 1991). This definition implies that it has to be developed following specific guidelines. It is a formal and systematic way of gathering information about the learners behavior, usually through paper and pencil procedure (Airisian, 1989). Through testing, teachers can measure students acquisition of knowledge, skills, and values in any learning area in the curriculum. While testing is the most common measurement technique teachers use in the classroom, there are certain limitations in their use. As pointed out by Moore (1992), tests cannot measure student motivation, physical

limitations and even environmental factors. The foregoing indicates that testing is only one of students learning and achievement.

Uses of Tests
Tests serve a lot of functions for school administrators, supervisors, teachers, and parents, as well (Arends, 1994; Escarilla & Gonzales, 1990). School administrators utilize test results for making decisions regarding the promotion or retention of students; improvement or enrichment of the curriculum; and conduct of staff development programs for teachers. Through test results, school administrators can also have a clear picture of the extent to which the objectives of the schools instructional program is achieved. Supervisors use test results in discovering learning areas needing special attention and identifying teachers weaknesses and learning competencies not mastered by the students. Test results can also provide supervisors baseline data on curriculum revision. Teachers, on the other hand, utilize tests for numerous purposes. Through testing, teachers are able to gather information about the effectiveness of instruction; give feedback to students about their progress; and assign grades.

Parents, too, derive benefits from tests administered to their children. Through test scores, they are able to determine how well their sons and daughters are faring in school and how well the school is doing its share in educating their children.

Types of Tests Numerous types of tests are used in school. There are different ways of categorizing tests, namely: ease of quantification of response , mode of preparation, mode of administration, test constructor, mode of interpreting results, and nature of response (Manarang & Manarang, 1983; Louisell & Descamps, 1992). As to mode of response, test can be oral, written or performance.
1. Oral Test It is a test wherein the test taker gives his answer

orally.
2. Written Test It is a test where answers to questions are

written by the test taker.


3. Performance Test It is one in which the test taker creates

an answer or a product that demonstrates his knowledge or skill, as in cooking and baking.

As to ease quantification of response, tests can either be objective or subjective.


1. Objective Test It is a paper and pencil test wherein

students answers can be compared and quantified to yield a numerical score. This is because it requires convergent or specific response.
2. Subjective Test It is a paper and pencil test which is not

easily quantified as students are given the freedom to write their answer to a question, such as an essay test. Thus, the answer to this type of test is divergent. As to mode of administration, tests can either be individual or group.
1. Individual Test It is a test administered to one student at a

time.
2. Group Test It is one administered to a group of students

simultaneously.

As to test constructor, tests can be classified into standardized and unstandardized.


1. Standardized Test It is a test prepared by an expert or

specialist. This type of test samples behavior under uniform procedures. Questions are administered to students with the same directions and time limits. Results in this kind of test are scored following a detailed procedure based on its manual and interpreted based on specified norms or standards.
2. Unstandardized Test It is one prepared by teachers for use

in the classroom, with no established norms of scoring and interpretation of results. it is constructed by a classroom teacher to meet a particular need. As to the mode of interpreting results, tests can either be norm referenced or criterion referenced.
1. Norm referenced Test It is a test that evaluates a

students performance by comparing it to the performance of a group of students on the same test.
2. Criterion referenced Test It is a test that measures a

students performance against an agreed upon or pre established level of performance.

As to the nature of the answer, tests can be categorized into the following types: personality, intelligence, aptitude, achievement,

summative, diagnostic, formative, socio metric, and trade.


1. Personality Test It is a test designed for assessing some

aspects of an individuals personality. Some areas tested in this kind of test include the following: emotional and social adjustment; dominance and submission; value orientation; disposition; emotional stability; frustration level; and degree of introversion or extroversion.
2. Intelligence Test It is a test that measures the mental ability

of an individual.
3. Aptitude Test it is a test designed for the purpose of

predicting the likelihood of an individuals success in a learning area or field of endeavor.


4. Achievement Test It is a test given to students to determine

what a student has learned from formal instruction in school.


5. Summative Test It is a test given at the end of instruction to

determine students learning and assign grades.

6. Diagnostic Test It is a test administered to students to

identify their specific strengths and weaknesses in past and present learning.
7. Formative Test It is a test given to improve teaching and

learning while it is going on. A test given after teaching the lesson for the day is an example of this type of test.
8. Socio metric Test It is a test used in discovering learners

likes and dislikes, preferences, and their social acceptance, as well as social relationships existing in a group.
9. Trade Test It is a test designed to measure an individuals

skill or competence in an occupation or vocation.

CHAPTER 3 Assessment of Learning in the Cognitive Domain


Learning and achievement in the cognitive domain are usually measured in school through the use of paper and pencil tests (Oliva, 1988). Teachers have to measure students achievement in all the levels of the cognitive domain. Thus, they need to cognizant with the procedures in the development of the different types of paper and pencil tests. This chapter is focused on acquainting prospective teachers with methods and techniques of measuring learning in the cognitive domain. Behaviors Measured and Assessed in the Cognitive Domain There are three domains of behavior measured and assessed in schools. The most commonly assessed, however, is the cognitive domain. The cognitive domain deals with the recall or recognition of knowledge and the development into six hierarchical levels, namely: knowledge, comprehension, application, analysis, synthesis, and evaluation.
1. Knowledge Level: behaviors related to recognizing and

remembering facts, concepts, and other important data on any topic or subject.

2. Comprehension

Level: behaviors associated with the

clarification and articulation of the main idea of what students are learning.
3. Application Level: behaviors that have something to do with

problem solving and expression, which require students to apply what they have learned to other situations or cases in their lives.
4. Analysis Level: behaviors that require students to think

critically, such as looking for motives, assumptions, cause effect relationship, differences and similarities, hypotheses, and conclusions.
5. Synthesis Level: behaviors that call for creative thinking,

such as combining elements in new ways, planning original experiments, creating original solutions to a problem and building models.
6. Evaluation Level: behaviors that necessitate judging the

value or worth of a person, object, or idea or giving opinion on an issue.

Preparing for Assessment of Cognitive Learning


Prior to the construction of paper and pencil test to be use in the measurement of cognitive learning, teachers have to answer the following questions (Airisian, 1994): What should be tested; what emphasis to give to the various objectives taught; whether to administer a paper and pencil test or observe each student directly; how long the test should take; and how best to prepare students for testing. What Should Be Tested. Identification of the information, skills, and behaviors to be tested is the first important decision that a teacher has to take. Knowledge of what shall be tested will enable a teacher to develop an appropriate test for the purpose. The basic rule to remember, however, is that testing emphasis should parallel teaching emphasis. How to Gather Information About What to Test. A teacher has to decide whether he should give a paper and pencil test or simply gather information through observation. Should he decide to use a paper and pencil test, if he decides to use observation of students performance of the targeted skill, then he has to develop appropriate devices to use in recording his observations. Decisions on how to gather information about what to test depends on the objective or the nature or behavior to be tested.

How Long the Test Should Be. The answer to the aforementioned question depends on the following factors: age and attention span of the students; and type of questions to be used. How Best to Prepare Students for Testing. To prepare students for teaching, Airisian (1994) recommends the following measures; (1) providing learners with good instruction; (2) reviewing students before testing; (3) familiarizing students with question formats; (4) scheduling the test; and (5) providing students information about the test.

Assessing Cognitive Learning


Teacher use two types of tests in assessing student learning in the cognitive domain: objective test and essay test (Reyes, 2000). An objective test is a kind of test wherein there is only one answer to each item. On the other hand, an essay test is one wherein the test taker has the freedom to respond to a question based on how he feels it should be answered.

Types of Objective Tests


There are generally two types of objective tests: supply type and selection type (Carey, 1995). In the supply type, the student constructs his / her own answer to each question. Conversely, the student chooses the right answer to each item in the selection type of objective test. Supply types of Objective Tests: The following types of tests fall under the supply type of test: completion drawing type, completion statement type, correction type, identification type, simple recall type, and short explanation type (Ebel & Frisbie, 1998).

Completion Drawing Type an incomplete drawing is presented which the student has to complete. Example: In the following food web, draw arrow lines indicating which organisms are consumers and which are producers.

Completion Statement Type an incomplete sentence is presented and the student has to complete it by filling in the blank. Example: The capital city of the Philippines is

__________________.

Correction Type a sentence with underlined word or phrase is presented, which the student has to replace to make it right. Example: Change the underlined word / phrase to make each of the following statements correct. Write your answer on the space before each number. __________ 1. The theory of evolution was popularized by Gregor Mendel. __________ 2. Hydrography is the study of oceans and ocean currents.

Identification Type a brief description is presented and the student has to identify what it is. Example: To what does each of the following refer? Write your answer on the blank before each number. __________ 1. A flat representation of all curved surfaces of the earth. __________ 2. The transmission of parents characteristics and traits to their offsprings.

Simple Recall Type a direct question is presented for the student to answer using a word or phrase. Example: What is the product of two negative numbers? Who is the national hero in the Philippines?

Short Explanation Type similar to an essay test but requires a short answer. Example: Explain in a complete sentence why the Philippines Magellan. was not really discovered by

Selection Types of Objective Test. Included in the category of selection type, grouping type, matching type, multiple choice type, alternate response type, key list test, and interpreting exercise.

Arrangement Type Terms or objects are to be arranged by

the students in a specified order. Example 1: Arrange the following events chronologically by writing the letters A, B, C, D, E on the spaces provided. _______ Glorious Revolution _______ Russian Revolution _______ American Revolution _______ French Revolution _______ Puritan Revolution Example 2: Arrange the following planets according to their nearness to the sun, by using numbers, 1, 2, 3, 4, 5. _______ Pluto _______ Venus _______Jupiter _______ Mars _______ Saturn

Matching Type A list of numbered items are related to a list

of lettered choices. Example: Match the country in Column 1 with its capital city in Column 2. Write letters only. Column 1 ________ 1. Philippines ________ 2. Japan ________ 3. United States ________ 4. Great Britain ________ 5. Israel Column 2 a. Washington D. C. b. Jeddah c. Jerusalem d. Manila e. London f. Tokyo g. New York

Multiple Choice Type this type contains a question,

problem or unfinished sentence followed by several responses. Example: The study of value is (a) axiology (c) epistemology (b) logic (d) metaphysics.

Alternative Response Type A test wherein there are only

two possible answers to the question. The true false format is a form of alternative response type. Variations on the true false include yes no, agree disagree, and right wrong. Example: Write True, if the statement is true; False, if it is false. _________ 1. Lapulapu was the first Asian to repulse European colonizers in Asia. _________ 2. Magellans expedition of the Philippines led to the first circumnavigation of the globe. _________ 3. The early Filipinos were uncivilized before the Spanish conquest of the archipelago. _________ 4. The Arabs introduced Islam in Southern Philippines.

Key List Test A test wherein the student has to examine

paired concepts based on a specified set of criteria (Olivia, 1998). Example: Examine the paired items in Column 1 and Column 2. On the blank before each number, write: A = If the item in column 1 is an example of the item in column 2; B = If the item in column 1 is a synonym of the item in column 2; C = If the item in column 2 is opposite of the item in column 1; and D = If the item in Columns 1 and 2 are not related in any way. Column 1 _____ 1. capitalism _____ 2. labor intensive _____ 3. Planned economy _____ 4. opportunity cost _____ 5. free goods

Column 2 economic system capital intensive command economy demand and supply economic goods

Interpretive Exercise It is a form of a multiple choice type

of test that can assess higher cognitive behaviors. According to Airisian (1994) and Mitchell (1992), interpretive exercise provides students some information or data followed by a series of questions on that information. In responding to the questions in an interpretive exercise, the students have to analyze, interpret,

or apply the material provided, like a map, excerpt of a story, passage of a poem, data matrix, table or cartoon. Example: Examine the data on child labor in Europe during the period immediately after the Industrial Revolution in the continent. Answer the questions given below encircling the letter of your choice.

TABLE 1 Child Labor in the Years Right After the Industrial Revolution in Europe
Year 1750 1760 1770 1780 1790 1800 1820 Number of Child Laborers 1800 3000 5000 3400 1200 600 150

1.

The employment of child labor was greatly used in

____________. a. 1750 b. 1760 2. c. 1770 d. 1780

As industrialization became rapid, what year indicated a

sudden increase in the number of child laborers? a. 1760 b. 1770 3. c. 1780 d. 1790

Labor unions and government policies were responsible

in addressing the problems of child labor. In what year this evident? a. 1780 c. 1800

b. 1790

d. 1820

Essay Test
This type of test presents a problem or question and the student is to compose a response in paragraph form, using his or her own words, and ideas. There are two forms of the essay test: brief or restricted; and extended.

Brief or Restricted Essay Test This form of the essay test

requires a limited amount of writing or requires that a given problem be solved in a few sentences. Example: Why did early Filipino revolts fail? Cite and explain 2 reasons.

Extended Essay Test This form of the essay test requires a

student to present his answer in several paragraphs or pages of writing. It gives students more freedom to express ideas and opinions and use synthesizing skills to change knowledge into a creative idea. Example: Explain your position on the issue of charter change in the Philippines.

According to Reyes (2000) and Gay (1985), the essay test is appropriate to use when learning outcomes cannot be adequately measured by objective test items. Nevertheless, all levels of cognitive behaviors can be measured with the use of the essay test as shown below.

Knowledge Level Explain hoe Siddharta Guatama became Buddha.

Comprehension Level What does it mean when a person had crossed the Rubicon?

Application Level Cite three instances showing the application of the Law of Supply and Demand.

Analysis Level Analyze the annual budget of your college as to categories of funds, sources of funds, major

expenditures; and needs of your college.

Synthesis Level Discuss the significance of the Peoples Power Revolution in the restoration of democracy in the Philippines.

Evaluation Level Are you in favor of the political platform of the Peoples Reform Party? Justify your answer.

Choosing the type of test depends on the teachers purpose and the amount of time to be spent for the test. As a general rule, teachers must create specific tests that will allow students to demonstrate targeted learning competencies.

CHAPTER 4 An Introduction to the Assessment of Learning in the Psychomotor and Affective Domains

As pointed out in the previous chapter, there are three domains of learning objectives that teachers have to assess. While it is true that achievement in the cognitive domain is the one teachers measure frequently, students growth in non cognitive domains of learning should also be given equal emphasis. This chapter expounds different ways by which learning in the psychomotor and affective domains can be assessed and evaluated. Levels of Learning in the Psychomotor Domain The psychomotor domain of learning is focused on processes and skills involving the mind and the body (Eby & Kujawa, 1994). It is the domain of learning which classifies objectives dealing with physical movement and coordination (Arends, 1994; Simpson, 1966). Thus, objectives in the psychomotor domain require significant motor

performance. Playing a musical instrument, singing a song, drawing, dancing, putting a puzzle together, reading a poem and presenting a

speech are examples of skills developed in the aforementioned domain of learning. There are three levels of psychomotor learning: imitation,

manipulation and precision (Gronlund, 1970).

Imitation is the ability to carry out a basic rudiments of a skill when given directions and under supervision. At this level the total act is not performed skillfully. Timing and coordination of the act are not yet refined.

Manipulation is the ability to perform a skill independently. The entire skill can be performed in sequence. Conscious effort is no longer needed to perform the skill, but complete accuracy has not been achieved yet.

Precision is the ability to perform an act accurate, efficiently, and harmoniously. Complete coordination of the skill has been acquired. The skill has been internalized to such extent that it can be performed unconsciously.

Based on the foregoing list of objectives, it can be noted that these objectives range from simple reflex reactions to complex actions, which communicate ideas or emotions to others. Moreover, these objectives serve as a reminder to every teacher that students under his charge have

to learn a variety of skills and be able to think and act in simple and complex ways. Measuring the Acquisition of Motor and oral Skills There are two approaches that teachers can use in measuring the acquisition of motor and oral skills in the classroom: observation of student performance and evaluation of student projects (Gay 1990). Observation of Student Performance is an assessment approach in which the learner does the desired skill in the presence of the teacher. For instance, in physical Education class, the teacher can directly observe how male students dribble and shoot the basketball. In this approach, the teacher observes the performance of a student, gives feedback, and keeps a record of his performance, if appropriate. Observation of student performance can either be holistic or atomistic (Louisell & Descamps, 1992). Holistic observation is employed when the teacher gives a score or feedback based on pre established prototypes of how an outstanding, average, or deficient performance looks. Prior to the observation, the teacher describes the different levels of performance.

A teacher, for example, who required his students to make an oral report on a research they undertook, describes the factors which go into an ideal presentation. What the teacher may consider in grading the report, include the following: knowledge of the topic; organization of the presentation of the report; enunciation; voice projection; and enthusiasm. The ideal present has to be described and the teacher has to comment on each of these factors. A student whose presentation closely matches the ideal described by the teacher would receive a perfect mark. The second type of observation that can be utilized is atomistic or analytic. This type of observation requires that a task analysis be conducted in order to identify the major subtasks involved in the student performance. For example, in dribbling the ball, the teacher has to identify movements necessary to perform the task. Then, he has to develop pa checklist which enumerates the movements necessary to the performance of the task. These positions are demonstrated by the teacher. As students perform the dribbling of the ball, the teacher assigns checkmarks for each of the various subtasks. After the students has performed the specified action, all checkmarks are considered and an assessment of the performance is made.

Evaluation of Student Products is another approach that teachers can use in the assessment of students mastery of skills. For example, projects in different learning areas may be utilized in assessing students progress. Student products include drawings, models, construction paper products, etc. The same principles involved in holistic and atomistic observations apply to the evaluation of projects. The teacher has to identify prototypes representing different levels of performance for a project or do a task analysis and assign scores by subtasks. In either case, the student has to inform of the criteria and procedures to be used in the assessment of their work.

Assessing Performance through Student Portfolios


Portfolio assessment is a new form of assessing students performance (Mitchell, 1992). A portfolio is but a collection of the students work (Airisian, 1994). It is used in the classroom to gather a series of students performances or products that show their accomplishment and / or improvement over time. It consists of carefully selected samples of the students work indicating their growth and development in some curricular goals. The following can be included in a students portfolio: representative pieces of his / her writing; solved math problems; projects and puzzles completed; artistic creations; videotapes of performance; and even tape recordings. Wolf (1989) says that portfolios can be used for the following purposes: Providing examples of student performance to parents; Showing student improvement over time; Providing a record of students typical performances to pass on to the next years teacher; Identifying areas of the curriculum that need improvement; Encouraging students to think about what constitutes good performance in a learning area; and Grading students.

According to Airisian (1994), there are four steps to consider in making use of this type of performance assessment. (1) establishing a clear purpose; (2) setting performance criteria; (3) creating an appropriate setting; and (4) forming scoring criteria or predetermined rating. Purpose is very important in carrying out portfolio assessment. Thus, there is a need to determine beforehand the objective of the assessment and the guidelines for student products that will be included in the portfolio prior to compilation. While teachers need to collaborate with their colleagues in setting a common criterion, it is crucial they involve their students in setting standards or performance. This will enable the latter to claim ownership over their performance. Portfolio assessment also needs to consider the setting in which students performance will be gathered. Shall it be a written portfolio? Shall it be a portfolio of oral or physical performances, science experiments, artistic productions and the like? Setting has to be looked into since arrangements have to be made on how desired performance can be properly collected.

Lastly, scoring methods and judging students performance are required in portfolio assessment. Scoring students portfolio, however, is time consuming as a series of documents and performances has to be scrutinized and summarized. Rating scales, anecdotal records, and checklists can be used in scoring students portfolios. The content of a portfolio, however, can be reported in the form of a narrative.

Tools for Measuring Acquisition of Skills


As pointed out previously, observation of student performance and evaluation of student products are ways by which teachers can measure the students acquisition of motor and oral skills. To overcome the problem relating to validity and reliability, teachers can use rating scales, checklists or other written guides to help them come up with unbiased or objective observations of student performance. Rating scale is nothing but a series of categories that is arranged in orders of quality. It can be helpful in judging skills, products, and procedures. According to Reyes (2000), there are three steps to follow in constructing a rating scale. Identify qualities of the product to be assessed. Create a scale for each quality or performance aspect.

Arrange the scales either from positive or negative or vice versa. Write directions for accomplishing the rating scale. Following is an example of a rating scale for judging a student teacher presentation of a lesson.

Rating Scale for Lesson Presentation


Student Teacher ___________________________ Date ______________ Subject _____________________________________________________ Rate the student teacher on each of the skill areas specified below. Use the following code: 5 = Outstanding; 4 = Very satisfactory; 3 = Satisfactory; 2 = Fair; 1 = Needs improvement. Encircle the number corresponding to your rating. 5 5 5 5 5 4 4 4 4 4 3 3 3 3 3 2 2 2 2 2 1 1 1 1 1 Audience contact Enthusiasm Speech quality and delivery Involvement of the audience Use of non verbal communication

5 5 5 5

4 4 4 4

3 3 3 3

2 2 2 2

1 1 1 1

Use of questions Directions and refocusing Use of reinforcement Use of teaching aids and instructional materials

A checklist differs from a rating scale as it indicates the presence or absence of specified characteristics. It is basically a list of criteria upon which a students performance or end product is to be judged. The checklist is used by simply checking off the criteria items that have been met. Response on a checklist varies. It can be a simple check mark indicating that an action took place. For instance, a checklist for observing student participation in the conduct of a group experiment may appear like this: 1. Displays interest in the experiment. 2. Helps in setting up the experiment. 3. Participates in the actual conduct of the experiment. 4. Makes worthwhile suggestions.

The rater would simply check the items occurred during the conduct of the group experiment. Another type of checklist requires a yes or no response. The yes is checked when the action is done satisfactorily; the no is checked when the action is done unsatisfactorily. Below is an example of this type of checklist.

Performance Checklist for a Speech Class


Name ___________________________________ Date ______________ Click Yes or No as to whether the specified criterion is met. Did the student: 1. Use correct grammar? 2. Make clear presentation? 3. Stimulate interest? 4. Use clear direction? 5. Demonstrate poise? 6. 7. Manifest enthusiasm? Use appropriate voice projection? YES ON

_______________ ______________ _______________ ______________ _______________ ______________ _______________ ______________ _______________ ______________ _______________ ______________ _______________ ______________

Levels of Learning in the Affective Domain


Objectives in the affective domain are concerned with emotional development. Thus, affective domain deals with attitudes, feelings, and emotions. Learning intent in this domain of learning is organized according to the degree of internalization. Kratwhol and his colleagues (1964) identified four levels of learning in the affective domain.

Receiving involves being aware of and being willing to freely attend to a stimulus.

Responding involves active participation. It involves not only freely attending to a stimulus but also voluntarily reacting to it in some way. It requires physical, active behavior.

Valuing refers to voluntarily giving worth to an object, phenomenon or stimulus. Behaviors at this level reflect a belief, appreciation, or attitude.

Commitment involves building an internally consistent value system and freely living by it. A set of criteria is established and applied in making choices.

Evaluating Affective Learning


Learning in the affective domain is difficult and sometimes impossible to assess. Attitudes, values and feelings can be intentionally concealed. This is because learners have the right not to show their personal feelings and beliefs, if they choose to do. Although the achievement of objectives in the affective domain are important in the educational system, they cannot be measured or observed like objectives in the cognitive and psychomotor domains. Teachers attempt evaluating affective outcomes when they

encourage students to express feelings, attitudes, and values about topics discussed in class. They can observe students and may find evidence of some affective learning. Although, it is difficult to assess learning in the affective domain, there are some tools that teachers can use in assessing learning in this area. Some of these tools are the following: attitude scale; questionnaire; simple projective techniques; and self expression techniques (Escarilla & Gonzales, 1990; Ahmann & Glock, 1991). Attitude Scale is a form of rating scale containing statements designed to gauge students feelings on an attitude or behavior. An example of an attitude scale is shown below.

An Attitude Scale for Determining Interest in Mathematics


Name __________________________________ Date _______________ Each of the statements below expresses a feeling toward mathematics. Rate each statement on the extent to which you agree. Use the following response code: SA = Strongly Agree; U = Uncertain; D = Disagree; SD = Strongly Disagree. 1. I enjoy my assignments in Mathematics. 2. The book we are using in the subject is interesting. 3. The lessons and activities in the subject challenge me to give my best. 4. I do not find exercises during our lesson boring. 5. Mathematical problems encourage me to think critically. 6. I feel at ease during recitation and board work. 7. My grade in the subject is commensurate to the effort I exert. 8. My teacher makes the lesson easy to understand 9. I would like to spend more time in this subject. 10. I like the way our teacher presents the steps in solving mathematical problems.

Response to the items is based on the response code provided in the attitude scale. A value ranging from 1 to 5 is assigned to the options provided. The value of of 5 is usually assigned to the option strongly agree and 1 to the option strongly disagree. When a statement is negative, however, the assigned values are usually reversed. The composite score is determined by adding the scale values and dividing it by the number of statements or items. Questionnaire can also be used in evaluating attitudes, feelings, and opinions. It requires students to examine themselves and react to a series of statements about their attitudes, feelings, and opinions. The response style for a questionnaire can take any of the following forms: checklist type, semantic differential, and likert scale

The Checklist type of response provides the students a list of adjectives for describing or evaluating something and requires them to check those that apply. For example, a checklist questionnaire on students attitudes in a science class may include the following: This class is ________________ boring. ________________ exciting. ________________ interesting. ________________ unpleasant. ________________ highly informative. I find Science ________________ fun. ________________ interesting. ________________ very tiring. ________________ difficult. ________________ easy. The scoring of this type of test is simple. Subtract the number of negative statements checked from the number of positive statements checked.

Semantic

differential

is

another

type

of

response

on

questionnaire. It is usually a five point scale showing polar or opposite objectives. It is designed so that attitudes, feelings, and opinions can be measured by degrees from very favorable to very unfavorable. Given below is an example of a questionnaire employing the aforementioned response type. Working with my group members is: Interesting Challenging Fulfilling _____ : _____ : _____ : _____ : _____ Boring _____ : _____ : _____ : _____ : _____ Difficult _____ : _____ : _____ : _____ : _____ Frustrating

The composite score on the total questionnaire is determined by averaging the scale values given to the items included in the questionnaire. Likert scale is one of the frequently used styles of response in attitude measurement. It is oftentimes a five point scale links the options strongly agree and strongly disagree. An example of this kind of response is shown below.

A Likert Scale for Assessing Students Attitude Towards Leadership Qualities of Student Leaders
Name ____________________________________ Date _____________ Read each statement carefully. Decide whether you agree or disagree with each of them. Use the following response code: 5 = Strongly disagree; 4 = Agree; 3 = Undecided; 2 = Disagree; 1 = Strongly Disagree. Write your response on the blank before each item. Student leaders: 1. Have to work for the benefit of the students.
2. Should set example of good behavior to the

members of the organization. 3. Need to help the school in implementing campus rules and regulations. 4. Have to project a good image of the school in the community. 5. Must speak constructively of the schools teacher and administrators. Scoring of a Likert scale is simlar to the scoring of an attitude scale earlier presented in this chapter.

Simple projective techniques are usually used when a teacher wants to probe deeper into the students feelings and attitudes. Escarilla and Gonzales (1990) say that there are three types of simple projective techniques that can 1be used in the classroom, namely: word association, unfinished sentences, and unfinished story. In word association, the student is given a word and asked to mention what comes to his / her mind upon hearing it. For example, what comes to your mind upon hearing the word corruption? In an unfinished sentence, the students are presented partial sentences and are asked to complete them with words that best express their feeling, for instance: Given the chance to choose, I _____________________________. I am happy when _______________________________________. My greatest failure in life was ______________________________. In an unfinished story, a story with no ending is deliberately presented to the students, which they have to finish or complete. Through this technique, the teacher will be able to sense students worries, problems, and concerns.

Another way by which affective learning can be assessed is through the use of self expression techniques. Through these techniques, students are provided the opportunity to express their emotions and views about issues, themselves, and others. Self expression techniques may take any of the following forms: log book of daily routines or activities, diaries, essays and other written compositions or themes, and

autobiographies.

CHAPTER REVIEW
1. What is meant by psychomotor learning? What are the levels of learning under the psychomotor domain? Explain each. 2. What are the two general approaches in measuring the acquisition of motor and oral skills? Differentiate each. 3. What are the guidelines to observe in undertaking atomistic and holistic observation? 4. What is portfolio assessment? What are the advantages of using this type of assessment in evaluating student performance and student products? 5. What are the guidelines to observe in using portfolio assessment in the classroom?

6. What are the tools teachers can use in measuring students acquisition of motor and oral skills? Briefly define each. 7. What do we mean by affective learning? What are the different levels of affective learning? Describe each briefly. 8. What are the techniques teachers can employ in evaluating affective learning? Discuss each very briefly.

CHAPTER 5 Constructing Objective Paper and Pencil Tests

Constructing paper and pencil test is a professional skill. Becoming proficient at it takes study, and practice. Owing to the recognized importance of a testing program, a prospective teacher has to assume this task seriously and responsibly. He / She needs to be familiar with the different types of test items and how best to write them. This chapter seeks to equip prospective teachers with the skill in constructing objective paper and pencil tests.

General Principles of Testing


Ebel and Frisbie (1999) listed five basic principles that should guide teachers in measuring learning and in constructing their own test. These principles are discussed below.

Measure all instructional objectives. The test a teacher writes should be congruent with all the learning objectives focused in class.

Cover all learning tasks. A good test is not focused only on one type of objective. It must be truly representative of all targeted learning outcomes.

Use appropriate test items. Test items utilized by a teacher have to be in consonance with the learning objectives to be measured.

Make test valid and reliable. Teachers have to see to it that the test they construct measures what it purports to measure. Moreover, they need to ensure that the test will yield consistent results for the students taking it for the second time.

Use test to improve learning. Test scores obtained by the students can serve as springboards for the teachers to re-teach concepts and skills that the former have not mastered.

Attributes of a Good Test as an Assessment Tool


A good test must possess the following attributes or qualities: validity; reliability; objectivity; scorability; administrability; relevance; balance; efficiency; diffculty; discrimination; and fairness (Sparzo, 1990; Reyes 2000; Manarang and Manarang, 1993; Medina; 2002).

Validity It is the degree to which a test measures what it seeks to measure. To determine whether a test a teacher constructed is valid or not, he / she has to answer the following questions: 1. Does the test adequately sample the intended content? 2. Does it test the behaviors / skills important to the content being tested? 3. Does it test all the instructional objectives of the content take up in class?

Reliability It is the accuracy with which a test consistently measures that which it does measure. A test, therefore, is reliable if it produces similar results when used repeatedly. A test may be reliable but not necessarily valid. On the other hand, a valid test is always a reliable one.

Objectivity It is the extent to which personal biases or subjective judgment of the test scorer is eliminated in checking the student responses to the test items, as there is only one correct answer for each question. For a test to be considered objective, experts must agree on the right of the best answer. Thus, objectivity is a characteristic of the scoring of the test and not of the form of the test questions.

Scorability It is easy to score or check as answer key and answer sheet are provided.

Administrability It is easy to administer as clear and simple instructions are provided to students, proctors, and scorers.

Relevance It is the correspondence between the behavior required to respond correctly to a test item and the purpose or objective in writing the item. The test item should be directly related to the course objectives and actual instruction. When used in relation to educational assessment, relevance is considered a major contributor to test validity.

Balance Balance in a test refers to the degree to which the proportion of items testing particular outcomes corresponds to the deal test. The framework of the test is outlined by a table of specifications.

Efficiency It refers to the number of meaningful responses per unit of time. Compromise has to be made the available time for testing, scoring, and relevance.

Difficulty The test items should be appropriate in difficulty level to the group being tested. In general, for a norm referenced test, a reliable test is one in which each item is passed by half of the students. For a criterion referenced test, difficulty can be judged relative to the percentage passing before and after instruction. Difficulty will indefinitely be based on the skill and knowledge measured and students ability.

Discrimination For a norm referenced, the ability of an item to discriminate is generally indexed by the difference between the proportion of good and poor students who respond correctly. For a criterion referenced test,

discrimination is usually associated with pretest and posttest

differences of the ability of the test or item to distinguish competent from less competent students.

Fairness To ensure fairness, the teacher should construct and administer the test in manner that allows students an equal chance to demonstrate their knowledge or skills.

Steps in Constructing Classroom Tests


Constructing classroom tests is a skill. As such, there are steps that a teacher has to follow (Reyes, 2000). These steps are outlined and discussed below.

Identification of instructional objectives and learning outcomes. This is the first step a teacher has to undertake when constructing classroom tests. He / She has to identify instructional objectives and learning outcomes, which will serve as his / her guide in writing test items.

Listing of the topics to be covered by the Test. After identifying the instructional objectives and learning outcome, a teacher needs to outline the topics to be included in the test.

Preparation of Table of Specification (TOS). The table of specifications is a two way table showing the content coverage of the test and the objectives to be tested. It can serve as a blueprint in writing the test items later.

Selection of the Appropriate Types of Tests. Based on the TOS, the teacher has to select test types that will enable him / her to measure the instructional objectives in the most effective way. Choice of test type depends on what shall be measured.

Writing Test Items. After determining the type of test to use, the teacher proceeds to write the suitable test items.

Sequencing the Items. After constructing the test items, the teacher has to arrange them based on difficulty. As a general rule items have to be sequenced from the easiest to the most difficult for psychology reason.

Writing the Directions or Instructions. After sequencing items, the teacher has to write clear and simple directions, which the students will follow in answering the test questions.

Preparations of the Answer Sheet and Scoring Key. To facilitate checking of students answers, the teacher has to provide answer sheets and prepare a scoring key in advance.

Preparing the Table of Specifications (TOS)


As already mentioned the table of specifications is the teachers blueprint in constructing a test for classroom use. According to Arends (2001), the TOS is valuable to teachers for two reasons. First, it helps teachers decide on what to include and leave out in a test. Second, it helps them determine how much weight to give for each topic covered and objective to be tested. There are steps to observe in preparing a table of test specifications. 1. List down the topics covered for inclusion in the test. 2. Determine the objectives to be assessed by the test. 3. Specify the number of days / hours spent for teaching a particular topic.

4. Determine percentage allocation of test items for each of the

topic covered. The formula to be applied is as follows: % for a Topic = Total number of days / hours spent divided by the total number of days / hours spent teaching the topic. Example: Mrs. Sid Garcia utilized 10 hours for teaching the unit on Pre Spanish Philippines. She spent 2 hours in teaching the topic, Early Filipinos and their Society. What percentage of test items should she allocate for the aforementioned topic? Solution: 2 (100) = 20% 4
5. Determine the number of items to construct for each topic.

This can be done by multiplying the percentage allocation for each topic by the total number of items to be constructed. Example: Mrs. Sid Garcia decided to prepare a 50 item test on the unit, Pre Spanish Philippines. How many items should she write for the topic mentioned in step number 4? Solution: 50 items x 0.20 (20%) = 10 items

6. Distribute the number of items to the objectives to be tested. The number of items allocated for each objective depends on the degree of importance attached by the teacher to it. After going through the six steps, the teacher has to write the TOS in a grid or matrix, as shown below.

Table of Specification for a 50 Item Test in Economics


Topic / Objective The Nature of Economics Economics Systems Law of Demand & Supply Price Elasticity of Demands & Supply Total Knowledge 2 3 3 Comprehension 2 2 3 Application 1 3 3 Analysis 5 2 6 Total 10 10 15

15

10

10

10

20

50

General Guidelines in Writing Test Items


Airisian (1994) identified five basic guidelines in writing test items. These guidelines are as follows: 1. Avoid wording that is ambiguous and confusing. 2. Use appropriate vocabulary and sentence structure. 3. Keep questions short and to the point. 4. Write items that have one correct answer. 5. Do not provide clues to the answer.

Criteria for Providing Test Directions


Test directions are very important in any written test as the inability of the test taker to understand them affects the validity of a test. Thus, direction should be complete, clear and concise. The students must be aware of what is expected of them. The method of answering has to be kept as simple as possible. Test directions should also contain instructions on guessing. The following criteria should be kept in mind when writing directions for a test (Linn, 1999): Assume that the examinees and the examiner know nothing at all about the objective tests.

In writing directions, use a clear, succinct style. Be as explicit as possible but avoid long drawn out explanations. Emphasize the more important directions and key activities through the use of understanding, italics, or different type size or style. Field or pretest the directions with a sample of both examinees and examiners and to identify and possible gather

misunderstandings

inconsistencies

suggestions for improvement. Keep directions for different forms, subsections or booklets as uniform as possible.

Where necessary or helpful, give practice items before each regular section. This is very important when testing young children or those unfamiliar with the objective tests, or separate answer sheets.

Writing Multiple Choice Items


The most widely used form of the test is the multiple choice item. This is because of its versatility. It can be used in measuring different kinds of content and almost any type of cognitive behavior, from factual knowledge to analysis of complex data. Furthermore, it is easy to score. A multiple choice item is composed of a stem, which sets up the problem and asks a question, followed by a number of alternative responses. Only one of the alternatives is the correct answer, the other alternatives are distractors or foils. The principal goal for a multiple choice item construction is to write clear, concise or unambiguous items. Consider the example below. Poor: The most serious disease in the world is (A) Mental illness (B) AIDS (C) Heart disease (D) Cancer

The correct answer depends on what is meant by serious. Considering that heart disease leads to more deaths, mental illness affects a number of people, and AIDS is a world wide problem nowadays, there are three possible answers. Nevertheless, the question can be reworded as follows, for example: Improved: The leading cause of death in the world today is: (C) Heart disease (D) Cancer

(A) Mental illness (B) AIDS

To be able to write effective multiple choice items, the following guidelines should be followed: 1. Each item should be clearly stated, in the form of a question or an incomplete statement. 2. Do not provide grammatical or contextual clues to the correct answer. For instance, the use of a before the options indicates that the answer begins with a vowel. 3. Use understand.
4.

language

that

even

the

poorest

readers

will

Write a correct or best answer and several plausible distractors.

5.

Each alternate response should fit the stem in order to avoid giving clues to its correctness.

6.

Refrain from using negatives or double negatives. They tend to make the items confusing and difficult.

7.

Use all of the above and none only when they will contribute more than another plausible distractor.

8.

Do not use items directly from the textbook. Test for understanding not memorization. Examine the following multiple choice items. Sample 1: A two way grid summarizing the relationship between test scores and criterion scores is

sometimes referred to as an: (A) Correlation coefficient. (C) Probability histogram. (B) Expectancy table. (D) Bivariate frequency distribution

Sample 1 is faulty because of the use of article an. This is because this article can lead the student to the correct answer, which is B. Improved: Two way grids summarizing test criterion relationships are sometimes called: (A) Correlation coefficient. (C) Probability histogram. (B) Expectancy table. (D) Bivariate frequency distribution

Sample 2: Which of the following descriptions makes clear the meaning of the word electron? (A) An electronic gadget (B) Neutral particles (C) Negative particles Sample 2 is poorly written owing to its use of distractors that are not plausible or closely related to each other. Options A and D are not in anyway associated with the remaining choices or alternatives. Improved: Which of the following phrases is a description of an electron? (A) Neutral particle (B) Negative particle (C) Neutralized proton Sample 3: What is the area of a right triangle whose sides adjacent to the right angle are 4 inches and 3 inches, respectively? Sample 3 is also erroneously written as it used the option none of the above without caution. Why? This is because the answer is 6 inches and the bright student will definitely choose option D. on the other hand, (D) Related particle (E) Atom nucleus (D) A voting machine (E) The nuclei of atoms

the student who solved the problem incorrectly and came up with an answer not found among the choices, would choose D, thereby getting the correct answer for the wrong reason. The answer, none of the above can be a good alternative if the correct answer is included among the options or choices. Improved: What is the area of a right triangle whose sides adjacent to the right angles are 4 inches,

respectively? (A) 6 square inches (B) 7 square inches (C) 12 square inches (D) 13 square inches (E) none of the above

Using Multiple Choice Items in Assessing Problem Solving and Logical Thinking
Schools today are stressing on problem solving skills owing to societys pressures on the former to produce individuals with significant skills in the aforementioned area. A number of terms have been used to describe the basic operations of application. Terms like critical thinking and logical reasoning are used as rubrics under which the basic processes of problem identification, specification of alternative solutions, evaluation of consequences, and solution selection are grouped.

Creating problem solving measures follows a step by step procedures (Haladyna & Downing, 1999). Step 1. Decide on the principle / s to be tested. Criteria to be considered should: Be known principles but the situation in which the principles are to be applied should be new. Involve significantly important principles. Be pertinent to a problem or situation common to all students. B e within the range of comprehension of all students. Use only valid and reliable sources from which to draw data Be interesting to students. Step 2. Determine the phrasing of the problem situation so as to require the students in drawing their conclusion to do one of the following: Make a prediction. Choose a course of action. Offer an explanation for an observed phenomenon. Criticize a prediction or explanation made by others.

Step 3. Set up the problem situation in which the principle or principles selected operate. Present the problem to the class with directions to draw a conclusion or conclusions and give several supporting reasons foe their answer. Step 4. Edit the students answers, selecting those that are most representative conclusions of their thinking. These will include both

and

supporting

reasons

that

are

acceptable and unacceptable. Step 5. To the conclusions and reasons obtained from the students, the teacher now adds any others that he or she feels are necessary to cover the salient points. The total number of items should be at least 50% more than is desired in the final form to allow for elimination of poor items. Some types of statements that can be used are as follows: True statements of principles and facts False statements of principles and facts Acceptable and unacceptable analogies Appeal to acceptable or unacceptable authority

Ridicule Assumption of the conclusion Teleological explanations Step 6. Submit tests to colleagues or evaluators for criticisms. Revise test based on these criticisms. Step 7. Administer test. Follow with thorough class discussion. Step 8. Conduct an item analysis. Step 9. In the light of steps 7 and 8, revise the test. Following are some examples of problem solving items.
1. Ulysses wanted to go to the US. But Ulysses father, who is quite

strict with him, stated emphatically that he could not go unless he got a grade of 1.25 in both his freshman English courses, Ulysses father always keep his promises. When summer came, Ulysses went to the US. If from this information, you conclude that Ulysses earned 1.25, you must be assuming that: (A) Ulysses had never obtained a grade of 1.25 before. (B) Ulysses had no money of his own.

(C) Ulysses father was justified in saying what he did. (D) Ulysses went to the US with his fathers consent. (E) Ulysses was very sure that he would be able to go. 2. Consider these facts about the coloring of animals: Plant lice, which live on the stems of green plants,

are green. The grayish mottled moth resembles the bark of

the trees on which it lives. Insects, birds, and mammals that live in the desert

are usually sandy or grey.

Polar bears and other animals living in the Arctic

region are white. Which one of the following statements do these facts tend to support? Animals that prey on others use colors as disguise. Some animals imitate the color and shape of other natural objects for protection. The coloration of animals has to do with their surroundings.

Protective coloration is found more among insects and birds than among mammals. Many animals and insects have protective coloring.

Writing Alternate Response Items


An alternate response item is one wherein there are only two possible answers to the stem. The true false format is an alternate response item. Some variations of the basic true false item include yes no, right wrong, and agree disagree items. Alternate response items seem easy to construct. Writing good alternate response items, however, requires skill so as to avoid triviality. Writing good true false items is difficult as there are few assertions that are unambiguously true or false. Besides, they are sensitive to guessing. Some guidelines to follow in writing alternate response items are given below. 1. Avoid the use of negatives. 2. Avoid the use of unfamiliar or esoteric language. 3. Avoid trick items that appear to be true but are false because of an inconspicuous word or phrase. 4. Use quantitative and precise rather than quantitative language where possible.

5. Dont make true items longer than false items.

6. Refrain from creating a pattern of response. 7. Present a similar number of true and false statements. 8. Be sensitive to the use of specific determiners. Words such as always all, never, and none indicate sweeping generalizations, which are associated with false items. Conversely, words like usually and generally are associated with true items. 9. A statement must only have one central idea. 10. Avoid quoting exact statements from the textbooks. Let us go over examples of the alternate response test items. Sample 1. The raison detre for capital punishment is retribution according to some peripatetic politicians. This sample alternate response item is poorly written for it used words that are very unfamiliar or difficult to understand by an average student. Improved: According to some politicians, the justification for the existence of capital punishment can be traced to the biblical statement, an eye for an eye, a tooth for a tooth.

Sample 2. From time to time efforts have been made to explain the notion that there may be a cause and effect relationship between arboreal life and primate anatomy. Sample 2 id again faulty as it was copied exactly between from the textbook. Improved: There is a known relationship between primate anatomy and arboreal life. Sample 3. Many people voted for Gloria Macapagal Arroyo in the last presidential election. Sample 3 also violates the rule on writing alternate response items owing to its use of not precise language. As such it is open to numerous and ambiguous interpretation. Improved: Gloria Macapagal Arroyo received more than 50% of the votes cast in the last presidential election. Alternate response items allow teachers to sample a number of cognitive behaviors in a limited amount of time. Even the scoring of alternative response items tends to be simple and easy. Nonetheless, there are content and learning outcomes that cannot be adequately measured by alternate response items, like problem solving and complex learning.

Writing Matching Items


Matching items are designed to measure students ability to single out pairs of matching phrases, words or other related facts from separate lists. It is basically an efficient arrangement of a set multiple choice items with all stems, called premises, having the same set of possible alternative answers. Matching items are appropriate to use in measuring verbal associative knowledge (Moore, 1997) or knowledge such as inventors and inventions, titles and authors, or objects and their basic characteristics. To be able to write good matching items, the following guidelines have to be considered in the process. 1. Specify the basis for matching the premises with the

responses. Sound testing practice dictates that the directions spell out the nature of the task. It is unfair and reasonable that the student should have to read through the stimulus and response list in order to discern the basis for matching. 2. Be sure that the whole matching exercise is found on one

page only. Splitting the exercise is confusing, distracting, and time consuming for the student.

3.

Avoid including too many premises on one matching item.

If a matching exercise is too long, the task becomes tedious and the discrimination too fine. 4. Both the premises and responses in the same general

category or class (e.g. inventors inventions; authors literary works; objects - characteristics). 5. Premises or responses composed of one or two words

should be arranged alphabetically. Analyze the following matching exercise. Does it follow the suggestions on writing a matching exercise? Directions: Match Column A with Column B. You will be given one point for each correct match. Column A 1. Execution of Rizal 2. Pseudonym of Ricarte 3. Hero of Tirad Pass 4. Arrival of the Spaniards in the Philippines 5. Masterpiece of Juan Luna Column B a. 1521 b. 1896 c. Gregorio del Pilar d. Spolarium e. Vibora

The matching exercise is poorly written as the premises in column A do not belong to same category. Thus, answers can easily be guessed by the student. Below is the version of the above matching exercise. Column A
1. National Hero of the Philippines 2. Hero of Tirad Pass 3.

Column B a. Aguinaldo b. Bonifacio c. Del Pilar d. Rizal Jacinto e. Mabini f. Rizal

Brain of the Katipunan


4. Brain of the Philippine Revolution 5. The Sublime Paralytic

Writing Completion Items


Completion items require the students to associate an incomplete statement with a word of phrase recalled from memory (Ahman, 1991). Each completion test item contains a blank, which the student must fill in correctly with one word or a short phrase. Inasmuch as the student is required to write test items are useful for the testing of specific facts. Guidelines in constructing completion items are as follows:

1.

As a general rule, it is best to use only one blank in a

completion item. 2. The blank should be placed near or at the end of the

sentence. 3. Give clear instructions indicating whether synonyms will be

correct and whether spelling will be a factor in scoring. 4. Be definite enough in the incomplete statement so that only

one correct answer is possible. 5. Avoid using direct statements from the textbooks with a

word or two missing. 6. All blanks for all items should be of equal length and long

enough to accommodate the longest response. Go over the following sample items: Directions: On your answer sheet, write the expression that completes each of the following sentences. 1. __________ is money earned from the use of money. 2. The Philippines is at the _________ and ________ of ________. Sample 1 is poorly written as a well written completion item should have its blank either near or at the end of the sentence. In like manner Sample 2 is also poorly written as the statement is over mutilated. Following are the improved versions of these sample items.

1. Money earned from the use of money is called _________. 2. The Philippines is located in the continent of _________.

Writing Arrangement Items


Arrangement items are used for knowledge of sequence and order. Arrangement of words alphabetically, of events chronologically, of numbers according to magnitude, stages in a process, incidents in a story or novel in a word, are a few cases of this type of test. Some guidelines on preparing this type of test are as follows: 1. Items to be arranged should belong to one category only. 2. Provide instructions on the rationale for arrangement or sequencing. 3. Specify the response code students have to use in arranging the items. 4. Provide sufficient space for the writing to the answer. Following are examples or arrangement items. Sample 1 Directions: Arrange the following decimals in the order of magnitude by placing 1 above the smallest, 2 above the next, 3 above the third, and 4 above the biggest. (a) 0.2180 (b) 0.2801 (c) 0.2018 (d) 0.2081

Sample 2 Directions: The following words are arranged at random. On your answer sheet, rearrange the words so that they will form a sentence. much the costs rose

Sample 3 Directions: Each group of letters below spell out words item if the letters are properly arranged. On your answer sheet, rearrange the letters in each group to form a word. ybo ebul swie atgo

Writing Completion Drawing Items


As pointed out in the previous chapter, a completion drawing item is one wherein an incomplete drawing is presented which the student has to complete. The following guidelines have to be observed in writing the aforementioned type of test item: 1. Provide instruction on how the drawing will be completed. 2. Present the drawing to be completed.

Writing Correction Items


The correction type of test item is similar to the completion item, except that some words or phrases have to be changed to make the sentence correct. The following have to be considered by the teacher in writing this kind of test item. 1. Underline or italicize the word of phrase to be corrected in a sentence. 2. Specify in the instruction where students will write their correction of the underlined or italicized word or phrase. 3. Write items that measure higher levels of cognitive behavior. Following are examples of correction items written following the guidelines in constructing this kind of item. Directions: Change the underlined word or phrase to make each of the following statements correct. Write your answer on the space before each number.
1.

Inflation caused by increased demand is

known as oil push.

2.

Inflation is the phenomenon of falling

prices.
3.

Expenditure on non food items increases

with increased income according to Keynes.


4.

The additional cost for producting an

additional unit of a product is average cost.


5.

The sum of the fixed and variable costs is

total revenue.

Writing Identification Items


An identification type of test item is one wherein an unknown specimen is to be identified by name or other criterion. In writing this type of item, teachers have to observe the following guidelines: 1. The direction of the test should indicate clearly what has to be identified, like persons, instruments, dates, events, steps, in a process and formulas. 2. item. 3. The question should not be copied verbatim from the textbook. Sufficient space has to be provided for the answer to each

Following are examples of identification items written following the guidelines in constructing this type of test item. Directions: Following are phrase definitions of terms. Opposite each number, write the term defined. 1. Weight divided by volume. 2. Degree of Hotness or coldness of a body
3. Changing speed of a moving body

4. Ratio of resistance to effort

Writing Enumeration Items


An enumeration item is one wherein the student has to list down parts or elements / components of a given concept or topic. Guidelines to follow in writing type of test items include the following: 1. The exact numbers of expected answers have to be specified.

2. Spaces for the writing of answers have to be provided and should be of the same length. Below are examples of enumeration items. Directions: List down or enumerate what are asked for in each of the following.

Underlying Causes of World War I and II 1. ______________________ 2. ______________________ 3. ______________________ Factors Affecting the Demand for a Product 6. ______________________ 7. ______________________ 8. ______________________ 9. ______________________ 10. _____________________ 4. ______________________ 5. ______________________

Writing Analogy Items


An analogy item consists of a pair of words, which are related to each other (Calmorin, 1994). This type of item is often used in measuring the students skill in easing association between paired words or concepts. Examples of this type of item are given below. Example 1: Black is to white, as peace is to ______________.

(a) Unity (b) Discord

(c) Harmony (d) Concord

Example 2: Bonifacio is for the Philippines, while ______________ is for the United States of America. (a) Jefferson (b) Lincoln (c) Madison (d) Washington

The following guidelines have to be considered in constructing analogy items: (Calmorin, 1994). 1. The pattern of relationship in the first pair of words must be the same pattern in the second pair. 2. 3. Options must be related to the correct answer. The principle of parallelism has to be observed in writing the options. 4. More than three options have to be included in each analogy item to lessen guessing. 5. All items must be grammatically consistent.

Writing Interpretative Test Item


Interpretative test item is often used in testing higher cognitive behavior. This kind of test item may involve analysis of maps, figures, or

charts or even comprehension of written passages. Airisian (1994) suggested the following guidelines in writing this kind of test item: 1. The interpretative exercise must be related to the instruction provided the students. 2. The material to be presented to the students should be new to the students but similar to what was presented during instruction.

3.

Written passages should be as brief as possible. The exercise should not be a test of general reading ability.

4.

The students have to interpret, apply, analyze and comprehend in order to answer a given question in the exercise.

Writing Short Explanation Items


This type of item is similar to an essay test but requires a short response, usually a sentence or two. This type of question is a good practice for the students in expressing themselves concisely. In writing this type of test item, the following guidelines have to be considered: 1. Specify in the instruction of the test, the number of sentences that students can use in answering the question. 2. Make the question brief and to the point for the students not to be confused.

CHAPTER REVIEW
1. What are the basic principles of testing that teachers must consider in constructing classroom tests? Explain each briefly. 2. What are the steps or procedures teachers have to follow in writing their own tests? Explain the importance of each of them. 3. What is the table of specification (TOS)? How is it prepared? 4. 5. What are the general guidelines in writing test items? What are the specific guidelines to be observed in writing the following types of test item: 5.1 5.2 Multiple choice; True false;

5.3 5.4 5.5 5.6 5.7 5.8 5.9

Matching item; Arrangement item; Identification item; Correction item; Analogy; Interpretative exercise; Short explanation item?

CHAPTER 6 Constructing and Scoring Essay Tests


Many new teachers believe that easy tests are the easiest type of assessment instrument to construct and score. This is not actually true. The expenditure of time and effort is necessary if essay items and tests are to yield meaningful information. An essay test permits direct assessment of the attainment of numerous goals and objectives. In contrast with the objective test item types, an essay test demands less construction time per fixed unit of student time but a significant increase in labor and time for scoring. This chapter exposes you to the problems and

procedures involved in developing, administering, and scoring of essay tests.

General Types of Essay Items


There are two types of essay items: extended response and restricted response. An extended response essay item is one that allows for an in depth sampling of a students knowledge, thinking processes, and problem solving behavior relative to a specific topic. The open ended nature of task posed by an instruction such as discuss essay and objective tests is challenging to a student. In order to answer this question correctly, the student has to recall specific information and organize, evaluate, and write an intelligible composition. Since it is poorly structured, such a free response essay item would tend to yield a variety of answers from the examinees, both with respect to content and organization, and thus inhibit reliable grading. The potential ambiguity of an essay task is probably the single most important contributor to unreliability. In addition, the more extensive the responses required and the fewer questions a teacher may ask would definitely result to lower content validity of the test. On the other hand, a restricted response essay item is one where the examinee is required to provide limited response based on a specified

criterion for answering the question. It follows, therefore, that a more restricted response essay item is, in general, preferable. An instruction such as discuss the relative advantages and disadvantages of essay tests with respect to (1) reliability, (2) objectivity, (3) content validity, and (4) usability presents a better defined task more likely to lend itself to reliable scoring and yet allows examinees sufficient opportunity or freedom to organize and express their ideas creatively.

Learning Outcomes Measured Effectively with Essay Items


Essay questions are designed to provide the students the opportunity answer questions in their own words (Orristein, 1990). They can be used in assessing the students skill in analyzing, synthesizing, evaluating, thinking, logically, solving problems, and hypothesizing. According to Gronlund and Linn (1990), there are 12 complex learning outcomes that can be measured effectively with essay items. There are the abilities to: Explain cause effect relationships; Describe relevant arguments; Formulate tenable hypothesis;

State necessary assumptions; Describe the limitations of data; Explain methods and procedures; Produce, organize, and express ideas; Integrate learning in different areas; Create original forms; and Evaluate the worth of ideas.

Content versus Expression


It is frequently claimed the essay item allows the student to present his or her knowledge and understanding and to organize the material in unique form and style. More often or not, factors like expression, grammar, spelling and the like are evaluated in relation to content. If the teacher has attempted to develop students skills in expression, and if this learning outcome is included in the table of specifications, the assessment of such skills is just right and valid. If these skills are not part of the instructional program, its not right to assess them. If the score of each essay question includes an evaluation of the mechanics of English, this should be made known to the student possible separate scores should be given to content and expression.

Specific Types of Essay Questions


The following set of essay questions is presented to illustrate how an essay item is phrased or worded to elicit particular behaviors and levels of response. I. A. Recall Simple Recall

1. What is the chemical formula for sodium bicarbonate? 2. Who wrote the novel, The Last of the Mothicans? B. Selective Recall in which a basis for evaluation

or judgment is suggested 1. Who among the Greek philosophers affected your thinking as a student? 2. Which method of recycling is the most appropriate to use at home? II. A. Understanding Comparison of two phenomena on a single

designated basis
1. Compare 19th century and present day Filipino writers

with respect to their involvement in societal affairs.

B.

Comparison of two phenomena in general

1. Compare the Philippine Revolution of 1896 with that of Peoples Power Revolution of 1986. C. Explanation of the use or exact meaning of a

phrase or statement. 1. The legal system of the Mesopotamians was anchored on the principle of an eye for an eye, a tooth for a tooth. What dies these principle mean?

D.

Summary of a text or some portion of it

1. What is the central idea of communism as an economic system? E. Statement of an artists purpose in the

selection or organization 1. Why did Hemingway describe in detail the episode in which Gordon, lying wounded, engages the oncoming enemy?
III.

Application. It should be clearly understood that whether or not question requires application depends on the

preliminary educational experience. If an analysis has been taught explicitly, a questionnaire analysis is but a simple recall. A. Causes or Effects

1. Why did Fascism prevail in Germany and Italy but not in Great Britain and France? 2. Why does frequent dependence on penicillin for treatment minor ailment result in its reduced

effectiveness against major invasion of body tissues by infectious bacteria?

B.

Analysis

1. Why was Hamlet torn by conflicting desires? 2. Why was the Propaganda Movement a successful failure? C. Statement of Relationship

1. A researcher reported that teaching styles correlates with student achievement at about 0.75. What does this correlation mean? D. Illustrations or examples of principles

1. Identify three examples of the uses of the hammer in a typical Filipino home. E. situations 1. Would you weigh more or less on the moon? Why or why not? F. Reorganization of facts Application of rules or principles in specified

1. Some radical Filipino historians assert that the Filipino revolution against Spain was a revolution from the top not from below. Using the same observation, what other conclusion is possible?

IV. A.

Judgment Decision for or against

1. Should members of the Communist Party of the Philippines be allowed to teach in colleges and universities? Why or why not? 2. Nature is more influential than the environment in shaping an individuals personality. Prove or disprove this statement.

B.

Discussion

1. Trace the events that led to the downfall of the dictatorial regime of Ferdinand Marcos. C. Criticism of the adequacy, correctness, or

relevance of a statement 1. Former President Joseph Estrada was convicted for the case of plunder by the Sandiganbayan. Comment on the adequacy of the evidence used by the said tribunal in reaching a decision on the case field against the former chief executive of the country.

D.

Formulation of new questions

1. What should be the focus of researches in education to explain the incidence of failure among students with high intelligence quotient? 2. What questions should parents ask their children in order to determine the reasons why they join fraternities and sororities?

Following are examples of essay questions based on Blooms Taxonomy of Cognitive Objectives. A. Knowledge Explain how Egypt came to be called the gift of the Nile. B. Comprehension What is meant when a person says, I had just crossed the bridge? C. Application Give at least three examples of how the law of supply operates in our economy today. D. Analysis Explain the causes and effects of the Peoples Power Revolution on the political and social life of the Filipino people. E. Synthesis Describe the origin and significance of the celebration of Christmas the world over.

Sources of Difficulty in the Use of Essay Tests

There are four sources of difficulty that are likely to be encountered by teachers in the use of essay tests (Greenberg, et al, 1996). Let us over each of these difficulties and look into ways to minimize them. Question Construction. The preparation of the essay item is the most important in the development process. Language usage and word choice are particularly important during the construction process. The language dimension is very critical not only because it controls the comprehension level of the item for examinee, but it also specifies the parameters of the task. As a test constructor, you need to narrowly specify, define, and clarify what it is that you want from the examinees. Examine this sample essay question, Comment on the significance of Darwins Origin of Species. The question is quite broad considering that there are several ways of responding to it. While the intention of the teacher who wrote this item was to provide opportunity for the students to display their mastery of the material, students could write for an hour and still not discover what their teacher really wants them to relative to the aforementioned topic. An improved version of the same question follows: Do you agree with Darwins concept of natural selection resulting in the survival of the fittest and the elimination of the unfit? Why or why not? Reader Reliability. A number of studies had been conducted then and now on the reliability of grading free response test items. Results of

these researches failed to demonstrate consistently satisfactory agreement among essay raters (Payne, 2003). Some of the specific contributory factors in the lack of reader reliability include the following: quality of composition and penmanship; item readability; racial or ethnic prejudice on essay scoring and subjectivity of human judgment. Instrument Reliability. Even if an acceptable level of scoring is attained, there is no guarantee that measurement of desired behaviors will be consistent. There remains the issue of the sampling of objectives or behaviors represented by the test. One way to increase the reliability of an essay test is to increase the number of questions and restrict the length of the answers. The more specific and narrowly defined the questions, the less likely they are to be ambiguous to the examinee. This procedure should result in more uniform understanding and performance of assigned and scoring. It also helps ensure better coverage of the domain of objectives. Instrument Validity. The number of test questions influences both the validity and reliability of essay questions. As commonly constructed, an essay test contains a small number of items; thus, the sampling of desired behaviors represented in the table of specification will be limited, and the test suffering from decreased or lowered content validity.

There is another sense in which the validity of an essay test may be questioned. Theoretically, the essay test allows the examinees to construct a creative, organized, unique and integrated communication. Nonetheless, these examinees spend most of their time very frequently in simply recalling and organizing information, rather than integrating it. The behavior elicited by the test, then, is not that hoped for by the teacher or dictated by the table of specifications. Again, one way of handling the problem is by increasing the number of items on the test.

Guidelines foe Constructing, Evaluating and Using Essay Tests


Consider the following suggestions for constructing, evaluating and using essay tests:

Limit the problem that the question poses so that it will have a clear or definite meaning to most students.

Use simple words which will convey clear meaning to the students.

Prepare enough questions to sample the material of the subject area broadly, within a reasonable time limit. Use the essay question for purposes it best serves, like organization, handling complicated ideas and writing. Prepare questions which require considerable thought, but which can be answered in relatively few words. Determine in advance how much weight will be accorded each of the various elements expected in a complete answer. Without knowledge of students names, score each question for all students. Require all students to answer, all questions on the test. Write questions about materials immediately relevant to the subject. Study past questions to determine how students performed. Make gross judgments of the relative excellence of answers as a first step in grading.

Word a question as simple as possible in order to make the task clear. Do not judge papers on the basis of the external factors unless they have been clearly stipulated. Do not make a generalized estimate of an entire papers worth. Do not construct a test consisting of only one question.

Scoring Essay Tests


Most teachers would agree that the scoring of essay items and tests is among the most time consuming and frustrating tasks associated with classroom assessment. Teachers are frequently not willing to devote a large chunk of time necessary for checking essay tests. It almost goes without saying that if reliable scoring is to be achieved, there is a need for the teacher to spend considerable time and effort.

Before focusing on the specific methods of scoring essay tests, let us consider the following guidelines. First, it is critical that the teacher prepare in advance a detailed ideal answer. This is necessary as it will

serve as the criterion by which each students response will be judge. If this is not done, the results could be terrible. The subjectivity of the teacher could seriously prevent consistent scoring, and it also possible that student responses might dictate what constitutes correct answers. Second, student papers should be scored anonymously, and that all answers to a given item be scored one at a time, rather than grading each students total test separately. As already pointed out, essay questions are the most difficult to check owing to the absence of uniformity of response on the part of the students who took the test. Moreover, there are a number of distractors on the students responses that can contribute to subjective scoring of an essay item (Hopkins et al, 1990). These distractors include the following: handwriting, style, grammar, neatness, and knowledge of the students. There are two ways of scoring an essay test: holistic and analytic (Kubiszyn & Borich, 1990).

Holistic Scoring. In this type of scoring, a total score is assigned to each essay question based on the teachers general impression of over all assessment. Answers to an essay question are classified into any of the following categories: outstanding; very satisfactory; fair; and poor. A score

value is then assigned to each of these categories. Outstanding response gets the highest score, while poor response gets the lowest score. Analytic Scoring. In this type of scoring, the essay is scored in terms of its components. An essay scored in this manner has separate points for organization of ideas; grammar and spelling; and supporting arguments or proofs. As an essay test is difficult to check, there is a need for teachers to ensure objectivity in scoring students responses (Hopkins et al, 1990). To minimize subjectivity in scoring an essay test, the following guidelines have to be considered by the teacher (Airisian, 1994): Decide what factors constitute a good answer before administering an essay question. Explain these factors in the test item. Read all answers to a single essay question before reading other questions. Reread essay answer a second time after initial scoring.

CHAPTER 7 Administering and Scoring Objective Paper and Pencil Test

While it is true that test formats and content coverage are important ingredients in constructing paper and pencil tests, the conditions under which students shall take the test are equally essential. This chapter is focused on how tests should be administered and scored.

Arranging Test Items


Before administering a teacher made test, test items have to be reviewed. Once the review is completed, these items have to be assembled into a test. The following guidelines should be observed in assembling a test (Airisian, 1994; Jacobsen et al, 1993): 1. Similar items should be grouped together. For example, multiple choice items should be together and separated from true false items. 2. Arrange test items logically. Test items have to be arranged from the easiest to the most difficult. 3. Selection items should be placed at the start of the test and supply items at the end. 4.
5.

Short answer items should be placed before essay items. Specify directions that students have to follow in responding to each set of grouped items.

6.

Avoid cramming items too close to each other. Leave enough space for the students to write their answers.

7.

Avoid splitting multiple choice or matching items across two different pages.

8.

Number test items consecutively.

Administering the Test


Test administration is concerned with the physical and psychological setting in which students take the test, for the students to do their best (Airisian, 1994). Some guidelines that teachers should observe in administering a test are discussed below.

Provide a quite and comfortable setting. This is essential as interruptions can affect students concentration and their performance in the test.

Anticipate questions that students may ask. This is also necessary as students questions can interrupt test taking. In order to avoid questions, teachers have to proofread their test question before administering it to the class.

Set proper atmosphere for testing. This means that students have to know in advance that they will be given a test. In effect, such information can lead them to prepare for the test and reduce test anxiety.

Discourage cheating. Students cheat for a variety of reasons. Some of these are pressures from parents and teachers, as well as intensive competition in the classroom. To prevent and discourage cheating Airisian (1994)

recommends the following strategies: strategies before testing; and strategies during testing.

Strategies before Testing


Teach well. Give students sufficient time to prepare for the test. Acquaint the students with the nature of the test and its coverage.

Define to the students what is meant by cheating. Explain the discipline to be imposed when caught cheating

Strategies during Testing

Require students to remove unnecessary materials from their desks. Have students sit in alternating seats. Go around the testing room and observe students during testing. Prohibit the borrowing of materials like pen and eraser. Prepare alternate forms of the test. Implement established cheating rules. Help students keep track of time.

Scoring Test
After the administration of a test, the teacher needs to check the students test papers in order to summarize their performance on the test. The difficulty of checking a test differs with the kind of test items used. Selection items are the easiest to scores, followed by short answer response and completion items. The most difficult to score, however is the essay item. Scoring Objective Tests. The following guidelines have to be considered by a teacher in scoring an objective test: Key to correction has to be prepared in advance for use in scoring the test paper.

Apply the same rules to all students in checking students responses to the test questions. Score each part of the test to have a clear picture of how students fared in order to determine areas they failed to master. Sum up the scores for grading purposes.

Conducting Post test Review


After scoring a test and recording results, teachers have to provide students information on their performance. This can be done by writing comments on the test paper to indicate ho students fared in the test. Answers to the items have to be reviewed in class for the students to know where they committed mistakes. In so doing, students will become aware of the right answer and how the test was scored and graded.