Ete511 C1 Bi

CHAPTER
The Role and Purpose of Testing and Evaluation in TESL
LEARNING OUTCOME Upon completion of this chapter, you should be able to: 1. Distinguish between tests, assessments and measurements; 2. Describe the basic parts of a test or evaluation; 3. Describe the role of tests in the instructional process.
ETE 511 LANGUAGE TESTING AND EVALUATION
CHAPTER 1
.......................................
INTRODUCTION
It is important to fully understand the role and purpose of testing and evaluation before we can discuss different ways of testing in TESL. In this chapter, the roles and purposes of testing and evaluation in TESL are discussed. This includes a discussion on the difference between various terminology related to basic concepts in testing; basic constituent parts of a test; as well as the role of tests in the instructional and educational process including decisions that are made on the basis of test scores.
1.1
WHAT ARE TESTS, ASSESSMENT AND MEASUREMENTS?
A course on testing may be called Testing and Measurement at one institution, Testing and Evaluation at another or even simply Assessment at a third institution. These terms are obviously related. However, what do the terms mean and how are they inter connected? Before we proceed further into the subject of testing, it is appropriate that we first understand several basic yet important terms. Perhaps the most important of these would be the terms tests, assessment, and measurement. Let us first look at the definitions of these three terms. 1.1.1 TEST
A test can be defined as a systematic procedure for measuring a sample of behaviour by posing a set of questions in a unified manner (Linn & Gronlund,1995:6). The key phrases in this definition are systematic procedure, measuring a sample of behaviour, and a set of questions in a unified manner. A test is a systematic procedure because there is a planned format in tests. A test cannot be haphazard as a haphazard test would lose much of its credibility as a test. A test also measures a sample behaviour. In the case of language tests, the sample behaviour would be language proficiency or any language related construct we are interested in. Finally, questions or items in a test are seen to be unified. A traditional view of test items is that they work in the same way by measuring the same construct. If items in a test are not unified and measure different constructs, what then does the test measure? 1.1.2 ASSESSMENTS
Assessment is any of a variety of procedures used to obtain information on students performance. Unlike a test, an assessment is seldom exclusively quantitative. A teacher may assess student learning by simply looking at how students respond to instruction. Students facial expression can provide valuable information useful in assessment. A test is an assessment although as mentioned here, not all assessments need to be tests.
The Role and Purpose of Testing and CHAPTER 1 Evaluation in TESL
It should also be noted that the term evaluation can be considered synonymous with assessment although some would limit its use to programme evaluation and not the evaluation of student performance. For the sake of brevity, I will consider both terms as synonymous. 1.1.3 MEASUREMENT
Measurement is a numerical description of a particular characteristic. We measure physical objects in terms of their height, weight, and depth. We can measure distance as well as length. However, tests tend to measure behavioural and cognitive aspects which are a lot more abstract than physical objects. Nevertheless, all tests are measurements. We have seen, however, that not all measurements are tests.
What do you think are the differences between tests, assessment and measurements?
The relationship between tests, assessments and measurement can be illustrated in the following diagram:
Figure 1.1: The relationship between tests, measurement and assessment. (Source: Bachman, 1990) From Figure 1.1, we can conclude that all tests are measurements. Similarly, tests can be assessments as well. Bachman (1990), considers qualitative assessment of students as an example of what may fall in area A and teacher ranking of students performance falling in area B. Area C is represented by tests which are also assessments and measurements. A clear example of this would be an achievement test given to students at the end of an instructional programme. Area D represents tests which are measurements but not assessments. One such example of this would be research
CHAPTER 1
.......................................
in which tests are given. Finally, we will find many measurements that are neither tests nor assessments. The age of students is a measurement but it is not a test. Neither is it an assessment. These types of measurements are represented by area E. In the next few chapters we will come across many different types of tests and assessments. We will also examine measurements commonly used in tests.
1.2
WHAT CONSTITUTES A TEST?
There are a number of ways how we can look at a test. We may want to examine characteristics of a good test and the issues of validity and reliability. These issues, however, will be discussed in Chapter 7 of this module. Here, it may be more important to look at the basic structure of a test. If we were to dissect a test and examine its anatomy, how would it look like? Wesche (1983) suggested four major parts to a test. These four parts of a test form a useful framework for examining any kind of test: 1. Stimulus material. 2. Task posed to the learner. 3. Learners response. 4. Scoring criteria.
Share with your friends some tests that you find good or bad. What are the features of a good test?
1.2.1
STIMULUS MATERIALS
The stimulus material refers to the text or material presented to the learner or test taker. This can be in many forms. In a reading comprehension test, for example, the stimulus material could be the reading passage. There could or could not be pictures or drawings accompanying the passage. The passage could be on any of many possible content and written in a particular style. The stimulus material for this example would also include the questions themselves after having read the text, would the questions be in the form of multiple choice objective type questions or are students required to write a summary?
1.2.2
TASK POST TO THE LEARNER
The second component of the framework, the task posed to the learner, is a somewhat abstract concept. It involves actually determining the mental or cognitive response required of the learner toward the stimulus. If the reading comprehension passage is again used as an example, the cognitive response that is required is for the learner to understand what is read. This may require comprehension at various levels the word level, the sentence level, and the discourse level. The reading for understanding process also involves other abilities and knowledge such as inferencing and cultural knowledge. Similarly, this component will also address how learners are expected to mentally and cognitively react to the format of the question and what skills, subskills and abilities they are required to draw on in order to complete the task. 1.2.3 LEARNERS RESPONSE
The learners response is the actual demonstration of the students ability within the limitations of the stimulus provided by the test. If reading comprehension is the stimulus and test items are in the form of multiple choice questions, then the learners actual response is to select the correct answer based on the question and the options that follow. If a summary is required, then students will demonstrate their reading comprehension ability by writing a short summary of the stimulus reading passage. This component of Wesches framework is closely related to the second component except that it is the actual physical performance of the task which is seen to demonstrate the ability or behaviour being examined.
1.2.4
SCORING CRITERIA
Finally, the fourth component of the framework is the scoring criteria. As a test is a measurement, scoring criteria is an important aspect of the overall structure of a test. So, once again, the four components of a test according to the Wesche framework are the stimulus material, the task posed to the learner, the learners response, and the scoring criteria. All four components are highly inter-related and important in testing. Almost every test can be analysed according to these four components. More importantly, when we construct a test, we are actually able to make the test easier or more difficult by varying each component of this framework.
CHAPTER 1
.......................................
1.3
A PRELIMINARY UNDERSTANDING OF TESTS AND THE INSTRUCTIONAL PROCESS
What role do tests have in instruction? It is important for an educator to understand how tests and instruction are related. Do we test what we teach? Or should we teach what we test? In an ideal world, these questions may be moot as the relationship between testing and teaching is seamless as both serve the purpose of helping students learn. However, in the real world, it is possible to justify an affirmative answer to each of the two questions. Yes, we should test what we teach in order to assess the extent to which our students have understood or perhaps even mastered what has been presented to them. However, in this world where examinations can play an important role in determining our future, who can blame teachers who teach what is being tested i.e. prepare students only for the test?
Why do you think it is important for teachers to understand the test that they administer?
So what exactly is the relationship between testing and teaching? Perhaps we can try to get an initial idea with the help of the simple diagram in Figure 1.2.
Washback
Curriculum Specification
Instruction
Testing
Figure1.2: Relationship between planning, instruction and testing In this model, we are reminded that instruction itself is guided by curriculum planning. Testing represents a final stage of a three stage process beginning with curriculum planning or instructional objectives, followed by the actual instruction itself, and finally culminating with testing. The model also suggests a washback from testing to both curriculum specifications and instruction stage. The concept of washback will be discussed in greater detail later. The model suggested by Figure 1.2, however, is clearly a simplified and idealised one. Such a model may work well if all three components are under the purview of a single person or small group of people. However, when it is applied to a national scenario, the linear process is not so
5
easy and likely anymore. Some of the objectives of the curriculum specifications may be lost in instruction especially as those who carry out the teaching may not be directly involved in curriculum planning. Similarly, national standardised tests or examinations may also fail to capture the emphases placed during instruction as test constructors in these examinations are not those who had actually carried out the teaching. Nevertheless, for want of a conceptual idea of the position of testing in instruction, this simple model in Figure 1.2 would suffice for the moment. We will revisit the model in latter chapters when we hopefully have a clearer and more comprehensive understanding of tests and instruction. It should be noted here that the nature of tests is affected by the nature or approach of instruction. We need to only look at language testing history to see the truth of this statement. It was once described to me that language testing had undergone three major historical shifts or phases. (a) The first phase, the pre scientific phase coincides with a time when teachers were thought to be competent in constructing tests simply by virtue of being teachers. It was felt that if they could teach, then they could test. (b) A more scientific era heralded by behaviorism and audiolingualism saw the notion of psychometric structuralism where measurement of structural knowledge of language was given top priority. (c) Finally, language tests were influenced by the communicative approach movement and a sociolinguistic integrative perspective in testing was adopted. Each of the three phases, of course, coincided with theories of and approaches to language learning and teaching of the time. This further reinforces the notion that there is a close relationship between teaching and testing. 1.3.1 TAXONOMIES OF INSTRUCTIONAL OBJECTIVES
Perhaps an even more important factor in examining the role of tests in an instructional process is the instructional objectives. Test items should be based on instructional objectives especially if we wish to know whether the instruction has been effective. In this respect, taxonomies such as those suggested by Bloom and Barrett are useful tools in ensuring the most appropriate questions are asked in tests. Table 1.1: Blooms Taxonomy and Representative Test Questions Adapted from: Nitko, 2001: 27 Level Knowledge Comprehension Application Analysis Test Question Who are the main characters of the story? What is the main theme of the story? Can the solutions found to the problems in this story be used in solving problems that many of our youths face? What literary devices are being used to convey to the reader the characters feelings?
CHAPTER 1
.......................................
Synthesis
Evaluation
Based on this story as well as other stories you have read, describe general strategies that main characters in stories have taken to overcome the problems that they face. Develop a set of three or four criteria for assessing the quality of a story and use these criteria to assess any story that we have read.
Blooms taxonomy consists of six levels which are generally considered to be hierarchical. This means that not only are the higher level skills more cognitively demanding, they also assume the skills that are lower in the taxonomy are also mastered. The levels of knowledge, comprehension, application, are often referred to as the lower order skills with knowledge being at the lowest end. Analysis, synthesis, and evaluation are considered the higher order skills with evaluation occupying the highest end of the taxonomy. In Table 1.1, each level of Blooms taxonomy is accompanied by a matching question that reflects the cognitive demands that it places on the students. Blooms taxonomy focuses on cognitive abilities and may have limitations when used in language teaching and learning. Other taxonomies, such as Barretts taxonomy have been developed for more language related skills. This taxonomy consists of four levels: literal recognition or recall; inference; evaluation; and appreciation. Each level consists of several sub levels. Barretts taxonomy focuses on reading and is especially relevant for language teaching and learning. However, what needed are also taxonomies of the productive language skills of writing and speaking. In second language situations, such taxonomies would be useful in charting out progress in learning as well as specifying a comprehensive teaching plan.
1.4
HOW DO STUDENT BENEFIT FROM A TEST
While the kinds of decisions that are made above are largely teacher and educator centred, tests also provide students with several benefits. First, there is the benefit of motivation. Whenever a teacher announces that there will be a test, the tendency for most students is to study and revise material in preparation for the test. In other words, the test acts as an impetus for study. Such form of motivation is useful when it is done sparingly as teachers should not depend only on tests to motivate students. The better students also use tests as a source of information. Feedback from test scores inform students of their strengths and weaknesses, whether their study approach has been beneficial, and if they have understood the material taught. In other words, information in the form of test
7
results is equally important for the student as it is for the teacher. As such, it should be a general practice to return test papers as often and as quickly as possible. A different way of looking at things is that teachers are now presented with a new responsibility i.e. to develop in their students the ability and self directedness to use information from such sources as test results to learn and plan their own learning.
What do you think are students reaction towards tests? Do they enjoy or fear tests?
1.5
DECISION MADE BASED ON TESTS
Why do we test? Do teachers and instructors have a sadistic streak that they have tests simply to see their students slog and burn the midnight oil preparing for the test? Certainly not! There are more noble intentions in testing. We can say that the main purpose of tests is to obtain information concerning a particular behaviour or characteristic. Based on information obtained from tests, several different types of decisions can be made. Kubiszyn & Borich (2000), mention eight different types of decisions made on the basis of information obtained from tests. These educational decisions are shown in Figure 1.3.
Figure 1.3: Eight different types of decisions mode The first three decisions are often within the domain of the classroom teacher. He or she can make decisions with respect to instruction, grading as well as diagnostic activities.
CHAPTER 1
.......................................
Instructional decisions are made based on test results when, for example, teachers decide to change or maintain their instructional approach. If a teacher finds out that most of his class have failed his test, there are many possible reactions he can have. First, he could be very disappointed, blame the students for not studying and punish them in some way. Of course, this is not a wise decision to make. Instead, the teacher could evaluate the effectiveness of his own teaching or instructional approach. An instructional decision is made when the teacher decides upon the approach currently used. Perhaps the teacher may decide that the approach is not suitable and a different approach should be used. Tests yield scores and teachers will have to make decisions in terms of the kind of grades to give students. As grades are indicators of student performance, teachers need to decide whether a student deserves a high grade perhaps an A on the basis of some form of assessment. Traditionally, and perhaps for a long time to come, this assessment will be in the form of tests. Sometimes, we give tests to find out the strengths and weaknesses of our students. Can they correctly construct a passive sentence? Do they use the different pronoun forms correctly? These kinds of questions can be answered by observing student performance on tests. When a teacher decides that he will spend more time teaching passive sentences because student performance on such sentences in a test was unsatisfactory, then he has made a diagnostic decision. Decisions related to selection, placement, counselling and guidance, programme or curriculum, and administrative policy are all made at levels higher than the classroom. Administrators, educational agencies and institutions may be involved in these decisions. Selection and placement decisions are somewhat similar. However, a selection decision relates to whether or not a student is selected for a programme or for admission into an institution based on a test score. Tests such as TOEFL and IELTS are often used by universities to decide whether a candidate is suitable, and hence selected for admission. A placement decision, however, deals with where a candidate should be placed based on performance on the test. A clear example is the language placement examination for newly admitted students commonly administered by many local and foreign universities. Based on their performance on such a test, students are placed into different language classes that are arranged according to proficiency levels. Counselling and guidance decisions are also made by relevant parties such as counsellors and administrators on the basis of exam results. Counsellors often give advice in terms of appropriate vocations for some of their students. These advice is likely to be made on the basis of the students own test scores. Programme or curriculum decisions reflect the kinds of changes made to the educational programme or curriculum based on examination results. Finally, there are also administrative policy decisions that need to be made which are also greatly influenced by test scores. (a) (b) What do the terms tests, assessments and measurements mean and how are they interconnected? What constitutes a good test? What are the 4 major parts of a test as suggested by Wesche (1983)?
1.6
HOW DO WE CONSTRUCT A TEST
The framework of a test is reflected in the way the test is constructed. The first stage in constructing a test is to determine what is to be tested. This is not as easy as it seems because it requires determining the theoretical construct of what is to be tested. For example, lets assume that we are interested in testing communicative competence. This requires that a theoretical construct of communicative competence be first determined. Various theories of communicative competence have been suggested (c.f. Bachman, 1990; Canale & Swain, 1980). We need to examine these theories and determine what communicative competence is to us for the purpose of our test. The second step in test construction is to operationalise the theoretical construct. A theoretical construct must necessarily be an idealised and abstract notion. When it is operationalised, it is reduced in order to fit into the constraints of a test. The many different formats of tests multiple choice, dictation, essay-type, matching, etc. represent the different kinds of operationalisation available in tests. Finally, the third step in constructing a test is quantification. As a test is a measure, then numbers and quantities will be a necessary element. Once again, just as with the previous stages, we may tend to take this stage of test construction for granted. There is more to quantification than simply assigning numbers or points to items in a test. If a test consists of two sections using different formats such as multiple choice questions and short answer. The steps described above provide a general description of the test construction process. In actual practice, there may be some additional steps that need to be taken. Sometime back, I was asked to construct a test of English language proficiency for a private company. When I set out to do the task, I listed down the steps that I probably had to take. One of the first steps I felt necessary was some form of needs analysis in order to determine what kind of language should be tested. I wanted to find out from the management what sort of test they wanted and whether what I had in mind fit their requirements. My intention was to draft the test, show the draft to the management for approval, pilot it and later validate the test in some way. questions, what weightage of points would you assign to the items in each section? Even the assignment of these points must be justified. I would also imagine that if I were teaching in the public schools, I would probably not spend so much time on the three steps described earlier theoretical construct, operationalisation, and quantification because the test construction process has largely been determined by the Ministry of Education. The national standardised Sijil Peperiksaan Malaysia is already an
10
CHAPTER 1
.......................................
embodiment of the three stages and teachers merely need to follow the model examination paper with respect to these three elements. However, it may be helpful to construct a test blueprint in order to ensure that my test spans the necessary content and that there is a variety of skill or abilities being tested. Table 1.2: Example of Test Blueprint According to Blooms Taxonomy Comprehension
Application
Knowledge
Evaluation 9 30 2
Synthesis
Analysis
Section A. Comprehension Section B. Grammar Section C. Functions Total
1, 3, 12,16, 18, 19 21, 23 8
2, 4, 5 11, 14, 20 22, 29 8
8 17 24, 25, 26 5
6 13 27 3
7, 10 15 28 4
10 10 10 30
There are numerous ways of forming test blueprints, some more comprehensive than others (see Nitko, 2001 for several examples), but an important point to remember is that the test blueprint should be used only as a tool rather than to promote exact or rigourous classification (Nitko, 2001: 113). Nevertheless, the most common form of test blueprints in schools in Malaysia has incorporated Blooms taxonomy as its primary method of classification. In the example in Table 1.2, the 30 items in the test are categorised according to Blooms taxonomy. The numbers 1 to 30 in the blueprint refer to the test item numbers. Items number 1 and 3, for example, are comprehension items which test knowledge, while item 8 tests application. A blueprint such as this is useful in ensuring that different kinds of questions are asked. In this particular example, most questions are knowledge and comprehension type questions (8 each) which tends to be quite common. However, all six question types are quite well represented and as such, the test itself can be considered acceptable.
What should you take into consideration when constructing a test?
Total
11
The following is a test question taken from an English Textbook. Identify the noun phrases in the following sentences (1 point for every correctly identified noun phrase): (a) Do you know where he is? (b) She didnt know if the teacher was coming. (c) The policeman stopped me as I was parking the car. Use the Wesche (1983), framework to describe this test item.
SUMMARY
This chapter has presented a discussion on various basic issues dealing with tests and measurements. It has looked at terminology related to tests and measurements and attempted to distinguish between terms which are similar. It has also attempted to situate testing within the instructional process, taking into consideration instructional objectives as well as decisions.
GLOSSARY
Assessment Evaluation
An assessment can be any procedure that is used to obtain information regarding a students performance or ability. For some, evaluation is synonymous with assessment, while for others, evaluation is a more formal form of assessment and may even be specific only to the evaluation of programmes. A measurement is a quantitative description of a particular characteristic. Lind and Gronlund (1995) define a test as a systematic procedure for measuring a sample behaviour by posing a set of questions in a unified manner.
Measurement Test
12
CHAPTER 1
.......................................
www.prsd.k12.pa.us/esl/Media/ell2_files/ell2.ppthttp://www.malaysi anmonarchy.org.my/portal_bi/rk1/rk1.php www.cesa7.org/ellcenter/Resources/documents/March212007.ppt http://www.2dix.com/pdf-2011/testing-and-evaluation-in-esl-pdf.php
Real English Clip R (83) Second Language Series # 3 http://www.youtube.com/watch?v=_uMeMChXpfE Sudbury Schools: #8: Grades, evaluation and testing http://www.youtube.com/watch?v=KyONG225aKQ
13

Ete511 C1 Bi

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Ete511 C1 Bi

Caricato da

Copyright:

Formati disponibili

CHAPTER

The Role and Purpose of Testing and Evaluation in TESL

ETE 511 LANGUAGE TESTING AND EVALUATION

The Role and Purpose of Testing and Evaluation in TESL

WHAT ARE TESTS, ASSESSMENT AND MEASUREMENTS?

The Role and Purpose of Testing and CHAPTER 1 Evaluation in TESL

The Role and Purpose of Testing and Evaluation in TESL

WHAT CONSTITUTES A TEST?

The Role and Purpose of Testing and CHAPTER 1 Evaluation in TESL

TASK POST TO THE LEARNER

The Role and Purpose of Testing and Evaluation in TESL

A PRELIMINARY UNDERSTANDING OF TESTS AND THE INSTRUCTIONAL PROCESS

The Role and Purpose of Testing and CHAPTER 1 Evaluation in TESL

The Role and Purpose of Testing and Evaluation in TESL

HOW DO STUDENT BENEFIT FROM A TEST

The Role and Purpose of Testing and CHAPTER 1 Evaluation in TESL

DECISION MADE BASED ON TESTS

The Role and Purpose of Testing and Evaluation in TESL

The Role and Purpose of Testing and CHAPTER 1 Evaluation in TESL

HOW DO WE CONSTRUCT A TEST

The Role and Purpose of Testing and Evaluation in TESL

Section A. Comprehension Section B. Grammar Section C. Functions Total

1, 3, 12,16, 18, 19 21, 23 8

2, 4, 5 11, 14, 20 22, 29 8

What should you take into consideration when constructing a test?

The Role and Purpose of Testing and CHAPTER 1 Evaluation in TESL

The Role and Purpose of Testing and Evaluation in TESL

www.prsd.k12.pa.us/esl/Media/ell2_files/ell2.ppthttp://www.malaysi anmonarchy.org.my/portal_bi/rk1/rk1.php www.cesa7.org/ellcenter/Resources/documents/March212007.ppt http://www.2dix.com/pdf-2011/testing-and-evaluation-in-esl-pdf.php

Potrebbero piacerti anche