Sei sulla pagina 1di 13

CHAPTER

Limitations of Tests

LEARNING OUTCOME Upon completion of this chapter, you should be able to: 1. Identify and describe the major limitations to tests and measurements in TESL; 2. Compare traditional assessment with alternative assessments; and 3. Describe two forms of alternative assessments and their benefits and shortcomings.

CHAPTER 3

LIMITATIONS OF TESTS

.........................

INTRODUCTION Tests are not perfect measures and certainly have limitations. These limitations should remind us to be careful in interpreting test scores and placing complete trust in test scores. Five major limitations of tests and measurements will be discussed in this chapter. Additionally, alternative forms of assessment which have been suggested in order to overcome these limitations will also be discussed in this chapter. 3.1

LIMITATIONS OF TEST AND MEASUREMENT IN TESL

Five major limitations identified and discussed by Bachman (1990), are as follows: (a) (b) (c) (d) (e) Subjectivity Under specification of domain Incompleteness Indirectness Imprecision

Each of these limitations are described and discussed in the following paragraphs. 3.1.1 SUBJECTIVITY

The most obvious form of subjectivity in tests is seen in grading tests that are in the supply type or subjective format such as essays and even interviews. This issue has been addressed at some length in previous chapters. However, subjectivity does not refer only to grading but also to other elements of the test as well. Even when the test is an objective, select or multiple choice type test, there is still some amount of subjectivity. This subjectivity is found in the selection of passages and item formats as well as content that is to be tested. In a test that contains a reading comprehension passage, for example, why was one passage selected over another? Was it because of the content? If so, then there must surely be many passages on one content. The question then is why the use of one passage and not another on the same content? The answer is that decisions that affect the test and its ability to precisely measure are made by individuals. There is some degree of subjectivity involved in these decisions. 3.1.2 UNDER-SPECIFICATION OF DOMAIN

A test is a measurement of some content. This content can be referred to as the domain or the construct of the test. However, while it may be quite easy to specify the domain to be listening comprehension, for example, it is not as easy to test or measure the domain. When any kind of theoretical domain or construct is operationalised, there is bound to be some aspect of the domain that cannot be translated into a test. The test therefore under-specifies the domain. It is this under-specification of domain that limits a test as a measure of ability, knowledge or sample behaviour.

LIMITATIONS OF TESTS CHAPTER 3

3.1.3

INCOMPLETENESS

Incompleteness refers to the students inability to demonstrate the entire repertoire of the construct being measured. As a test is constrained by time and physical setting, a student will never be able to show all of what he or she is able to do. Because only a few questions can be asked in a test due to time constraints, these questions may not be able to elicit the students true or complete ability. Similarly, the constraints placed by the physical setting of the test may also restrain the student from demonstrating specific kinds of abilities. As such, we should take note that even when a student scores zero points in a test, this does not mean that he or she is completely ignorant of the subject or ability being tested. It is just that the test has not elicited the knowledge or abilities that the student is able to convey or perform. 3.1.4 INDIRECTNESS

While we are aware of the importance of having direct tests, it is unlikely that a test will be completely free of being an indirect measure of ability. This limitation is inherent in the testing situation itself. Many of us have gone through test anxiety. Once the word test or assessment is mentioned, the entire situation changes. While some students will be able to speak well in situations outside the classroom, they lose this ability once they become aware that they are being tested. In addition to this, every test situation has elements that are not related to the construct being tested. This is referred to as construct irrelevant variance by Messick (1989) and examples may include the test rubrics or instructions, time constraints, and other rules and regulations of the test. All these are not present in the actual real-world situation and must be considered as aspect of indirectness. As such, we can only conclude that the test situation is indirect because it is inauthentic. And by being indirect, it fails to capture the true ability of the students if they were to perform in the real world. 3.1.5 IMPRECISION

Finally, we need to acknowledge that there is a degree of imprecision in all tests. While we may be able to justify some of the weightage in marks or points given to some items, we will never be able to be completely accurate and just. Even in a situation where there are twenty multiple choice items, each assigned one point, it is almost impossible to claim that each one of the twenty items are of equal difficulty. As such, we will not be able to justify equal weightage of one point for each item. It is this imprecision that must be acknowledged as another constraint of tests. In addition to the above, Herman et al. (1992) also point out other limitations such as the mismatch between test content and curriculum and instruction; the over emphasis on routine and discrete skills to the neglect of complex thinking and problem solving skills; and the limited relevance of major test formats such as the multiple choice format to either classroom or realworld learning (pp. 5-6).

As a teacher, is it important that you take into consideration the limitations when conducting tests and measurements? Why and what are the consequences if fail to do so?

CHAPTER 3

LIMITATIONS OF TESTS

.........................

3.2

ALTERNATIVE ASSESSMENT

You have previously read on the five major limitations of tests and measurements. What other alternative methods of assessment can you think of?

Alternative assessments are assessment procedures that differ from the traditional notions and practice of tests with respect to format, performance, or implementation. It is likely that alternative assessment found its roots in writing assessment because of the need to provide continuous assessment rather than a single impromptu evaluation (Alderson & Banerjee, 2001). Hamayan (1995), considers alternative assessments to be procedures and techniques which can be used within the context of instruction and can be easily incorporated into the daily activities of the school or classroom (p. 213). As the term indicates, alternative assessments are assessment proposals that present alternatives to the more traditional examination formats. They have become more popular of late because of some doubts raised regarding the ability of traditional assessment to elicit a fair and accurate measure of a students performance. Alternative assessment brings together with it a complete set of perspectives that contrast against traditional tests and assessments. Table 3.1 illustrates some of the major differences between traditional and alternative assessments. Table 3.1: Contrasting Traditional and Alternative Assessment Source: Adapted from Bailey (1998:207 and Puhl, 1997: 5) Traditional Assessment One-shot tests Indirect tests Inauthentic tests Individual projects No feedback to learners Speeded exams Decontextualised test tasks Norm-referenced score reporting Alternative Assessment Continuous, longitudinal assessment Direct tests Authentic assessment Group projects Feedback provided to learners Power exams Contextualised test tasks Criterion-referenced score reporting

LIMITATIONS OF TESTS CHAPTER 3 Classroom-based tests Formative Process of instruction Integrated Developmental Teacher mediated

Standardised tests Summative Product of instruction Intrusive Judgmental Teacher proof

In discussing alternative assessments, Herman et al. (1992: 6) list several of their common characteristics. They describe alternative assessments as performing the following: (a) Ask the students to perform, create, produce, or do something. (b) Tap higher-level thinking and problem-solving skills. (c) Use tasks that represent meaningful instructional activities. (d) Invoke real-world applications. (e) People, not machines, do the scoring, using human judgment. (f) Require new instructional and assessment roles for teachers. Alternative assessments are suggested largely due to a growing concern that traditional assessments are not able to accurately measure the ability we are interested in. They are also seen to be more student centred as they cater for different learning styles, cultural and educational backgrounds as well as language proficiencies. Nevertheless, although alternative assessments are compatible with the contemporary emphases on the process as well as product of learning (Croker, 1999), several shortcomings of alternative assessments have been noted. Perhaps one of the major limitations of alternative assessments is that accounts of the benefits of alternative assessment tend to be descriptive and persuasive, rather than research-based (Alderson & Banerjee, 2001: 229). Alternative assessments are also said to be limited to the classroom and has not become part of mainstream assessment. Brown and Hudson, in advocating alternative assessment, seem to have taken a safer approach by suggesting the term alternatives in assessment. They believe that educators should be familiar with all possible formats of assessment and decide on the format that best measures the ability or construct that they are interested in. Hence, these alternatives would include all possible assessment formats both traditional and informal. Despite these limitations, alternative assessments present a viable and exciting option in eliciting and assessing the students actual abilities. At present, there are a number of test formats that are
4

CHAPTER 3

LIMITATIONS OF TESTS

.........................

considered alternative assessment formats. Figure 3.1 provides a list of several of the more common formats in alternative assessment. Tannenbaum (1996), comments that alternative assessments focus on documenting individual strengths and development which would assist in the teaching and learning process.

Physical demonstration Dialogue journals Pictorial products Checklist K-W-L (what I know/what I want to know/what Ive learned) charts Reading response logs

Teacher-pupil conferences Interviews Performance tasks Portfolios

Self assessment

Peer assessment

Figure 3.1: Sample formats in alternative assessment Source: (Tannenbaum, 1996; Short, 1993) In this chapter, however, only two of these formats will be further discussed in order to provide a glimpse of what alternative assessment can provide.

3.2.1

PORTFOLIOS

Perhaps the most well known of alternative assessments is the portfolio assessment. The portfolio, although relatively new in language teaching and assessment, is actually quite a common form of assessment as many professions place great importance on the development of personal portfolios. Architects and artists, for example, develop their portfolios in order to show potential customers or employers their work. The contents of the portfolio become evidence of their abilities much like how we would use a test to measure the abilities of our students. They stress that the collection must include criteria for judging Paulson, Paulson & Meyer (1991), define a portfolio as a purposeful collection of student work that exhibits the students efforts, progress and achievements in one or more areas (p.60).merit as well as evidence of student participation in selecting content and in self-reflection. A portfolio is therefore not simply a file folder or manila card folder containing a hotch-potch collection of student work but is a careful selection of their
5

LIMITATIONS OF TESTS CHAPTER 3

work. We will see that the portfolio not only provides a source for assessment but also learning opportunities. Bailey (1998, p: 218), describes a portfolio to contain four primary elements. First, it should have an introduction to the portfolio itself which provides an overview to the content of the portfolio. Bailey even suggests that this section include a reflective essay by the student in order to help express the students thoughts and feelings about the portfolio, perhaps explaining strengths and possible weaknesses as well as explain why certain pieces are included in the portfolio. Secondly, she argues that portfolios should have what she refers to as an academic works section. This section is meant to demonstrate the students improvement or achievement in the major skill areas (p. 218). The third section is described as a personal section in which students may wish to include their journals, score reports of tests that they have sat for, as well as photographs and other items that illustrate their experiences with as well as achievements in the English language. Finally, an assessment section may contain evaluations made by peers, teachers as well as self evaluations. Table 3.2: Contents of a Portfolio Source: Adapted from Bailey (1998: 218) Introductory Section Overview Reflective Essay Personal Section Journals Score reports Photographs Personal items Academic Works Section Samples of best work Samples of work demonstrating development Assessment Section Evaluation by peers Self-evaluation

The portfolio can be said to be a students personal documentation that helps demonstrate his or her ability and successes in the language. It may even require students to consciously select items that can document their own progress as learners. The actual compilation of the content of the portfolio is in itself a learning experience. Some suggest that students should attach a short reflection on each piece or item placed in the portfolio. Portfolio assessment, therefore, is both a learning and assessment experience. This dual function can be considered as one of the benefits of portfolio assessment. Brown and Hudson (1998), summarise several other advantages in using portfolios in assessment. They discuss these advantages according to how the portfolio strengthens students learning, enhances the teachers role and improves the testing process. With respect to testing, the advantages of using portfolio as an assessment instrument are listed as follows (pp.664-665):

CHAPTER 3

LIMITATIONS OF TESTS

.........................

(a) enhances student and teacher involvement in assessment; (b) provides opportunities for teachers to observe students using meaningful language; (c) to accomplish various authentic tasks in a variety of contexts and situations; (d) permit the assessment of the multiple dimensions of language learning; (e) provide opportunities for both students and teachers to work together and reflect on what it means to assess students language growth; (f) increase the variety of information collected on students; (g) make teachers ways of assessing student work more systematic. However, portfolios are not without problems. It should especially be noted that portfolios can become rather problematic assessment devices when they are used on a large scale especially with respect to grading. Brown and Hudson (1998), also point out a number of other concerns that are related to the design, logistics, interpretation, reliability and validity of portfolio assessment. These concerns involve the design, logistics, interpretation, and reliability of the portfolio assessment. The design of the portfolio is an issue because it is quite subjective. A portfolio must be considered as a personal student product. If the teacher becomes completely involved in determining the content of the portfolio especially with regard to which student work should be included - some of the benefits of the portfolio will be lost. Hence, questions such as who will determine grading criteria, how the grading criteria will be established, who determines what the portfolio will contain, and how much of the daily authentic classroom activities should be included all become difficult design issues that the portfolio assessment needs to contend with. Logistically, the portfolio also poses several real problems. This includes the increased time and resources needed not only in developing the portfolio, but also in assessing it. Another concern will be the need to train teachers to assess the portfolio fairly and accurately. Portfolios are also problematic in terms of their interpretation. This includes the setting of standards and criteria for grading portfolios. Assessing a portfolio will involve evaluating a students personal interests. It may not be appropriate for a teacher to consider an item as having no value when it is invaluable to the person involved. This is the dilemma that teachers will definitely face when they attempt to interpret and proceed to assess a students personal portfolio. Similarly, reporting a portfolio assessment result will also be problematic in that in most cases, it can only be in the form of suggestions to the student rather than clear indications of strengths and weaknesses. Finally, there is an obvious problem of reliability. It will be difficult to maintain a high inter rater reliability with portfolio assessment. Due to the many different pieces or items included in the portfolio, the tendency will be that a high inter-rater reliability will be even more difficult to achieve than with written essays. While these problems hinder the increased use of portfolios, it should not deter us from using
7

LIMITATIONS OF TESTS CHAPTER 3

the portfolio at least in a controlled manner in our classrooms. We should remember that the portfolio is not only an assessment tool but is also a learning experience. Even when assessment may have some problems related to testing and assessment, it may still benefit the students in their learning. Furthermore, some of the problems related to the portfolio raised in this section may actually be addressed by self assessment - an assessment technique discussed in the following section.
(a) In your opinion, what are the advantages of using portfolios as a form of alternative assessment? (b) Look at the characteristics of alternative assessment as opposed to traditional assessment in Table 3.1. How many of these characteristics accurately describe a portfolio?

3.2.2

SELF ASSESSMENT AND PEER ASSESSMENT

Two other common forms of alternative assessment are the self-assessment and peer-assessment procedures. Both these forms of assessment are strongly advocated by Puhl as she believes that they are essential to continuous assessment, a cornerstone to alternative assessment. The benefits of self and peer assessment are especially found in formative stages of assessment in which the development of the students abilities are emphasised. Black and William (1998), point out that all students in their study said that work involving self and peer assessment made them think more and that a large proportion of the students (85%) said that it made them learn more (p. 29). Self assessment can take several forms, including the use of a yes-no checklist, a Likert-type scale or even an open-ended format. Self assessment, however, should be distinguished from self-marking in which students mechanically check their own answers (Freeman and Lewis, 1998). Self appraisals are also thought to be quite accurate and are said to increase student motivation. Puhl (1997), describes a case study in which she believes self-assessment forced the students to reread and thereby make necessary editing and corrections to their essays before they handed them in. Nevertheless, in order for self assessment to be useful and not a futile exercise, the learners need to be trained and initially guided in performing their self assessment. This training involves providing students with the rationale for self assessment and how it is intended to work and how it is capable of helping them. Brooks (2002: 70-72) lists several other aspects of training such as: (a) teacher modelling of the use of metacognitive processes and skills; (b) student practice of their assessment skills; (c) introduction to relevant assessment criteria; (d) clarification of abstract assessment criteria; (e) the use of self assessment during rather than at the end of an instructional unit.
8

CHAPTER 3

LIMITATIONS OF TESTS

.........................

In order to conduct such training for his or her students, the teacher must be conversant with the concept of self assessment. This is an important prerequisite as some teachers may not be clear of what self assessment entails and hence dismiss the importance of this assessment technique. In language teaching and learning, self assessment is relevant in assessing all the language skills. An example of the self assessment of the listening skill, especially in the comprehension of questions asked is suggested by Cohen (1994), as follows:

These questions are useful in the formative stages of assessment as it helps students identify their own strengths and weaknesses and respond accordingly. Through asking these types of self assessment questions, the students are expected to become more sensitive to their own learning and ultimately perform better in the final summative evaluation at the end of the instructional programme. Luoma and Tarnanen (2003), provide an interesting description of the use of benchmarks in self assessment. Their project involved a self-rating instrument that was part of DIALANG, a diagnostic language assessment system for 14 languages for the internet. In the self-rating instrument, students write a text and compare it to benchmarks that represent six different levels or bands. They then determine their level on the basis of the different benchmarks. Luoma and Tarnanen (2003) further compared student self-ratings to teacher ratings and found that although the students tended to rate themselves fairly high on the scale, the self-ratings were fairly realistic and tended to match teacher ratings. They report that most of the mismatches between teacher and self-ratings were overestimations (p. 452). They consider that the tendency to overestimate abilities may be due to the influence of student background variables. The validity of self assessment can be affected by various factors. Based on their review of sixteen studies on self assessment, Blanche and Merino (1989) contend that five of these factors are the: (a) student lack of training in how to perform self assessment; (b) lack of a generally accepted criteria for learner self ratings and subsequent teacher interpretations of ratings; (c) conflict between the students cultural background and the culture of self assessment; (d) intervening related variables such as students professional aspirations, or academic training;
9

LIMITATIONS OF TESTS CHAPTER 3

(e) student inability to accurately perform aspects of self assessment such as to report on subconscious behaviour or to report post hoc on their performance. These observations underscore the importance of proper student training in self assessment. They also indicate that different types of self assessment techniques should be used in order to accommodate the different students involved. Peer assessment differs from self assessment in that it involves the social and emotional dimensions to a much greater extent. Peer-assessment can be defined as a response in some form to other learners work (Puhl, 1997). It can be given by a group or an individual and it can take any of a variety of coding systems: the spoken word, the written word, checklists, questionnaires, nonverbal symbols, numbers along a scale, colours, etc. (p.8) Peer assessment requires that a student take up the role of a critical friend to another student in order to support, challenge, and extend each others learning (Brooks, 2002: 73). Among the reported benefits of peer assessment are as follows: (a) remind learners they are not working in isolation; (b) help create a community of learners; (c) improve the product (Two heads are better than one); (d) improve the process; motivates, even inspires; (e) help learners be reflective; (f) stimulate meta-cognition. Each of these benefits have real world importance as they all stress how awareness of the other person or the peer can actually change perceptions of how the individual should work in a society or community. While the potential benefits of both self and peer assessment are quite apparent, especially in the context of todays educational emphasis on self directed, independent and autonomous learning, the preparation and implementation of both forms of assessment need to be done with considerable care. In addition to the type of training required as mentioned earlier, correct attitudes towards these forms of assessment must also be formed. Students, for example, need to develop the correct interpersonal skills such as attentive listening and respectful questioning techniques in order for peer assessment to proceed without obstacles. Negative views and perceptions which may exist about these assessment techniques such as the views that self assessment is a way of reducing teachers marking burden, as well as that it lowers standards must also be addressed. It may not be sufficient to consider these negative views as misconceptions or confusion with other activities such as self marking as suggested by Brooks (2002), and noted earlier in this section. Teachers must once again prove their mettle and provide the necessary conditions and accompanying training in order for these alternative types of assessment to succeed.

10

CHAPTER 3

LIMITATIONS OF TESTS

.........................

Truth about Testing: An Educator's Call to Action http://site.ebrary.com/lib/aeu/Doc?id=10044769&ppg=4

Understanding Language Tests and Testing Practices http://www.tesol.org/s_tesol/docs/3500/3467.pdf

TOEFL Structure & Skills for iBT success! http://www.youtube.com/watch?v=Em0woQvskjY

http://www.2dix.com/pdf-2011/testing-and-evaluation-in-eslpdf.php

SUMMARY This chapter has highlighted limitations of tests with regard to incompleteness, indirectness and imprecision. It also discussed what other alternatives we have to assess our students performances, such a using self-assessments or peer assessments and portfolios.

11

LIMITATIONS OF TESTS CHAPTER 3

GLOSSARY Alternative assessment Alternative assessment refers to non-traditional assessment and often involves assessment formats such as the portfolio, simulation, and other forms of generally subjective type tests. Peer evaluation is evaluation that involves providing a response in some form to the work of a peer or fellow student and may be performed individually or in groups and may take any of a variety of coding systems. A portfolio can be defined as a purposeful collection of student work that exhibits the students efforts, progress, and achievements in one or more areas (Paulson, Paulson & Meyer, 1991,p. 60) and is most often collected by the student himself or herself. Self assessment refers to assessment in which the student assesses himself or herself with respect to how well he or she has performed or progressed. Also often referred to as self-appraisals, self evaluation, and self rating.

Peer evaluation

Portfolio

Self assessment

12

Potrebbero piacerti anche