Sei sulla pagina 1di 17

Assessing Interactive Oral Skills in EFL Contexts

By Jason Beale, M.Ed (TESOL) Monash University


2001

Word count 3,825 (excluding index and bibliography)


Jason Beale
Email: jasongbeale@hotmail.com

Assessing Interactive Oral Skills in EFL Contexts Jason Beale


Introduction
1.

The purpose of assessment

1.1

Testing general proficiency

1.2

Educational placement & diagnosis

1.3

Formative and summative assessment

1.4

Testing for special purposes

2.

Establishing assessment criteria

2.1

The importance of validity

2.2

The components of language use

2.3

Specifying performance criteria

2.4

Global and analytic rating scales

3.

Choosing the best test format

3.1

Interview tasks

3.1.1

Structured interviews

3.1.2

Unstructured interviews

3.2

Role play tasks

3.2.1

Structured role plays

3.2.2

Unstructured role plays

4.

Special issues

4.1

Practicality

4.2

Bias for best

4.3

Marking

Conclusion
Bibliography
Appendix: Sample rating scales
1. Speaking Proficiency English Assessment Kit (SPEAK), Educational Testing Service, USA.
2. Test in English for Educational Purposes (TEEP), Associated Examining Board, England.
3. Negotiated grading scheme, Tokyo Denki University, Japan.
4. Placement rating scale, Nova conversation school, Japan.

Assessing Interactive Oral Skills in EFL Contexts Jason Beale

Assessing Oral Language Skills in EFL Contexts


Introduction
There are many English language teachers working in EFL contexts overseas. Their work often
requires the quick assessment of a student's oral ability, usually during a brief initial interview or
even in the very first class. This can help determine the choice of class material and the overall
aims of a course of instruction. Informal assessment also continues throughout any teaching
program, as a way of ensuring that desired outcomes are being achieved and students' needs are
being met.

Such informal assessment is clearly a central part of language teaching. It is no less important
than the formal testing of achievement, or the testing of employment and academic-related
proficiency. It follows that all teachers in EFL contexts, whatever their positions and duties, ought
to have a basic understanding of the principles underlying assessment of oral language skills.

1. The purpose of assessment


Before designing oral assessment tasks there needs to be a clear idea of the purpose of
assessment. This is essential because the same degree of detail is not required in every testing
situation. The purpose of the test will determine the overall shape of the assessment criteria to be
used.

1.1 Testing general proficiency


The assessment of general proficiency is independent of a particular syllabus, and provides a
broad view of a person's language ability. Ideally it focuses on fundamental oral skills, as well as
on common communicative functions. Tasks such as summarising technical data, or describing
statistics, require a grasp of fundamental skills of course, but they are clearly limited to academic
or employment settings. They are formal tasks requiring particular language and presentation
skills.

Assessing Interactive Oral Skills in EFL Contexts Jason Beale

The term 'proficiency' refers to the practical use of language as a whole. It is therefore best
assessed directly by eliciting extended samples of interactive language use in realistic contexts.
The indirect assessment of oral language, through controlled response to single test items, has
limited value as an indicator of real-life oral proficiency.

Unfortunately there is no such thing as a definitive test of general oral ability that can be applied
in any situation. The standard of 'native-like' proficiency is only a convenient abstraction - one
that ignores the personal and cultural differences that make communication real and complex. In
EFL contexts, such as Japan, testees are often quite unfamiliar with Western cultural references
and modes of behaviour, and so the design of test items needs to be as culturally neutral as
possible without being too vague.

1.2 Educational placement and diagnosis


Assessment for educational placement is used to place the student in a suitable level for learning.
It does not require the same degree of detail as a general proficiency test, but involves matching a
student against fairly broad criteria in a band scale. Each 'band' describes the minimum level of
ability needed for each stage of instruction. The most basic band scale would consist of only three
levels: beginner, intermediate, and advanced.

Diagnostic assessment provides more detailed information on a learner's strengths and


weaknesses. It requires descriptive analysis, both by impressionistic description and by rating
specific aspects of language use. Such information is valuable for tailoring lessons more closely
to learners' needs, and as a standard for evaluating progress at a later stage.

Assessing Interactive Oral Skills in EFL Contexts Jason Beale

1.3 Formative and summative assessment


Formative assessment indicates a learner's ongoing progress during a course. It need not involve
testing under formal conditions, but may simply consist of various impressions and notes that the
teacher takes while observing students engage in communicative tasks. Summative assessment on
the other hand is the formal measurement of a learner's achievement at the end of a unit or course
of instruction. This involves matching student achievement with the stated objectives of the
course. Summative assessment is different from general proficiency testing in that the assessment
tasks of the former are based on the representative sampling of a syllabus.

1.4 Testing for special purposes


Outside of a specific syllabus of instruction, many people sit self-contained language tests that are
recognized by higher education and employment bodies (ie. TOEIC and TOEFL). The design of
assessment tasks and choice of language is intended to reflect the skills and knowledge needed in
special contexts of work or study.

These kinds of tests are used as high-grade filters that discriminate between learners and rank
them against a sliding scale. The main purpose of assessment here is to identify candidates for
access to limited opportunities such as scholarships and promotions. As such they are not
particularly suitable for assessing an individual's particular level of proficiency in detail.

2. Establishing assessment criteria


According to the Australian Oxford Mini Dictionary a criterion is a "principle or standard by
which (a) thing is judged" (Kent 1998, 118). To test oral language skills there need to be such
criteria to act as guidelines for judgement. These should describe the various levels of
performance in a way that can be tested both logically and consistently. The last two points are
often called 'validity' and 'reliability' in the literature on assessment.

2.1 The importance of validity

Assessing Interactive Oral Skills in EFL Contexts Jason Beale

Validity has been described as "the single most critical element in constructing foreign language
tests" (Nakamura 1995, 126). A valid test has a recognizable logic to it that makes the test a
meaningful tool of assessment. The most fundamental kind of validity relates to the underlying
theory of language on which the test is constructed (construct validity). This influences the
sampling of language material and tasks (content validity), which in turn has an effect on the
appearance of the test to the teachers and learners who use it (face validity).

Construct validity requires a set of principles that can adequately describe real-life language use.
In the case of oral language skills this is not such a simple matter. Speaking may seem to be a
general-purpose ability, but it occurs under many contexts and conditions, and for many reasons.
Each has its own characteristics and demands, especially when seen as an interactive skill. In the
last few decades a great deal of effort has been made to describe language use as an interactive or
communicative system. Canale and Swain's (1980) model of 'communicative competence' is
certainly the best known example in the literature on applied linguistics.

2.2 The components of language use


Grammar, vocabulary, and pronunciation all fall under the general category of grammatical (or
linguistic) competence in Canale and Swain's influential model. These are the basic skills,
traditionally taught and tested in isolation from a communicative context. Yet in order to predict
real language use successfully, higher level skills and knowledge also need to be considered.

A second category called discourse competence concerns the way language is conventionally
shaped in different communicative contexts. Describing a suspect during a police interview, for
example, requires more than basic grammatical skills - it involves selecting, organising and
linking elements together to create a structured and coherent whole. Canale and Swain distinguish
a third category called sociocultural competence; which covers the cultural forms of speech
deemed appropriate in a particular community.

Assessing Interactive Oral Skills in EFL Contexts Jason Beale

Weir (1993), drawing from Bygate, conveniently includes both discourse and sociocultural
aspects of language use under the single heading "routine skills". These are "frequently recurring
ways of structuring speech, such as descriptions, comparisons, instructions, telling stories", and
includes the patterns of interactional language use seen in such things as "buying goods in a shop,
or telephone conversations, interviews, meetings, discussions, decision making, etc" (Weir 1993,
32)

Canale and Swain's fourth category is strategic competence, which covers the various techniques
people use to manage and enhance communication. This category is covered by Weir under the
heading "improvisation skills" (1993, 32-4). Communication is a faulty and chaotic process and
speakers need to be able to improvise when their conventional language routines fail. This
includes both the "negotiation of meaning" in various ways to enhance understanding, as well as
the "management of interaction" to establish "who is going to speak next and what the topic is
going to be" (turn taking and topic initiation).

2.3 Specifying performance criteria


As the preceding section has shown, interactive oral skills involve different categories of practical
knowledge (or know-how), each one effectively building on the next. First, basic grammatical &
linguistic knowledge (core skills), then discourse & sociocultural knowledge (routine skills), and
finally strategic knowledge (improvisational skills). Having this general framework is helpful in
identifying the various components of oral ability that can be assessed. Yet deciding what
weighting to give each category of skill is still not a straightforward matter.

Improvisational skills are useful in every general context. For example, "Excuse me, what did you
say?", or its equivalent, is an essential phrase. In particular contexts, such as business negotiation,
there is a greater need for highly developed improvisational skills. In choosing or designing
specific performance criteria for an oral test it is important to decide which of these categories are
important and to what extent at each level of a candidate's ability. Different criteria will produce

Assessing Interactive Oral Skills in EFL Contexts Jason Beale

different results. As noted by Brown, "if each group were to develop its own assessment
framework..., they may, in fact, through the inclusion or weighting of specific criteria, produce
schemes which lead to quite different evaluations of candidates ability." (cited in Turner 1998,
198) The assessment criteria need to be related to the actual purpose of the test. This is sometimes
called systemic validity. It requires close consultation with the relevant educational and
employment bodies to help determine in detail what they intend the assessment instrument to
achieve.

2.4 Global rating scales


Performance criteria is usually displayed in a rating scale. A global or wholistic scale provides a
general description of ability, in which the various components of language use are grouped
together in a single 'band' descriptor:
Band 6: Competent Speaker. Is able to maintain the theme of dialogue, to follow topic
switches and to use and appreciate main attitude markers. Stumbles and hesitates at times
but is reasonably fluent otherwise. Some errors and inappropriate language but these will
not impede exchange of views. Shows some independence in discussion with ability to
initiate. (Carroll cited in Weir 1993, 44)

Global descriptors are not always so brief as this. The Australian Second Language Proficiency
Ratings (ASLPR) scale, developed by Ingram and Wylie in 1982, uses an A4 page to present each
band descriptor in considerable detail. This allows for increased accuracy of identification, but at
the cost of flexibility of assessment. Detailed global scales effectively dictate what combination of
skills is to be recognized at each level, although in practice the particular features "may not cooccur in actual student performance" (Turner 1998, 200).

2.5 Analytic rating scales


The term analysis strictly refers to the breaking down of an object into its constituent parts or
aspects. This is the opposite of synthesis or the putting together of parts to make a whole.

Assessing Interactive Oral Skills in EFL Contexts Jason Beale

Although the general components of oral language use are those discussed above in 2.2, there are
various ways in which this "cake" of abilities can be sliced for assessment. Following are
examples of assessment categories from four different analytic rating scales:

FLUENCY, PRONUNCIATION, GRAMMAR, COMPREHENSIBILITY


Speaking Proficiency English Assessment Kit (SPEAK), Educational Testing Service ,USA
(Clankie 1995, 124).

FLUENCY, ACCURACY, COMPREHENSION, COMMUNICATIVE ABILITY


Placement rating scale, Nova conversation school, Japan (unpublished).

ATTITUDE & CONFIDENCE, EXPRESSIVENESS (pronunciation, intonation & volume),


BODY LANGUAGE, UNDERSTANDABILITY (for the listener, is the message delivered
clearly?), COMMUNICATIVE ABILITY (can the speaker say what he/she wants to say?)
Negotiated performance profile, Tokyo Denki University, Japan (McClean 1995, 142-3).

FLUENCY, GRAMMATICAL ACCURACY, INTELLIGIBILITY, APPROPRIATENESS,


ADEQUACY OF VOCABULARY FOR PURPOSE, RELEVANCE AND ADEQUACY OF
CONTENT
Test in English for Educational Purposes (TEEP), Associated Examining Board, England (Weir
1993, 43-44).

Within each category, different levels of ability need to be distinguished clearly using descriptive
language that can be matched against test results. With clear criteria determined by the overall
purpose of assessment (systemic validity) and founded on a clear theory of language use
(construct validity), it is possible to choose relevant assessment tasks. The choice of relevant tasks
is an important step in itself, for as shown in one study of interview-format discourse (cited in
Turner 1998, 195), "some of the supposed characteristics of intermediate versus advanced

10

Assessing Interactive Oral Skills in EFL Contexts Jason Beale

learners represented in the rating scales were not substantiated in the actual performance of
intermediate and advanced learners."

3. Choosing the best test format


Since Canale and Swain presented their model of communicative competence twenty years ago
(see 2.1-2.2), the communicative approach has spread into both teaching and testing methodology.
According to Weir (1988, 82) communicative testing is purposive, interesting, motivating,
interactive, unpredictable and realistic.

Assessing interactive language means by definition that there is someone else actively taking part.
The person being tested is not only producing language, but is also responding in a
communicative way with another interlocutor. This is quite different from non-interactive
stimulus response tasks. Techniques that use written or visual prompts to elicit language samples
are very straightforward and time-efficient to administer, and can also help to gauge the general
educational level of the student. The SPEAK test of oral proficiency is one example of a test
composed mostly of non-interactive tasks (Clankie 1995). Unfortunately they fulfil very few of
the qualities of communicative testing listed above.

There are many kinds of oral assessment task that can be used - one writer listing over sixty
variations (Underhill 1987). In essence there are two general approaches that meet the criteria for
interactive assessment. These are interview and role play.

3.1 Interview tasks


Interview tasks are a direct test of language use; that is, "they measure oral skills by having the
examinees actually speak" (Turner 1998, 194). Even so, the ostensible context remains that of a
language test. Beyond making the candidate feel at ease, there is no attempt to simulate a non-test

11

Assessing Interactive Oral Skills in EFL Contexts Jason Beale

setting. Interview tasks thus represent a compromise solution to the problem of how to control
something that is inherently unpredictable.

3.1.1 Structured interviews


A structured interview composed of set questions has many advantages. It can be reliably used to
determine someone's general level in terms of grammatical knowledge, vocabulary, pronunciation
and fluency. It can also be used to find out how well the candidate can structure a short narrative,
and to what degree they can express more complex points of view. It is relatively cost and time
efficient to administer, and if the interview is recorded properly then marking can be a fairly
reliable standardised procedure.

A common interview structure has four stages - 1) a friendly warm up, 2) a level check to
determine the candidate's overall ability in terms of the criteria, 3) challenging probes to find
where performance drops, and 4) a final wind down at a less challenging level (Nagata 1995). In
EFL contexts, such as Japan, the structured interview is readily accepted by test users since it
mirrors the often formal social relationship that exists between teacher and student. This high face
validity makes it a popular method of oral assessment, despite its limitations as a measure of reallife oral ability.

The structured interview allows only a partial assessment of routine and improvisation skills (as
defined in 2.2 above). However keen the candidates may be, they remain passive respondents.
Interactive routine and improvisational skills require greater freedom on the part of the candidate
to direct and initiate the conversational flow.

3.1.2 Unstructured interviews


Standardized assessment is not always necessary, especially in more informal settings. A less
structured interview format can more closely approximate the conditions of free conversation, and

12

Assessing Interactive Oral Skills in EFL Contexts Jason Beale

there is ideally a greater use of interactive skills - including the strategic skills of negotiation of
meaning and turn taking.
Of course this will depend on there being suitable motivation for conversation, and also a positive
atmosphere for communication to happen.

Unfortunately an unstructured interview is not easily applied to large numbers of candidates. It


requires experienced interviewers who can facilitate the conversation in an unforced manner,
allowing the testee to interact on an equal footing. Regardless of the interviewer's skill, however,
the unpredictable nature of the unstructured interview format means it lacks reliability, and is
unsuitable for large scale assessment.

3.2 Role play tasks


A role play is language use in a simulated real life situation. Unlike the interview format, role
play can focus on a variety of different language functions. This is especially useful for the
assessment of specific work-related oral performance. It is a better indicator of real life
performance than the interview format, although it tends to favour extroverted candidates with a
degree of acting ability (Weir 1988, 88).The assessor can be involved as a participant in the role
play, or simply as an observer of two or more testees.

3.2.1 Structured role plays (information gap)


The structured or controlled role play gives the candidates a detailed set of instructions to follow,
usually with some kind of form to complete as they go. These are usually called information gap
activities since they involve the transfer of information with others to complete a set task. This is
a popular method of language practice in EFL classrooms, and there are many published
resources available to teachers that can be photocopied for immediate use.

The major drawback of information gap activities is that they are often no more than mechanical
exercises requiring the production of linguistic forms on cue. There is very little scope for the

13

Assessing Interactive Oral Skills in EFL Contexts Jason Beale

purposeful creative use of language, which makes it difficult for students to identify with the role.
Tightly scripted information gap tasks have little predictive validity, since real interactive
language use is much more unpredictable.

3.2.2 Unstructured role plays


Unstructured role play allows the participants to select and structure language more freely.
Instructions on small role play cards can provide more or less detail, depending on the ability of
the testees to improvise and initiate. Instructions need to be made clear to the testees before the
role play begins so that it is not "a test of comprehension of instructions" (Underhill 1987, 52).
Participants should also be able to identify with the role and understand what communication in
the role play is meant to achieve.

Role plays can be designed to test language use in various settings, such as at a hotel, a doctor's
office, a supermarket, or a boardroom. The role play may focus on general language functions. (or
purposes), such as asking, checking, describing, complaining, apologizing, or giving advice, to
take only a few examples. Unlike information gap activities, role play instructions do not usually
specify particular language structures to be used, though they may be implied in the way the
instructions are written.

The lack of obvious manipulation of the testees' responses is the main strength of this format.
Well-designed role plays are purposive, interesting, motivating, interactive, unpredictable and
realistic, to use the characteristics of communicative language given by Weir (1988, 82). This
means that there is more scope for higher level testees to display a range of interactional and
improvisational skills.

The advantages of this approach means increased validity as a test of real life oral skills, but at the
cost of reliability of measurement due to the unpredictability of testees' responses. To some extent
this can be balanced out by ensuring there are well defined procedures of assessment based on

14

Assessing Interactive Oral Skills in EFL Contexts Jason Beale

clear criteria. Testees themselves also need to understand the criteria under which their own
language performances will be judged.

4. Special Issues
Some special issues that influence the design and implementation of assessment also need to be
mentioned.

4.1 Practicality
The practicality of a test refers to the degree to which it is cost effective and easy to administer.
The number of testees, the time constraints for testing and marking, and the available human and
physical resources all need to considered carefully before an assessment scheme is chosen. This is
not only an issue of money, but also of the perceptions of those who will be taking and using the
test. Also, if a test can be administered efficiently by assessors and markers, this increases the
validity and reliability of the results as a whole.

4.2 Bias for best


Testing language skills requires getting a representative sample of optimum performance. To 'bias
for best' means to elicit a candidate's best performance on a test. A poorly designed or delivered
test will not provide consistent results. This may be because confusing instructions favour some
students over others, or perhaps because role play situations require specific knowledge or
vocabulary that only some of the candidates possess. Also, generally distracting or stressful
conditions of assessment will clearly disadvantage some students over others in a way that is
unrelated to language ability.

4.3 Marking
Applying descriptive assessment criteria to a candidate's oral performance requires making
subjective (or impressionistic) judgements. This is in contrast to objective marking, in which a

15

Assessing Interactive Oral Skills in EFL Contexts Jason Beale

quantitative marking scheme is mechanically applied to structured tasks, such as multiple choice
and sentence completion exercises.

A descriptive scale of oral performance, with clearly defined levels, can be combined with
quantitative grades. Subjective judgements matching performance to such descriptors will then
generate a quantitative grade score useful for ranking candidates. Analytic rating scales, that
describe specific language skills (see 2.5 above), can be graded differently to emphasize the
relative importance of different skills. This is called 'weighting' the assessment criteria, and needs
to be based on a clear understanding of the stages of language development (construct validity)
and the purpose of the assessment instrument (systemic validity). A graded analytic scale can then
be combined with a global scale, for example as shown by McClean (1995) in her description of a
negotiated grading scheme at a Japanese university.

Grading is very much dependent on the purpose of the test and the way this is reflected in the
criteria. An achievement test that is criterion referenced will judge candidates individually on
their achievement of learning outcomes. Score distribution depends solely on learning success,
and it is theoretically possible for all candidates to receive 100%. On the other hand, a test for
selection purposes will need to separate candidates, making fine distinctions between their
performances. This kind of comparative assessment is called norm referenced, and the scores are
ideally distributed on a bell-shaped curve, so that most candidates are placed at the centre of the
distribution.
Conclusion
An effective test of interactive oral skills is not a haphazard selection of tasks chosen at random.
Instead each assessment situation presents a set of practical demands that need to be specifically
addressed. The principles of validity, reliability, practicality and bias for best provide basic
guidelines for evaluating the effectiveness of a test instrument.

16

Assessing Interactive Oral Skills in EFL Contexts Jason Beale

A theoretical model of oral skills is also necessary to structure what is fundamentally fleeting and
changeable. At the same it needs to be remembered that human skills are highly dependent on a
variety of internal and external factors that are independent of language ability per se. The art of
testing involves minimising the influence of such extraneous factors and creating conditions under
which all candidates can display their genuine abilities.

17

Assessing Interactive Oral Skills in EFL Contexts Jason Beale

Bibliography
Canale, M. and M. Swain. 1980. Theoretical bases of communicative approaches to second
language teaching and testing. Applied Linguistics (1): 1-47.
Clankie, S. 1995. The SPEAK test of oral proficiency: A case study of incoming freshmen. In
JALT Applied Materials: Language Testing in Japan. eds. J. D. Brown and S. O. Yamashita,
119-125. Tokyo: The Japan Association for Language Teaching.
Kent, H. 1998. The Australian Oxford Mini Dictionary. 2nd ed. Melbourne: Oxford University
Press.
McClean, J. 1995 Negotiating a spoken-English scheme with Japanese university students. In
JALT Applied Materials: Language Testing in Japan. eds. J. D. Brown and S. O. Yamashita,
119-125. Tokyo: The Japan Association for Language Teaching.
Nagata, H. 1995. Testing oral ability: ILR and ACTFL oral proficiency interviews. In JALT
Applied Materials: Language Testing in Japan. eds. J. D. Brown and S. O. Yamashita, 119125. Tokyo: The Japan Association for Language Teaching.
Nakamura, Y. 1995. Making speaking tests valid: Practical considerations in a classroom setting.
In JALT Applied Materials: Language Testing in Japan. eds. J. D. Brown and S. O.
Yamashita, 119-125. Tokyo: The Japan Association for Language Teaching.
Turner, J. 1998. Assessing speaking. Annual Review of Applied Linguistics 18: 192-207.
Underhill, N. 1987. Testing Spoken Language: A Handbook of Oral Testing Techniques.
Cambridge: Cambridge University Press.
Weir, C. J. 1988. Communicative Language Testing with Special Reference to English as a
Foreign Language. Exeter: University of Exeter.
Weir, C. J. 1993. Understanding and Developing Language Tests. New York: Prentice Hall.

Potrebbero piacerti anche