Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Introduction
Effective learner assessment is an important part of a workplace English program,
because the outcomes and results can have serious impacts on the learners in terms of
employment options. While adult assessment has traditionally involved standardized tests such
as the BEST Plus or BEST Literacy (Center for Applied Linguistics, 2014), many programs are
moving to other, more qualitative means of assessment such as portfolios, periodic observations
with focused checklists, and interviews with learners and supervisors. These methods can assess
a learners progress, as well as, reflect learning outcomes which can be more representative of
the real-world (Lytle & Wolfe, 1989). The test proposed in this project is a combination of
standardized items and alternative means of assessment, including a mock interview, in order to
prepare adults for the reality of the target language domain of applying and interviewing for jobs
and promotions.
One major factor contributing to the need to adequately assess language ability and
proficiency in adult language learners stems from the current shifting demographics in the
United States, which has resulted in a growing linguistic diversity in the workforce. According
to the 2007 census, 20% of individuals in the United States reported speaking a language other
than English in the home and this number is expected to continue to increase (U.S. Census
Bureau). This amounts to a large population of people adjusting to a new culture, a new
language, and new employment. Many companies are taking a proactive approach to dealing
with the language barriers of their employees. One such approach is offering Workplace English
classes to their non-native English speaking employees. Many companies view the investment in
such programs as a way to retain productive and satisfied employees, thus increasing the morale
and potentially shrinking the tremendous cost of high employee turn-over. This represents a
soft benefit for a small investment. The second language proficiency test proposed in this
project could fulfill a companys desire to assess language ability in a Workplace English
program for the purpose of properly placing employees in appropriate levels of English language
training. This could effectively maximize employees time in class as well as give an employer a
basis for determining and comparing the level of English ability and performance their
employees have gained through Workplace English classes. Unlike many other standardized
proficiency tests, this test is competency-based and is especially aimed at Workplace English
content and objectives by including tasks which replicate real-world communicative acts.
Overall organization of this proposal
This paper will detail the overall proposed pilot test project by describing the test through
the following characteristics: purpose, type, how the scores are to be interpreted, the TLU
domain from which the tasks can be used to make inferences, a definition of the constructs
assessed by the test, a description of the table of specification and the test tasks, and the
development and characteristics of the rubrics. In light of the fact that this a proposal for a test,
and the test has not been administered, the ideal participants and procedure for administration, as
well as the scoring procedures, will be only be described for the proposed test. The qualitative
and quantitative analyses for the proposed pilot test will be discussed in terms of item statistics,
descriptive statistics, reliability statistics, and the standard error of measurement from the
reliability statistics. Additionally, a section of this paper will also demonstrate ways in which
item performance can be used to evaluate the proposed pilot tests usefulness in terms of
reliability, construct related evidence for validity, and practicality.
particular level than they had anticipated. The expense of financially supporting programs such
as Workplace English programs also has an impact on a companys fiscal budget. However, they
may ultimately end up being positively impacted or benefited from more linguistically competent
employees. Additionally, there may also be a positive impact on society as learners could
increase their ability to communicate within many contexts of society.
The usefulness of a test depends on a variety of criteria, one of which is authenticity or
the degree to which an assessment task corresponds to the target-language use domain (TLU).
Bachman and Palmer (2010) have formulated a framework which uses a set of characteristics to
adequately describe and correlate assessment tasks with real world tasks. The characteristics of
real-world tasks, as outlined in the framework, outline a conceptualization and design of
assessment tasks which simulate the skills and abilities required in the TLU domain. A more indepth description of the TLU domain will be given in a subsequent section. Assessment tasks
which are based upon the TLU domain characteristics allow us to determine the ways and extent
a learners language ability is engaged. These characteristics help test designers devise tasks
which can be generalized to the setting beyond the test itself. See appendix A for a description of
common TLU tasks.
Type of Test
Green (2013) defines a proficiency assessment as an instrument used to measure
information connected with whether or not a persons language ability adequately satisfies a
predetermined standard or need and is distinguished from educational assessments (i.e. an
assessment that centers on learning outcomes from a particular course of study). For this reason,
proficiency assessments are often used for the purpose of placement and/or gatekeeping
decisions such as immigration, educational, and employment opportunities. Adult proficiency
assessments, therefore, can have a profound effect and impact on quality of life, success level
achieved, and opportunities for advancement.
Interpretation of Scores
The interpretation of scores on the proposed test will be norm-referenced. The basic
purpose of any norm-referenced test is to spread students out on a continuum of language
abilities (Brown, 2003). This type of interpretation describes a learners performance as a
position relative to a known group. In this test, scores are interpreted according to performance
standards and measureable outcomes. The scores on this test are interpreted as indicators of a
learners proficiency level, such as beginner, intermediate, and advanced. In this way, the
emphasis is placed on discriminating among learners in order to measure language abilities upon
entrance into a language learning level. Miller, Linn, and Gronlund (2009) add that the goal of a
placement assessment is to determine the relative position of each student in the instructional
sequence as well as the mode of instruction that would most benefit them.
TLU Domain
A primary interest in language assessment is the ability to make generalized
interpretations about a test takers language ability through their performance on test tasks
relative to the language required in similar tasks in the target language use domain. Bachman and
Palmer (2010) define a target language use (TLU) domain as a specific setting outside of the test
itself which requires learners to perform similar language use tasks. As the tasks on this
assessment are from a specific target language use domain (the workplace), they are called target
language use tasks and are the sources from which interpretations of the test takers language
abilities are generalized. The tasks on this test represent one TLU domain, that of language for
obtaining employment. Each TLU task, however, can be used to make a generalization for
language ability within the TLU domain. For example, interpretations from the test takers
performances on tasks 1, 2, and 3 can be used to make generalizations about their language
abilities within the real-world domain of applying and interview for employment within the TLU
domain of the workplace. See Appendix A for an extended description of the TLU domain
according to a framework of characteristics by Bachman and Palmer (2010).
Construct Definition
An important component for test validity is defining the construct(s) which assessment
tasks are measuring. As this test is a proficiency test, it has a theory-based construction
according to a theoretical framework of language ability. The constructs for the tasks on this test
are defined in terms of a framework for language ability proposed by Bachman and Palmer
(1996; 2010). Task 1 assesses language ability within the construct of grammatical knowledge.
Specifically, the constructs of task 1 are receptive, recall, and meaning knowledge of vocabulary.
Task 2 assesses language ability within the constructs of grammatical knowledge. Specifically
the constructs of this tasks are syntactic knowledge and functional knowledge of interpreting
relationships between sentences and paragraphs, vocabulary knowledge of meaning in context,
and the strategy of skimming and scanning is assumed. Task 2 also measures textual knowledge
(cohesion and rhetorical organization) and comprehension of meaning in context, and
recognition of information and vocabulary. The constructs of task 3 are productive and are
measured in terms of language ability relating to pragmatic knowledge, functional knowledge
(ideational functions, manipulative functions [interpersonal functions], and sociolinguistic
knowledge of register.
Design of the Test: Table of Specifications
exercise in the form of a job application. An analysis using Compleat Lexical Tutor (Cobb,
2002; Heatley et al., 2002) reveals that the language used in the paragraphs is restricted to K-1,
K-2, and K-3 words. K-1 and K-2 words come from the General Service Word List (West, 1953,
cited in Bauman & Culligan, 1995) and represent the 1000 and 2000 most frequently used words
in English. Likewise, approximately 3% of the words are also represented on the AWL
(Coxhead, 2000 as cited in Cobb, n.d.). The rationale for constraining the vocabulary to these
levels was to make the paragraph accessible to learners with beginning and intermediate
vocabulary levels. Of particular interest to test developers is the "type-token ratio," which
indicates the number of different words in the text (types) divided by the number of words on
which they are based (tokens). If learners are being encouraged to increase the variety of words
(breadth) used in their vocabulary, they should be reading a texts with a higher type-token ratio
number. The following chart breaks down the token % for each list:
Current profile
(token %)
K-1 (76.16)
76.16
K-2 (10.60)
86.76
AWL (3.31)
90.07
OFF
100%
9.93
The off-list tokens are the proper names included in the paragraphs.
The following chart illustrates the lexical density for the paragraphs:
Freq. Level
K-1 Words :
K-2 Words :
AWL Words:
Families (%)
Types (%)
Tokens (%)
Cumul. token %
76 (69.72)
82 (65.08)
230 (76.16)
76.16
24 (22.02)
25 (19.84)
32 (10.60)
86.76
9 (8.26)
9 (7.14)
10 (3.31)
90.07
The third task assesses the learners receptive and productive knowledge by integrating
speaking and listening skills. In this task, the learners should be able to apply and synthesize
information from the previous tasks in their oral responses to six speaking prompts relevant to
the TLU domain of interviewing for a job. Again, the test includes 36 total items, worth a total
of 36 points. Task 1 accounts for 15 matching type items, task 2 includes 15 gap filling,
information transfer items where the input comes from four paragraphs, and task 3 includes 6
prompts where the oral response is graded according to a holistic rubric where the task receives
an overall single score on a 1 to 6 scale. As this assessment represents a proposed pilot test, the
time allotment may change as well as the task order according to observation and feedback
collected during a trialing stage. The decision to allow 60 minutes is based off of an example
description of assessment parts and the time allowed for each part included in Bachman and
Palmer (2010, p. 389), where the parts and number of tasks in each part were very similar to the
pilot test in this project.
The instructions for tasks 1 and 2 are briefly explained in writing before the task along
with the scoring method and recommended time allotment for the task. Task 1 and task 2 are
non-reciprocal tasks where there is a direct relationship between the information supplied in the
input and a successful response. The items in task 1 have a limited amount of input and can be
characterized as narrow scope. Tasks 2, however, can be characterized as having a broader scope
than task 1 as the language user has to process a lot of input in order to give a successful
response. Task 3 is a reciprocal task as the test taker must engage in language with an
interlocutor for a successful response to each prompt. This task also has an indirect relationship
between the input and a successful response language users cannot draw upon the language in
the input to give a successful response (see Appendix A for a TLU language task description).
The justification for the tasks, (i.e. their sequence, the number of items, and the time
allotted,) is that the test will allow decisions to be made for proper placement into levels/classes.
The sequence is set so that a learner is able to complete the easier more receptive knowledge
tasks first in the event that they are not able to complete the productive tasks. The stakeholders
who are most directly affected by this test, then, are the test takers. According to Bachman and
Palmer (2010), the potential consequences on the test takers of this assessment are: 1) a negative
or stressful experience in preparing for and/or taking the test, 2) the negative or disappointing
feedback they may receive about their performance on the test, and 3) the decisions, i.e.
placement, that may be made about them on the basis of their performance. A justification for
the content of the tasks and the learning objectives of the tasks, and again for the sequencing of
the tasks, in terms of teachers and educational institutions, is that the test is designed to help
alleviate the potential for a very mixed classroom in that it places learners according to the their
language ability as inferenced from their test performance. The underlying motivation for
developing this test stems from my own experience teaching a mixed-level workplace English
classroom which resulted in what I perceived as a somewhat unfair instructional situation for all
parties. The implication from the respective situation, then, is that an assessment of proficiency
for the purpose of placement has the goal, as in all classroom testing and assessment, to improve
learning and instruction (Miller, Linn, & Gronlund, 2009). See Appendix C for a copy of the test
and scoring key.
Development of Rubrics
The rubric used in task 3, the speaking portion of the test, is scored as levels of language
ability, in this case, speaking ability. The listening and speaking portion of the test (task 3) is
scored using a holistic rubric adapted from the rubric used by the National Reporting System
rubric used in many other adult education performance assessments. The language in the
criterion differs from the language used to describe outcomes in the NRS, in that, it is specific to
workplace English. Also, score reporting numbers where modified in this rubric to correspond to
the scoring in the proposed test. The NRS was established by the Department of Education as a
result of the Workforce Investment Act of 1998, which requires that each state develop and
implement a comprehensive accountability system to interpret and demonstrate individual
learner progress and performance (Mislevy & Knowles, 2002). Similar to criteria which
Bachman and Palmer (2010) use to define the usefulness of performance assessments, The NRS
consider performance assessments to be assessments which require test takers to demonstrate
their language skills and knowledge through tasks which closely resemble real-world situations
or setting (TLU domain). Each level in the oral proficiency rubric, except the beginning level,
contains three criterion from which the test taker must meet two in order to score into that
language level. The level number from the oral proficiency rubric is then added to the scores of
tasks 1 and 2 to indicate an overall level of proficiency. Each level has a descriptor of skills and
language ability. All components of language ability are considered together a single unitary
ability. In this way, the emphasis is placed upon what the test taker can do instead of what they
cannot do. This type of rubric often has a higher inter-rater reliability because it is easier to train
raters how to use it and it can save time by minimizing the number of decisions a rater has to
make compared with analytic scales which consider each language component separately. See
Appendix D for the rubrics.
Scoring Procedures
The test includes a total of 36 items worth 36 total points. Tasks one and two are scored
0=incorrect/1=correct for a total of 30 points. The speaking portion of the assessment is scored
using a 6 point holistic rubric. This score is then added to the correct number of items from task
one and two to obtain an overall score. This overall score can be interpreted as a language level
according to an overall language level rubric which describes functional and workplace skills
and outcome measures for each level. Test takers will see their score and be able to interpret
their score as a language level according to a chart included as part of the scoring form. See
Appendix E for example of score reporting form.
Test Results
Item Statistics
There are many types of item analyses that can be done once a pilot test has been
administered and scored. For norm-referenced tests, these analyses are important for
determining which items have the best difficulty to discriminate between high and low achieving
students as well as identifying faulty items. Miller, Linn, and Gronlund (2009) suggest using
item analyses to answer the following questions:
1. Did the item function as intended?
2. Was the test item of appropriate difficulty?
3. Was the test item free of irrelevant clues and other defects?
4. Were the distracters effective (in multiple choice items)?
(p. 351)
The second question in the list is especially relevant when analyzing items in a norm-referenced
assessment such as the pilot test developed for this project. One simple way to analyze items for
difficulty is to rank the scores in order from highest to lowest and compare the responses from
the highest scoring students to those of the lowest scoring students. For example, if this pilot test
were administered to 40 students, we would rank the tests in order from highest to lowest and
select the 10 tests with the highest scores and the 10 tests with the lowest scores for item
difficulty comparison. In a chart we could tabulate, for each item, the number of students in the
upper group and the lower group who selected each alternative in the items. From this
information calculate the difficulty of each item by finding the percentage of students who got
the item correct. Then we could calculate the discriminating power of each item by finding the
difference between the number of students in the upper group and lower group who got the item
correct. If more students in the upper group get an item correct than the lower group, the item is
discriminating positively because it is distinguishing between the high and low scorers. To find
the percentage of difficulty for an item we would need to find the number of students in both
groups who got the item correct out of the total number of students. For example, if 9 students in
the upper group and 5 students in the lower group got the item correct out of 20 students, the
item difficulty would be 70%. If we analyzed the lower groups alternative selection in an item
and found that each alternative was selected by at least one student in the group we could
conclude that the other alternatives (distracters) for an item are operating effectively. These
types of analyses could be done for task 1 and also for task 2 but the analysis would be more
subjective since there are not any distracters offered.
Another way to calculate item difficulty is using the formula:
P=100R/T
In this formula P represents item difficulty, R equals the number of students who got the item
correct and T equals the total number of students who tried the item. If we apply this to an item
in task 1 and use the scores from the highest 5 tests and the lowest 5 tests and pretend that 5
students in the upper group and 3 students in the lower group got the item correct the item
difficulty would be:
P=100*8/10=80%
Similarly, we could use the item discriminating power formula to find the index of
discrimination for an item:
D=(RU-RL)/(.5T)
D =(5-3)/(5)=.40
The index of discriminating power for this item is lower than the ideal index of .50, but anything
greater than or equal to .30 is thought to be discriminating well on a norm-reference assessment.
Miller, Linn, and Gronlund (2009, p.362) state that when using norm-referenced classroom
tests, item analysis provides us with a general appraisal of the functional effectiveness of the test
items, a means for detecting defects, and a method for identifying instructional weaknesses.
These types of analyses, have a more limited applicability for performance based assessments
such as a writing task. These analyses are very helpful for test development (pilot projects)
because they lead to the formation of an item bank consisting of strong items which can
discriminate between ability levels.
One other item statistic that can be used to evaluate items in task 1is the item mean. The
mean, or central tendency, is the average student response to an item. It can be calculated by
adding the number of students who got the item correct (or the number of points earned by all
students on the item) and dividing that total by the total number of students.
We could also look at the frequency and distribution of responses for an item. The
frequency is the number of students who chose an alternative and the distribution refers to the
percentage of students who chose each alternative. Task 1 has 15 items and 17 alternatives so we
could look at how many students chose a specific alternative for item 1 and compare that to the
percentage in the group overall who chose that alternative. Incorrect alternatives which are
frequently chosen may indicate common misunderstandings in a group of students, which could
mean the item is faulty or has an ambiguity.
Descriptive Statistics
Descriptive statistics are used to investigate relationships between variables in a sample
of test takers and between two or more groups of test takers. Descriptive statistical data helps to
describe the distributions of data in order to decide whether or not significant relationships or
differences exist between test takers and groups of test takers. Describing an aspect of data helps
to look at performances as percentages. For example, for tasks 1 and 2 test takers would have to
receive a raw score of 30 to receive 100% on the two tasks. If the test taker had a raw score of
27 between the two sections there percentage score would be 90%. These two types of scores are
samples of interval data which helps us represent how much or how little of skill measured has
been demonstrated by the test taker (Flahive, 2014). The scores could also be ranked as a means
of comparison by listing them from highest score to lowest score, however ranking the scores
(ordinally) does not always show the interval differences so using interval data gives a more
precise picture of the data. For example, if we show all of the interval differences for task 1:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
We can find the central tendencies: mean, median, and mode of the data. The mean is found by
dividing the total of the scores or data points by the number of cases in the set. The median
describes the point in the data in which half of the other cases fall above and below. The median
is a good central tendency statistic if a distribution has outliers because it is less sensitive to them
than finding the mean. The mode is the most common value in a distribution of data. The range
in the data can be calculated by subtracting the lowest number in a set from the highest number
and then adding one.
Score
23
17
21
28
27
17
26
Mean
22.7
22.7
22.7
22.7
22.7
22.7
22.7
Difference Score
0.3
-5.7
-1.7
5.3
4.3
-5.7
3.3
Differences2
0.09
32.49
2.89
28.09
18.49
32.49
10.89
Sum=125.43
Variance=125.43/6=20.90
SD=4.572
Reliability and Standard Error of Measurement
Reliability in a test refers to the consistency of scores across tasks, forms, raters, and
time. Reliability is considered a statistical argument in that it is a quantitative measurement and
gives no qualitative information. Reliability is a characteristic of the results not of the test. The
coefficient alpha is a type of reliability measure that measures for internal consistency in tasks.
In this procedure, the test is given once and Cronbachs alpha formula is applied to the scores:
In this formula N is the total number of items, c-bar is the average inter-item covariance and vbar equals the average variance. Basically this formula shows how closely related a set of items
are as whole group by giving a reliability estimate between -1 and 1, the closer to 1 the more
reliable. The split-half method is also a measure of internal consistency. In this method the test
is given once and two equivalent halves are scored (odd items and even items) and the
Spearman-Brown formula is applied to correct the correlation between the two halves to fit the
whole test. Both formulas can be applied to a set of scores using SPSS software.
On more subjective tests, such as task 3 in the pilot test, inter-rater reliability becomes an
important quantitative measurement to show reliability. Inter-rater reliability measures the
correlation between raters ranking and scoring over time. It is used to assess each raters
consistency over time, to show the degree to which different raters give consistent estimates on
the same phenomenon, and to show agreement between the scores assigned by two different
raters. Training raters and using a clear, concise and reasonable rating scale can increase interrater reliability. The Spearman Brown formula can be applied to measure inter-rater reliability.
The Pearson formula is the most common formula applied to compute inter-rater reliability on
speaking assessment scores. Both of these formulas are included in Microsoft Excel and SPSS.
The standard error of measurement (SEM) is directly related to the reliability of a test. It
is a measurement of the amount of variability in a students performance due to random
measurement error. Because it is not possible to administer an infinite amount of parallel forms,
the SEM is an estimate of the variation that should be considered when interpreting scores. The
SEM demonstrates a window of performance in which somewhere within the window the true
score lies but cannot be calculated exactly because every assessment has some error. The SEM
helps determine the boundaries of error. Whereas the reliability of a test is between -1 and 1, the
SEM is described on the same scale as the test scores. A higher SEM indicates a lower
reliability. The SEM is dependent on the assessment itself and not the test-taker. The formula
for calculating SEM is the SD times the square root of 1 minus the reliability.
Discussion
Critique of Item Performance
As I discussed above, the ways to critique an items performance are by determining an
items difficulty and its discriminating power. It is also a good idea to have another professional,
preferably someone teaching the same class, look over the items for any ambiguities or defaults.
As I was not able to pilot my test it is impossible to critique them, however, I think task 2 may
have interesting results as it is a gap filling task scored on a binary scale. I think this task might
be better if turned into a more objective multiple choice task type.
Evaluation of Test Usefulness
A test is useful if it has utility for a given purpose. I believe this test includes tasks which
are familiar to the test takers as their purpose for taking a Workplace English class is to help
them with the language skills needed to secure and maintain employment. Therefore, I believe
that having the students do a mock interview represents a very authentic and interactive
performance task. The impact of the test I think would not cause washback on either the
employees or the employers, although there would be a financial investment for employer. The
practicality of the assessment depends on the investment that an employer wants to make in their
employees. If they only hired one instructor than this pilot test would not be practical as there
would be no need to place students into levels.
Reliability
The reliability of this test would depend a lot on inter-rater reliability of task 3. Some
strategies to increase the reliability of the pilot test would be to train raters well and correlate the
scoring scale to a benchmark. Also task 2 could be made into a multiple choice task type which
would make the items more objective.
Construct Related Evidence for Validity
I feel the test has very good face validity as the task characteristics are very similar to the
TLU domain, but without a sample it is difficult to assess whether or not there is construct
related evidence for validity. One piece of evidence for construct validity could be if the higher
level language ability students scored high on the test and the lower level language ability
students scored low on the test. Another scenario that would provide evidence for construct
validity is if students who had filled out job applications or done job interviews (in the TLU
domain, i.e. real-world) before taking the assessment placed higher than students who had not.
This finding would provide the evidence that the task characteristics clearly represent the
construct of interest, i.e. the real-world. The tasks required language which is representative of
language in the same types of tasks in the TLU domain making the tasks very interactive and
authentic.
Consequential Evidence for Validity
I feel this test would have a positive impact on test takers because it would place them in
a correct level within a Workplace English program, likewise it could also be used as an indicator
of language ability for an employer. Therefore, I think there is evidence of validity in that the
assessment would be useful to all individuals involved.
Reflection on Personal Significance of Test
This test proposal project was a challenge. However, I think everything that was
included in this project was relevant in helping us as future teachers see the importance of
developing test items and administering tests which are fair, reliable and valid. The descriptive
statistics used for item difficulty and discrimination are easy measures that can be used to glean a
lot of information from test items. One thing I will take away from this project pertains to
writing instructions for items and developing rubrics for outcome measures in that fairness is
paramount. I also realize how difficult it is to write questions that are interactive, authentic, and
valid.
References
Bachman, L. F., Palmer, A. S. (2010). Language assessment in practice: Developing language
assessments and justifying their use in the real world. Oxford: Oxford University Press.
Brown, J.D. (2003). Norm-referenced item analysis (item facility and item discrimination).
Shiken: JALT Testing & Evaluation SIG Newsletter, 7(2) p. 16-19.
Chapelle, C. A., Enright, M. K., & Jamieson, J. M. (2008). Test score interpretation and use.
Building a validity argument for the Test of English as a Foreign Language, 1-25.
Appendix A
TLU Domain Description
Characteristics of the setting
physical characteristics
participants
time of task
Characteristics of the test rubric
instructions
language
Business or institution
English language learners
Morning, afternoon, evening
channel
specification of procedures and tasks
structure
time allotment
scoring method
criteria for correctness
procedures for scoring the response
explicitness of criteria and procedures
Characteristics of the input
format
channel
form
language
length
type
degree of speededness
vehicle
language of the input
language characteristics
organizational characteristics
grammatical
textual
pragmatic characteristics
functional
sociolinguistic
topical characteristics
Characteristics of the expected response
format
channel
form
language
length
type
degree of speededness
language of the expected response
language characteristics
organizational characteristics
grammatical
textual
pragmatic characteristics
functional
sociolinguistic
topical characteristics
Relationship between input and response
reactivity
scope of relationship
directness of the relationship
English
Written and spoken
Briefly explained in speaking and writing
15 matching items, 15 information transfer, 6 speaking
prompts
60
0= wrong, 1=correct
Each correct answer = 1 point
Included in instructions
n/a
n/a
n/a
n/a
n/a
Non-reciprocal and reciprocal
scope of relationship
directness of the relationship
Appendix B
Table of Specifications
Learning Objectives
Text/Task
Knowledge
(recall)
Interpret
and transfer
Informatio
Analyze,
Apply and
Synthesize
#
items
%
item
s
n
Workplace
Vocabulary
Filling out a job
application
15:
Informatio
n
0
1.1-1.15
0
15:
15
42
2.1-2.15
0
6:
16
36
Interviewing for a 0
job
# items
15
15
3.1-3.6
6
% items
42
42
16
Appendix C
Workplace English Test
15
42
100
Directions: Next to each vocabulary item listed in Column A, write the letter of the best
definition for the item from Column B. Each definition in Column B may be used once, more
than once, or not at all. Each item is worth one point. This section should take you no longer
than 20 minutes. An example has been done for you.
Column A
___E__0. Surname
______1. Position desired
Column B
A. Place you last worked
______4. Duties
______5. Skills
______6. Qualifications
______7. Salary
______8. Wage
______9. References
______10. Applicant
______12. Relocate
______13. N/A
Mary Ortez is looking for a job as a head cook. She can start immediately. Her last job
was as a head cook in a kitchen at Freedom College in Arvada, Colorado. She worked in the
college kitchen for 4 years. Her duties included: ordering food and supplies each week, cleaning
the kitchen, and supervising the employees during her shift. Her position as a head cook had a
lot of responsibilities which required her to be flexible and dependable.
Before being a head cook, Mary worked as a baker for 2 years at Delicious Bakery in
Denver, Colorado. The bakery specialized in wedding cakes. In that position, Mary had to be
polite, organized, and creative to satisfy the customers. She also had to be very efficient in order
to make and deliver the wedding cakes on time.
In her first job, shortly after she moved to Colorado from Bogota, Colombia, Mary was a
dishwasher at Hungrys Family restaurant, located in Colorado Springs, Colorado. Shortly after
being hired, she was promoted in the family restaurant to a line cook position. In this position,
Mary was in charge of making the food orders quickly and accurately. At the family restaurant,
Mary worked the graveyard shift. Mary worked for a total of 5 years at Hungrys.
Although Mary moved to the United States 12 years ago from Colombia she mostly
speaks Spanish at home with her family but can speak English well. Mary graduated high school
in Colombia and has attended 2 years of General English as a second language training. She has
been a citizen of the United States for 8 years. Mary has a valid drivers license but she prefers to
take the bus to work if possible.
Job Application
Position Desired:
1.
Name/Address:
3. Last:
Street: 2550 Central Ave. West
City, State: Denver, CO 80511
4. First:
Apt #: 25 D
Phone: 303-897-4562
Length of
of Company
Job Title
Duties
Employment
7.
8.
9.
Freedom College
Arvada, Colorado
10.
11.
Designed cakes
Baker
Hungrys Family
12.
13.
Restaurant,
Colorado Springs,
CO
training:
Task 3: Mock Interview (6 points)
Instructor reads directions to participant before beginning prompt
Baked cakes
Delivered cakes
14.
Directions: In this task you will be interviewing for a management position at a restaurant.
Listen to the interview prompts and respond to the best of your ability according to the
information in the prompt. There will be two warm-up questions and 6 interview questions.
Warm-up question suggests:
What is your name?
Where were you born?
How long have you lived here?
Interview prompts:
Interview prompts:
1.
2.
3.
4.
5.
6.
What is your current job and what position do you work in at your job?
What are your current duties in your job or in your last job?
What do you most enjoy/least enjoy about your current position?
What are your strengths as an employee?
What qualities do you think are important for a manager to have?
Why are you the right person for this job?
*****Thank you for your answers, you did great!*****
Scoring Key:
Part 1: Each item is worth one point for a total of 15 points.
K, I, A, N, B, L, F, C, P, G, O, M, Q, D, J
Part 3: Each blank area is worth one point for a total of 15 points. Test takers are expected
to transfer the information directly from the paragraphs.
1. Head Cook
2. Immediately
3. Ortez
4. Mary
5. Yes
6. Yes
7. 4 years
8. Head Cook
9. Ordering food and supplies
Cleaning the kitchen
Supervising the employees
10. Delicious Bakery, Denver, CO
11. 2 years
12. 5 years
13. Line cook (will also accept dishwasher)
14. making food orders quickly
15. 2 years of General English as a second language
Appendix D
Rubrics
Scoring Rubric for Task 3- Listening and Speaking
OUTCOME MEASURES DEFINITIONS: LISTENING AND SPEAKING
Score Interpretation
Criteria/descriptors for scoring
criterion
vocabulary.
meets two criterion
contexts.
2. Shows ability to go beyond learned patterns
to construct new and more complex sentences.
Shows control of grammar and pronunciation.
3. Can clarify own or others meaning through
rewording.
Interpretation
including
listening and
speaking score
Beginning ESL
Workplace
0-5
Literacy
Low Beginning
ESL
6-11
Workplace
Literacy
High Beginning
ESL
Workplace
12-17
Literacy
Low Intermediate
ESL
Workplace
Literacy
18-23
High Intermediate
ESL
Workplace
24-29
Literacy
Advanced ESL
Workplace
Literacy
30-36
Appendix E
Score Reporting Form
Score Reporting Form:
Name:___________________________________
Score
Class/Level
0-5
6 - 11
12 - 17
18 - 23
24 - 29
30 - 36
Beginning
Low Beginning
High Beginning
Low Intermediate
High Intermediate
Advanced