Test Study

Scenerios:
***no right or wrong answer. Provide an argument and make an informed decision.
1. Is it an assessment or an evaluation?
Think about significance = evaluation
Your assessment strategy feeds into your evaluation
2. Consider the 3 steps to the decision-making process
Conditions (for collected info - physical, emotional context, letting
students show their typical behaviour, quality of instruction)
o Which (of the 3 primary methods) is used to collect data (paper
and pencil, observation techniques, asking oral qs)
Quality of instruments
Objectivity of info, including unbiased scoring
3. Is it performance (actively engaged in the task) or authentic (replicate real-world
contexts) assessment?
4. Is it a standard (under identical conditions or scoring, for fairness) or nonstandard (created by teacher and do not compare students assessment?
5. Is it norm-referenced (comes from within the group, your own criteria in the
classroom, bell-curve) or criterion referenced (designed outside of the gropu. E.g.
criteria made up by the ministry?
6. It is a pluralistic assessment: being responsive to cultural diversity in the
classroom?
7. Does it demonstrate a fixed or growth mindset?
This is why one assessment is not reliable for all
Fixed mindset ex is lack of motivation/attitude. Can represent a random
error. Norm ref can lead to fixed mindset where teachers teach to the test
rather than skills and high order thinking. Teachers are not encouraging
students to learn beyond what is needed. This also effects teachers e.g. low
passion towards job and pedagogy if teaching a low achieveing school
Praising students for their intelligence puts students into a fixed mindset
Growth = encourages teamwork and collaboration rather than an
individualist mindset
8. What is the washback effect?
The impact on on-intended and unintended impacts of assessment
strategies and tools. What does this mean for the organization? Its the test
on teachers and students actions. Washback can be positive (expected) or
negative (unexpected, harmful)
4 aspects
o Practicality
is what youre offering practical? Feasibility of test given

materials, funding, time, expertise, staff
o Reliability
will performance be the same if I test it again
(consistency)?
Consider the type of test, the reliability estimate reported,
and the context in which the test will be used.
Evaluating reliability coefficients: types of reliability used
(sources of random error)? How reliability studies were
conducted (conditions data was obtained, length of time
that passed between administrations of a test),
characteristics of the sample group.
o Validity
measures what its supposed to.
Is there a psychological reality (the norms of society) E.g.
thousands of years ago, people thought the sun rotated
around the earth and that was their psychological reality.
Construct that can be tested.
What level of validity coefficient would you use for the
scenerio?
Construct(not static - changes from time to time. Norm.) or
Face validity (first impressions. Shock factor. E.g.
interview is supposed to be informal but when you arrive it
isnt)?
3 types of methods for conducting validity
a. Criterion-related (statistical rel. between test
performance and job performance. E.g. people who
score high on test do better at job (concurrent or
predictive)
b. Content-related - content represents important jobrelated behaviours. Test items should be relevant to
and measure directly important requirements and
qualifications for the job.
c. Construct-related - test measures the construct or
characteristic it claims to measure and this is
important and successful performance
o Impact
The positive or negative effects of testing.
tool might be valid, reliable, and practical, but its not a
good tool because of its impact.
Question of ethics. The so what? of what you do.
9. Consider the types of reliability

Test-retest
o Through passage of time.
o Memory is an issue - when students retake the test they might

remember more thus get a higher grade then they should. Orr too
much time in between tests.
o Variance may happen
o Systematic error due to learning or forgetting
High parallel form
o Test that mirrors first test
o Assumes validity of the first test
o Challenging to write another test that has the same form
o Still cant completely eliminate problem of memory
Inter-rater reliability
o Usually for open-ended qs
o Indicates how consistent tst scores are likely to be if the test is
scored by more than 1 rater
Internal consistency
o The extent to which items on a test measure the same thing
o Problem: if test taker is tired by #25. To combat this order
problem (which is a systematic error), half of class does test order
one and the other half does test number two
10. How are the aspects you can control compensate for the ones you cannot?
11. What are your limitations?
12. Formative or Summative assessment?
13. Feedback?
14. Do the outcomes contain the content and the process?
15. What types of errors are there?
Systematic - the only type of error we can prevent. E.g. race, gender, etc.
topics that refugees might not be able to understand.
Random *its better to have this. E.g. environment, assessment instrument
form, test takers temporary psychological or physical state, multiple raters
(subjectivity of rater), fatigue
16. What is the purpose, to control and command or to be transformative?
Definitions:
Assessment: Collecting data and weighing opinions in order to make decisions
Classical Measurement Theory or true score theory
- control errors (reduce systematic errors) through reliability
- trying to control things we know and things we dont know
- reduce measurement error by
pilot testing
training
double-checking data
multiple measures of the same construct (using different tools, e.g. not all
oral qs)
reliability is dependent on the sample (e.g. for test-retest its dependent upon how
the student performs)
test and sample dependent
problem is it doesnt differentiate between test takers ability
problem is its based on validity and reliability
classical analyses are done on the whole rather than individual
although item statistics can be generated, they apply to the whole group.
In CTT we assume that error is: normally distributed, uncorrelated with true
score, has a mean of zero
o
o
o
o
Evaluation:
what
Making jugments about merit, worth, and significance. Is it work it?

When you evaluate, you go beyond assessment. Are we achieving
we should achieve? E.g. is the school meeting the expectations?
Factor Analysis
- statistical model that tries to say what does income/diversity/socioeconomic (any
general term) mean?
- These qs are not as simple as they seem
- Looks at concept to see what factors come together to make that global concept so
then you can make 30 qs and see which ones are close together
- Regression vs factor analysis
IRT
-
how different people react to different items

can show Qs that need to be gone off the test or be rearranged
item difficulty and learner ability
item specific
probability of learner with specific chance of answering an item (this is the value
to large scale assessment)
juxtaposition of learner ability, proficiency and item difficulty
Latent trait model

- one of these is IRT
- aim to look at the underlying traits through spectific items rather than the test as a
whole
- sample-free assessment
- sometimes called modern psychometrics cuz in large scale assessments, it has
almost entirely replaced CTT
Measurement Theory:
True score = Observed score plus random error

At the heart, there is always error
Paradigm = use this word to refer to your philosophy, how you do things, the tools you
choose. Also try not to evaluate certain philosophies. You can evaluate them, but you
cant really say that one is better than the other.
Pragmatist = person who says depends
Test: Technique or tool that helps obtain (obtain indicates that it is a test) info about
characteristics of a person. A test is not an assessment!

Test Study

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Test Study

Caricato da

Copyright:

Formati disponibili

Scenerios:

is what youre offering practical? Feasibility of test given

9. Consider the types of reliability

o Memory is an issue - when students retake the test they might

Making jugments about merit, worth, and significance. Is it work it?

how different people react to different items

Latent trait model

True score = Observed score plus random error

Potrebbero piacerti anche