LET Review Assessment of Learning

LET Review
Professional
Education
ASSESSMENT OF LEARNING
Mr. Angelo Unay
*BEED, PNU-Manila (Cum Laude)
*PGDE-Math & English NTU-NIE, Singapore
BASIC CONCEPTS
Test
An instrument designed to measure any quality,
ability, skill or knowledge.
Comprised of test items of the area it is designed
to measure.
Measurement
A process of quantifying the degree to which
someone/something possesses a given trait (i.e.
quality, characteristics or features)
A process by which traits, characteristics and
behaviors are differentiated.
BASIC CONCEPTS
Assessment
A
process of gathering and

organizing data into an interpretable
form to have basis for decision- making.
It is a prerequisite to evaluation. It
provides the information which enables
evaluation to take place.
BASIC CONCEPTS
Evaluation
A process of systematic analysis of
both qualitative and quantitative data in

order to make sound judgment or
decision.
It involves judgment about the
desirability of changes in students.
MODES OF ASSESSMENT
MODE
TRADITIONAL
DESCRIPTION
EXAMPLES
ADVANTAGES
DISADVANTAGES
The objective
paper-andpen test which
usually
assesses lowlevel thinking
skills
Standardized
Tests
Teacher-made
Tests
Scoring is
objective
Administration
is easy
because
students can
take the test at
the same time
Preparation of
instrument is
time-consuming
Prone to
cheating
Question:
Which is an advantage of teacher-made tests
over those of standardized tests?
Teacher-made tests are:
a. highly reliable
b. better adapted to the needs of the
pupils
c. more objectively scored
d. highly valid
MODES OF ASSESSMENT
MODE
PERFORMANCE
DESCRIPTION
EXAMPLES
A mode of
assessment that
requires actual
demonstration
of skills or
creation of
products of
learning
Practical Test
Oral and
Aural Tests
Projects
ADVANTAGES
Preparation of
the instrument is
relatively easy
Measures
behaviours that
cannot be
deceived
DISADVANTAGES
Scoring tends to
be subjective
without rubrics
Administration
is time
consuming
MODES OF ASSESSMENT
MODE
PORTFOLIO
DESCRIPTION
EXAMPLES
ADVANTAGES
DISADVANTAGES
A process of
gathering
multiple
indicators of
student
progress to
support course
goals in dynamic,
ongoing and
collaborative
process
Working
Portfolios
Show
Portfolios
Documentary
Portfolios
Measures
students
growth and
development
Intelligencefair
Development is
time-consuming
Rating tends to
be subjective
without rubrics
Question:
Which is the least authentic mode of
assessment?
a. Paper-and-pencil test in vocabulary
b. Oral performance to assess students
spoken communication skills
c. Experiments in science to assess skill
in the use of scientific methods
d. Artistic production for music or art
subject
A COMPARISON OF THE FOUR EVALUATION

PROCEDURES
Placement
Evaluation
done before instruction
determines mastery
of prerequisite
skills
not graded
Summative
Evaluation
done after
instruction
certifies
mastery of the
intended
learning
outcomes
graded
determines the extent of what the pupils have achieved or

mastered in the objectives of the intended instruction
determine the students strengths and weaknesses
place the students in specific learning groups to facilitate
teaching and learning
serve as a pretest for the next unit
serve as basis in planning for a relevant instruction
A COMPARISON OF THE FOUR EVALUATION

PROCEDURES
Formative Evaluation
Diagnostic Evaluation
reinforces successful
learning
determine recurring or
persistent difficulties
provides continuous
feedback to both
students and teachers
concerning learning
success and failures
searches for the underlying
not graded
administered during instruction
designed to formulate a plan for remedial instruction
modify the teaching and learning process
not graded
causes of these problems

that do not respond to first
aid treatment
helps formulate a plan for a

detailed remedial instruction
PRINCIPLES OF HIGH QUALITY

ASSESSMENT
1.Clarity of Learning Targets
Clear and appropriate learning targets
include (1) what students know and can do
and (2) the criteria for judging student
performance.
2. Appropriateness of Assessment Methods
The method of assessment to be used
should match the learning targets.

ASSESSMENT
3. Validity
This refers to the degree to which a scorebased
inference
is
appropriate,
reasonable, and useful.
4. Reliability
This refers to the degree of consistency
when several items in a test measure the
same thing, and stability when the same
measures are given across time.

ASSESSMENT
5. Fairness
Fair assessment is unbiased and provides
students with opportunities to demonstrate what
they have learned.
6. Positive Consequences
The overall quality of assessment is enhanced
when it has a positive effect on student
motivation and study habits. For the teachers,
high-quality assessments lead to better
information and decision-making about students.

ASSESSMENT
7. Practicality and efficiency
Assessments
should
consider
the
teachers familiarity with the method, the
time
required,
the
complexity
of
administration, the ease of scoring and
interpretation, and cost.
TAXONOMY OF EDUCATIONAL
OBJECTIVES
OBJECTIVES
COGNITIVE DOMAIN
(Bloom, 1956)
KNOWLEDGE
Remembering of previously learned material
Recall of a wide range of material, but all
that is required is the bringing to mind of the
appropriate information
Represents the lowest level of learning
outcomes in the cognitive domain
OBJECTIVES
COGNITIVE DOMAIN
(Bloom, 1956)
COMPREHENSION
Ability to grasp the meaning of material
Shown by translating material from one form
to another, by interpreting material, and by
estimating future trends
OBJECTIVES
COGNITIVE DOMAIN
(Bloom, 1956)
APPLICATION
Ability to use learned material in new and
concrete situations
Application of rules, methods, concepts,
principles, laws, and theories
OBJECTIVES
COGNITIVE DOMAIN
(Bloom, 1956)
ANALYSIS
Ability to break down material into its
component parts so that its organizational
structure may be understood
Include identification of parts, analysis of the
relationships between parts, and recognition
of the organizational principles involved
OBJECTIVES
COGNITIVE DOMAIN
(Bloom, 1956)
SYNTHESIS
Ability to put parts together to form a new
whole
Stress creative behaviors, with major
emphasis on the formulation of new patterns
or structures
OBJECTIVES
COGNITIVE DOMAIN
(Bloom, 1956)
EVALUATION
Ability to judge the value of material for a
given purpose
Judgments are to be based on definite
criteria [internal (organization) or external
(relevance to the purpose)]
OBJECTIVES
COGNITIVE DOMAIN
(Bloom, 1956)
READING
K: Knows vocabulary
U: Reads with comprehension
Ap: Reads to obtain information to solve a problem
An: Analyzes text and outlines arguments
S: Integrates the main ideas across two or more passages
E: Critiques the conclusions in a text and offers alternatives
OBJECTIVES
COGNITIVE DOMAIN
(Bloom, 1956)
MATHEMATICS
K: Knows the number system and basic operations
U: Understands math concepts and processes
Ap: Uses mathematics to solve problems
An: Shows how to solve multistep problems
S: Derives proofs
E: Critiques proofs in geometry
OBJECTIVES
COGNITIVE DOMAIN
(Bloom, 1956)
SCIENCE
K:
Knows terms and facts
U:
Understands scientific principles
Ap: Applies principles to new situations
An: Analyzes chemical reactions
S:
Conducts and reports experiments
E:
Critiques scientific reports
Question:
With SMART lesson objectives in the
synthesis in mind, which one does
NOT belong to the group?
a. Formulate
b. Judge
c. Organize
d. Build
Question:
Which test item is in the highest level of
Blooms taxonomy of objectives?
a. Explain how a tree functions in
relation to the ecosystem.
b. Explain how trees receive nutrients.
c. Rate three different methods of
controlling tree growth.
d. List the parts of a tree.
Question:
Which behavioral term describes a
lesson outcome in the highest level
of Blooms taxonomy?
a. Analyze
b. Create
c. Infer
d. Evaluate
MAIN POINTS
FOR COMPARISON
TYPES OF TESTS
Psychological
Aims to measure
students intelligence or
mental ability in a large
degree without reference
to what the students has
Purpose learned
Educational
Aims to measure
the result of
instructions and
learning (e.g.
Measures the intangible Achievement Tests,
characteristics of an
Performance Tests)
individual (e.g. Aptitude
Tests, Personality Tests,
Intelligence Tests)
MAIN POINTS
FOR
COMPARISON
TYPES OF TESTS
Survey
Scope of
Content
Mastery
Covers a broad
range of objectives
Covers a
specific
objective
Measures general
achievement in
certain subjects
Measures
fundamental
skills and
abilities
MAIN POINTS
FOR
COMPARISON
TYPES OF TESTS
Verbal
Language
Mode
Non-Verbal
Students do not
use words in
Words are used by
attaching meaning
students in attaching
to or in responding
meaning to or
to test items (e.g.
responding to test items
graphs, numbers,
3-D subjects)
MAIN POINTS
FOR
COMPARISON
TYPES OF TESTS
Standardized
Construction
Informal
Constructed by a
professional item writer
Constructed by a
classroom teacher
Covers a broad range of

content covered in a subject
area
Covers a narrow
range of content
Uses mainly multiple choice
Various types of
items are used
Items written are screened

and the best items were
chosen for the final
instrument
Teacher picks or
writes items as
needed for the test
MAIN POINTS
FOR
COMPARISON
TYPES OF TESTS
Standardized
Construction
Informal
Scored
Can be scored by a
manually by the
machine
teacher
Interpretation of
results is usually
norm-referenced
Interpretation
is usually
criterionreferenced
MAIN POINTS
FOR
COMPARISON
TYPES OF TESTS
Individual
Mostly given orally or
requires actual
demonstration of skill
Group
This is a paperand-pen test
Loss of rapport,
One-on-one situations,
insight and
Manner of
thus, many opportunities
knowledge about
Administration for clinical observation
each examinee
Chance to follow-up
examinees response in
order to clarify or
comprehend it more
clearly
Same amount of
time needed to
gather information
from one student
MAIN POINTS
FOR
COMPARISON
TYPES OF TESTS
Objective
Effect of
Biases
Subjective
Scorers personal
judgment does not
affect the scoring
Affected by
scorers personal
opinions, biases
and judgments
Worded that only one

answer is acceptable
Several answers
are possible
Little or no
disagreement on what
is the correct answer
Possible to
disagreement on
what is the correct
answer
MAIN POINTS
FOR
COMPARISON
TYPES OF TESTS
Power
Consists of series of
items arranged in
ascending order of
Time Limit and
difficulty
Level of
Difficulty
Measures students
ability to answer more
and more difficult items
Speed
Consists of items
approximately
equal in difficulty
Measures
students speed or
rate and accuracy
in responding
MAIN POINTS
FOR
COMPARISON
TYPES OF TESTS
Selective
Supply
Multiple choice, True or Short answer,

Completion, Restricted
False, Matching Type
or Extended Essay
Format
There are choices for

the answer
There are no choices

for the answer
Can be answered
quickly
May require a longer

time to answer
Prone to guessing
Less chance to
guessing but prone to
bluffing
Time consuming to
construct
Time consuming to
answer and score
MAIN POINTS
FOR COMPARISON
TYPES OF TESTS
Maximum
Performance
Determines what
Nature individuals can do
when performing at
of
Assess their best
ment
Aptitude tests,
achievement tests
Typical
Performance
Determines what
individuals will do
under natural
conditions
Attitude, interest, and
personality inventories;
observation
techniques; peer
appraisal
MAIN POINTS
FOR
COMPARISON
TYPES OF TESTS
Norm-Referenced
Criterion-Referenced
Result is interpreted
by comparing one
students
performance with
Interpretation other students
performance
Result is interpreted
by comparing
students performance
based on a predefined
standard/criteria
Some will really

pass
All or none may pass
Constructed by
trained professional
Typically constructed
by the teacher
MAIN POINTS
FOR
COMPARISON
TYPES OF TESTS
Norm-Referenced
There is
competition for a
limited percentage
of high scores
There is no
competition for a
limited percentage of
high score
Typically covers a
Interpretation large domain of
learning tasks
Typically focuses on
a delimited domain of
learning
Emphasizes
discrimination
among individuals in
terms of level of
learning
Emphasizes
description of what
learning tasks
individuals can and
cannot perform
MAIN POINTS
FOR
COMPARISON
Interpretation
TYPES OF TESTS
Norm-Referenced
Favors items of
average difficulty
and typically omits
very easy and very
hard items
Matches item
difficulty to learning
tasks, without altering
item difficulty or
omitting easy or hard
items
Interpretation
requires a clearly
defined group
Interpretation
requires a clearly
defined and delimited
achievement domain
Similarities Between NRTs and CRTs

1. Both require specification of the
achievement domain to be measured.
2. Both
require
a
relevant
representative sample of test items.
and
3. Both use the same types of test items.
Similarities Between NRTs and CRTs

4. Both use the same rules for item writing
(except for item difficulty).
5. Both are judged by the same qualities of
goodness (validity and reliability).
6. Both are useful in educational
assessment.
Question:
A test consists of a graph showing the
relationship between age and population.
Following it is a series of true-false items
based on the graph. Which type of test does
this illustrate?
a. Laboratory exercise
b. Problem solving
c. Performance
d. Interpretive
Types of Test According to FORMAT

Selective Type provides choices for the answer
a. Multiple Choice consists of a stem which
describes the problem and 3 or more alternatives which
give the suggested solutions. The incorrect alternatives
are the distractors.
b. True-False or Alternative Response

consists of declarative statement that one has to mark
true or false, right or wrong, correct or incorrect, yes or
no, fact or opinion, and the like.
c. Matching Type consists of two parallel columns:

Column A, the column of premises from which a match is
sought; Column B, the column of responses from which
the selection is made.
Types of Test According to FORMAT

Supply Test
a. Short Answer uses a direct question that can be
answered by a word, phrase, a number, or a symbol
b. Completion Test it consists of an incomplete
statement
Essay Test
c. Restricted Response limits the content of the
response by restricting the scope of the topic
d. Extended Response allows the students to select
any factual information that they think is pertinent, to
organize their answers in accordance with their best
judgment
Question:
Which assessment tool will be most
authentic?
a. Short answer test
b. Alternate-response test
c. Essay test
d. Portfolio
Question:
Which does NOT belong to the
group?
a. Short Answer
b. Completion
c. Multiple Choice
d. Restricted-response essay
ALTERNATIVE ASSESSMENT
PERFORMANCE & AUTHENTIC ASSESSMENTS
Specific behaviors are to be
observed
When To
Use
Possibility
of
judging
the
appropriateness of students actions
A process or outcome cannot be
directly measured by paper-andpencil test
Allow evaluation of complex skills
which are difficult to assess using
written tests
Advantages Positive effect on instruction and
learning
Can be used to evaluate both the
process and the product
Time-consuming
develop, and score
to
administer,
Limitations Subjectivity in scoring

Inconsistencies in performance on
alternative skills
PORTFOLIO ASSESSMENT
CHARACTERISTICS:
1) Adaptable to individualized instructional goals
2) Focus on assessment of products
3) Identify students strengths rather than
weaknesses
4) Actively involve students in the evaluation
process
5) Communicate student achievement to others
6) Time-consuming
7) Need of a scoring plan to increase reliability
RUBRICS scoring guides, consisting of specific
pre-established performance criteria, used in
evaluating
student
work
on
performance
assessments
Types:
1) Holistic Rubric requires the teacher to score
the overall process or product as a whole,
without judging the component parts separately
2) Analytic Rubric requires the teacher to score
individual components of the product or
performance first, then sums the individual
scores to obtain a total score
Types of NON-COGNITIVE TEST

1. Closed-Item or Forced-choice Instruments
ask for one or specific answer
a. Checklist measures students preferences,
hobbies, attitudes, feelings, beliefs, interests, etc.
by marking a set of possible responses
b. Scales these instruments that indicate the
extent or degree of ones response
1) Rating Scale measures the degree or
extent of ones attitudes, feelings, and
perception about ideas, objects and people
by marking a point along 3- or 5- point scale

2.) Semantic Differential Scale measures the
degree of ones attitudes, feelings and perceptions
about ideas, objects and people by marking a point
along 5- or 7- or 11- point scale of semantic
adjectives
Ex:
Math is
easy __ __ __ __ __ __ __ difficult
important __ __ __ __ __ __ __ trivial
useful __ __ __ __ __ __ __ useless

3) Likert Scale measures the degree of ones
agreement or disagreement on positive or
negative statements about objects and people
Ex:
Use the scale below to rate how much you agree or
disagree about the following statements.
5 Strongly Agree
4 Agree
3 Undecided
2 Disagree
1 Strongly Disagree
1. Science is interesting.
2. Doing science experiments is a waste of time.

c. Alternative Response measures students
preferences, hobbies, attitudes, feelings, beliefs,
interests, etc. by choosing between two possible
responses
Ex:
T F 1. Reading is the best way of spending leisure time.
d. Ranking measures students preferences or
priorities by ranking a set of responses
Ex: Rank the following subjects according to its importance.
___ Science
___ Math
___ English
____ Social Studies

____ Arts

2. Open-Ended Instruments
open to more than one answer
Sentence Completion measures students preferences
over a variety of attitudes and allows students to
answer by completing an unfinished statement which
may vary in length
Surveys measures the values held by an individual by
writing one or many responses to a given question
Essays allows the students to reveal and clarify their
preferences, hobbies, attitudes, feelings, beliefs, and
interests by writing their reactions or opinions to a
given question
Question:
To evaluate teaching skills, which is
the most authentic tool?
a. Observation
b. Non-restricted essay test
c. Short answer test
d. Essay test
GENERAL SUGGESTIONS
IN WRITING TESTS
1. Use your test specifications as guide to
item writing.
2. Write more test items than needed.
3. Write the test items well in advance of the
testing date.
4. Write each test item so that the task to be
performed is clearly defined.
5. Write each test item in appropriate reading
level.
GENERAL SUGGESTIONS
IN WRITING TESTS
6. Write each test item so that it does not
provide help in answering other items in
the test.
7. Write each test item so that the answer is
one that would be agreed upon by experts.
8. Write test items so that it is the proper level
of difficulty.
9. Whenever a test is revised, recheck its
relevance.
SPECIFIC SUGGESTIONS
Supply Type
1. Word the item/s so that the required
answer is both brief and specific.
2. Do not take statements directly from
textbooks to use as a basis for short
answer items.
3. A direct question is generally more
desirable than an incomplete statement.
4. If the item is to be expressed in numerical
units, indicate the type of answer needed.
Supply Type
5. Blanks should be equal in length.
6. Answers should be written before the item
number for easy checking.
7. When completion items are to be used, do
not have too many blanks. Blanks should
be at the center of the sentence and not at
the beginning.
Selective Type
Alternative-Response
1. Avoid broad statements.
2. Avoid trivial statements.
3. Avoid the use of negative statements
especially double negatives.
4. Avoid long and complex sentences.
5. Avoid including two ideas in one sentence
unless cause and effect relationship is
being measured.
Selective Type
Alternative-Response
6.If opinion is used, attribute it to some source
unless the ability to identify opinion is being
specifically measured.
7. True statements and false statements should be
approximately equal in length.
8. The number of true statements and false
statements should be approximately equal.
9. Start with a false statement since it is a common
observation that the first statement in this type is
always positive.
Selective Type
Matching Type
1. Use only homogeneous materials in a
single matching exercise.
2. Include an unequal number of responses
and premises, and instruct the pupils that
response may be used once, more than
once, or not at all.
3. Keep the list of items to be matched brief,
and place the shorter responses at the
right.
Selective Type
Matching Type
4. Arrange the list of responses in logical
order.
5. Indicate in the directions the basis for
matching the responses and premises.
6. Place all the items for one matching
exercise on the same page.
Selective Type
Multiple Choice
1. The stem of the item should be meaningful
by itself and should present a definite
problem.
2. The item should include as much of the
item as possible and should be free of
irrelevant information.
3. Use a negatively stated item stem only
when a significant learning outcome
requires it.
Selective Type
Multiple Choice
4. Highlight negative words in the stem for
emphasis.
5. All the alternatives should be grammatically
consistent with the stem of the item.
6. An item should only have one correct or
clearly best answer.
7. Items used to measure understanding
should contain novelty, but beware of too
much.
Selective Type
Multiple Choice
8. All distractors should be plausible.
9. Verbal association between the stem and the
correct answer should be avoided.
10. The relative length of the alternatives should not
provide a clue to the answer.
11. The alternatives should be arranged logically.
12. The correct answer should appear in each of the
alternative positions and approximately equal
number of times but in random number.
Selective Type
Multiple Choice
13. Use of special alternatives (e.g. None of
the above; all of the above) should be done
sparingly.
14. Do not use multiple choice items when
other types are more appropriate.
15. Always have the stem and alternatives on
the same page.
16. Break any of these rules when you have a
good reason for doing so.
Question:
In preparing a multiple-choice test,
how many options would be ideal?
a. Five
b. Three
c. Any
d. Four
Essay Type
1. Restrict the use of essay questions to
those learning outcomes that cannot be
satisfactorily measured by objective items.
2. Formulate questions that will bring forth the
behavior specified in the learning outcome.
3. Phrase each question so that the pupils
task is clearly defined.
4. Indicate an approximate time limit for each
question.
5. Avoid the use of optional questions.
Question:
What should a teacher do before
constructing items for a particular test?
a. Prepare the table of specifications.
b. Review the previous lessons.
c. Determine the length of time for
answering it.
d. Announce to students the scope of
the test.
CRITERIA TO CONSIDER IN
CONSTRUCTING GOOD TESTS
VALIDITY - is the degree to which a test measures
what is intended to be measured. It is the
usefulness of the test for a given purpose. It is the
most important criteria of a good examination.
FACTORS influencing the validity of tests in general
Appropriateness of test
Directions
Reading Vocabulary and Sentence Structure
Difficulty of Items
Construction of Items
Length of Test
Arrangement of Items
Patterns of Answers
WAYS of Establishing Validity

Face Validity is done by examining the
physical appearance of the test
Content Validity is done through a careful

and critical examination of the objectives of the
test so that it reflects the curricular objectives

Criterion-related validity is established statistically
such that a set of scores revealed by a test is correlated
with scores obtained in another external predictor or
measure.
Has two purposes:
a. Concurrent Validity describes the present status of
the individual by correlating the sets of scores obtained
from two measures given concurrently
b. Predictive Validity describes the future performance
of an individual by correlating the sets of scores obtained
from two measures given at a longer time interval

Construct Validity is established statistically by
comparing psychological traits or factors that influence
scores in a test, e.g. verbal, numerical, spatial, etc.
a. Convergent Validity is established if the instrument
defines another similar trait other than what it intended
to measure (e.g. Critical Thinking Test may be
correlated with Creative Thinking Test)
b. Divergent Validity is established if an instrument can
describe only the intended trait and not other traits (e.g.
Critical Thinking Test may not be correlated with
Reading Comprehension Test)
RELIABILITY - it refers to the consistency of

scores obtained by the same person when retested
using the same instrument or one that is parallel to
it.
FACTORS affecting Reliability
Length of the test
Difficulty of the test
Objectivity
Administrability
Scorability
Economy
Adequacy
Type of
Reliability
Measure
Procedure
Statistical
Measure
Test-Retest
Measure of
stability
Give a test twice to the same

group with any time interval
between sets from several
minutes to several years
Pearson r
Equivalent
Forms
Measure of
equivalence
Give parallel forms of test at

the same time between forms
Pearson r
Test-Retest
with
Equivalent
Forms
Measure of
stability and
equivalence
Give parallel forms of test

with increased time intervals
between forms
Pearson r
Split Half
Measure of
Internal
Consistency
Method
KuderRichardson
Measure of
Internal
Consistency
Give a test once. Score

Pearson r and
equivalent halves of the test
Spearman Brown
(e.g. odd-and even numbered
Formula
items)
Give the test once, then
correlate the
proportion/percentage of the
students passing and not
passing a given item
Kuder
Richardson
Formula 20 & 21
Question:
Setting up criteria for scoring essay
tests is meant to increase their:
a. Objectivity
b. Reliability
c. Validity
d. Usability
Question:
The same test is administered to different
groups at different places at different
times. This process is done in testing
the:
a. Objectivity
b. Validity
c. Reliability
d. Comprehensiveness
ITEM ANALYSIS
STEPS:
1. Score the test. Arrange from lowest to
highest.
2. Get the top 27% (T27) and below 27% (B27)
of the examinees.
3. Get the proportion of the Top and Below who
got each item correct. (PT) & (PB)
4. Compute for the Difficulty Index.
Df = (PT + PB) / N
5. Compute for the Discrimination Index.
Ds = (PT - PB) / n
ITEM ANALYSIS
INTERPRETATION
Difficulty Index (Df)
0.76 1.00
0.25 0.75
0.00 0.24
=
=
=
easy
(revise)
average
(accept)
very difficult (reject)
Discrimination Index (Ds)

0.40 above = very good (accept)
0.30 0.39 = good
(accept)
0.20 0.29 = moderate (revise)
0.19 and below = poor
(reject)
ITEM ANALYSIS
Example:
Question
1
2
A
0
12*
B
3
13
C
24*
3
D
3
2
Df
# of students: 30
*To compute the Df:
Divide the number of students who choose the
correct answer by the total number of students.
ITEM ANALYSIS
Example:
Question
1
2
A
0
12*
B
3
13
C
24*
3
D
3
2
Df
0.80
# of students: 30
*To compute the Df:
ITEM ANALYSIS
Example:
Question
1
2
A
0
12*
B
3
13
C
24*
3
D
3
2
Df
0.80
0.40
# of students: 30
*To compute the Df:
Example:
ITEM ANALYSIS
Student
Score (%)
Q1
Q2
Q3
Joe
90
Dave
90
Sujie
80
Darrell
80
Eliza
70
Zoe
60
Grace
60
Hannah
50
Ricky
50
Anita
40
* 1 corrrect; 0 - incorrect
ITEM ANALYSIS
Example:
Question
1
2
3
PT
PB
Df
Ds
ITEM ANALYSIS
Example:
Question
1
2
3
PT
4
0
5
PB
4
3
1
Df
Ds
ITEM ANALYSIS
Example:
Question
1
2
3
PT
4
0
5
PB
4
3
1
Df
0.80
Ds
ITEM ANALYSIS
Example:
Question
1
2
3
PT
4
0
5
PB
4
3
1
Df
0.80
0.30
Ds
ITEM ANALYSIS
Example:
Question
1
2
3
PT
4
0
5
PB
4
3
1
Df
0.80
0.30
0.60
Ds
ITEM ANALYSIS
Example:
Question
1
2
3
PT
4
0
5
PB
4
3
1
Df
0.80
0.30
0.60
Ds
0
ITEM ANALYSIS
Example:
Question
1
2
3
PT
4
0
5
PB
4
3
1
Df
0.80
0.30
0.60
Ds
0
- 0.6
ITEM ANALYSIS
Example:
Question
1
2
3
1.
2.
3.
4.
PT
4
0
5
PB
4
3
1
Df
0.80
0.30
0.60
Ds
0
- 0.6
0.8
Which question was the easiest?

Which question was the most difficult?
Which item has the poorest discrimination?
Which question would you eliminate (if any)?
Why?
Question:
A negative discrimination index means that:
a. More from the lower group answered
the test items correctly.
b. The items could not discriminate
between the lower and upper group.
c. More from the upper group answered
the test item correctly.
d. Less from the lower group got the test
item correctly.
Question:
A test item has a difficulty index of 0.89
and a discrimination index of 0.44. What
should the teacher do?
a. Reject the item.
b. Retain the item.
c. Make it a bonus item.
d. Make it a bonus item and reject it.
SCORING ERRORS AND BIASES

Leniency error: Faculty tends to judge better than it really is.
Generosity error: Faculty tends to use high end of scale only.
Severity error: Faculty tends to use low end of scale only.
Central tendency error:
Faculty avoids both extremes of the scale.
Bias:
Letting other factors influence score (e.g., handwriting,
typos)
Halo effect:
Letting general impression of student influence rating of
specific criteria (e.g., students prior work)
Contamination effect:
Judgment is influenced by irrelevant knowledge about the
student or other factors that have no bearing on
performance level (e.g., student appearance)
SCORING ERRORS AND BIASES

Similar-to-me effect:
Judging more favorably those students whom faculty see
as similar to themselves (e.g., expressing similar interests
or point of view)
First-impression effect:
Judgment is based on early opinions rather than on a
complete picture (e.g., opening paragraph)
Contrast effect:
Judging by comparing student against other students
instead of established criteria and standards
Rater drift:
Unintentionally redefining criteria and standards over time
or across a series of scorings (e.g., getting tired and
cranky and therefore more severe, getting tired and
reading more quickly/leniently to get the job done)
SCALES OF MEASUREMENT
NOMINAL
ORDINAL
RATIO
INTERVAL
frequency
TYPES OF DISTRIBUTION
low scores
Normal Distribution
Symmetrical
Bell Curve
scores
high scores
frequency
low scores
scores
Rectangular Distribution
high scores
Unimodal Distribution
Bimodal Distribution
high scores
Multimodal / Polymodal
Distribution
frequency
low scores
scores
high scores
Positively Skewed Distribution

Skewed to the Right
frequency
low scores
scores
high scores
Negatively Skewed Distribution

Skewed to the Left
KURTOSIS
Leptokurtic
distributions are tall and
peaked. Because the scores
are clustered around the
mean, the standard deviation
will be smaller.
Mesokurtic
distributions are the ideal
example
of
the
normal
distribution,
somewhere
between the leptokurtic and
playtykurtic.
Platykurtic
distributions
and flat.
are
broad
Question:
Which statement applies when score
distribution is negatively skewed?
a. The scores are evenly distributed
from the left to the right.
b. Most pupils are underachievers.
c. Most of the scores are high.
d. Most of the scores are low.
Question:
If the scores of your test follow a
positively skewed score distribution,
what should you do? Find out _______.
a. why your items are easy
b. why most of the scores are high
c. why some pupils scored low
d. why most of the scores are low
ASSUMPTIONS
WHEN USED
APPROPRIATE STATISTICAL TOOLS

MEASURES OF
CENTRAL TENDENCY
(describes the
representative value of a
set of data)
When
the
frequency Mean
distribution is regular or average
symmetrical (normal)
Usually used when data
are numeric (interval or
ratio)
the
arithmetic
MEASURES OF
VARIABILITY
(describes the degree of
spread or dispersion of a
set of data)
Standard Deviation
root-mean-square
of
deviations from the mean
the
the
When
the
frequency Median the middle score Quartile Deviation the
distribution is irregular or in a group of scores that are average deviation of the 1st and
skewed
ranked
3rd quartiles from the median
Usually when the data is
ordinal
When the distribution of Mode the most frequent Range
the
difference
scores is normal and quick score
between the highest and the
answer is needed
lowest score in the distribution
Usually used when the
data are nominal
Question:
Teacher B is researching on a family
income distribution which is quite
symmetrical. Which measure/s of
central
tendency
will
be
most
informative and appropriate?
a. Mode
b. Mean
c. Median
d. Mean and median
Question:
What measure/s of central tendency does
the number 16 represent in the following
score distribution?
14, 15, 17, 16, 19, 20, 16, 14, 16
a. Mode only
b. Median only
c. Mode and median
d. Mean and mode
INTERPRETING MEASURES OF VARIABILITY

STANDARD DEVIATION (SD)
The result will help you determine if the group is
homogeneous or not.
The result will also help you determine the number of
students that fall below and above the average performance.
Main points to remember:

Points above Mean + 1SD = range of above average
Mean + 1SD
Mean - 1SD
= give the limits of an average ability
Points below Mean 1SD = range of below average
Example:
A class of 25 students was given a 75-item test. The
mean average score of the class is 61. The SD is 6.
Lisa, a student in the class, got a score of
Describe the performance of Lisa.
X = 61
SD = 6
63.
X = 63
X + SD = 61 + 6 = 67
X - SD = 61 6 = 55
All scores between 55-67 are average.
All scores above 67 or 68 and above are above average.
All scores below 55 or 54 and below are below average.
Therefore, Lisas score of 63 is average.
Question:
Zero standard deviation means that:
a. The students scores are the same.
b. 50% of the scores obtained is zero.
c. More than 50% of the scores
obtained is zero.
d. Less than 50% of the scores
obtained is zero.
Question:
Nellies score is within x 1 SD. To which
of the following groups does she belong?
a. Below Average
b. Average
c. Needs Improvement
d. Above Average
Question:
The score distribution of Set A and Set B have
equal mean but with different SDs. Set A has an
SD of 1.7 while Set B has an SD of 3.2. Which
statement is TRUE of the score distributions?
a. The scores of Set B has less variability than
the scores in Set A.
b. Scores in Set A are more widely scattered.
c. Majority of the scores in Set A are clustered
around the mean.
d. Majority of the scores in Set are clustered
around the mean.
INTERPRETING MEASURES OF VARIABILITY

QUARTILE DEVIATION (QD)
The result will help you determine if the group is
homogeneous or not.
The result will also help you determine the number of
students that fall below and above the average performance.
Main points to remember:

Points above Median + 1QD = range of above average
Median + 1QD
Median 1QD
= give the limits of an average ability
Points below Median 1QD = range of below average
Example:
A class of 30 students was given a 50-item test. The
median score of the class is 29. The QD is 3. Miguel,
a student in the class, got a score of
performance of Miguel.
~
X = 29
QD = 3
~
X + QD = 29 + 3 = 32
~
X - QD = 29 3 = 26
33. Describe the
X = 33
All scores between 26-32 are average.

All scores above 32 or 33 and above are above average.
All scores below 26 or 25 and below are below average.
Therefore, Miguels score of 33 is above average.
INTERPRETATION of Correlation Value

1
----------- Perfect Positive Correlation

high positive correlation
0.5 ----------- Positive Correlation
low positive correlation
0
----------- Zero Correlation
low negative correlation
-0.5 ----------- Negative Correlation
high negative correlation
-1 ----------- Perfect Negative Correlation
.81 1.0 = very high correlation

.61 - .80 = high correlation
.41 - .60 = moderate correlation
.21 - .40 = low correlation
0 - .20 = negligible correlation
for Validity:
computed r should be at least
0.75 to be significant
for Reliability:
computed r should be at least
0.85 to be significant
Question:
The computed r for scores in Math and
Science is 0.92. What does this mean?
a. Math score is positively related to
Science score.
b. Science score is slightly related to Math
score.
c. Math score is not in any way related to
Science score.
d. The higher the Math score, the lower
the Science score.
STANDARD SCORES
Indicate the pupils relative position by showing
how far his raw score is above or below
average
Express the pupils performance in terms of
standard unit from the mean
Represented by the normal probability curve or
what is commonly called the normal curve
Used to have a common unit to compare raw
scores from different tests
Corresponding Standard Scores and Percentiles

in a Normal Distribution
Z-Scores
-3
-2
-1
+1
+2
+3
T-Scores
20
30
40
50
60
70
80
Percentiles
16
50
84
98
99.9
PERCENTILE
tells the percentage of examinees that lies below
ones score
Example:
Joses score in the LET is 70 and his percentile
rank is 85.
P85 = 70 (This means Jose, who scored 70,
performed better than 85% of all the examinees
)
Z-Score
tells the number of standard deviations equivalent

to a given raw score
Formula:
XX
Z
SD
Where:
X individuals raw score
X mean of the normative group
SD standard deviation of the
normative group
Example:
Jenny got a score of 75 in a 100-item test. The mean
score of the class is 65 and SD is 5.
Z = 75 65
5
=2
(Jenny is 2 standard deviations above the mean)
Example:
Mean of a group in a test:
Josephs Score
X = 27
X X 27 26 1
Z
SD
2
2
Z = 0.5
= 26
SD = 2
Johns Score
X = 25
X X 25 26
1
Z
SD
2
2
Z = -0.5
T-Score
refers to any set of normally distributed standard deviation
score that has a mean of 50 and a standard deviation of 10
computed after converting raw scores to z-scores to get rid
of negative values
Formula:
T score 50 10(Z )
Example:
Josephs T-score = 50 + 10(0.5)
= 50 + 5
= 55
Johns T-score = 50 + 10(-0.5)
= 50 5
= 45
ASSIGNING GRADES / MARKS / RATINGS

Marking or Grading is the process of assigning value to a
performance
Marks / Grades / Rating SYMBOLS:
Could be in:
1. percent such as 70%, 88% or 92%
2. letters such as A, B, C, D or F
3. numbers such as 1.0, 1.5, 2.75, 5
4. descriptive expressions such as Outstanding
(O), Very Satisfactory (VS), Satisfactory (S),
Moderately Satisfactory (MS), Needs Improvement
(NI)

Could represent:
1. how a student is performing in relation
to other students (norm-referenced
grading)
2. the extent to which a student has
mastered a particular body of knowledge
(criterion-referenced grading)
3. how a student is performing in relation
to a teachers judgment of his or her
potential

Could be for:
Certification that gives assurance that a student has
mastered a specific content or achieved a certain level
of accomplishment
Selection that provides basis in identifying or grouping
students for certain educational paths or programs
Direction that provides information for diagnosis and
planning
Motivation that emphasizes specific material or skills to
be learned and helping students to understand and
improve their performance

Could be assigned by using:
Criterion-Referenced Grading or grading based on
fixed or absolute standards where grade is assigned
based on how a student has met the criteria or a welldefined objectives of a course that were spelled out in
advance. It is then up to the student to earn the grade he
or she wants to receive regardless of how other students
in the class have performed. This is done by transmuting
test scores into marks or ratings.

Norm-Referenced Grading or grading based on
relative standards where a students grade reflects his
or her level of achievement relative to the performance
of other students in the class. In this system, the grade
is assigned based on the average of test scores.
Point or Percentage Grading System whereby the
teacher identifies points or percentages for various tests
and class activities depending on their importance. The total
of these points will be the bases for the grade assigned to
the student.
Contract Grading System where each student agrees to
work for a particular grade according to agreed-upon
standards.
Question:
Marking on a normative basis means that
__________.
a. the normal curve of distribution should
be followed
b. The symbols used in grading indicate
how a student achieved relative to
other students
c. Some get high marks
d. Some are expected to fail
Here is a set of scores for a class of 24 students:

Student
A
B
C
D
E
F
G
H
I
J
K
PT
78
67
88
74
97
84
57
65
81
58
70
Student
M
N
O
P
Q
R
S
T
U
V
W
PT
65
92
53
65
83
79
45
95
62
74
85
81
76

LET Review Assessment of Learning

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

LET Review Assessment of Learning

Caricato da

Copyright:

Formati disponibili

LET Review

process of gathering and

both qualitative and quantitative data in

A COMPARISON OF THE FOUR EVALUATION

determines the extent of what the pupils have achieved or

A COMPARISON OF THE FOUR EVALUATION

searches for the underlying

administered during instruction

designed to formulate a plan for remedial instruction

modify the teaching and learning process

causes of these problems

helps formulate a plan for a

PRINCIPLES OF HIGH QUALITY

PRINCIPLES OF HIGH QUALITY

PRINCIPLES OF HIGH QUALITY

PRINCIPLES OF HIGH QUALITY

Covers a broad range of

Uses mainly multiple choice

Items written are screened

Worded that only one

Multiple choice, True or Short answer,

There are choices for

There are no choices

May require a longer

Some will really

All or none may pass

Similarities Between NRTs and CRTs

3. Both use the same types of test items.

Similarities Between NRTs and CRTs

Types of Test According to FORMAT

b. True-False or Alternative Response

c. Matching Type consists of two parallel columns:

Types of Test According to FORMAT

Limitations Subjectivity in scoring

Types of NON-COGNITIVE TEST

Types of NON-COGNITIVE TEST

Types of NON-COGNITIVE TEST

Types of NON-COGNITIVE TEST

____ Social Studies

Types of NON-COGNITIVE TEST

WAYS of Establishing Validity

Content Validity is done through a careful

WAYS of Establishing Validity

WAYS of Establishing Validity

RELIABILITY - it refers to the consistency of

Give a test twice to the same

Give parallel forms of test at

Give parallel forms of test

Give a test once. Score

Discrimination Index (Ds)

Which question was the easiest?

SCORING ERRORS AND BIASES

SCORING ERRORS AND BIASES

Positively Skewed Distribution

Negatively Skewed Distribution

APPROPRIATE STATISTICAL TOOLS

INTERPRETING MEASURES OF VARIABILITY

Main points to remember:

= give the limits of an average ability

Points below Mean 1SD = range of below average

INTERPRETING MEASURES OF VARIABILITY

Main points to remember:

= give the limits of an average ability

Points below Median 1QD = range of below average