Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
ASSESSMENT OF LEARNING
Assessment –refers to the process of gathering, describing or quantifying information about the
student performance. It includes paper and pencil test, extended responses (example essays) and
performance assessment are usually referred to as”authentic assessment” task (example
presentation of research work)
Measurement-is a process of obtaining a numerical description of the degree to which an
individual possesses a particular characteristic. Measurements answers the questions”how much?
Evaluation- it refers to the process of examining the performance of student. It also determines
whether or not the student has met the lesson instructional objectives.
Test –is an instrument or systematic procedures designed to measure the quality, ability, skill or
knowledge of students by giving a set of question in a uniform manner. Since test is a form of
assessment, tests also answer the question”how does individual student perform?
Testing-is a method used to measure the level of achievement or performance of the learners. It
also refers to the administration, scoring and interpretation of an instrument (procedure) designed
to elicit information about performance in a simple of a particular area of behavior.
Types of Measurement
Norm-reference test is a test designed to measure the performance of a student compared with
other students.
Criterion- referenced test is a test designed to measure the performance of students with respect
to some particular criterion or standard.
TYPES OF ASSESSMENT
A. Placement Assessment is concerned with the entry performance of student, the purpose of
placement evaluation is to determine the prerequisite skills, degree of mastery of the course
objectives and the best mode of learning.
B. Diagnostic Assessment is a type of assessment given before instruction. It aims to identify the
strengths and weaknesses of the students regarding the topics to be discussed. The purpose of
diagnostic assessment:
1. To determine the level of competence of the students
2. To identify the students who have already knowledge about the lesson;
3. To determine the causes of learning problems and formulate a plane for remedial action.
C. Formative Assessment is a type of assessment used to monitor the learning progress of the students
during or after instruction. Purpose of formative assessment:
1. To provide feed back immediately to both student and teacher regarding the success and failure
of learning.
2. To identify the learning errors that is need of correction
3. To provide information to the teacher for modifying instruction and used for improving
learning and instruction
D. Summative Assessment is a type of assessment usually given at the end of a course or unit. Purpose
of summative assessment:
1. To determine the extent to which the instructional objectives have been met;
2. To certify student mastery of the intended outcome and used for assigning grades;
3. To provide information for judging appropriateness of the instructional objectives
4. To determine the effectiveness of instruction
MODE OF ASSESSMENT
A. Traditional Assessment
1. Assessment in which students typically select an answer or recall information to complete the
assessment. Test may be standardized or teacher made test, these tests may be multiple-choice,
fill-in-the-blanks, true-false, matching type.
2. Indirect measures of assessment since the test items are designed to represent competence by
extracting knowledge and skills from their real life context.
3. Items on standardized instrument tends to test only the domain of knowledge and skill to avoid
ambiguity to the test takers.
4. One-time measures to rely on a single correct answer to each item. There is a limited potential
for traditional test to measure higher order thinking skills.
B. Performance assessment
1. Assessment in which students are asked to perform real-world tasks that demonstrate
meaningful application of essential knowledge and skills
2. Direct measures of students performance because task are design to incorporate contexts,
problems, and solutions strategies that students would use in real life.
3. Designed ill-structured challenges since the goal is to help students prepare for the complex
ambiguities in life.
4. Focus on processes and rationales. There is no single correct answer, instead students are led
to craft polished, thorough and justifiable responses, performances and products.
5. Involve long-range projects, exhibits, and performances are linked to the curriculum
6. Teacher is an important collaborator in creating tasks, as well as in developing guidelines for
scoring and interpretation
C. Portfolio Assessment
1. Portfolio is a collection of student’s work specifically to tell a particular story about the student.
2. A portfolio is not a pie of student work that accumulates over a semester or year
3. A portfolio contains a purposefully selected subset of student work
4. It measures the growth and development of students.
B. RELIABILITY refers to the consistency of score obtained by the same person when retested using
the same instrument or one that is parallel to it.
C. ADMINISTRABILITY the test should be administered uniformly to all students so that the scores
obtained will not vary due to factors other than differences of the students knowledge and skills.
There should be a clear provision for instruction for the students, proctors and even the who will
check the test or the scorer
D. SCORABILITY the test should be easy to score, directions for scoring is clear, provide the answer
sheet and the answer key
E. APPROPRIATENESS the test item that the teacher construct must assess the exact performances
called for in the learning objectives. The test item should require the same performance of the
student as specified in the learning objectives.
F. ADEQUACY the test should contain a wide sampling if items to determine the educational
outcomes or abilities so that resulting scores are representatives of the total performance in the
areas measured.
G. FAIRNESS the test should bit be biased to the examinees, it should not be offensive to any
examinees subgroups. A test can only be good if it is also fair to all test takers.
H. OBJECTIVITY represents the agreement of two or more raters or a test administrators concerning
the score of a student. If the two raters who assess the same student on the same test cannot agree
in score, the test lacks objectivity and the score of neither judge is valid, thus, lack of objectivity
reduces test validity in the same way that lack reliability influence validity.
TABLE OF SPECIFICATIONS
Table of specification is a device for describing test items in terms of the content and the process
dimensions. That is, what a student is expected to know and what he or she is expected to do with that
knowledge. It is described by combination of content and process in the table of specification.
Sample of One way table of specification in Linear Function
=2x40
20
Number of items= 4
ITEM ANALYSIS
Item analysis refers to the process of examining the student’s responses to each item in the test. According
to Abubakar S. Asaad and William M. Hailaya (Measurement and Evaluation Concepts & Principles) Rexr
Bookstore (2004 Edition), there are two characteristics of an item. These are desirable and undesirable
characteristics. An item that has desirable characteristics can be retained for subsequent use and that with
undesirable characteristics is either be revised or rejected.
Difficulty index refers to the proportion of the number of students in the upper and lower groups who
answered an item correctly. In a classroom achievement test, the desired indices of difficulty not lower than
0.20 nor higher than 0.80. the average index difficulty form 0.30 or 0.40 to maximum of 0.60.
DF = PUG + PLG
2
PUG = proportion of the upper group who got an item right
PLG = proportion of the lower group who get an item right
Index of Discrimination
Discrimination Index is the differences between the proportion of high performing students who got the
item and the proportion of low performing students who got an item right. The high and low performing
students usually defined as the upper 27% of the students based on the total examination score and the
lower 27% of the students based on total examination score. Discrimination are classified into positive
Discrimination if the proportion of students who got an item right in the upper performing group is greater
than the students in the upper performing group. And Zero Discrimination if the proportion of the students
who got an item right in the upper performing group and low performing group are equal.
Maximum Discrimination is the sum of the proportion of the upper and lower groups who answered the
item correctly. Possible maximum discrimination will occur if the half or less of the sum of the upper and
lower groups answered an item correctly.
Discriminating Efficiency is the index of discrimination divided by the maximum discrimination.
PUG = proportion of the upper group who got an item right
PLG= proportion of the lower group who got an item right
Di = discrimination index
DM – Maximum discrimination
DE = Discriminating Efficiency
Formula:
Di = PUG – PLG
DE = Di
DM
Example: Eighty students took an examination in Algebra, 6 students in the upper group got the correct
answer and 4 students in the lower group got the correct answer for item number 6. Find the Discriminating
efficiency
Given:
Number of students took the exam = 80
27% of 80 = 21.6 or 22, which means that there are 22 students in the upper performing group and 22
students in the lower performing group.
Di = PUG- PLG
= 27%- 18%
Di= 9%
DM = PUG +PLG
= 27% + 18%
DM= 45%
DE = Di/DM
= .09/.45
DE = 0.20 or 20%
This can be interpreted as on the average, the item is discriminating at 20% of the potential of an item of
its difficulty.
Measures of Attractiveness
To measure the attractiveness of the incorrect option ( distracters) in multiple-choice tests, we count the
number if students who selected the incorrect option in both upperand lower groups. The incorrect option
is said to be effective distracter if there are more students in the lower group chose that incorrect option
than those students in the upper group.
Steps of Item Analysis
1. Rank the scores of the students from highest score to lowest score.
2. Select 27% of the papers within the upper performing group and 27% of the papers within the lower
performing group.
3. Set aside the 46% of papers because they will not be used for item analysis.
4. Tabulate the number of students in the upper group and lower group who selected each alternative.
5. Compute the difficulty of each item
6. Compute the discriminating powers of each item
7. Evaluate the effectiveness of the distracters
VALIDITY OF A TEST
Validity refers to the appropriateness of score-based inferences; or decisions made based on the students
test results. The extent to which a test measures what is supposed to measure.
TYPES OF VALIDITY
1. Content Validity- a type of validation that refers to the relationship between a test and the
instructional objectives, establishes content so that the test measures what it is supposed to measure.
Things to remember about validity:
a. The evidence of the content validity of your test is found in the Table of Specification.
b. This is the most important type of validity to you, as a classroom teacher.
c. There is no coefficient for content validity. It is determined judgmentally, not empirically.
2. Criterion-related Validity- a type of validation that refers to the extent to which scores form a test
relate to theoretically similar measures. It is a measure of how accurately a student’s current test
score can be used to estimate a score on criterion measure, like performance in courses, classes or
another measurement instrument. Example, classroom reading grades should indicate similar levels
of performance as Standardized Reading Test scores.
a. Construct Validity- a type of validation that refers to a measure of the extent to which a test
measures a hypothetical and unobservable variable or quality such as intelligence, math
achievement, performance anxiety, etc. it established through intensive study of the test or
measurement instrument.
b. Predictive Validity- a type of validation that refers to a measure of the extent to which a
person’s current test results can used to estimate accurately what that persons performance or
other criterion, such as test scores will be at the later time.
3. Concurrent Validity- a type of validation that require the correlation of the predictor or concurrent
measure with the criterion measure. Using this, we can determine whether a test is useful to us as
predictor or as substitute ( concurrent) measure. The higher the validity coefficient, the better the
validity evidence of the test. In establishing the concurrent validity evidence no time interval is
involved between the administration of the new test and the criterion or established test.
Reliability of a Test
Reliability refers to the consistency of measurement; that is, how consistent test results of other assessment
results from one measurement to another. We can say that at test is reliable when it can be used to predict
practically the same scores when test administered twice to the same group of students and with a reliability
index of 0.50 or above. The reliability of a test can be determined by means of Pearson Product Correlation
Coeffficient, Spearman-Brown Formula and Kuder-Richardon Formula.
Measures of Central Tendency it is a single value that is used to identify the center of the data, it is taught
as the typical value in a set of scores. It tends to lie within the center if it is arranged form lowest to highest
or vice versa. There are three measures of central tendency commonly used; the mean, median and mode.
The Mean
The Mean is the common measures of center and it also know as the arithmetic average.
Sample Mean = ∑x
n
Mean = ∑x
n
= 485÷ 10
Mean = 48.5
Properties of Mean
1. Easy to compute
2. It may be an actual observation in the data set
3. It can be subjected to numerous mathematical computation
4. Most widely used
5. Each data affected by the extremes values
6. It is easily affected by the extremes values
7. Applied to interval level data
The Median
The median is a point that divides the scores in a distribution into two equal parts when the scores are
arranged according to magnitude, that is from lowest score to highest score or highest score to lowest score.
If the number of score is an odd number, the value of the median is the middle score. When the number of
scores is even number, the median values is the average of the two middle scores.
First , arrange the scores from lowest to highest and find the average of two middle most scores since the
number of cases in an even.
35
39
44
45
47
48
54
55
58
60
Mean = 47 + 48
2
= 47.5 is the median score
The median value is the 5th score which is 47. Which means that 50% of the scores fall below 47.
Properties of Median
1. It is not affected by extremes values
2. It is applied to ordinal level of data
3. The middle most score in the distribution
4. Most appropriate when there are extremes scores
The Mode
The mode refers to the score or scores that occurred most in the distribution. There are classification of
mode: a) unimodal is a distribution that consist of only one mode. B) bimodal is a distribution of scores
that consist of two modes, c) multimodal is a score distribution that consist of more than two modes.
Properties of Mode
1. It is the score/s occurred most frequently
2. Nominal average
3. It can be used for qualitative and quantitative data
4. Not affected by extreme values
5. It may not exist
Example 1. Find the mode of the scores of students in algebra quiz: 34,36,45,65,34,45,55,61,34,46
Mode= 34 , because it appeared three times. The distribution is called unimodal.
Example 2. Find the mode of the scores of students in algebra quiz: 34,36,45,61,34,45,55,61,34,45
Mode = 34 and 45, because both appeared three times. The distribution is called bimodal
Measures of Variability
Measures of Variability is a single value that is used to describe the spread out of the scores in distribution,
that is above or below the measures of central tendency. There are three commonly used measures
variability, the range, quartile deviation and standard deviation
The Range
Range is the difference between highest and lowest score in the data set.
R=HS-LS
Properties of Range
1. Simplest and crudest measure
2. A rough measure of variation
3. The smaller the value, the closer the score to each other or the higher the value, the more scattered
the scores are.
4. The value easily fluctuate, meaning if there is a changes in either the highest score or lowest score
the value of range easily changes.
Example: scores of 10 students in Mathematics and Science. Find the range and what subject has a greater
variability?
Mathematics Science
35 35
33 40
45 25
55 47
62 55
34 35
54 45
36 57
47 39
40 52
Mathematics Science
HS = 62 HS =57
LS= 33 LS= 25
R = HS-LS R= HS-LS
R= 62-33 R= 57-25
R= 29 R= 32
Based form the computed value of the range, the scores in Science has greater variability. Meaning, scores
in Science are more scattered than in the scores in Mathematics
The Quartile Deviation
Quartile Deviation is the half of the differences the third quartile (Q3) and the first quartile (Q1). It is based
on the middle 50% of the range, instead the range of the entire set
Of distribution. In symbol QD = Q3-Q1
2
=50.25 – 25.4
2
QD= 12.4
The value of QD =12.4 which indicates the distance we need to go above or below the median to include
approximately the middle 50% of the scores.
The standard deviation
The standard deviation is the most important and useful measures of variation, it is the square root of the
variance. It is an average of the degree to which each set of scores in the distribution deviates from the
mean value. It is more stable measures of variation because it involves all the scores in a distribution rather
than range and quartile deviation.
SD = √∑( x-mean)2
n-1
X (x-mean)2
45 12.25
35 182.25
48 0.25
60 132.25
44 20.5
39 90.25
47 2.25
55 42.25
58 90.25
54 30.25
∑x= 485 ∑(x-mean)2 = 602.25
N= 10
Mean = ∑x
N
= 485
10
Mean= 48.5
SD= √∑(x-mean)2
n-1
SD= √ 602.5
10-1
SD= √ 66.944444
SD= 8.18, this means that on the average the
amount that deviates from the mean value= 48.5 is
8.18
Example 2: Find the standard deviation of the score of 10 students below. In what subject has greater
variability
Mathematics Science
35 35
33 40
45 25
55 47
62 55
34 35
54 45
36 57
47 39
40 52
SD= √∑(x-mean)2
n-1
= √ 936.9
10-1
√
= 104.1
Mean =430
10
Mean= 43
SD= √∑(x-mean)2
n-1
= √ 918
10-1
=√ 102
SD= 10.10 for science subject
The standard deviation for mathematics subject is 10.20 and the standard deviation foe science subject is
10.10, which means that mathematics scores has a greater variability than science scores. In other words,
the scores in mathematics are more scattered than in science.
Interpretation of Standard Deviation
When the value of standard deviation is large, on the average, the scores will be far form the mean. On the
other hand. If the value of standard deviation is small, on the average, the score will be close form the mean.
Coefficient of Variation
Coefficient of variation is a measure of relative variation expressed as percentage of the arithmetic mean.
It is used to compare the variability of two or more sets of data even when the observations are expressed
in different units of measurement. Coefficient of variation can be solve using the formula.
( )
CV = SD x 100%
Mean
The lower the value of coefficient of variation, the more the overall data approximate to the mean or more
the homogeneous the performance of the group
Group Mean Standard deviation
A 87 8.5
B 90 10.25
= 8.5 x 100%
87
CV Group A=9.77%
CV GroupB= standard deviation x 100%
Mean
= 10.25 x 100%
90
CV Group B=11.39%
The CV of Group A is 9.77% and CB of Group B is 11/39%, which means that group A has homogenous
performance.
Percentile Rank
The Percentile rank of a score is the percentage of the scores in the frequency distribution which are lower.
This means that the percentage of the examinees in the norm group who scored below the score of interest.
Percentile rank are commonly used to clarify the interpretation of scores on standardized tests.
Z- SCORE
Z- score (also known as standard score) measures how many standard deviations an observations is above
or below the mean. A positive z-score measures the number of standard deviation a score is above the mean,
and a negative z-negative z-score gives the number of standard deviation a score is below the mean.
EXAMPLE:A study showed the performance of two Groups A and B in a certain test given by a researcher.
Group A obtained a mean score of 87 points with standard deviation of 8.5 points, Group B obtained a
mean score of 90 points with standard deviation of 10.25 points. Which of the two group has a more
homogeneous performance?
In what subject did James Mark performed best? Very Poor?
Z math analysis = 95-88
10
Z math analysis = 0.70
Z natural science= 80-85
5
Z natural Science= -1
Z labor management = 94-92
7.5
Z labor management = 0.27
James Mark had a grade in Math Analysis that was 0.70 standard deviation above the mean of the Math
Analysis grade, while in Natural Science he was -1.0 standard deviation below the mean of Natural Science
grade. He also had a grade in Labor Management that was 0.27 standard deviation above the mean of the
Labor Management grades. Comparing the z scores, James Mark performed best in Mathematics Analysis
while he performed very poor in Natural Science in relation to the group performance.
T-score
T-score can be obtained by multiplying the z-score by 10 and adding the product to 50. In symbol, T-score
= 10z +50
Using the same exercise, compute the T-score of James Mark in Math Analysis, Natural Science and Labor
Management
T- score (math analysis) = 10 (.7) +50
= 57
T- score (natural science) = 10(-1)+50
= 40
T-score (labor management) = 10(0.27) +50
=52.7
Since the highest T-score us in math analysis = 57, we can conclude that James Mark performed best in
Math analysis than in natural science and labor management.
Stanine
Stanine also known as standard nine, is a simple type of normalized standard score that illustrate the process
of normalization. Stanines are single digit scores ranging form 1 to 9.
The distribution of new scores is divided into nine parts
Skewness
Describes the degree of departures of the distribution of the data from symmetry.
The degree of skewness is measured by the coefficient of lsewness, denoted as SK and computed as,
SK= 3(mean-media)
SD
Normal curve is a symmetrical bell shaped curve, the end tails are continuous and asymptotic. The mean,
median and mode are equal. The scores are normally distributed if the computed value of SK=0
Positively skewed when the curve is skewed to the right, it has a long tail extending off to the right but a
short tail to the left. It increases the presence of a small proportion of relatively large extreme value SK˃0
When the computed value of SK is positive most of the scores of students are very low, meaning to say that
they performed poor in the said examination
Negatively skewed when a distribution is skewed to the left. It has a long tail extending off to the left but a
short tail to the right. It indicates the presence of a high proportion of relatively large extreme values SK˂0.
When the computed value of SK is negative most of the students got a very high score, meaning to say that
they performed very well in the said examination
Rubrics
Rubrics is a scoring scale and instructional tool to assess the performance of student using a task-specific
set of criteria. It contains two essential parts: the criteria for the task and levels of performance for each
criterion. It provides teachers an effective means of students-centered feedback and evaluation of the work
of students. It also enables teachers to provide a detailed and informative evaluations of their performance.
Rubrics is very important most especially if you are measuring the performance of students against a set of
standard or pre-determined set of criteria. Through the use of scoring rubrics or rubrics the teachers can
determine the strengthens and weaknesses of the students, hence it enables the students to develop their
skills.
Types of Rubrics
1. Holistic Rubrics
In holistic rubrics does not list a separate levels of performance for each criterion. Rather , holistic,
rubrics assigns a level of performance along with a multiple criteria as a whole, in other words you
put all the component together.
Advantage: quick scoring, provide overview of students achievement.
Disadvantage: does not provide detailed information about the student performance in specific
areas of the content and skills. May be difficult to provide one overall score.
2. Analytic Rubrics
In analytic rubrics the teacher or the rater identify and assess components of a finished product.
Breaks down the final product into component parts and each part is scored independently. The
total score is the sum of all the rating for all the parts that are to be assessed or evaluated. In analytic
scoring, it is very important for the rater to treat each part as separate to avoid bias toward the whole
product.
Advantage: more detailed feedback, scoring more consistent across students and graders.
Disadvantage: time consuming to score.
The final step in performance assessment is to assess and score the student’s performance. To assess the
performance of the students the evaluator can used checklist approach , narrative or anecdotal approach,
rating scale approach, and memory approach. The evaluator can give feedback on a student’s performance
in the form of narrative report or grade. There are different was to record the results of performance-based
assessments.
1. Checklist Approach are observation instruments that divide performance whether it is certain or
not certain. The teacher has to indicate only whether or not certain elements are present in the
performances
2. Narrative/Anecdotal Approach is continuous description of student behavior as it occurs, recorded
without judgment or interpretation. The teacher will write narrative reports of what was done during
each of the performances. Form these reports teachers can determine how well their students met
their standards.
3. Rating Scale Approach is a checklist that allows the evaluator to record information on a scale,
noting the finer distinction that just presence or absence of a behavior. The teacher they indicate to
what degree the standards were met. Usually, teachers will use a numerical scale. For instance, one
teacher may arte each criterion on a scale of one to five with one meaning “ skills barely present”
and five meaning “skill extremely well executed.”
4. Memory Approach the teacher observes the students when performing the tasks without taking any
notes. They use the information from memory to determine whether or not the students were
successful. This approach is not recommended to use for assessing the performance of the students.
PORTFOLIO ASSESSMENT
Portfolio assessment is the systematic, longitudinal collection of student work created in response to
specific, know instructional objectives and evaluated in relation to the same criteria. Student Portfolio is a
purposeful collection of student work that exhibits the students efforts, progress and achievements in one
or more areas. The collection must include student participation in selecting contents, the criteria for
selection, the criteria for judging merit and evidence of student self-reflection.
The working portfolio may be used to diagnose student needs. In both student and teacher have
evidence of student strengths and weakness in achieving learning objectives, information
extremely useful in designing future instruction.
2. Showcase Portfolio
Showcase portfolio is the second type of portfolio and also know as best works portfolio or display
portfolio. In this kind of portfolio, it focuses on the student’s best and most representative work. It
exhibit the best performance of the student. Best works portfolio may document student activities
beyond school for example a story written at home. It is just like an artist’s portfolio where a variety
of work is selected to reflect breadth of talent, painters can exhibits the best paintings. Hence, in
this portfolio the student selects what he or she thinks is representative work. This folder is most
often seen at open houses and parent visitations.
The most rewarding use of student portfolios is the display of student’s best work, the work that
makes them proud. In this case, it encourages self-assessment and build self-esteem to students.
The pride and sense of accomplishment that students feel make the effort well worthwhile and
contribute to a culture for learning in the classroom
3. Progress Portfolio
This third type of portfolio is progress portfolio and it is also known as Teacher Alternative
Assessment Portfolio. It contains examples of student’s work with the same types done over a
period of time and they are utilized to assess their progress
All the works of the students in this type of portfolio are scored, rated, ranked, or evaluated.
Teachers can keep individual student portfolios that are solely for the teacher’s use as an assessment
tool. This a focused type of portfolio and is a model approach to assessment.
Assessment portfolios used to document student learning on specific curriculum outcomes and used
to demonstrate the extent of mastery in any curricular area,
Uses of Portfolios
1. It can provide both formative and summative opportunities for monitoring progress toward
reaching identified outcomes
2. Portfolios can communicate concrete information about what us expected of students in terms
of the content and quality of performance in specific curriculum areas.
3. A portfolio is that they allow students to document aspects of their learning that do not show
up well in traditional assessments
4. Portfolios are useful to showcase periodic or end of the year accomplishment of students such
as in poetry, reflections on growth, samples of best works, etc.
5. Portfolios may also be used to facilitate communication between teachers and parents regarding
their child’s achievement and progress in a certain period of time.
6. The administrator may use portfolios for national competency testing to grant high school
credit, to evaluate education programs.
7. Portfolios may be assembled for combination of purposes such as instructional enhancement
and progress documentation. A teacher reviews students portfolios periodically and make notes
for revising instruction for next year used.
According to Mueller (2010) there are seven steps in developing portfolios of students.
Below are the discussions of each step.
1. Purpose: What is the purposes of the portfolio?
2. Audience: For what audience will the portfolio be created?
3. Content: What samples of student work will be included?
4. Process: What processes (e.g. selection of work to be included, reflection in work, conferencing)
will be engaged in during the development of the portfolio?
5. Management: How will time and materials be managed in the development of the portfolio?
6. Communication: How and when will the portfolio be shared with pertinent audiences?
7. Evaluation: If the portfolio is to be used for evaluation, when and how should it be evaluated?