Sei sulla pagina 1di 1

Figure 2: Summary of the NAEP Item Scoring Process

Stage 1: Develop Scoring Guides


Phase I.
Scoring Guide
Development Scoring guides developed along with
and Pilot Scoring items

Refined during item reviews:


Item development contractor
NAGB Reviews Standing committee (content experts)
State item review
Ongoing NCES reviews

Pilot administration

Multiple-Choice Items Stage 2: Score Pilot Constructed-Response Items


Scanned and processed electronically Scanned responses scored by qualified and trained scorers
with quality control and validity using an electronic image-processing and scoring system
checks Stage 2A: Refine Scoring Guides and Prepare
Pilot scoring guides and training
packets reviewed by NCES and standing Training Materials
committee Content experts check student papers against scoring guides
Refine scoring guides
Select examples and practice papers for training packets
Stage 2B: Score with Quality Control Checks
Train scorers
Ensure scorer quality
Monitor scoring accuracy and reliability
• Individual scorers (backreading)
• Between scorers (inter-rater reliability)
(25% of responses double scored)
Submit data for item analysis Stage 2C: Conduct Scoring Debriefing and Document
Potential Refinements to Items and Scoring Guides

Select items for operational assessment

Refined during item reviews:


NAGB Reviews Standing committee
Ongoing NCES reviews

Reading and Mathematics


First Operational Administration Pre-Calibration Administration
(or Pre-Calibration) All Other Subjects
First Operational Administration

Phase II. Multiple-Choice Items Stage 3: Score First Operational Constructed-Response Items
First Operational Scanned and processed electronically Administration Scanned responses scored by qualified and trained scorers
Scoring with quality control and validity (or Pre-Calibration) using an electronic image-processing and scoring system
(or Pre-Calibration) checks Stage 3A: Refine Scoring Guides and Prepare
Operational scoring guides and training Training Materials
packets reviewed by NCES and standing Content experts check student papers against scoring guides
committee Refine scoring guides
Select examples and practice papers for training packets
Stage 3B: Score with Quality Control Checks
Train scorers
Ensure scorer quality
Monitor scoring accuracy and reliability
• Individual scorers (backreading)
• Between scorers (inter-rater reliability)
(5–25% of responses double scored)
Stage 3C: Archive Final Scoring Guides and Training
Submit data for analysis
Materials

Reading and Mathematics All Other Subjects


Item pre-calibration Scaling and reporting

First operational administration Second operational administration

Phase III. Constructed-Response Items


Multiple-Choice Items Stage 4: Score Operational
Subsequent Scanned responses scored by qualified and trained scorers
Scanned and processed electronically Administration
Operational using an elecronic image-processing and scoring system
with quality control and validity (Trend Scoring)
Scoring
checks Stage 4A: Score with Quality Control Checks
Train scorers
Ensure scorer quality
Calibrate scoring
• Qualify scorers for consistency with scoring in previous
years
Monitor scoring accuracy and reliability
• Individual scorers (backreading)
• Between scorers (inter-rater reliability)
(5–25% of responses double scored)
Monitor across-year scoring consistency
• Meet criteria for trend
Stage 4B: Update Documentation on Scoring and
Submit data for analysis
Training After Each Administration

Scaling and reporting

April 2005