Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Nothing is more central or more basic to a music department than playing and
singing music. Although no one would wish to diminish the critical importance of
music theory, music history, music education, entrepreneurship and other vital
components of a healthy and vibrant music department, it is unlikely that students will
get very far in their discipline unless they are actively engaged with making music. Yet,
accurately measure and assess students' progress in the applied studio and the applied
jury, the standard tool used to assess private instruction. What exactly does the private
lesson jury measure? Does it measure whether a student has met the expectations of the
syllabus in the manner of every other class in the University, met the subjective
on matters of taste and preference. Or, consider the difficulty in assessing varying
her instrument earn the same grade as another first-year student who is performing a
more difficult concerto movement? Questions like this highlight the need for a
This article presents one approach to tackling some of these issues through a
while watching and grading video recorded juries. We felt that this process would
create meaningful dialog among our performance faculty, develop common standards
across applied areas, help to develop a shared vocabulary about measuring standards,
and generate useful assessment data that would help our programs.
remained clouded with professor bias on one end, and deciding what is being
measured, whether it is student achievement or student progress (or both), on the other
end. How do we remove these confounding issues and move toward objectivity and
precision? The question remains whether there exists a calibration process that can
remove professor bias and accurately measure student performance achievement while
Music (1999) suggested that competence on at least one major performing medium
should be expected for students pursuing any baccalaureate degree in music. Since the
juries, (Lebler, Carey & Scott, 2014) the challenge is to create a method that is both a
reliable assessment of the performance and of student achievement. The potential for
subjectivity confounds the reliability of any performance assessment (Ciorba & Smith,
Fiske (1983) noted the need to establish reliable evaluation criteria in 1983, and
twenty years later (2003). Additionally, Asmus established that reliable performance
strategies (2003). Wesolowski (2012) also emphasized the importance of clearly defining
the criteria to both judges and students, while Lebler, Carey and Scott added the
There has been an increase in applying assessment criteria in the area of fine arts,
and through the use of rubrics, those assessing musical performance are moving closer
towards more equitable and useful data results. Parkes (2010) and Mintz (2015)
supported the importance of criteria assessment as a critical link in the teaching and
rubrics provide specific advantages when used to assess music performances and the
author contended that the key elements of a music performance rubric are the
descriptors for what a performance is like within the full range of proficiency levels.
Gordon (2002) supported this claim that the more descriptors included for each
dimension, the more reliable the rubric will become, as long as that number does not
exceed five.
Bergee tested the reliability and validity of specific "criteria rating scales," or
rubrics (2003). His findings supported the concept that the criteria helped applied music
faculty grade more consistently in the jury setting especially when they had access to a
specific tool rather than just commenting on a more broad impression (Bergee, 2003).
He found that when faculty used a rating scale, the feedback provided to performers
was more accurate and balanced. Ciorba and Smith also suggested a good way to
Rubric Creation
Rubrics have proven effective across content areas for the purpose of assessment
expectations for students (Ciorba & Smith, 2009). A rubric can define the difference
so that the jury process can be more fair and useful to students (Fiske, 1983). It is also
important that rubric criteria are understandable to faculty. A study by Fiske (1983)
developed and shared amongst faculty. A study by Ciorba and Smith (2009) showed a
high inter-judge reliability when a faculty panel developed and shared a common
rubric across all performance areas. This was especially true when the rubric
Professor bias must also be minimized for student scores to be reliable. Chase,
Ferguson, and Hoey (2014) encouraged the use of highly detailed rubrics with
rater training, the development of exemplars, and a clear benchmarking system proved
Assessment Models
Music research within the last twenty years demonstrates increased reliability
when using rubrics (Asmus, 1999; Bergee, 2003; Ciorba & Smith, 2009; Wesolowski,
2012; Fuller, 2014; Mintz, 2015). Researchers agree that consistent, clear criteria
embedded within the rubrics provide the highest rate of reliability in performance
The exemplary assessment process used for voice students at Brigham Young
University (BYU) is described in detail by Clayne Robison in his book, Beautiful Singing
(2001). Robison described the systematic way the voice faculty track student progress
from one semester to another, from one year to another, from one teacher to another. It
is based around the "Voice Progress Score Chart" (VPS) - a simple 1-5 rubric with the
additional option to add a "+" or "-" to the score. At every audition, every jury, and
every solo performance, students are given a VPS that is tallied each semester and
Figure 1—The Voice Progress Score Chart from Beautiful Singing (2001) by Clayne Robison
Rating Explanation
5 In a fully professional setting (e.g., a leading role in a regional
professional company), this performance would have received favorable
press reviews and a significant 'bravo' response from the audience (None
of us has yet given a 5 in an audition).
4 In a featured university setting (e.g., a leading role in a major opera,
oratorio, or music theatre production with orchestra), this performance
would have been completely successful. I would enjoy hearing this
student sing for an hour-long senior recital.
3 In a modest university public performance setting (e.g., a secondary role
in an opera, oratorio or musical theater production with orchestra), this
performance would have been successful. I would enjoy hearing this
student sing for half an hour in a junior recital.
2 In a university classroom performance setting (e.g., in an opera scenes
class or a short recital with piano) this performance would have been
satisfactory. This student's technique is
Sufficiently solid to permit concentration on character projection. I
the VPS system. One of them was helping to alleviate bias and potential conflicts of
interest since the voice faculty began attending opera auditions and giving each student
a Vocal Progress Score. Robison then based his casting on an overall score instead of his
"The best assessment methods are those that enable us to connect the dots,
adjustments to facilitate their learning" (Chase, Ferguson, & Hoey, 2014, pg. 70). The
VPS allows faculty to "connect the dots" from semester to semester, assessing students
on a professional scale that is not based on their progress from the previous semester,
but based on a professional standard, separate from the jury rubric criteria.
For the purpose of testing the reliability of using a rubric to score choral music
festival assessment, Latimer, Bergee and Cohen (2010) evaluated a large-group festival
given by the Kansas State High School Activities Association. They used metrics to
score responses to four questions of reliability: 1) Did the rubric result in internally
correlation between performance dimension scores, global scores, and ratings? 4) What
was the rubric's perceived level of pedagogical utility, as reported by adjudicators and
directors, and what changes did they recommend? Their conclusions demonstrated
good internal consistency when compared to other, non-rubric forms (Latimer, Bergee,
measuring the progress of our goals. We have collected data for years in academic
areas within the music field (e.g. history, theory, composition, etc.), but struggled to
find a way to generate meaningful assessment data about applied studies. For this
standards, 2) find common ground across applied areas about measuring these
we can employ across applied areas, and 4) generate useful assessment data that would
assessment activity around videos of student juries held a few days before. Our faculty
watched the videos, trying to justify the scoring of a previous jury. The entire day was
Pre-Workshop Planning
Our assessment activity was relatively compact, lasting four hours on a single
day. Advanced planning was needed to maintain efficiency and ensure the success of
the day’s activities. The department agreed to video record the juries in a standardized
manner, develop a set of scoring rubrics for each applied area, store these two
documents (the video and the marked rubric) for the duration of the student’s tenure in
the department, and make both of these documents available for quick retrieval by the
quality recordings that biased the outcome. We selected some small, inexpensive video
cameras that could record in HD and stereo sound onto large SD cards. The cameras
were extremely simple to operate and the setup and use of the cameras was quick,
efficient and non-threatening. The only responsibility at the end of the jury session was
for one instructor to return the camera to the music office where an assistant would
Of critical importance to the entire process was ensuring that all videos and
rubrics went into secure storage that guaranteed the student’s privacy and allowed
sharing between the teacher and student. They had to have the capacity to securely
online storage solution that provided us with almost unlimited storage capacity. The
caveat with online storage is that video files in HD are quite large and require time to
upload to cloud storage across HTTP. We surmounted this obstacle by setting a queue
and letting the files upload at night. The following morning, the departmental assistant
emailed links to the jury video and the scored and scanned rubrics to the students so
that they could review and discuss them with their applied instructors.
copied onto them all of the videos and scanned jury rubrics that the group would need.
The videos and rubrics were put on flash drives ahead of time by the department chair
who also made sure that every group had a variety of juries to assess; some that
demonstrated very high-level performance, some at a middle level, and also a few that
classroom with a video projector and a flash drive, and watched and scored at the same
time.
Grading Rubrics
A crucial component of the assessment process was getting the music faculty to
start grading the applied lesson jury from a rubric instead of simply providing
qualitative comments and a grade. This shift in process and thinking required both the
consent of and participation by the faculty, and a willingness to change habits and
shift towards grading rubrics a year in advance of actually implementing their use.
Faculty members in each performance area were responsible for developing a basic
outline of what its faculty envisioned, what they valued in applied instruction, and
what they wanted the student to demonstrate at the applied jury. We did not attempt
to standardize or drill down the language of the rubrics at this early stage but allowed
each area to freely explore format, language and scoring. What was important was
participation across the department, thinking about the rubrics, and engaging with
assessment day and calibration exercise, all of the faculty had scored applied lesson
juries twice using paper and pencil copies of the jury rubrics they had developed, and
were familiar enough with the process that they were ready to look at it with a critical
eliminate stress and to ensure that everyone was comfortable and ready to work. We
broke the faculty participants into groups of three to four in such a way that no one
graded the same jury that they had scored a few days earlier at end of the semester
juries. We also ensured there were no two faculty members in the same musical
discipline in a group.
rubric while discussing the scoring as a group. Once the group scored the performance,
they were then asked to look at the score they assigned to the performance and compare
it to the actual jury score discussing how the scores differed. This step was important to
justify their score, or benchmark the manner in which the jury scored the performance
The next step in the process was to score as many videos as possible without
discussion in a manner similar to that of a jury where every member of a jury scores
individually. We limited the scoring activity to an hour and a half in order to alleviate
fatigue. The process took about ten minutes per performance, so in the time allotted for
this activity, the groups were able to score seven to nine videos. The groups were next
instructed to open the original scored rubrics created a few days earlier in the actual
jury. The members of the group then compared their scores to each other as well as to
the original jury scores, keeping note of discrepancies. They were instructed to note if
they were able to accurately predict and replicate the jury grades.
discussion questions (see Figure 2). There were nine questions that were to be filled out
by each group as a whole and submitted by one person in the group. The groups'
responses then formed the basis for the post-workshop debrief held over lunch. We
purposefully arranged the setting of our lunch to promote dialogue, and moved
through the questionnaire group by group as we ate. In fact, everything throughout the
1. How would you rate the overall quality of the juries that you observed today? Do our
students seem to be doing good work?
2. If you were going to offer one piece of advice that would improve the jury process (not
necessarily the rubric) as you either observed it or experienced it, what would it be?
3. Of the juries that you were assigned, how many were you actually able to assess? How
many juries are actually practical in the time that we have available?
4. When you looked at the jury sheets/grades, were you able to consistently match the
jury’s assessment? Were your standards higher or lower? Was the scoring consistent?
5. Most of us also listen to/serve on juries during finals week. How did the rubric that you
looked at today differ from the one that you use in your own area?
6. What did you see on the rubric that you used today that you thought was a good idea
and would consider adopting for your own area?
7. If you were going to offer one improvement for the rubric that you used today what would
it be?
8. Did the rubric that you used today actually assess all of the areas that you felt needed to
be assessed? Are there missing areas? Are there redundancies, unnecessary items or
inconsistencies?
9. If you were going to improve the assessment process that we did today, what would you
improve? What worked, what did not? How could we improve what we did today?
dozens of ideas for improving our work and process. In fact, there were far more ideas
generated than could be reasonably developed and implemented over the course of the
following year. However, an important part of the assessment cycle, and one of our
stated goals, was generating useful data about the program, and both finding areas for
end we identified five areas to concentrate on and which we also felt we could
5. Change the culture of the jury to function more like a performance and less like
an exam
Generally, our faculty felt that our students were doing good work and were
pleased with the performances that they observed on the videos. However, our
assessment day workshop was the first time that our faculty had examined the jury
process itself and the mechanics of grading. Faculty also had strong feelings about how
standards differed across disciplines. To this end we agreed to work on our rubrics and
devoted discussion time over the next year towards improving them.
created for academic papers and a second from winds and brass) and combined them
into a new, standardized rubric (see Appendix). We liked the larger format of a legal-
size rubric with wide columns and lots of room for comments. From the winds/brass
rubric we liked the scoring columns and totals at the ends of rows in such a way that
the jury grade was calculated from the rubric instead of assigned separately. Our new,
described what we were hearing instead of simply offering qualitative statements such
The look and feel of the rubric provided a starting point to designing an online
interface that implements the same behavior as the paper copy. That is, by selecting a
box in the rubric the instructor is generating a numerical score. The columns in the
rubric are labeled "Exemplary" "Proficient" "Developing" and "Initial" and have below
them a range of 3-4 numbers, so that a score must be selected within the rubric box.
Unfortunately, our faculty could not agree on a single way to score the rubrics. A
portion of our faculty wanted their area rubrics to score to 100, a move that runs
and not contain more than five categories. In spite of a great deal of discussion devoted
to scoring, we still wound up with several different scoring methods, and continue to
discuss and work towards a department-wide solution. The standardized rubric that
we include in the appendix offers one approach along with one scoring solution (see
Figure 3) that is a snapshot of where we are now. Our work in this area is ongoing.
The final change to our rubrics was separating the jury grade from an assessment
imagined a scale that ranged from "0", representing an absolute entry-level student who
embarking on a career as a performer. The intent was to avoid having the jury grade
first-semester freshman, who works hard and meets all of the expectations of her
teacher and her teacher's syllabus, might still earn an "A" for the semester but would
We adapted our new, separate rubric (see Figure 4) from the model developed by
Clayne Robison for the vocalists at Brigham Young University that was given as Figure
1. Dr. Robison's model (2001) was specifically tailored for vocalists who are planning
on a professional singing career, but we felt that we could adapt it for our own needs.
This rubric is only scored after the jury rubric has been graded. In this way, we try to
ensure that the jury and semester grade are based on the syllabus and the studio
Figure 4—Our new, Applied Performance Assessment Rubric, based on a model by Clayne Robison (2001)
Preliminary technical work is still needed before attempting any significant public
1
performance. However, this student shows potential as a performer.
Preliminary technical work is still needed before attempting any significant public
2
performance. However, this student should consider performance.
In a university classroom setting (e.g., a studio class or short, non-public recital with
3
piano) this performance would have been almost satisfactory.
In a featured university setting (opera, oratorio, full recital, solo with the orchestra) this
9 performance would have been mostly successful. I would enjoy a 50 minute recital with
this performer.
In a featured university setting (opera, oratorio, full recital, solo with the orchestra) this
10 performance would have been successful. I would enjoy a one-hour recital with this
performer.
In a featured university setting (opera, oratorio, full recital, solo with the orchestra) this
11 performance would have been completely successful. I would enjoy a one-and-a-half-
hour recital with this performer.
The final observation offered by our faculty was to replace the paper copy and
roll it into a technology-based solution. A first attempt used Google Docs, specifically a
Google Form that tallies the group's responses into a Google Sheet. This solution
presented all of the functionality of the paper rubric but in a different format that
looked more like a questionnaire rather than a scoring rubric. The advantages of a
Google Docs solution were obvious: simple, free, with collaboration built in, and they
tweaking. The disadvantages of a Google Docs solution are minimal error checking or
security features, the look and feel are different (it no longer looks like a rubric) and the
amount of manual data entry required. Our long-term goal is to move our rubrics into
our CMS (such as Canvas, Blackboard, Moodle and similar); a project that has already
begun.
In April 2016, we sent out a short survey to poll opinions about the calibration
process and the common rubric that was developed for music juries (see Figure 6). The
survey was sent out electronically to all music faculty and adjuncts, 23 people in all. Of
In May 2015, the music professors spent several hours working to calibrate jury
expectations and results for the purpose of equalizing and norming the jury process. They
also worked to develop a common rubric to be used in all juries by all instruments. In
December, the common rubric and new process were used for the first time. Please answer
the following questions about your experience and opinion about that process and the new
common rubric.
1. BRANCH to 6--Were you involved in the calibration process in May 2015?
Yes No
2. I think the calibration process helped create a shared vocabulary across applied
areas.
Strongly disagree Disagree Undecided Agree Strongly agree
4. I think the calibration process clarified performance standards across applied areas.
Strongly disagree Disagree Undecided Agree Strongly agree
8. What changes, if any, resulted from the new rubric in the way you prepare students
for juries?
questions that pertained to them. All respondents answered a series of eight questions
addressing their experience and opinions about using a common jury rubric, with
additional space to write comments. Since only the full-time faculty and two part-time
faculty participated in the calibration exercise, the branch flow in the survey directed
process. Of the eight full-time and part-time faculty who participated in calibration,
The responses to the survey questions about the calibration process suggest that
faculty appreciated this process for a number of reasons. Respondents indicated that
they valued having colleagues from different applied areas participating together in the
calibration process. They felt that calibrating across applied areas helped to create a
shared vocabulary, standardize the jury process, and clarify performance standards
Rated “Agree” or N
“Strongly Agree”
I think the calibration process helped create 100% 7
a shared vocabulary across applied areas.
I think it was valuable to calibrate:
the common rubric. 100% 7
performance standards 100% 7
across applied areas.
with someone outside my 100% 7
applied area.
I think it was valuable to calibrate the jury 86% 7
process across applied areas.
I think the calibration process clarified 86% 7
performance standards across applied areas.
process and common rubric, respondents’ comments were positive overall, such as “We
made our rubric more clear”; "Better rubric, better discussion and shared vocabulary.
We can now talk about how we want the jury to work and whether or not we are
successfully meeting our goals"; The new rubric “can give students a strong
In spite of these positive comments, the survey results indicated that participants
did not feel a strong sense of ownership of either the jury process or the common rubric
even after they had gone through the calibration process. This is not surprising
considering that faculty had used the new rubric for juries only once.
The next series of questions were answered by all respondents, some who went
through the calibration process and some who did not. These questions focused on the
common rubric and indicated that the majority of respondents found the common
rubric improved the jury process and performance standards, and has the potential to
validate jury feedback to students. These findings supported the responses from those
who went through calibration discussed earlier, namely that faculty value dialoging
emphasized this, saying, “It was beneficial to see what other areas were using to assess
progress. This type of collaboration allowed for growth in the assessment of our
students department-wide.”
There were additional comments indicating mixed feelings about the rubric.
Some respondents felt the rubric was too complicated and that it distracted them from
focusing on students’ performances during juries, saying, “The entire jury was spent
attending to the… rubric.” In contrast, another commented that because of the rubric,
“students now have a clearer understanding of what is expected and what the jury saw
given that faculty were only one semester in to using the new rubric and emphasize the
Though our sample size was small, it seems reasonable to assert from the
collected data that faculty found worth and value from discussions and calibration
across applied areas that have led to a shared vocabulary, and standardization of jury
expectations and process. Respondents who did not participate in the calibration
admitting the need to simplify it and continue the discussion across applied areas.
There is every reason to think that these benefits will grow over time as the faculty use
the common rubric and go through more calibration exercises across disciplines.
Conclusions
Our Music Department implemented the assessment day exercise outlined in this
article with the goals of creating meaningful dialogue among our performance faculty,
about measuring these standards, and generating useful assessment data. Throughout
the day, our music faculty watched, scored, and discussed multiple video-recorded
student juries, then compared their scores against the original jury scores. Through this
process our faculty were able to get a feel for how the jury process worked in our
department and how scores and standards compared across performance disciplines.
They also were able to outline areas for improvement across our applied juries.
Although the discussions generated through the calibration process proved valuable to
our department, we did not find agreement in all areas and are still working on some of
the key components of our jury process, specifically the format of the rubrics and how
We have presented a snapshot of where we are in our process with one approach
to a common rubric, a simple scoring system that scores from 0 to 5, and the grading
we have outlined is both simple and universal enough that it can be implemented in a
wide number of music departments. The calibration process should prove itself to be
equally useful, and generate meaningful discussion and useful data for other
departments. We also feel that there is room for further development of our ideas, and
are already in process developing our own online solution that we hope to roll into our
campus LMS. In addition, we feel that a logical next step would be to encourage
students to use the assessment data to reflect and journal about their jury performance
and scores, and to use this data to map out a learning plan. In this way the assessment
data comes full circle and informs the work and direction of study for both student and
instructor.
Asmus, E. P. (January, 1999). Rubrics: Definitions, benefits, history and types. Music
Educators Journal, 85(4).
Asmus, E. P. (March, 2003). Music assessment concepts. Music Educators Journal, 86(2),
pp. 19-24. doi: 10.2307/3399585 .
Chase, D. M., Ferguson, J. L., & Hoey IV, J. J. (2014). Assessment in Creative Disciplines;
Quantifying the Aesthetic. Common Ground Publishing. Champaign, Il.
Ciorba, C. R., & Smith, N. Y. (April, 2009). Measurement of instrumental and vocal
undergraduate performance juries using a multidimensional assessment
rubric. Journal of Research in Music Education. doi: 10.1177/0022429409333405.
Fuller, J. A. (June 2014). Music assessment in higher education. Open Journal of Social
Sciences, (2) 476-484. doi: 10.4236/jss.2014.26056.
Gordon, E. (2002). Rating scales and their uses for evaluating achievement in music
performance. Chicago: GIA.
Latimer, M. E., Bergee, M. J., & Cohen, M. L. (2010). Reliability and perceived
pedagogical utility of a weighted music performance assessment rubric. The
National Association for Music Education. doi: 10.1177/0022429410369836.
Lebler, D., Carey, G, & Scott, D. (2014) Assessment in music education: from policy to
practice. doi: 10.1007/978-3-319-10274-0.
Robison, C. W. (2001). Beautiful singing: "Mind warp" moments (Ed.1). Provo, UT: Clayne
W. Robison.
Wesolowski, Brian C., (March 2012). Understanding and developing rubrics for music
performance assessment. Music Educators Journal, doi: 00274321, 98(3) 35-42.
Wesolowski, B. C., Wind, S. A., & Engelhard, G. (2015). Rater fairness in music
performance assessment: Evaluating model-data fit and differential rater
functioning. Musicae Scientiae 19(2) 147-170. DOI: 10.1177/1029864915589014.
5 4 3 2 1 0
Tempo is significantly Misplaced rhythms and rhythmic
Tempos are secure and convey a
Nuanced use of tempo and rhythm slower/faster than suggested discrepancies mar the perfor-
strong grasp of playing style.
Metrical and is used to communicate at a high tempo. Misplaced rhythms and/or mance. Tempos are
Rhyth-mic nuance is used to
Rhythmical level. Tempos are technically discrepancies in rhythm are inappropriate. Technical
commu-nicate lines and emotiional
Accuracy brilliant. uncomfortable. Limited use of limitations prohibit the use of
connection.
rhythmic nuance. rhythmic nuance.
5 4 3 2 1 0
Item Exemplary (5) Proficient (4-3) Developing (2-1) Initial (0) Comments
Exceptionally high level of Appropriate style is maintained Incorrect style or lack of any
Communicates appropriate style
emotion-al involvement conveys a throughout the selections and stylistic change from piece to
and emotional connection is
deep understanding of the music emotional involvement is readily piece. Performer is emotionally
Musicianship/ and a desire to communicate an visible. Strong growth from
evident at times. Some growth is
detached from the music. No
Communication visible but more is needed.
emotional connection with the previous semesters. growth from previous semesters.
music.
5 4 3 2 1 0
Appearance and deportment are
Appearance and deportment are Appearance and deportment are Appearance and/or deportment
professional, sophisticated and
appropriate and thoughtfully acceptable and do not detract are noticeably inappropriate and
Appearance and contribute to an impressive and
planned. from the performance. are visually uncomfortable.
Performance well-planned performance
5 4 3 2 1 0
SCORE
Comments: