Assessment in The Applied Studio

Assessment in the Applied Studio
Bill Clemmons, Mary Jo Clemmons, and Christine Phillips
Point Loma Nazarene University
Nothing is more central or more basic to a music department than playing and
singing music. Although no one would wish to diminish the critical importance of
music theory, music history, music education, entrepreneurship and other vital
components of a healthy and vibrant music department, it is unlikely that students will
get very far in their discipline unless they are actively engaged with making music. Yet,
as basic as these activities are to a healthy music program, it is often difficult to
accurately measure and assess students' progress in the applied studio and the applied
jury, the standard tool used to assess private instruction. What exactly does the private
lesson jury measure? Does it measure whether a student has met the expectations of the
syllabus in the manner of every other class in the University, met the subjective
approval of the instructor, or does it measure something larger such as “professional
quality tone,” or “mature interpretation?” Each of these would resist objective
measurement, often generating disagreement even among experts, frequently hinging
on matters of taste and preference. Or, consider the difficulty in assessing varying
levels of experience. Can a first-semester student who is just beginning instruction on
her instrument earn the same grade as another first-year student who is performing a
more difficult concerto movement? Questions like this highlight the need for a
deliberate assessment process.

Statement of the Problem
This article presents one approach to tackling some of these issues through a
process known as calibrating or benchmarking. In the calibration exercise developed at
our institution, private-lesson instructors were asked to agree on grading standards
while watching and grading video recorded juries. We felt that this process would
create meaningful dialog among our performance faculty, develop common standards
across applied areas, help to develop a shared vocabulary about measuring standards,
and generate useful assessment data that would help our programs.
A Review of the Literature
Higher education has traditionally struggled with finding an objective
assessment to measure student, juried performances, even though performance is an
essential outcome to music education. Assessment practices in Higher Education have
remained clouded with professor bias on one end, and deciding what is being
measured, whether it is student achievement or student progress (or both), on the other
end. How do we remove these confounding issues and move toward objectivity and
precision? The question remains whether there exists a calibration process that can
remove professor bias and accurately measure student performance achievement while
at the same time provide a measure of individual student progress.
Assessment in the Applied Studio 2

Performance Assessment
Music performance is considered a foundational content standard for the
National Association for Music Education. The National Association of Schools of
Music (1999) suggested that competence on at least one major performing medium
should be expected for students pursuing any baccalaureate degree in music. Since the
dominant method for scoring music performance in higher education is assessment by
juries, (Lebler, Carey & Scott, 2014) the challenge is to create a method that is both a
reliable assessment of the performance and of student achievement. The potential for
subjectivity confounds the reliability of any performance assessment (Ciorba & Smith,
2009) and must be taken into account.
Fiske (1983) noted the need to establish reliable evaluation criteria in 1983, and
establishing reliable criteria was still noted as being under-investigated by Bergee
twenty years later (2003). Additionally, Asmus established that reliable performance
scoring is essential to be useful to both student and teacher to improve instructional
strategies (2003). Wesolowski (2012) also emphasized the importance of clearly defining
the criteria to both judges and students, while Lebler, Carey and Scott added the
dimension of aligning program objectives to improve the educational experience (2014).
There has been an increase in applying assessment criteria in the area of fine arts,
and through the use of rubrics, those assessing musical performance are moving closer
towards more equitable and useful data results. Parkes (2010) and Mintz (2015)
supported the importance of criteria assessment as a critical link in the teaching and

learning process throughout a variety of music settings. According to Asmus (1999),
rubrics provide specific advantages when used to assess music performances and the
author contended that the key elements of a music performance rubric are the
descriptors for what a performance is like within the full range of proficiency levels.
Gordon (2002) supported this claim that the more descriptors included for each
dimension, the more reliable the rubric will become, as long as that number does not
exceed five.
Bergee tested the reliability and validity of specific "criteria rating scales," or
rubrics (2003). His findings supported the concept that the criteria helped applied music
faculty grade more consistently in the jury setting especially when they had access to a
specific tool rather than just commenting on a more broad impression (Bergee, 2003).
He found that when faculty used a rating scale, the feedback provided to performers
was more accurate and balanced. Ciorba and Smith also suggested a good way to
provide strong evidence of student achievement by "combining several performance
dimensions into one standardized multidimensional assessment rubric” (2009, p. 14).
Rubric Creation
Rubrics have proven effective across content areas for the purpose of assessment
in higher education to measure student progress and establish clear learning
expectations for students (Ciorba & Smith, 2009). A rubric can define the difference
between a formative progress assessment, and a summative performance assessment.

These two aspects often stand in conflict with one another and need to be made visible
so that the jury process can be more fair and useful to students (Fiske, 1983). It is also
important that rubric criteria are understandable to faculty. A study by Fiske (1983)
showed reliability was increased when department-wide criterion scales were
developed and shared amongst faculty. A study by Ciorba and Smith (2009) showed a
high inter-judge reliability when a faculty panel developed and shared a common
rubric across all performance areas. This was especially true when the rubric
development process was spread out over a period of at least a semester.
Professor bias must also be minimized for student scores to be reliable. Chase,
Ferguson, and Hoey (2014) encouraged the use of highly detailed rubrics with
underlying definitions to aid in precision to improve a higher degree of consistency
among raters. As reported by Wesolowski, Wind, and Engelhard, a combination of
rater training, the development of exemplars, and a clear benchmarking system proved
optimum in performance assessment reliability (2016).
Assessment Models
Music research within the last twenty years demonstrates increased reliability
when using rubrics (Asmus, 1999; Bergee, 2003; Ciorba & Smith, 2009; Wesolowski,
2012; Fuller, 2014; Mintz, 2015). Researchers agree that consistent, clear criteria
embedded within the rubrics provide the highest rate of reliability in performance

assessment. The reliability of rubrics can be verified in multiple mediums using both
individual performance assessment, and group performance assessment.
Vocal Progress Score Metric (BYU)
The exemplary assessment process used for voice students at Brigham Young
University (BYU) is described in detail by Clayne Robison in his book, Beautiful Singing
(2001). Robison described the systematic way the voice faculty track student progress
from one semester to another, from one year to another, from one teacher to another. It
is based around the "Voice Progress Score Chart" (VPS) - a simple 1-5 rubric with the
additional option to add a "+" or "-" to the score. At every audition, every jury, and
every solo performance, students are given a VPS that is tallied each semester and
plotted into a graph (see Figure 1).
Figure 1—The Voice Progress Score Chart from Beautiful Singing (2001) by Clayne Robison
Rating Explanation
5 In a fully professional setting (e.g., a leading role in a regional
professional company), this performance would have received favorable
press reviews and a significant 'bravo' response from the audience (None
of us has yet given a 5 in an audition).
4 In a featured university setting (e.g., a leading role in a major opera,
oratorio, or music theatre production with orchestra), this performance
would have been completely successful. I would enjoy hearing this
student sing for an hour-long senior recital.
3 In a modest university public performance setting (e.g., a secondary role
in an opera, oratorio or musical theater production with orchestra), this
performance would have been successful. I would enjoy hearing this
student sing for half an hour in a junior recital.
2 In a university classroom performance setting (e.g., in an opera scenes
class or a short recital with piano) this performance would have been
satisfactory. This student's technique is
Sufficiently solid to permit concentration on character projection. I

would remain comfortable during a 15-minute recital
1 Preliminary vocal technical work is still needed before attempting any
significant public singing. This student however shows promise as a voice
major at BYU.
0 Not yet ready to be considered as a voice major.
Robison sights multiple positive outcomes at BYU from the implementation of
the VPS system. One of them was helping to alleviate bias and potential conflicts of
interest since the voice faculty began attending opera auditions and giving each student
a Vocal Progress Score. Robison then based his casting on an overall score instead of his
own personal choice (2001).
"The best assessment methods are those that enable us to connect the dots,
recognize patterns of achievement or non-achievement across our students, and make
adjustments to facilitate their learning" (Chase, Ferguson, & Hoey, 2014, pg. 70). The
VPS allows faculty to "connect the dots" from semester to semester, assessing students
on a professional scale that is not based on their progress from the previous semester,
but based on a professional standard, separate from the jury rubric criteria.
Kansas State High School
For the purpose of testing the reliability of using a rubric to score choral music
festival assessment, Latimer, Bergee and Cohen (2010) evaluated a large-group festival
given by the Kansas State High School Activities Association. They used metrics to
score responses to four questions of reliability: 1) Did the rubric result in internally
consistent ratings? 2) How reliable were adjudicators in assigning performance

dimension scores, global scores, and ratings when using the rubric? 3) What is the
correlation between performance dimension scores, global scores, and ratings? 4) What
was the rubric's perceived level of pedagogical utility, as reported by adjudicators and
directors, and what changes did they recommend? Their conclusions demonstrated
good internal consistency when compared to other, non-rubric forms (Latimer, Bergee,
& Cohen, 2010).
The Assessment Activity and our Goals
As with many music programs, our music department at XXX University is
actively engaged in collecting assessment data to generate useful feedback while
measuring the progress of our goals. We have collected data for years in academic
areas within the music field (e.g. history, theory, composition, etc.), but struggled to
find a way to generate meaningful assessment data about applied studies. For this
reason, we structured an assessment day focused on applied instruction. Our four
assessment-day goals were to 1) promote meaningful discussion about performance
standards, 2) find common ground across applied areas about measuring these
standards, 3) develop a standardized vocabulary, expressed in our shared rubrics, that
we can employ across applied areas, and 4) generate useful assessment data that would
inform departmental decisions. To accomplish these goals we structured our
assessment activity around videos of student juries held a few days before. Our faculty
watched the videos, trying to justify the scoring of a previous jury. The entire day was

structured with an eye towards examining standards, evaluating the jury process, and
creating conversation across performing areas.
Pre-Workshop Planning
Our assessment activity was relatively compact, lasting four hours on a single
day. Advanced planning was needed to maintain efficiency and ensure the success of
the day’s activities. The department agreed to video record the juries in a standardized
manner, develop a set of scoring rubrics for each applied area, store these two
documents (the video and the marked rubric) for the duration of the student’s tenure in
the department, and make both of these documents available for quick retrieval by the
members of each group on the assessment day.
We wanted a standardized recording solution to avoid the possibility of unequal
quality recordings that biased the outcome. We selected some small, inexpensive video
cameras that could record in HD and stereo sound onto large SD cards. The cameras
were extremely simple to operate and the setup and use of the cameras was quick,
efficient and non-threatening. The only responsibility at the end of the jury session was
for one instructor to return the camera to the music office where an assistant would
remove the cards and upload the videos to a shared drive.
Of critical importance to the entire process was ensuring that all videos and
rubrics went into secure storage that guaranteed the student’s privacy and allowed
sharing between the teacher and student. They had to have the capacity to securely

store vast amounts of data for multiple semesters. Our University recently moved to an
online storage solution that provided us with almost unlimited storage capacity. The
caveat with online storage is that video files in HD are quite large and require time to
upload to cloud storage across HTTP. We surmounted this obstacle by setting a queue
and letting the files upload at night. The following morning, the departmental assistant
emailed links to the jury video and the scored and scanned rubrics to the students so
that they could review and discuss them with their applied instructors.
In order to eliminate dead time, we selected inexpensive 8 Gb flash drives and
copied onto them all of the videos and scanned jury rubrics that the group would need.
The videos and rubrics were put on flash drives ahead of time by the department chair
who also made sure that every group had a variety of juries to assess; some that
demonstrated very high-level performance, some at a middle level, and also a few that
demonstrated entry-level performance. Each faculty group met separately in a
classroom with a video projector and a flash drive, and watched and scored at the same
time.
Grading Rubrics
A crucial component of the assessment process was getting the music faculty to
start grading the applied lesson jury from a rubric instead of simply providing
qualitative comments and a grade. This shift in process and thinking required both the
consent of and participation by the faculty, and a willingness to change habits and

process in favor of a new approach that creates shared accountability. We started the
shift towards grading rubrics a year in advance of actually implementing their use.
Faculty members in each performance area were responsible for developing a basic
outline of what its faculty envisioned, what they valued in applied instruction, and
what they wanted the student to demonstrate at the applied jury. We did not attempt
to standardize or drill down the language of the rubrics at this early stage but allowed
each area to freely explore format, language and scoring. What was important was
participation across the department, thinking about the rubrics, and engaging with
language used in measuring performance standards. By the time we ran our
assessment day and calibration exercise, all of the faculty had scored applied lesson
juries twice using paper and pencil copies of the jury rubrics they had developed, and
were familiar enough with the process that they were ready to look at it with a critical
eye towards improvement.
Structuring the Assessment Day
We began our assessment workshop after a light breakfast to minimize or
eliminate stress and to ensure that everyone was comfortable and ready to work. We
broke the faculty participants into groups of three to four in such a way that no one
graded the same jury that they had scored a few days earlier at end of the semester
juries. We also ensured there were no two faculty members in the same musical
discipline in a group.

After watching the video, the group scored the performance against the area
rubric while discussing the scoring as a group. Once the group scored the performance,
they were then asked to look at the score they assigned to the performance and compare
it to the actual jury score discussing how the scores differed. This step was important to
justify their score, or benchmark the manner in which the jury scored the performance
in order to predict and replicate the scores of subsequent performances.
The next step in the process was to score as many videos as possible without
discussion in a manner similar to that of a jury where every member of a jury scores
individually. We limited the scoring activity to an hour and a half in order to alleviate
fatigue. The process took about ten minutes per performance, so in the time allotted for
this activity, the groups were able to score seven to nine videos. The groups were next
instructed to open the original scored rubrics created a few days earlier in the actual
jury. The members of the group then compared their scores to each other as well as to
the original jury scores, keeping note of discrepancies. They were instructed to note if
they were able to accurately predict and replicate the jury grades.
The final component of our assessment day was a set of post-assessment
discussion questions (see Figure 2). There were nine questions that were to be filled out
by each group as a whole and submitted by one person in the group. The groups'
responses then formed the basis for the post-workshop debrief held over lunch. We
purposefully arranged the setting of our lunch to promote dialogue, and moved
through the questionnaire group by group as we ate. In fact, everything throughout the

day was purposefully designed to communicate to the faculty that their time was
valued and their voices were important.
Figure 2—The post assessment day discussion questions.
1. How would you rate the overall quality of the juries that you observed today? Do our
students seem to be doing good work?
2. If you were going to offer one piece of advice that would improve the jury process (not
necessarily the rubric) as you either observed it or experienced it, what would it be?
3. Of the juries that you were assigned, how many were you actually able to assess? How
many juries are actually practical in the time that we have available?
4. When you looked at the jury sheets/grades, were you able to consistently match the
jury’s assessment? Were your standards higher or lower? Was the scoring consistent?
5. Most of us also listen to/serve on juries during finals week. How did the rubric that you
looked at today differ from the one that you use in your own area?
6. What did you see on the rubric that you used today that you thought was a good idea
and would consider adopting for your own area?
7. If you were going to offer one improvement for the rubric that you used today what would
it be?
8. Did the rubric that you used today actually assess all of the areas that you felt needed to
be assessed? Are there missing areas? Are there redundancies, unnecessary items or
inconsistencies?
9. If you were going to improve the assessment process that we did today, what would you
improve? What worked, what did not? How could we improve what we did today?
Closing the Assessment Loop and Follow Up
Our Assessment Day de-brief created a vast amount of discussion, generating
dozens of ideas for improving our work and process. In fact, there were far more ideas
generated than could be reasonably developed and implemented over the course of the
following year. However, an important part of the assessment cycle, and one of our
stated goals, was generating useful data about the program, and both finding areas for

improvement and implementing the improvements in a reasonable timeframe. To this
end we identified five areas to concentrate on and which we also felt we could
accomplish over the course of the following year:
1. Standardize the look and feel of the rubrics
2. Standardize how the rubrics score
3. Implement an online version of the rubrics
4. Create a separate, non-graded rubric that measures the student's performance

level against professional-level standards
5. Change the culture of the jury to function more like a performance and less like
an exam
Generally, our faculty felt that our students were doing good work and were
pleased with the performances that they observed on the videos. However, our
assessment day workshop was the first time that our faculty had examined the jury
process itself and the mechanics of grading. Faculty also had strong feelings about how
standards differed across disciplines. To this end we agreed to work on our rubrics and
devoted discussion time over the next year towards improving them.
We selected two rubrics whose format we especially appreciated (one originally
created for academic papers and a second from winds and brass) and combined them
into a new, standardized rubric (see Appendix). We liked the larger format of a legal-
size rubric with wide columns and lots of room for comments. From the winds/brass
rubric we liked the scoring columns and totals at the ends of rows in such a way that
the jury grade was calculated from the rubric instead of assigned separately. Our new,

standardized rubric combined these two features and provided generic language that
described what we were hearing instead of simply offering qualitative statements such
as "excellent, good or poor."
The look and feel of the rubric provided a starting point to designing an online
interface that implements the same behavior as the paper copy. That is, by selecting a
box in the rubric the instructor is generating a numerical score. The columns in the
rubric are labeled "Exemplary" "Proficient" "Developing" and "Initial" and have below
them a range of 3-4 numbers, so that a score must be selected within the rubric box.
Unfortunately, our faculty could not agree on a single way to score the rubrics. A
portion of our faculty wanted their area rubrics to score to 100, a move that runs
counter to best practices research which encourages scoring to be as simple as possible
and not contain more than five categories. In spite of a great deal of discussion devoted
to scoring, we still wound up with several different scoring methods, and continue to
discuss and work towards a department-wide solution. The standardized rubric that
we include in the appendix offers one approach along with one scoring solution (see
Figure 3) that is a snapshot of where we are now. Our work in this area is ongoing.
Figure 3—One scoring solution for the new, standardized rubric.
Description Grade Points Range Spread

Highest (40) All 4 A+ 40-39 2
Mostly 4 and some 3 A 38-36 3 7
Split 4 and 3 A- 35-34 2
Mostly 3 and some 4 B+ 33-32 2
Middle (30) All 3 B 31-28 4 9
Mostly 3 and some 2 B- 27-25 3
Mostly 2 and some 3 C+ 24-22 3
Middle (20) All 2 C 21-18 4 10
Mostly 2 and some 1 C- 17-15 3

Mostly 1 and some 2 D+ 14-12 3
Middle (10) All 1 D 11-8 4 10
Mostly 1 and some 0 D- 7-5 3
Mostly or all 0 F 4-0 5 5
The final change to our rubrics was separating the jury grade from an assessment
of the student's performance level against professional performance standards. We
imagined a scale that ranged from "0", representing an absolute entry-level student who
should not be performing publicly, to a "13", representing a young professional
embarking on a career as a performer. The intent was to avoid having the jury grade
convey an unrealistic career expectation. Thus, an inexperienced performer, such as a
first-semester freshman, who works hard and meets all of the expectations of her
teacher and her teacher's syllabus, might still earn an "A" for the semester but would
place low on the Applied Performance Assessment Rubric.
We adapted our new, separate rubric (see Figure 4) from the model developed by
Clayne Robison for the vocalists at Brigham Young University that was given as Figure
1. Dr. Robison's model (2001) was specifically tailored for vocalists who are planning
on a professional singing career, but we felt that we could adapt it for our own needs.
This rubric is only scored after the jury rubric has been graded. In this way, we try to
ensure that the jury and semester grade are based on the syllabus and the studio
teacher's stated requirements.
Figure 4—Our new, Applied Performance Assessment Rubric, based on a model by Clayne Robison (2001)

Preliminary technical work is still needed before attempting any significant public
0
performance. Not convinced that this student should pursue performance.
1
performance. However, this student shows potential as a performer.
2
performance. However, this student should consider performance.
In a university classroom setting (e.g., a studio class or short, non-public recital with
3
piano) this performance would have been almost satisfactory.
In a university classroom performance setting (e.g., a studio class or short, non-public

4 recital with piano), this performance would have been satisfactory. I would enjoy a 15-
minute recital with this performer.
In a university classroom performance setting (e.g., a studio class or short, non-public

5 recital with piano) this performance would have been very satisfactory. I would enjoy a
20-minute recital.
In a modest university public performance setting (a Monday-afternoon recital, a public,

6 shared recital program) this performance would have been mostly successful. I would
enjoy a 25-minute recital.

7 shared recital program) this performance would have been successful. I would enjoy a
30-minute recital.

8 shared recital program) this performance would have been completely successful. I
would enjoy a 40-minute recital.
In a featured university setting (opera, oratorio, full recital, solo with the orchestra) this
9 performance would have been mostly successful. I would enjoy a 50 minute recital with
this performer.
10 performance would have been successful. I would enjoy a one-hour recital with this
performer.
11 performance would have been completely successful. I would enjoy a one-and-a-half-
hour recital with this performer.
In an apprenticeship program or as an emerging artist in opera, oratorio, musical

12
theatre, or orchestral program, this performance would be successful.

In a professional opera, oratorio, recital, or concert this performance would be
13
completely successful.
The final observation offered by our faculty was to replace the paper copy and
roll it into a technology-based solution. A first attempt used Google Docs, specifically a
Google Form that tallies the group's responses into a Google Sheet. This solution
presented all of the functionality of the paper rubric but in a different format that
looked more like a questionnaire rather than a scoring rubric. The advantages of a
Google Docs solution were obvious: simple, free, with collaboration built in, and they
can be set up relatively quickly, requiring a minimal amount of behind-the-scenes
tweaking. The disadvantages of a Google Docs solution are minimal error checking or
security features, the look and feel are different (it no longer looks like a rubric) and the
amount of manual data entry required. Our long-term goal is to move our rubrics into
our CMS (such as Canvas, Blackboard, Moodle and similar); a project that has already
begun.
Music Survey Results
In April 2016, we sent out a short survey to poll opinions about the calibration
process and the common rubric that was developed for music juries (see Figure 6). The
survey was sent out electronically to all music faculty and adjuncts, 23 people in all. Of

those 23, six were full-time faculty and 17 were adjuncts, two of whom are classified as
“part-time.” Eleven took the survey, giving a 52% response rate.

Figure 5—The post-assessment day survey that was completed by the faculty.
In May 2015, the music professors spent several hours working to calibrate jury
expectations and results for the purpose of equalizing and norming the jury process. They
also worked to develop a common rubric to be used in all juries by all instruments. In
December, the common rubric and new process were used for the first time. Please answer
the following questions about your experience and opinion about that process and the new
common rubric.
1. BRANCH to 6--Were you involved in the calibration process in May 2015?
Yes No
2. I think the calibration process helped create a shared vocabulary across applied
areas.
Strongly disagree Disagree Undecided Agree Strongly agree
3. I think it was valuable to calibrate

a. the common rubric.
b. performance standards across applied areas.
c. the jury process across applied areas.
d. with someone outside my applied area.
4. I think the calibration process clarified performance standards across applied areas.
5. Since participating in the calibration process, I feel more ownership of

a. the jury process.
b. the common rubric

6. I think that having a common rubric …

a. changed the dynamics of the jury between faculty members.
b. clarifies performance standards.
c. makes my jury feedback to students more valid and reliable.
d. makes the jury process more equalized across all applied areas.
e. makes me better able to express to students how they will be evaluated.
Please give specific feedback in as much detail as you like.

7. What changes, if any, have resulted from the calibration process and common
rubric?
8. What changes, if any, resulted from the new rubric in the way you prepare students
for juries?
9. Any other comments?

The survey used a branch flow to allow respondents to answer only those
questions that pertained to them. All respondents answered a series of eight questions
addressing their experience and opinions about using a common jury rubric, with
additional space to write comments. Since only the full-time faculty and two part-time
faculty participated in the calibration exercise, the branch flow in the survey directed
those respondents to a series of seven additional questions about the calibration
process. Of the eight full-time and part-time faculty who participated in calibration,
seven responded, giving us an 87% response rate on the calibration questions.
The responses to the survey questions about the calibration process suggest that
faculty appreciated this process for a number of reasons. Respondents indicated that
they valued having colleagues from different applied areas participating together in the
calibration process. They felt that calibrating across applied areas helped to create a
shared vocabulary, standardize the jury process, and clarify performance standards
meeting three of our four goals (see Figure 6).
Figure 6—Responses to the post-assessment day survey, questions 1-4.
Rated “Agree” or N
“Strongly Agree”
I think the calibration process helped create 100% 7
a shared vocabulary across applied areas.
I think it was valuable to calibrate:
the common rubric. 100% 7
performance standards 100% 7
across applied areas.
with someone outside my 100% 7
applied area.
I think it was valuable to calibrate the jury 86% 7
process across applied areas.
I think the calibration process clarified 86% 7
performance standards across applied areas.

When asked what changes faculty experienced as a result of the calibration
process and common rubric, respondents’ comments were positive overall, such as “We
made our rubric more clear”; "Better rubric, better discussion and shared vocabulary.
We can now talk about how we want the jury to work and whether or not we are
successfully meeting our goals"; The new rubric “can give students a strong
understanding of where they are compared to the rest of the world.”
In spite of these positive comments, the survey results indicated that participants
did not feel a strong sense of ownership of either the jury process or the common rubric
even after they had gone through the calibration process. This is not surprising
considering that faculty had used the new rubric for juries only once.
Figure 7—Responses to the post-assessment survey, questions 5-8.
Rated “Agree” or “Strongly Agree” N

Since participating in the calibration process, 29% 7
I feel more ownership of the jury process.
Since participating in the calibration process, 57% 7
I feel more ownership of the common rubric.
The next series of questions were answered by all respondents, some who went
through the calibration process and some who did not. These questions focused on the
common rubric and indicated that the majority of respondents found the common
rubric improved the jury process and performance standards, and has the potential to
validate jury feedback to students. These findings supported the responses from those
who went through calibration discussed earlier, namely that faculty value dialoging

with each other across applied areas about juries and the jury process. One comment
emphasized this, saying, “It was beneficial to see what other areas were using to assess
progress. This type of collaboration allowed for growth in the assessment of our
students department-wide.”
Figure 8—Responses to the post-assessment survey, questions 7-9
Rated “Agree” or “Strongly Agree” N

Having a common rubric makes the jury process 82% 11
more equalized across all applied areas.
Having a common rubric clarifies performance 73% 11
standards.
Having a common rubric makes my jury 73% 11
feedback to students more valid and reliable.
There were additional comments indicating mixed feelings about the rubric.
Some respondents felt the rubric was too complicated and that it distracted them from
focusing on students’ performances during juries, saying, “The entire jury was spent
attending to the… rubric.” In contrast, another commented that because of the rubric,
“students now have a clearer understanding of what is expected and what the jury saw
in the students' performance.” Again, these varying comments are understandable
given that faculty were only one semester in to using the new rubric and emphasize the
need to continue to refine and simplify the common rubric.
Though our sample size was small, it seems reasonable to assert from the
collected data that faculty found worth and value from discussions and calibration
across applied areas that have led to a shared vocabulary, and standardization of jury
expectations and process. Respondents who did not participate in the calibration

process indicated that they valued the common rubric that was developed even while
admitting the need to simplify it and continue the discussion across applied areas.
There is every reason to think that these benefits will grow over time as the faculty use
the common rubric and go through more calibration exercises across disciplines.
Conclusions
Our Music Department implemented the assessment day exercise outlined in this
article with the goals of creating meaningful dialogue among our performance faculty,
developing common standards across applied areas, developing a shared vocabulary
about measuring these standards, and generating useful assessment data. Throughout
the day, our music faculty watched, scored, and discussed multiple video-recorded
student juries, then compared their scores against the original jury scores. Through this
process our faculty were able to get a feel for how the jury process worked in our
department and how scores and standards compared across performance disciplines.
They also were able to outline areas for improvement across our applied juries.
Although the discussions generated through the calibration process proved valuable to
our department, we did not find agreement in all areas and are still working on some of
the key components of our jury process, specifically the format of the rubrics and how
they are scored.
We have presented a snapshot of where we are in our process with one approach
to a common rubric, a simple scoring system that scores from 0 to 5, and the grading

scale that could be used to score the rubric. We believe the jury assessment process that
we have outlined is both simple and universal enough that it can be implemented in a
wide number of music departments. The calibration process should prove itself to be
equally useful, and generate meaningful discussion and useful data for other
departments. We also feel that there is room for further development of our ideas, and
are already in process developing our own online solution that we hope to roll into our
campus LMS. In addition, we feel that a logical next step would be to encourage
students to use the assessment data to reflect and journal about their jury performance
and scores, and to use this data to map out a learning plan. In this way the assessment
data comes full circle and informs the work and direction of study for both student and
instructor.

References
Asmus, E. P. (January, 1999). Rubrics: Definitions, benefits, history and types. Music
Educators Journal, 85(4).
Asmus, E. P. (March, 2003). Music assessment concepts. Music Educators Journal, 86(2),
pp. 19-24. doi: 10.2307/3399585 .
Bergee, M.J. (Summer, 2003). Faculty interjudge reliability of music performance

evaluation. Journal of Research in Music Education, 51(2). doi: 10.2307/3345847.
Chase, D. M., Ferguson, J. L., & Hoey IV, J. J. (2014). Assessment in Creative Disciplines;
Quantifying the Aesthetic. Common Ground Publishing. Champaign, Il.
Ciorba, C. R., & Smith, N. Y. (April, 2009). Measurement of instrumental and vocal
undergraduate performance juries using a multidimensional assessment
rubric. Journal of Research in Music Education. doi: 10.1177/0022429409333405.
Fiske, H. E. (1983). Judging musical performances: Method or madness? The Applications

of Research in Music Education, 1(3), 7-10.
Fuller, J. A. (June 2014). Music assessment in higher education. Open Journal of Social
Sciences, (2) 476-484. doi: 10.4236/jss.2014.26056.
Gordon, E. (2002). Rating scales and their uses for evaluating achievement in music
performance. Chicago: GIA.
Latimer, M. E., Bergee, M. J., & Cohen, M. L. (2010). Reliability and perceived
pedagogical utility of a weighted music performance assessment rubric. The
National Association for Music Education. doi: 10.1177/0022429410369836.
Lebler, D., Carey, G, & Scott, D. (2014) Assessment in music education: from policy to
practice. doi: 10.1007/978-3-319-10274-0.
Mintz, S. (April 2015). Performance-based Assessment. Higher – ed. doi:

10.4135/9781412952644.

National Association of Schools of Music (1999). National Association of Schools of Music
handbook 1999-2000. Reston, VA: Author.
Parkes, K. A. (2010). Performance assessment: Lessons from performers. International

Journal of Teaching and Learning in Higher Education. 22 (1), 98-106.
Robison, C. W. (2001). Beautiful singing: "Mind warp" moments (Ed.1). Provo, UT: Clayne
W. Robison.
Wesolowski, Brian C., (March 2012). Understanding and developing rubrics for music
performance assessment. Music Educators Journal, doi: 00274321, 98(3) 35-42.
Wesolowski, B. C., Wind, S. A., & Engelhard, G. (2015). Rater fairness in music
performance assessment: Evaluating model-data fit and differential rater
functioning. Musicae Scientiae 19(2) 147-170. DOI: 10.1177/1029864915589014.

Appendix--Revised Assessment Rubric for Applied Juries
Item Exemplary (5) Proficient (4-3) Developing (2-1) Initial (0) Comments
Repertory and Style

Selections are appropriate to Selections demonstrate essen-tial Repertory is either well below or
Repertory is exceptional, creative course level and provide musical musical skills and offer some well beyond the student’s ability
Repertory and and innovative and tech-nical challenges that opportunities for the student to and displays minimal evidence of
Selection demonstrate growth display progress progress
5 4 3 2 1 0
Tone can tend to feel unsecure
Professional, full and character- Tone often loses focus and/or
Tone is characteristic, secure, and and tenuous at times. Tone is not
istically mature tone. Exceptional support and is uncharacteristic.
supported. The improvement and always centered or characteristic.
Tone Quality support, depth and volume Little or no improvement from
growth is evident. Some improvement is visible but
throughout selections previous semester
more is needed.
5 4 3 2 1 0
Incorrect pitches and/or serious
Accuracy and intonation are secure Intonation suffers at time and
Intonation is secure and profes- intonation problems visibly mar
and contribute to the musical missed notes interfere with the
sional technique evidenced even in the performance and make the
Pitch Accuracy technically difficult and awkward
presentation. Technical passages performance. Some improvement
listener uncomfortable. No visible
and Intonation are secure and improvement is has been made but more is
passages improvement from previous
evident. needed.
semesters
5 4 3 2 1 0
Technical Progress
Fluid technique and technical Technique is awkward and
Technique is improving although
Smooth, natural, and seemingly growth is evident throughout noticeably hampers the perfor-
difficulties are still evident. Tech-
effortless throughout selections. selections. Technical passages are mance. Technical difficulties from
nical passages are limited. Im-
Technical Facility Professional technique is im- secure and the performance
provement is visible but more is
previous semesters are still
pressive and technically brilliant demonstrates a wide range of evident, unchanged and
needed.
technical work unaddressed.
5 4 3 2 1 0
Inaccuracies and muddiness

Full range of articulations are Inaccuracies and muddiness
Wide range of articulations demon- visibly mar the performance.
accurate and effortless through-out communicate a lack of knowledge
strates an understanding of playing Little or no evidence of
selections and communicate a of or inability to engage playing
Articulation style. Musical style changes knowledge of playing style and
sophisticated and professional styles. Some improvement is
appropriately from piece to piece. little or no improvement from
understanding of playing style visible but more is needed.
previous semesters.
5 4 3 2 1 0
Tempo is significantly Misplaced rhythms and rhythmic
Tempos are secure and convey a
Nuanced use of tempo and rhythm slower/faster than suggested discrepancies mar the perfor-
strong grasp of playing style.
Metrical and is used to communicate at a high tempo. Misplaced rhythms and/or mance. Tempos are
Rhyth-mic nuance is used to
Rhythmical level. Tempos are technically discrepancies in rhythm are inappropriate. Technical
commu-nicate lines and emotiional
Accuracy brilliant. uncomfortable. Limited use of limitations prohibit the use of
connection.
rhythmic nuance. rhythmic nuance.
5 4 3 2 1 0
Item Exemplary (5) Proficient (4-3) Developing (2-1) Initial (0) Comments
Dynamic markings are not

Played as written and observed Observed most of the written
communicated and performance
Exceptional use of dynamic con- dynamic contrasts. Dynamics dynamics and at times used
does not engage the full dynamic,
Dynamics and trasts to richly communicate full creatively communicated an dynamics in a creative manner to
performing range. Little or no
Contrast range of dynamic possibilities. appropriate level of musical fashion the line. Some
progress from previous
understanding. improvement is visible.
semesters.
5 4 3 2 1 0
Musicality and Performance
The musical line suffers at times Performance visibly suffers from
Phrasing clearly used to commu-
Exceptionally planned and from unclear, poorly executed or phrasing that is either inconsistent
nicate the musical line. Strong evi-
executed phrasing communicates missing phrasing. Improvement or completely missing. The
Phrasing dence of musical growth from
mature and professional musicality from previous semesters is musical line is not communicated
previous semesters.
evident but more is needed. and no improvement is evident.
5 4 3 2 1 0
Exceptionally high level of Appropriate style is maintained Incorrect style or lack of any
Communicates appropriate style
emotion-al involvement conveys a throughout the selections and stylistic change from piece to
and emotional connection is
deep understanding of the music emotional involvement is readily piece. Performer is emotionally
Musicianship/ and a desire to communicate an visible. Strong growth from
evident at times. Some growth is
detached from the music. No
Communication visible but more is needed.
emotional connection with the previous semesters. growth from previous semesters.
music.
5 4 3 2 1 0
Appearance and deportment are
Appearance and deportment are Appearance and deportment are Appearance and/or deportment
professional, sophisticated and
appropriate and thoughtfully acceptable and do not detract are noticeably inappropriate and
Appearance and contribute to an impressive and
planned. from the performance. are visually uncomfortable.
Performance well-planned performance
5 4 3 2 1 0
SCORE
Comments:

Assessment in The Applied Studio

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Assessment in The Applied Studio

Caricato da

Copyright:

Formati disponibili

Assessment in the Applied Studio

Bill Clemmons, Mary Jo Clemmons, and Christine Phillips

Point Loma Nazarene University

as basic as these activities are to a healthy music program, it is often difficult to

approval of the instructor, or does it measure something larger such as “professional

quality tone,” or “mature interpretation?” Each of these would resist objective

measurement, often generating disagreement even among experts, frequently hinging

levels of experience. Can a first-semester student who is just beginning instruction on

deliberate assessment process.

process known as calibrating or benchmarking. In the calibration exercise developed at

our institution, private-lesson instructors were asked to agree on grading standards

A Review of the Literature

Higher education has traditionally struggled with finding an objective

assessment to measure student, juried performances, even though performance is an

essential outcome to music education. Assessment practices in Higher Education have

at the same time provide a measure of individual student progress.

Assessment in the Applied Studio 2

Music performance is considered a foundational content standard for the

National Association for Music Education. The National Association of Schools of

dominant method for scoring music performance in higher education is assessment by

2009) and must be taken into account.

establishing reliable criteria was still noted as being under-investigated by Bergee

scoring is essential to be useful to both student and teacher to improve instructional

dimension of aligning program objectives to improve the educational experience (2014).

Assessment in the Applied Studio 4

provide strong evidence of student achievement by "combining several performance

dimensions into one standardized multidimensional assessment rubric” (2009, p. 14).

in higher education to measure student progress and establish clear learning

between a formative progress assessment, and a summative performance assessment.

Assessment in the Applied Studio 5

showed reliability was increased when department-wide criterion scales were

development process was spread out over a period of at least a semester.

underlying definitions to aid in precision to improve a higher degree of consistency

among raters. As reported by Wesolowski, Wind, and Engelhard, a combination of

optimum in performance assessment reliability (2016).

Assessment in the Applied Studio 6

individual performance assessment, and group performance assessment.

Vocal Progress Score Metric (BYU)

plotted into a graph (see Figure 1).

Assessment in the Applied Studio 7

Robison sights multiple positive outcomes at BYU from the implementation of

own personal choice (2001).

recognize patterns of achievement or non-achievement across our students, and make

Kansas State High School

consistent ratings? 2) How reliable were adjudicators in assigning performance

Assessment in the Applied Studio 8

& Cohen, 2010).

The Assessment Activity and our Goals

As with many music programs, our music department at XXX University is

actively engaged in collecting assessment data to generate useful feedback while

reason, we structured an assessment day focused on applied instruction. Our four

assessment-day goals were to 1) promote meaningful discussion about performance

standards, 3) develop a standardized vocabulary, expressed in our shared rubrics, that

inform departmental decisions. To accomplish these goals we structured our

Assessment in the Applied Studio 9

creating conversation across performing areas.

members of each group on the assessment day.

We wanted a standardized recording solution to avoid the possibility of unequal

remove the cards and upload the videos to a shared drive.

Assessment in the Applied Studio 10

In order to eliminate dead time, we selected inexpensive 8 Gb flash drives and

demonstrated entry-level performance. Each faculty group met separately in a

Assessment in the Applied Studio 11