The psychometric development and review of an evaluation system for string ensemble performance using Rasch measurement theory
Edwards, Kinsey Emiline
MetadataShow full item record
The purpose of these studies was to develop a valid and reliable rubric for the evaluation of large ensemble string performances using psychometric principles of invariant measurement. The three papers seek to define assessment within the music classroom, create and validate a rubric for performance evaluation, and review how the newly designed rubric operates in a live performance evaluation setting. The first portion of the study was guided by the following research questions: (a) What does Rasch Measurement analysis reveal about the psychometric quality (i.e., validity and reliability) of items, raters, and ensembles within the context of a large ensemble string performance assessment? (b) How do the items vary in difficulty, raters vary in severity, and ensembles vary in achievement? and (c) How does the rating scale structure vary across individual items? Music content experts (N = 25) were solicited to evaluate string ensemble performances. Response categories were optimized in order to increase measurement accuracy and precision. Implications for the improvement of music assessment practices are discussed. The second part of the study was guided by the following research questions: (a) How do the numerical ratings from the condition A rating scale compare to those numerical results yielded from the newly developed condition B rubric? (b) How do the written forms of feedback given to the directors of the ensembles from the two systems compare? and (c) How do the two forms compare in terms of overall usability for the raters? A side-by-side comparison of the condition A rating scale in relation to the condition B rubric was conducted. Music content experts (N = 3) were solicited to evaluate string ensemble performances using the condition A rating scale while three additional content experts used the condition B rubric to evaluate the performances. Results from the condition A rating scale were analyzed using both Rasch analysis and Classical Test Theory and results from the condition B rubric were analyzed using Rasch analysis. Comparisons were made to determine which method better distinguished true measurement of the actual performances. Implications for the improvement of music assessment practices are discussed.