ENBIS-16 in Sheffield

11 – 15 September 2016; Sheffield Abstract submission: 20 March – 4 July 2016

Reliability of Student Evaluations of Teaching

13 September 2016, 16:20 – 16:40


Submitted by
Maria Sole Pellegrino
Amalia Vanacore (University of Naples Federico II, Department of Industrial Engineering), Maria Sole Pellegrino (University of Naples Federico II, Department of Industrial Engineering)
Student Evaluations of Teaching (SETs) are the most common way to measure teaching quality in Higher Education (HE): they are assuming a strategic role in monitoring teaching quality, becoming helpful in taking major formative academic decisions aimed at improving and shaping teaching quality for future courses starting from the evidence shown by quality evaluations.
In the context of HE, the choice of using the student as primary rater of teaching quality has dominated worldwide for the past 40 years. Nevertheless, a review of specialized literature evidences that researchers widely discuss whether SETs can be considered reliable measures for quality evaluation. Although some researchers believe that SET is the best option to provide quantifiable and comparable measures, others point out that student ratings could be influenced by significant impacting factors (such as own classroom experience and class size) losing reliability. The majority of studies concerning SETs reliability focus on the instruments and the procedures adopted to collect students' evaluations or on the inter-rater reliability rather than the intra-rater reliability.
In this paper the main results of a reliability study on SETs provided for a university course are fully illustrated. The data sets have been collected during three supervised experiments carried out on successive classes. The class sizes were more than 20 students, homogeneous in curriculum and instruction.
SETs collected in each experiment have been analysed by measuring the reliability of each student (i.e. single rater), the reliability of the whole class (i.e. multiple raters) and the agreement among students. Particularly, the reliability has been measured as the coherence of the evaluations provided in two occasions using the same rating instrument (i.e. same items and same rating scale) or using partially different rating instruments (i.e. same items but different rating scales); whereas the agreement has been measured for students’ ratings collected exactly in the same conditions (i.e. same occasion, same items and same rating scales).
View paper

Return to programme