Peer review of teaching. Do we know good teaching when we see it?

Jack Heinemann

When I was applying for my first academic position I was advised to ask my referees to not say that I’d be a good teacher, because that was “code” for saying I’d never be a top researcher. Whether or not many of us think the way this mentor did, there is some resemblance in this advice to how the research university works.

Your job application to a research university begins and ends with what you have done, and what you plan to do, in research. To get an interview you have to demonstrate your research worth. Even if the interview includes a separate event to showcase your teaching talents, more of your future colleagues will attend your research seminar.

Your future colleagues often think that they can extrapolate your teaching talents from how slick your lecture is. Indeed, most teaching components of an interview process are just mock lectures to hypothetical undergraduates. You have to be pretty diabolically bad for the job to be lost because of this perfomance.

Endemic in our culture is the quest for knowledge through research. That is not a bad thing! One of the characteristics of a research university that makes it different from primarily tertiary teaching institutions and from primarily research institutions is that research is a vehicle of teaching. Students ride this vehicle with mentors whose choice of research questions is (sometimes and hopefully mostly) influenced by how well they align with the capabilities of students to use them to learn to become independent researchers.

This begs the question then about what is evidence of effective teaching. The most common tool of measurement is the SET (student evaluation of teaching) survey. While this tool has value when applied well and for the right purpose—which sadly is almost never[1]—we have few alternatives. Most universities appear to be too insecure to lead the way away from this situation. However, the search is on for more effective measures of teaching effectiveness.

The advantage of survey tools is scalability. SET not only thrives on big numbers of respondents, but the cost per respondent declines as the number of students increases. Thus the cost to using the tool is predictable and minimal, regardless of whether the information is fit for purpose. Alternatives that involve careful measure of student achievement against carefully defined and possibly customised learning goals are not scalable and therefore potentially much more expensive.

Scalability is likely to guide a criterion for auditioning the next wave of SET companions. A tool I hear more often spoken about is peer review/observation of teaching. There is research that supports the efficacy of this tool, again when applied well and for the right purpose.

Notwithstanding that evidence, poor implementation of peer review of teaching could put us right back to where we are now. How will we know when or if peer review of teaching enriches the culture of learning? The tool is particularly attractive if you believe that we know good teaching when we see it. The challenge is to design a system of peer review that does something SET surveys do not do, to make visible previously invisible good teaching.

Challenges include:

  • decades of academics’ defining themselves as good or bad teachers based on SET surveys, a practice that has embedded within the culture the same perceptions of good and bad teaching that students have and which a mountain of evidence has shown can be contraindicative of benefit to students. How will our peers manage, even perceive, the effects of career long SET grooming?
  • assessing whether academics are better at guaging the learning happening in students by watching an academic guiding students through a learning activity. If the watcher has the same bias and ability to observe students as the instructor, and neither have access to objective and independent evaluation of changes in the minds of students, then peer review of teaching could be just correlative with SET.
  • distinguishing between high acheivement in a poor teaching activity from mediocre achievement in a good teaching activity. The difference is important because improvement in the latter has more potential to improve learning outcomes than does improvement in the former. We academics still primarily use the lecture format to both share our research and demonstrate our teaching ability. Why would we suddenly see that lectures are less effective than other approaches by watching our peers give lectures, especially when some of our colleagues could be much better at giving lectures than we are?

I prefer to identify good teaching through evidence of its effectiveness rather than reference to an internal standard of goodness. One source of evidence comes from comparing peer review evaluations with the outcomes of careful measure of student achievement against carefully defined and possibly customised learning goals. Oops, back to the unscalable and therefore unaffordable standard of measurement!

Measuring poorly can possibly cause more harm than not measuring at all especially when the measurement is strongly linked to promotion. The accountability-through-metrics generation will not like to read this. However, the evidence for this statement is now too large to ignore.[2] But I can throw a bone here and will. Measuring well but infrequently might cause more good than not measuring at all.

There are no easy answers to how to measure effective teaching. In part this is because teachers and learners can have different goals, and teachers can have different goals from one another and in different courses with different learners. There is a lot to measure. This might all sound depressing but to me it isn’t. Academics have a selfish interest in being effective teachers. We are dependent on a society that is made competent through our teaching. So good teaching really does matter.


[1] A particularly blunt statement from the research literature: “…our findings indicate that depending on their institutional focus, universities and colleges may need to give appropriate weight to SET ratings when evaluating their professors. Universities and colleges focused on student learning may need to give minimal or no weight to SET ratings. In contrast, universities and colleges focused on students’ perceptions or satisfaction rather than learning may want to evaluate their faculty’s teaching using primarily or exclusively SET ratings, emphasize to their faculty members the need to obtain as high SET ratings as possible (i.e., preferably the perfect ratings)…” Source: Uttl, B.; White, C.A.; Wong Gonzalez, D. Meta-analysis of faculty’s teaching effectiveness: student evaluation of teaching ratings and student learning are not related. Studies Ed Eval 2016;54:22-42.

[2] Braga, M.; Paccagnella, M.; Pellizzari, M. Evaluating students’ evaluations of professors. Econ Ed Rev 2014;41:71-88. http://www.insidehighered.com/views/2019/06/24/relying-often-biased-student-evaluations-assess-faculty-could-lead-lawsuits-opinion#.XRCVPqdRI5g.twitter

One thought on “Peer review of teaching. Do we know good teaching when we see it?

  1. Pingback: Homepage

Leave a Reply

Your email address will not be published. Required fields are marked *