Rating Items as Measures of the Quality of Knowledge Representation

In traditional paper-and-pencil tests, rating items are used to measure teachers’ declarative-conceptual or procedural knowledge in a standardized and economical way. To account for the quality of knowledge representations, items that target different cognitive processes are formulized. In interpreting the quality of knowledge, participants’ responses are compared with a criterion-related norm (resulting, for example, in right or wrong answers) as reference. In this section, we describe how we transferred this approach to the video-based assessment of teachers’ knowledge. We explain how to develop rating items that target professional vision skills based on a theoretically conceptualized model. Furthermore, given the fact that teaching effectiveness research does not provide right or wrong answers with regard to the quality of the events observed in the videos, we detail how we use expert judgments as the qualitative norm for the assessment.

Rating Items Targeting Different Cognitive Processes

Based on our theoretical model of the structure of professional vision, we aimed to develop rating items that capture the quality of teachers’ knowledge representations. For this reason, we constructed rating items for the combination of declarative-conceptual knowledge about the three TL components (goal clarity, teacher support, and learning climate) and the three reasoning skills (description, explanation, and prediction). Yet, what does it actually mean to describe, explain, or predict a classroom situation with regard to knowledge concerning a certain TL component? In order to provide evidence-based reasoning for the noticed classroom events, models and knowledge regarding these processes are important. In this context, teaching effectiveness researchers refer to self-determination theory (SDT) in order to model the processes involved in the creation of learning environments by teachers as well as the effective use by learners. SDT proposes three basic conditions that a learning environment needs to satisfy in order to make learning processes likely the experience of competence, autonomy, and social relatedness (Deci & Ryan, 2004). A substantial body of research has shown that the perception of these conditions in a learning environment is positively related to both intrinsic motivation and human development. With regard to the three selected teaching and learning components derived from the teaching effectiveness research, it has been shown that goal clarity and orientation are important for students to experience competence, autonomy, and social relatedness (Kunter, Baumert, & Koller, 2007; Seidel, Rimmele, & Prenzel, 2005), with positive effects on students’ motivation and knowledge development over time. In addition, teacher support and guidance in classroom discourse is positively related to the three conditions, with positive effects on intrinsic learning motivation and interest development (Lipowsky et al., 2009; Seidel et al., 2003). Furthermore, a positive learning climate positively affects perceptions of the three conditions, again with positive effects on students’ learning (Buff, Reusser, Rakoczy, & Pauli, 2011).

In this vein, the construction of our rating items was based on the framework in Figure 3. Questions measuring description targeted the specific observation of the three TL components using knowledge about aspects of each component in naming and differentiating an observed event. Questions tapping into explanation focused on the link between an observed event and knowledge about the corresponding TL component, specifically with regard to how a teaching component addresses students’ individual perceptions of the supportiveness (e.g., autonomy, competence) of a classroom situation. Questions assessing prediction focused on the potential consequences of an observed situation in terms of students’ learning, including the consequences for learning motivation, cognitive processing, and affect.

For each TL component, 18 rating items were developed, with nine per content aspect (three for description, three for explanation, and three for prediction). A four-point Likert-scale ranging from 1 (disagree) to 4 (agree) was used. Participants were asked the extent to which they agree with the items after having watched a video representing the relevant TL component.

Frame of reference for item construction

Figure 3. Frame of reference for item construction.

Expert Judgments as the Frame of Reference for the Rating Items

When assessing teachers’ professional vision using rating items, it is necessary to identify a suitable frame of reference for comparing participants’ responses. In competence assessment, various approaches can be used to define the relevant criteria. In qualitative research concerning professional vision, for example, the individual approach (Fuchs, Benowitz, & Barringer, 1987) has been used to describe the development of individuals’ performance over time (Star & Strickland, 2008). According to this approach, growth is measured within individuals over time. However, differences between individuals cannot be compared using an individual norm. To address this shortcoming, the traditional comparative approach has been used in addition to the individual approach to compare a person’s performance to a norm based on the performance of other individuals with similar characteristics—a social reference norm (Fuchs et al., 1987). However, given that the significance of a participant’s performance is dependent on his/her relative position in comparison to that of the other participants, significant variability is required to apply a norm-based reference to performance (Popham, 1971). Thus, it is essential to utilize representative samples with a great deal of variety.

When it comes to assessing professional vision at the level of teacher education, for example, in initial teacher education programs at universities, only limited variance within the sample can be guaranteed. A potential approach for dealing with this issue is seen in the use of criterion-referenced norms (Goldstein & Hersen, 2000). Criterion-referenced norms use content-related criteria for comparison. One well-established criterion-referenced norm is the expert norm (i.e., Oser, Heinzer, & Salzmann, 2010). This approach is based on the assumption that experts can be characterized as exhibiting a large number of domain-specific organized knowledge structures that they can draw on to successfully deal with the specialized tasks of their profession (Kalyga, 2007; Ericsson, Krampe, & Tesch-Romer 1993).

In our study, we used an expert norm as the criterion norm to measure teachers’ reasoning skills. However, a question still arises regarding the most appropriate type of expert to act as a suitable reference for the target competence assessment. With the Observer Research Tool, we aim to assess teachers’ knowledge representation regarding effective teaching and learning. With this in mind, we chose as a suitable frame of reference persons with an elaborate base of evidence-based knowledge, such as researchers in the field of teaching effectiveness. Furthermore, the assessment targets the application of knowledge to practice by observing and interpreting classroom videos. Thus, the second criterion for being an expert was a broad treasure trove of experience in classroom observation.

To create our norm, three expert researchers, each with 100-400 h of experience observing classroom situations according to the teaching and learning components under investigation, independently answered all four-point Likert-type items included in the Observer Research Tool (Seidel & Sturmer, 2014). Cohen’s Kappa (к) was calculated to determine the consistency of the expert ratings, with a mean Cohen’s к of.79 across the raters indicating a satisfactory level of consistency (Seidel et al., 2010b). In cases where the experts initially disagreed, agreement was reached by consensus validation. The expert norm was thus established and the participants’ responses can be compared in terms of the extent to which they concurred with the expert judgment. With respect to how stringent the comparison to the expert norm should be for a reliable measure, two different strategies for calculating agreement were established and tested: (1) a more strict measure of ‘0’ (miss expert rating) and ‘1’ (hit expert rating); and (2) a less strict measure of ‘0’ (miss expert rating), ‘1’ (correct direction on the scale), and ‘2’ (hit expert rating). The strict recoding proved to be superior to the less strict version, which took tendency into account (Seidel & Sturmer, 2014).

< Prev   CONTENTS   Source   Next >