Methodological Limitations of Student Engagement and Experience Surveys
The widespread use of student experience and engagement survey data raises questions of reliability, validity and other quality characteristics of such data to be used as evidence in higher education decision-making. Validity concerns “whether the surveys measure what they are designed to measure and to provide evidence that supports inferences about the characteristics of individuals being tested” (OECD 2013, p. 12). The key aspect of validation of the survey instrument lies in assessing whether the assumptions which are included in the theory deﬁning the construct (in this case student satisfaction/student experience and student engagement) are credible. Reliability concerns whether surveys “provide stable and consistent results over repeated measures allowing for results to be replicable across different testing situations?” (OECD 2013, pp. 12–14). Here the focus is much more on questions such as how respondents respond to the questions, i.e. if the interpretation of questions is consistent among participants, but also the guidelines for administering the survey and scoring individual items. In this section we synthesise the key points of criticism and the defence of student surveys with speciﬁc focus on student engagement surveys. We believe that decision makers ought to be aware of these discussions as to be able to evaluate the rigorousness and appropriateness of the speciﬁc survey instruments at their hand.
Critics point to two major areas of contention in the student engagement surveys:
(1) accuracy of student self-reported information on engagement and learning gains, and (2) the selection of the standards of educational practice and student behaviour implied in the questions (Campbell and Cabrera 2011; Gordon et al. 2008; Porter 2013; Porter et al. 2011). The proposition on the former is that cognitive abilities of students to comprehend the survey question and retrieve the information are often overestimated by survey designer. On this point Porter (2011, p. 56) illustratively suggests that the surveys are built with the view of students “as having computer hard drives in their head allowing them to scan the reference period of matching behaviour, process it and provide an answer of the frequency of the behaviour”. Particular criticism of the student engagement surveys concerns students' ability to make an informed judgment of their self-reported learning gains, i.e. growth in knowledge, skills, abilities, and other attributes that they have gained during studentship. Porter et al. (2011) offer empirical evidence as to the inaccuracy of the self-reported learning gains. They argue that such types of questions are highly susceptible to social desirability bias (ibid.). When students wish to provide an answer in the survey, but cannot retrieve information, they resort to intelligent guessing, often based on what they think should be happening or to make them look favourable to others (Porter 2011).
Another criticism is in the selection of the standards of institutional practice and student behaviour, i.e. the factors that are expected to influence student learning and development, implied in the survey questions, and how these relate to other external measures. The important question here is what is measured and what is not. Standardised surveys imply an established (ﬁxed) standard of process or outcome against which institutions are evaluated and need to demonstrate conformity (Ewell 2009). This raises a question of how these “standards” have been established: have they been derived from theory, from other empirical ﬁndings, or they reflect certain policy objectives. Survey research is prone to observational biases when researchers look “where they think they will ﬁnd positive results, or where it is easy to record observations”, i.e. the so-called 'streetlight effect' coined by Friedman (2010) after the joke of a drunken man who lost his key and is looking under the streetlight since that is where the light is. In this respect, surveys tend to give more attention to institutional factors that shape student experience and less to the other contextual and psycho-socioecological factors, which are much more difﬁcult to measure, such as the role of broader socio-cultural context, university culture, family support, psycho-social influences (Kahu 2013), emotions (Beard et al. 2007; Kahu 2013), student and academic identities, and disciplinary knowledge practices (Ashwin 2009).
A problem speciﬁc to inter-institutional and system-wide surveys lies in the level of contextualisation. To allow for comparisons, these surveys are conceived in a generic and highly abstract way. This proves it difﬁcult to adequately account for the organisational differences between institutions in terms of their speciﬁc missions and objectives, resources, proﬁles of student population, and various unique arrangements that give each and every institution certain distinct flavour. If the survey tool is generic enough as to allow for comparison of very different institutions in a national system or internationally, then their use by any of the intended users—institutions, students or governments—is fairly limited. In their generic form these surveys cannot discern the contextual dimensions and variables which could add most value to a formative use of such data. International comparisons or international adaptations of the instruments initially developed for a particular higher education system (such as the US or Australia or the UK) present a number of challenges associated with adequate translation and cultural localization of survey items. More contextualised variations of survey design are developed when very similar institutions are compared and the lower we go within institutional hierarchy, i.e. to the program level.
The rebuttals of the criticism are equally numerous. The key response to the criticism regarding the accuracy of self-reported learning gains is that surveys— such NSSE—never claimed to collect precise responses about either learning gains or behaviours, but are based on the principle of a reasoned and informed judgement, which allows the institutions to use the data to screen major occurrences and major trends over time and across institutions (Ewell 2009; McCormick and McClenney 2012; Pike 2013). The criticism regarding the selection of “benchmarks” has been refuted by pointing out that major surveys rely on interviews and focus groups both in formulating and in pilot-testing the questions. The key focus of these qualitative appraisals is precisely to test participants' understanding and the consistency of interpretation of the questions (McCormick and McClenney 2012; Pike 2013). Pike (2013) notes that the primary use of student surveys is often ignored by the critics and that major validation lies in these surveys' appropriateness for institutionand group-level decision-making. In the case of NSSE he offers empirical evidence that the NSSE benchmarks can be used to assess the extent to which an institution's students are engaged in educationally purposeful activities, and the extent to which colleges and universities are effective in facilitating student engagement (Pike 2013). Furthermore, several authors highlight that the survey benchmarks were designed so as to “represent clusters of good educational practices and to provide a starting point for examining speciﬁc aspects of student engagement” (Ewell et al. 2011; Kuh 2001; McCormick and McClenney 2012; Pike 2013, p. 163).
Furthermore, a welcome modiﬁcation has been in longitudinal designs with repeated measure which allow for tracking changes in student behaviour and perceptions of student experience over time. Another helpful revision to the survey designs has been done by introducing the questions of student expectations and aspirations to surveys targeted at students at the beginning of their study. Importantly, longitudinal designs have also been extended into the labour market since the effects of educational provision on students may better reveal upon completion of studies (cf. Kim and Lalancette 2013). Promising complementary research lies in student social network analyses which depict a complex web of relationships and interactions, both historic and present, both within and outside academic settings, both physical and virtual, that shape individual students' (perception of) learning and experience (Biancani and McFarland 2013).
One implication of the eagerness of institutional decision makers and policy makers to collect data directly from students is survey fatigue. Students are more and more tired of surveys, complete them carelessly or do not complete them at all. Institutional surveys compete with hundreds of other surveys (including those by business eager to understand the millennials' consumer habits) and students do not differentiate between them or do not care to respond. Low response rates accentuate possible biases in survey responses; the most common among them is underrepresentation of disengaged, non-traditional and minority students. Low response rate remain major challenge in the student survey methodology despite ample attempts devoted to ﬁnd better ways to raise response rates (Porter 2004; Porter and Whitcomb 2004; Porter et al. 2004). Inevitably, we will need to look for new ways of collecting data from students on their behaviour, preferences and opinions.
In sum, there are convincing arguments on both sides. Obviously, researchers ought to continue to work towards improving student survey instruments, as such data is helpful for our better understanding of how students experience higher education and for devising interventions for improvements. While survey data is an important source of evidence, it is by no means sufﬁcient. As mentioned by Alderman et al. (2012, p. 273), greater reliability of data is achieved when student survey data are used “in conjunction with information from other sources and robust links are established between the data and the institution's overall quality management system”. For the purposes of formative decision-making oriented towards the institutional and program improvements, student data needs to come from several sources and be validated through cross veriﬁcation of data from different sources (i.e. triangulated). At best student surveys are used as screening instruments to discover major deﬁciencies in educational environment and provision, and major discrepancies in student behavior from the expected. Such diagnostic results in turn guide the institutional managers to explore causes and consequences of various practices and processes. This is done through qualitative methods which can generate contextualized data—indeed richer, deeper and more authentic data—on student experience and behaviour albeit on smaller scale, by focusing on the 'particular'.
The advantage of qualitative methods is that they can generate richer, deeper and more authentic data on student experience and behaviour. However, their major drawback is in limited scope—they focus on particular case or phenomenon, which makes generalisations to large populations problematic. The intensive ﬁeld-work (through in-depth interviews, focal groups, direct observation, etc.) makes it simply too time-consuming and too costly to reach large numbers of students. The question that arises is whether, with the use of new technology, the universal use of social media by students and the advances in big data science, these limitations could be overcome. Frontier research agendas lie in exploring digital adaptations of qualitative research methods of data collection, such as digital ethnography and digital phenomenology, which give access to more contextualized data on human behavior and lived-experiences on a large scale (Klemenčič 2013). It is plausible to expect that, in the very near future, data on student experience will be collected from students not through invitations to answer on-line student surveys but rather— seamlessly and in great volumes—through social media platforms adapted to use by institutional researchers (Klemenčič 2013). Advancements in educational technology and students' near universal use of mobile technology present enabling conditions for such innovation. The major challenge to this promising method, however, lies in safeguarding of private or individually identifying information and other ethical concerns that arise from research using Internet.
Before we continue to describe the various approaches to student data analytics, one concession is in place. Student data analytics to generate evidence for decision-making is inevitably reductionist: it means capturing aspects of student experience which are general to most students, rather than particular to a few. There is no way that we can turn every idiosyncratic aspect of individual student experience into evidence that can inform institution-wide or system-wide decisions. Against, best what we can do as researchers and decision-makers who seek “intelligence” for their decisions, is to utilize data from several sources and obtained from both quantitative and qualitative methods.