DESIGNING TESTS TO MEASURE PERSONAL ATTRIBUTES AND NONCOGNITIVE SKILLS
Patrick C. Kyllonen
In the past decade there has been an increasing interest in noncognitive skills, or what are sometimes called soft skills, character skills, social-emotional skills, self-management skills, psychosocial skills, behavioral skills, interpersonal and intrapersonal skills, or 21st-century skills. The purpose of this chapter is to review why these factors are important for both education and the workplace, to review various construct frameworks for these factors and then to review a wide variety of methods for measuring them. Surveys have shown that higher education faculty value and seek out students with strong noncognitive skills as well as cognitive ones (e.g., Walpole, Burton, Kanyi & Jackenthal, 2002), and for good reason. Several meta-analyses of predictors of college performance (e.g., Casillas et al., 2012; Richardson, Abraham & Bond, 2012; Robbins et al., 2004; Robbins, Oh, Le & Button, 2009) have shown that a variety of noncognitive factors, such as effort regulation, achievement motivation, academic and performance self-efficacy and grade goal predicted both grades and persistence in college. This is true even after controlling for grade point average, standardized achievement test scores (e.g., SAT, ACT) and socioeconomic status, and in some cases the prediction is at a level comparable to the prediction given by these other scores. Burrus et al. (2013) developed a model of persistence in higher education that reflected these findings. Major test publishers have recently made available noncognitive behavioral assessments—ACTs Engage and ETS’s SuccessNavigator—designed to identify students at risk academically and to boost retention rates. This all represents a new direction for student assessment in higher education.
A similar story can be told about noncognitive assessment in K-12. Poropat’s (2009) meta-analysis showed that personality ratings were as strong a predictor of school grades as cognitive ability test scores were, from K-12 through college. A study by Segal (2013) showed that teacher ratings on a five-item checklist of eighth-grade student misbehavior predicted labor market outcomes 20 years later—employment and earnings—even after controlling for educational attainment and standardized test scores. Durlak and Weissberg’s two meta-analyses conducted by the Collaborative for Academic, Social, and Emotional Learning (CASEL) showed the benefits of school-based (Durlak, Weissberg, Dymnicki, Taylor & Schellinger, 2011) and after-school (Durlak, Weissberg & Pachan, 2010) social and emotional learning programs on achievement as well as on social and emotional skills, attitudes and behavior. Paul Tough’s (2013) best-selling book on the keys to academic accomplishment argued that character qualities, such as perseverance, curiosity, optimism and self-control, were as important as or more important than cognitive abilities for children’s success in school and beyond.
Similarly, licensing agencies in the health-care professions and employers in business and industry report that they are looking for noncognitive skills, such as teamwork and collaboration, professionalism, work ethic, leadership, creativity, adaptability, positive attitude, interpersonal skills, communication skills and goal orientation (Casner-Lotto, Barrington & Wright, 2006; Haist, Katsufrakis & Dillon, 2013; Raymond, Swygert & Kahraman, 2012). Approximately 15% of employers report using personality tests for hiring workers (Schmitt & Ryan, 2013), and both employers and higher education institutions use interviews and letters of recommendation to assess applicants’ noncognitive skills (Kyllonen, 2008; Walpole et al., 2002).
Given this context it is clear that noncognitive assessment is emerging as an increasingly important topic for test development. The time is right for a chapter addressing some of the unique issues associated with noncognitive assessment. Some issues in noncognitive assessment—sources of validity evidence, evaluating reliability, fairness, threats to validity, item development processes, norms, cut scores and others—are similar to issues in cognitive assessment. But the constructs themselves are different (e.g., conscientiousness), and the methods for assessing the constructs (e.g., rating scales) are typically different from those used in cognitive assessment. The purpose of this chapter is to give a reader a sense for both the similarities and differences in the test development process for noncognitive assessments.
In this chapter I outline some of the more popular constructs and general frameworks. Some of the key constructs include work ethic, self-regulation, teamwork, creativity, leadership and communication skills, as well as attitudes, interests and subjective well-being. There have been several notable attempts to summarize these constructs in frameworks relevant to K-12 assessment, including the Collaborative for Academic, Social, and Emotional Learning (CASEL, 2013), the Chicago Schools consortium (for K-12) (Farrington et al., 2012), several 21st-century skills reviews conducted by the National Research Council (e.g., Pellegrino & Hilton, 2012) and frameworks for large-scale assessments, such as the Office of Economic Cooperation and Development’s (OECD) Program for the International Assessment of Adult Competencies (PIAAC) (Allen et al., 2013) and the Program for International Student Assessment (PISA) (OECD, 2013). I also review the Big 5 personality theory, which can serve as a foundation for noncognitive skills assessment and has been particularly important in the workforce and increasingly in education.
Following this I review the assessment methods one commonly finds in the noncognitive assessment literature. Measurement may be particularly important in noncognitive skills assessment; as can be seen from the foregoing brief review, there are many noncognitive constructs. Further, they tend often to be only loosely defined, and definitions differ across test developers so generalizing beyond the defined construct is hazardous. There have been noteworthy attempts to standardize definitions (e.g., Goldberg et al., 2006; Pellegrino & Hilton, 2012), but many more such efforts are needed to achieve standardization in noncognitive construct definitions.
Rating scales are widely used in noncognitive assessment, but methods for moderating rating scale responses (e.g., anchoring vignettes), forced choice (rank-and-preference methods), situational judgment tests and performance measures have also been used extensively I provide a constructs-by- methods taxonomy to indicate for which constructs particular methods have been used and for which they might not exist or be in a research phase. I conclude with a discussion of various uses of noncognitive assessments.