What We Learned from Assessing Astronaut Applicants
Future research regarding selection for spaceflight must resort to more creative tactics for quantifying performance and validating predictors. For example, space agencies could conduct studies to generalize and validate predictors among samples of teams whose work approximates some portion of astronaut responsibilities or tasks. Synthetic validation tactics will also hold more appeal as collaborations between the commercial and international space agencies continue expanding to accomplish long-duration missions.
In the meantime, 50 years of ground-based research on individual selection for work done in teams, including small group research conducted in analog and/or extreme environments, should inform astronaut selection. Ground-based studies show that new teams choosing individuals who are skilled at training and articulating their roles to others, compromising, helping other team members take on their tasks, and who also understand effective team processes achieved better team performance than those ignoring these individual skills during selection (Jones, Stevens, & Fischer, 2000). Evidence suggests that individual characteristics (in addition to individual skills and values) influence performance in a teamwork setting. For example, Barrick, Stewart, Neubert, and Mount (1998) found that a team member with a very low score on Conscientiousness (as measured by the NEO PI-R) had an impact on team performance by acting as the “weakest link,” constraining team performance. In assembly and maintenance work teams, team averages on three personality factors (Emotional Stability, Conscientiousness, and Agreeableness) and general mental ability were positively correlated with supervisor ratings of team effectiveness. In addition, team average general mental ability and two personality factors (Extraversion, Emotional Stability) were positively related to supervisor ratings of the team’s ability to maintain itself overtime (Barrick et al., 1998). One metaanalysis found that interpersonal facilitation was significantly predicted by three personality factors (Conscientiousness, Emotional Stability, and Agreeableness) (Flurtz & Donovan, 2000). Studies like these provide ample evidence that individual factors, such as personality and general mental ability, help predict the quality of performance in a teamwork setting.
Research on pilots offers further evidence that individual personality factors are relevant to selecting individuals capable of effective teamwork. In regards to interpersonal characteristics, a “right stuff’ cluster based on the Personality Characteristics Inventory was composed of high levels of expressivity (warmth, sensitivity), low levels of negative instrumentality (arrogance/hostility), and verbal aggressiveness (complaining, nagging, passive-aggressive) (Chidester, Helmreich, Gregorich, & Geis, 1991; Gregorich, Flelmreich, Wilhelm, & Chidester, 1989; Musson & Helmreich, 2005). A “wrong stuff” cluster included high levels of verbal aggressiveness and low levels of positive expressivity; whereas, a “no stuff” included low scores on expressiveness, instrumentality, mastery, etc. The “right stuff” cluster pilots were considered more effective by observers in a one-and-a-half-day simulated trip with crew than “low stuff’ and “no stuff” pilots (Chidester, Kanki, Foushee, Dickinson, & Bowles, 1990).
Navy research in Antarctica suggests that while technical competence is necessary, it is also important to select individuals who exhibit “social compatibility or likeability, emotional control, patience, tolerance of others, self-confidence without egotism, the capacity to subordinate routinely one’s own interests to work harmoniously as a member of a team, a sense of humor, and the ability to be easily entertained” as well as be practical and hardworking (Stuster, 1996, p. 268). NASA has historically used personality measures during astronaut selection as one source of data to capture key characteristics underpinning successful performance in the role. One challenge to evaluating the effectiveness of these tools is that there is a substantial delay (e.g., 10 years) between the time that these characteristics are initially measured and the time when astronauts are “tested” in actual space environments. Changes in personality may occur during that interval, which complicates the task of identifying the best predictors of astronaut mission success (Sgobba et ah, 2018).
Another promising tool is biographical data questionnaires, which have been effectively used to predict success in military, government, and civilian training and job performance (Breaugh, 2009; U.S. OPM, 2018). Reviews indicate that biodata will predict a wide range of criteria (e.g., leadership performance, teamwork behaviors) in a wide variety of occupations and it typically yields cross validities between .30 and .40 (Bobko, Roth, & Potosky, 1999; Breaugh, 2009; Cook & Cripps, 2005; Schmidt, Ones, & Hunter, 1992). Strong research evidence indicates that empirically keyed biodata instruments can produce validities approaching or exceeding validities obtained with cognitive ability measures to predict a wide spectrum of performance criteria, ranging from leadership to absenteeism (Asher, 1972; Bobko et ah, 1999; Hunter & Hunter, 1984; Mumford & Owens, 1987, Reilly & Chao 1982; Schmitt Gooding, Noe, & Kirsch, 1984). However, limitations on the use of biodata instruments include the large samples needed to develop valid empirical keys, low face validity for items (Smither Reilly, Millsap, AT&T, & Stoffey, 1993), potential of perceptions of invasiveness (Mael, Connerley, & Morath, 1996), and the possibility of faking or distortion by candidates (McFarland & Ryan, 2000). Given these limitations, effective use of biodata instruments during the psychological screening of astronaut applicants is extremely difficult. Organizations selecting for work in extreme environments cannot always manage the logistics of implementing biodata assessments with the larger applicant pools. For space agencies to benefit from biodata assessments, research will have to find ways to develop valid empirical keys with small, international samples.
Ground-based research also suggests that cognitive ability tests (both general and specific measures) are one of the most valid predictors of job performance, especially for more complex jobs (Hunter & Hunter, 1984). Despite their potential for disparate impact, given the strong validity associated with ability measures and the criticality of cognition-based skills and activities for successful astronaut job performance (e.g., identifying and gathering information, problem solving and decision making), cognitive ability measures remain a useful and legally defensible form of assessment in the selection processes for work in extreme environments. To this point, cognitive ability tests have only been introduced at a late stage of the astronaut selection process (i.e., for the final 200 to 50 applicants) and the limited range of the scores coming from such a small subset creates complications for conducting validation research. Cognitive ability test scores are one of the most robust predictors of job success, so a greater variety of cognitive ability tools could be used at an earlier stage in astronaut selection to more effectively select subsequent pools with greater diversity of sub-factors of general ability. Popular theories and ground-based studies suggest that well-roundedness among sub-factors or different conceptions of intelligence contributes to successful work performance in extreme environments. However, research is required to determine whether this is the case for the astronaut role before taking this approach to selecting individuals for that position.
A large preponderance of evidence from ground-based studies supports the general fairness, validity, and commonality of conducting structured interviews to screen applicants across industries and cultures (Carson, Carson, Fontenot, & Burdin, 2005; Fox & Spector, 2000; Huffcutt Weekley, Wiesner, Degroot, & Jones, 2001; Pulakos & Schmitt, 1996; Schmidt & Hunter, 1998). Additionally, the literature indicates that structured interviews increase the veracity of clinical judgments and screening decisions by standardizing practices across clinicians (and/ or assessors) within spaceflight (Endo, Ohbayashi, Yumikura, & Sekiguchi, 1994; Fassbender & Goeters, 1994; Santy & Jones, 1994) and in other industries. For these reasons, NASA has adopted a semi-structured approach to interviews designed to evaluate whether applicants are psychiatrically qualified, as well as those evaluating candidates’ level of psychological suitability for the astronaut role.
The likely multi-national nature of future long-duration spaceflight missions will increase the need to use interviewers/assessors from multiple cultures to jointly assess applicants for the astronaut role. Traditional work is also more global than ever before in the course of human history, so there is good motivation for doing more research to understand how to best train interviewers from different cultures to assess fairly together and how to build interviews that accurately assess qualified applicants from multiple cultures.
Finally, we consider simulations. Work simulations present “applicants with a task stimulus that mimics an actual job situation [including teamwork requirements] and elicit responses that are interpreted as direct indicators of how applicants would handle the task situation if it were actually to occur on the job” (Motowidlo, Dunnette, & Carter, 1990, p. 640). Simulations vary in the amount of fidelity with which they present a task stimulus and elicit a response. Among astronauts, teamwork has consistently been mentioned as an important task and remains an area of interest for creating relevant, validated work samples. Such samples or “teamwork reaction exercises” have great public face validity and are highly valued by many past and current job incumbents in extreme environments (e.g., NASA’s Astronaut Office).
Selection tools with high face validity often improve selection decisions because they function as realistic job previews that help applicants make better informed decisions about whether to join the organization. In addition, when face validity is high, applicants are more likely to perceive the selection process as fair and less likely to initiate litigation claims. Relationships with other important outcomes, such as job satisfaction, organizational commitment, and tendency to recommend the employer to others have been identified when using selection tools with high face validity as well (Kelechi, 2012). Additionally, validity coefficients for work samples equal or exceed those of other predictors (Schmidt & Hunter, 1998) because they provide a realistic set of test conditions that generalize to those of the actual job (Lance Johnson, Douthitt, Bennett, & Harville, 2000). Work simulations also have been shown to exhibit less adverse impact than other selection tools (Cascio & Phillips, 1979; Hough, Oswald, & Ployhart, 2001; Motowidlo & Tippins, 1993).
The high validity, perceived fairness, and reduced adverse impact associated with work samples depend on the point-to-point correspondence between the simulation and job content (Asher & Sciarrino, 1974). However, high-fidelity work simulations are usually expensive and time-consuming to administer (Cascio & Phillips, 1979). Ground-based research suggests that low-fidelity simulations have validities comparable to those of high-fidelity simulations (Motowidlo et al., 1990; Motowidlo & Tippins, 1993). Thus, low-fidelity work (and teamwork) samples remain a very attractive option for building sound psychological selection systems for future long- duration spaceflight endeavors. NASA has already introduced some team exercises to provide additional data on candidate competencies that were not thoroughly evaluated in other parts of the selection process. In 2009, candidates were assigned to teams responsible for completing tasks designed to mimic many of the cognitive and social problem-solving challenges that behavioral health and performance experts expected crews to face during expeditionary missions to the moon or Mars (Sipes, Polk, Beven, & Shepanek, 2016). Psychologists observed candidates in real time and collected observations of their behavior. Review of the effectiveness of this selection method revealed that successful completion of the exercise was often characterized by an “aha” moment experienced by a single team member when he/she identified a method for successfully overcoming the technical task presented to the team. From that point, the exercise became primarily about successfully executing a plan to implement the solution identified by that team member and was less effective in providing a forum that fostered opportunities for all participants to demonstrate their team-related capabilities. In addition, the exercise had been chosen without extensive forethought regarding the competencies that were most important to evaluate in a real-time setting.
The team exercise was modified for the 2013 selection cycle to alleviate these limitations (Beven, Holland, Picano, Moomaw, Slack, & Vander Ark, 2018). First, critical team-related competencies that were not effectively evaluated in other parts of the selection system were identified. The design of the exercise was modified to create opportunities for all participants to demonstrate those skills as the team worked to execute its task and a peer review component was added to the end of the exercise. Further, a second source of peer feedback was added at the end of the selection week. After having spent multiple days working and living together in crew quarters, candidates provided feedback on one another regarding their experiences during the final stage of the selection process (e.g., can you live with this person?). Further, in 2017, the MARS team simulation and an individual, “day in the life of an astronaut” simulation was added (Beven et al., 2018). The latter simulation requires applicants to interface with “Mission Control” and to execute mission-related tasks and simulate the muti-team system conditions inherent in the role today.
The teamwork exercises and peer-feedback aspects of these work samples are innovative and promising, but remain difficult to validate because of our three recurring primary challenges. The samples sizes will likely remain small for both developing and validating such work samples. The specific teamwork contexts and performance requirements will continue to evolve as new vehicles, equipment, and systems are created to enable longer-duration space exploration by humans.
Other space agencies have looked to real-time exercises as a context for evaluating team skills as well. For example, ESA astronaut candidates complete group and didactic team exercises that provide opportunities to display their competency in interacting with others and personality traits suitable for future astronauts during the final week of selection at the European Astronaut Centre (Maschke et al., 2011). JAXA has observed candidates as they complete tasks in a group setting since the early 90s (Endo et ah, 1994). More recently, the agency added a one-week stay in an isolation chamber to its selection process. Here, candidates are continuously observed as they perform group and personal tasks, providing a rich environment in which to assess their leadership, teamwork skills, and productivity. Although this selection activity has not been formally validated, JAXA’s ВНР experts have provided considerable positive feedback on its effectiveness. It provides critical information that confirms (or corrects) information gathered in earlier phases of the selection process regarding candidate’s skill in critical intra-personal and interpersonal competencies (Inoue & Tachibana, 2013).