Participants were expected to mostly rely on the cluster-display, audio cues, and vocalisations due to familiarity (Gough, Green and Billinghurst 2006) with current car interfaces, which widely use the former two systems (Akamatsu, Green and Bengler 2013) and due to research showing higher usability of speech-based vocalisations (Forster, Naujoks and Neukum 2017).
The non-normally distributed results for reaction time, driving performance, and subjective workload (Eriksson and Stanton 2017) demonstrate considerable differences among drivers for the takeover. Therefore, a large amount of variability in chosen customisation settings in order to increase or decrease salience was expected between participants.
A significant effect of OOTL time on customisation settings was expected due to research showing increased drowsiness for longer OOTL times (Bourrelly et al. 2019). This was believed to influence drivers’ interface requirements resulting in higher intensity characteristics for longer OOTL times.
The OOTL times, short (lmin) and long (10 min), were expected to significantly affect the takeover time needed. This also tests whether the findings in Bourrelly et al. (2019) are observable for short and long OOTL times.
The data analysis was performed using ‘IBM SPSS© Statistics 25’, and the significance threshold was set at 0.05. The statistical methods to be applied were chosen based on Field (2017).
To measure to what extent participants explored specific settings, the number of changes of each setting over all participants and over all trials was counted. In addition, the number of changes for each setting over all participants between the first and the last trial was counted. The second number was subtracted from the first to get the measurement of exploration. One change constitutes the adjustment from ‘On’ to ‘Off’, or vice versa.
The effect of OOTL time sequence on the binary settings was analysed using between-subjects, Pearson chi-square because of their categorical character. Since the Pearson chi-square test needs to be performed for each binary setting independently, only statistical results for significant observations are reported.
The nonparametric Wilcoxon signed-rank test is applied to detect significant effects of OOTL time on the binary settings since OOTL time is a within-subject variable. Again, only statistically significant results are reported due to the number of tests performed.
Ordinal Settings and Takeover Time
The Kolmogorov-Smirnov (K-S) test was used to test the ordinal settings and takeover time for normal distribution. No test was performed for the ordinary setting ‘Vibrate’. Because of its small number of levels, a normal distribution is not expected.
Because of the interval character of the ordinal settings and the expected non-normal distribution of takeover time (Eriksson and Stanton 2017), the effect of OOTL time sequence and age on these was tested using the Kruskal-Wallis test. When significance was detected, potential linear trends were assessed using a Jonckheere-Terpstra test.
For the effect of OOTL time on ordinal settings and takeover time, the Wilcoxon signed-rank test was applied. For the same reasons as before, only statistically significant results are reported.
Hierarchical agglomerative cluster analysis (Everitt et al. 2011; Langdon et al. 2003) was used to group participants and interfaces based on similarities in their final customisation profiles. The decision to apply cluster analysis is based on its wide application to divide human participants into homogenous groups as used in market research and psychiatry (Everitt et al. 2011). The measured similarity can be interpreted as the psychological distance participants perceived between option settings (Langdon et al. 2003; Nosofsky 1985).
The squared Euclidean distance was used as the similarity measure due to its known suitability and wide acceptance to assess differences in human participants
(Clatworthy et al. 2005; Nosofsky 1985). All values were standardised by variable in the range 0-1 to account for different measurement scales among the settings. After exploring different linkage methods, namely single, complete, and Ward’s, it was decided to apply Ward’s linkage as the underlying intergroup proximity measure since it revealed clearly observable clusters. In order to decide on the cut-off distance, a scree-like plot indicating the distance on the у-axis against the number of groups on the x-axis was applied.
We performed a Friedman test to evaluate participants’ interface reliance rankings. Dunn-Bonferroni post hoc tests were used to determine which interfaces show different reliance rankings.
Participants consisted of 32 female and 33 male UK drivers who were recruited by an external agency and aged between 21 and 75 (M=44.1, SD=16.2). Adhering to the principles of inclusive design (Langdon and Thimbleby 2010), participants were recruited and divided into three different age bands: (1) 18-34years (TV=21, 11 females), (2) 35-56years (N=21,10 females), and (3) 57-82years (N= 19,9 females). The definition of these age bands was decided on a principal basis using statistical analysis (Langdon and Stanton, Personal communication, 2017). An equal distribution of gender was pursued within each age group. The participants had driving experience of 2 up to 57 years (M=24.21, SD = 16.22) and drove between 4,300 and 36,000 miles per year (M = 11,793, SD = 5,512).
The study complied with the American Psychological Association Code of Ethics and was approved by the Department of Engineering Research Ethics Committee at the University of Cambridge. Informed consent was obtained from each participant.