Experiment 1

A repeated-measures experimental design was conducted to evaluate the four TOR interfaces developed. The sequence of the interfaces was counterbalanced over the participants. For each interface, one trial with three TORs was performed. However, in an autonomous mode, participants performed a secondary task, which kept them out of the loop. There were 49 participants (24 females and 25 males) in the age range of 17 to 86years (M=45.51, SD= 17.36) recruited from the general public. Participants represented a wide range of ages and were gender-balanced: (1) age band 17-34 (12 participants, 5 females), (2) age band 35-56 (24 participants, 13 females), and (3) age band 57-86 (13 participants, 6 females). Participants had an average

The simulator set-up showing the tablet in use and the cluster simulation

FIGURE 5.7 The simulator set-up showing the tablet in use and the cluster simulation.

driving experience of 25 years with a maximum of 70years and a minimum of less than a year.

A low-fidelity driving simulator, pictured in Figure 5.7, presented a ten-mile-long route on rural roads and the motorway with different curve radii using the ‘STISIM Drive®’ (STISIM 2017). Displays were used to depict the road scene, the dashboard, and the instrument cluster, which showed driving and car-related information, such as state of automation, current speed, and surrounding traffic. A secondary task was performed on a 10-inch tablet. On both sides of the driving seat, an LED strip was mounted on the wall to indicate traffic movement and position. A haptic device (Karam, Russo and Fels 2009) was implemented in the driving seat to deliver vibro- tactile stimuli based on the audio information presented. After a short welcoming, participants gave consent and were briefed with safety procedures and familiarised with the experiment. A memory game served as a secondary task (Politis et al. 2017).


A short acclimatisation drive was completed, and trials started under participants’ control, then switching to hand over control to the vehicle. In an autonomous mode, participants interacted with the secondary task while a TOR was issued, timed at points approximately 10%, 40%, and 70% into the route. The automation was toggled by pressing two steering wheel buttons. A Wizard of Oz protocol was implemented with vocal utterances from the vehicle and pseudo-responses by the experimenter using a dialogue generator. This required the participant to interact in dialogue with the interface as the participant deemed appropriate, but instructions did not require the completion of the protocol. After completion of the condition, participants took control of the car using the wheel button and drove manually for 1 min before handing back using the button control. At the end of each trial, participants completed a range of standardised questionnaires: NASA-Task Load Index (NASA-TLX), acceptance scale, system usability scale (SUS), and situation awareness rating technique. After completing all four trials, participants were asked to rank the experienced concepts from their most to their least preferred.


The experimental design was a one-factor repeated measures, and the qualitative and quantitative measures reported were as follows:

  • Takeover time (TOT): Total duration in seconds for completing the dialogue protocol to the point the participant pressed the takeover button on the steering wheel.
  • Driving performance: The absolute angle input measuring the lateral driving stability. After each takeover, the data was logged for 60s.
  • Workload: NASA-TLX (Hart and Staveland 1988).
  • Acceptance: System acceptance scale (Van Der Laan, Heino and De Waard 1997).
  • Usability: System usability scale (SUS) (Brooke 1996).
  • Post-experiment ranking: Interfaces were ranked by the participants from 1, most preferred, to 4, least preferred, at the end of the trials.

The average TOTs in seconds for the CB protocol were significantly lower than the remaining three protocols that were essentially undifferentiated (see Figure 5.8): ‘CB countdown’ (M = 19.79, SD = 9.3), ‘RepB readback’ (M = 48.61, SD = 4.67), 'ResB

A box-and-whisker plot showing medians, and interquartile ranges for time to takeover

FIGURE 5.8 A box-and-whisker plot showing medians, and interquartile ranges for time to takeover.

answering questions’ (M = A1.19, SD = 10.04), and ‘MB multimodality’ (M = 49.65, SD = 6.39).

Using one-way repeated-measures analysis of variance (ANOVA) determined that theinterfacetypesignificantlyaffectedTOT(F(2.563,l 12.785)= 128.053, /><0.001), Greenhouse-Geisser (e = 0.854). Post hoc comparison indicated that TOT was significantly lower for ‘countdown’ than for all other interfaces (p < 0.001).

The system acceptance scale consisted of two subscores: usefulness and satisfaction. The ‘countdown’ interface gave the highest usefulness as well as satisfaction ratings from the participants. However, only the satisfaction subscale reached statistical significance (see Figure 5.9): ‘CB countdown’ (M = 1.05, SD = 0.76), ‘RepB readback’ (M = 0.14, SD = 0.76), ‘ResB answering questions’ (M = 0.37, SD = 1.01), and ‘MB multimodality’ (M = 0.2, SD = 1.08).

The type of interface showed a significant effect on the satisfaction subscale (F(3,141)= 11.979, p< 0.001) of the acceptance scale. The concept ‘CB countdown’ obtained a higher satisfaction score than all other concepts (p< 0.001). The ‘countdown’ concept was also the highest rated for the SUS, with the other concepts again performing on similar levels (F(3,138) = 6.826, p<0.001). ‘CB countdown’ gained a higher perceived usability score than ‘RepB readback’ (/> = 0.005), ‘ResB answering questions’ (p< 0.001), and ‘MB multimodality’ (p< 0.001). Workload TLX

There was a significant effect of interface on perceived workload (F(3,141) = 6.826, p< 0.001). Pairwise comparisons revealed that ‘answering questions’ was perceived to induce higher workload than the ‘countdown’ (p = 0.008) and ‘readback’ (/> = 0.003).

A box-and-whisker plot showing medians, and interquartile ranges for the system acceptance scale, satisfaction subscale

FIGURE 5.9 A box-and-whisker plot showing medians, and interquartile ranges for the system acceptance scale, satisfaction subscale. Post-Experiment Rankings

For the post-experiment interview rankings, a Wilcoxon test with Bonferroni correction showed that ‘CB countdown’ was significantly higher ranked than ‘ResB answering questions’ (T = -0.896, p = 0.004), ‘RepB readback' (T = -l.521, p < 0.001), and ‘MB multimodality’ (T = -1.583, p< 0.001). There was no significant difference in average completion time of the secondary task between the TOR interfaces. Driving Performance

Driving performance was taken from the values and standard deviations for the absolute steering angle input calculated based on the logged data of 60s after the takeover: ‘CB countdown’ (M = 0.36, SD = 0.34), ‘RepB readback' (M = 1.12, SD = 1.22), ‘ResB answering questions’ (M = 1.45, SI) = 2.53), and ‘MB multimodality’ (M = 121, SD = 1.19). The driving performance was not significantly affected by the interface concept.

Discussion: Experiment 1

To summarise the main findings:

  • • There was an overall preference for the countdown concept. Time to takeover was significantly lower as were disturbances in driving post-takeover. The system acceptance scale scores were significantly higher, especially for satisfaction, and system usability was seen as significantly higher.
  • • Concepts that involved longer and more scripted dialogues with the vehicle were less preferred.
  • • The workload for the ResB ‘answering questions’ condition was significantly higher than the others, and more errors were made in dialogues.

It was apparent from the data and the post-experiment interviews that the dialogue- based protocols were largely seen as repetitive and tedious. It should be noted that the participants were instructed to complete the protocols resulting in more protracted interactions during the non-countdown interactions. No shortcuts to regaining control were offered although it was notable that some participants took control early anyway. The ‘countdown’ condition was shortest, most preferred, and was used by participants to take control when they wanted. Although ‘answering questions’ increased workload, it did not improve SA, from data based on participants’ reports. This was contrary to expectations and may be attributed to the fact that perceived SA is a subjective measure that is influenced by participants’ impressions. A more objective measure would be their response to a simulated critical road event. This may reflect more accurately the degree of in-depth SA attained.

The most dialogue mistakes were made for ‘answering questions’, which was expected since this interface required thinking and not purely repetition. Participants mentioned in the interviews that they would prefer ‘answering questions’ over ‘readback’ since it was less laborious. However, they had a clear preference for a shortcut to independently decide the moment of takeover, irrespective of the protocol.

Interviews revealed that participants perceived ‘countdown’ as the least supportive to their SA but appreciated the short duration and sense of control it elicited. The ‘RepB readback’ and ‘MB multimodality’ conditions were based on the same principle and were ranked the lowest. While such an interface may prove beneficial in aviation, it was not appreciated by the participants in this context. A possibility is that one of the main functions of readbacks in aviation is to ensure accurate comprehension of utterances made in radio telegraphy between the pilot and the air traffic control (Civil Aviation Authority 1999). Any SA improvements were not apparently directly perceived by drivers. The benefits of information perceived in the MB ambient, multimodal interface were not apparent in the objective and subjective results. The design of these cues may have been too subtle as some participants mentioned that they did not notice them.


The pattern of results suggest that the predominant effect was elicited by the repetitive and tedious nature of the dialogue protocols, especially when repeated multiple times. This apparently led to a preference for the clean interaction of the countdown, which required little cognition and allowed the drivers a stronger perception of control. Interviews suggested that some drivers were clearly prepared to trade SA against speed of takeover in the simplified simulation task to achieve a satisfactory result.

< Prev   CONTENTS   Source   Next >