A total of 49 participants took part, 24 of whom were female, and ages ranged from 17 to 86years (mean = 45.51, SD= 17.36). All participants held a full driving licence for between 1 and 70years (mean = 25.68, SD= 17.58). An approximate gender split between age ranges was achieved: 7 males, 5 females, aged 17-34; 11 males, 13 females, aged 35-56; 7 males, 6 females, aged 58-86. This was in line with the principles of inclusive design (Keates and Clarkson 2002). Ethical approval was granted by the Ethics Review Committee, University of Cambridge, Department of Engineering.

Experimental Design

The experiment employed a repeated-measures design with four VtD control transfer conditions, representing the single independent variable. Conditions were counterbalanced using the Latin square design. All conditions featured an initial stage whereby the system would verbally ask if the driver were ready to resume control; following a verbal confirmation, the protocol would start. The first condition ‘Timer’ was based on a simple timer that appeared on the dashboard display when the automation detected a need to hand control back to the driver. On confirmation that the driver was ready, it counted down in 10-s intervals from 60s, with the driver being required to take control by pressing a button on the steering wheel before the countdown reached zero. This was based on an existing design currently undergoing testing by Volvo Cars (2015), with an auditory rather than visual countdown.

The second condition ‘HazLan’ uses a ‘readback’ principle to raise situation awareness; the system would vocalise five elements of situation awareness: potential hazards, current lane, current speed, the next required exit, and the next required action. Following each of the system’s vocalisations, the participant was required to repeat them back. Incorrect or missing readbacks resulted in the system repeating the original vocalisation. Once all of the readbacks were complete, the participant was able to resume manual control.

The third condition ‘VAA’ (voice autopilot assistant) was response-based using the same element sequence as the second condition; however, the system provided the participants with a question regarding each element. If the participant answered the question correctly, the next question was presented. Upon completion of the sequence, the participant was able to resume manual control.

The fourth condition was an augmented version of the HazLan condition. In addition to the HazLan situation awareness aspect sequence, it incorporated multiple modalities in the form of audio-driven seat-based haptics (whereby audio signals, including vocalisations, were transmitted to the driver via a pad on the driver’s seat) and two LED strips mounted on each side of the driving position. This presented a constant information on the longitudinal positions of cars in neighbouring lanes, thus providing a dynamic blind spot warning system.

Data on multiple aspects was collected. However, for the validation experiment presented in this paper, the dependent variable was driver behaviour, in terms of the processes carried out by participants during VtD control transitions. This driver behaviour data was collected using four webcams, generating footage from multiple angles within the vehicle. The processes consisted of actions, inactions, and vocalisations. Actions included those expected from the driver as predicted by the associated OESD, as well as unexpected actions, such as placing a finger on a button early or making an exaggerated glance. Inactions consisted of the failure to carry out a process predicted by the associated OESD. Vocalisations included those expected by the system as specified in the associated OESD.


Experiments were carried out using a lower-fidelity driving simulator consisting of a gaming seat, a Logitech G25 steering wheel, and pedal set, and three screens to provide a wide field of view (as shown in Figure 8.2). An additional tablet was employed

Study 1 (lower-fidelity) experimental configuration (Politis et al. 2018)

FIGURE 8.2 Study 1 (lower-fidelity) experimental configuration (Politis et al. 2018).

to act as a pseudo-dashboard, to illustrate speed, lane positioning relative to other cars, fuel level, automation mode, ideal lane, and the next required exit. The driving scenario featured a route approximately 10 miles long, consisting of a combination of highways with gentle bends and urban roads without corners. A Java-based memory game application was installed on a tablet-based PC to provide the participant with a secondary task when automation was enabled. Participant behaviour was recorded using a camera with a wide-angle lens. Two Arduino-based LED strips were fitted to the wall each side of the driving position, and C code was written to enable them to perform as blind spot indicators. A tactile acoustic device was placed on the driving seat to provide sound-based haptic information.


After welcoming, the participants were briefed on the experiment and presented with a demographics questionnaire. The simulator was then presented to the participants, and they embarked on a short introductory test drive. No other information was provided to the participants, other than a brief overview of the vocalisation system, in order to avoid training effects. The driving scenarios were then run, using a counterbalanced design to mitigate order effects. During each automation phase, participants were requested to play the tablet-based memory game; a total of three VtD transitions were performed per scenario, at approximately 10%, 40%, and 70% progress through the route. VtD transition dialogues used synthetic vocalisations, and the participants were expected to respond vocally before switching to manual control using a button on the steering wheel. A Wizard of Oz-based system was employed to manage the synthetic vocalisations in response to the participant. At the end of the study, participants were provided with remuneration in the form of a £20 web voucher for their time.


Signal detection theory’s (SDT) primary use is to discern between ‘signals’ and noise (Abdi 2007). Four stimulus-response events exist: hits, misses, false alarms, and correct rejections (Nevin 1969). In the context of this experiment, it provided a method by which to compare participant behaviour observed during experiments, with predicted driver behaviour illustrated on OESDs (see Figure 8.3).

For the lower-fidelity experiments, analysis was limited to the second control transition for each condition. Three conditions were analysed: Timer, HazLan, and VAA.

Labelled template forms consisting of SDT matrices were created to allow the paper-based recording of the analysis. Footage was displayed on a large LCD screen, and printed sequential lists of VtD driver processes for each condition, drawn from the OESDs, were used for reference. The footage of the selected VtD transfers was viewed, and participants’ behaviours were noted on the SDT matrices using the OESD-derived driver processes as a template of expected behaviour. To aid the analysis, the driver processes were split into three phases, namely A, B, and C, representing the participant preparing to take control, proceeding through the protocol, and taking control back from the automation, respectively. The ‘Timer’ condition was particularly short and therefore had no requirement for phase B. Figure 8.1 shows the six driver-based processes of phase A in the driver column.

Signal detection theory (SDT) matrix

FIGURE 8.3 Signal detection theory (SDT) matrix.

A ‘perfect’ SDT score was attained when the participant only carried out every predicted process as part of the VtD control transfer; in this case, the SDT matrix would have an equal number of ‘hits’ to the number of predicted driver processes. For each predicted process that a participant failed to carry out, a ‘false alarm’ was recorded. In the event that a driver exhibited a behaviour in addition to that which was predicted, a ‘miss’ was recorded. Correct rejections were calculated at the end of the SDT analysis by subtracting the number of false alarms from the total pool of all false alarms for all participants.

The results from all the SDT matrices were collated in a spreadsheet, allowing Phi to be calculated. The Matthews correlation coefficient (Phi) was applied to the data generated by the SDT analysis; this quantified the correlation between the expected and observed behaviour, as a means to validate the OESDs. The Matthews correlation coefficient formula is given by the follow'ing equation:

Inter-Rater Reliability Method

Inter-rater reliability testing was carried out due to the subjective nature of analysing, interpreting, and categorising driver behaviour. An analyst was provided with approximately 10% of the video footage files, together w'ith associated SDT analysis forms, a list of exceptions, and a list of driver processes split across the three phases. The analyst watched the footage and compared the driver behaviour to that which was expected as specified in the list of driver processes. SDT results were recorded on the SDT analysis forms, together with any exceptions that occurred. This was identical to the method used by the original analysts. Equal-weighted Cohen’s kappa values were calculated and are reported in Sections 8.2.2 and 8.3.2.


Equal-weighted Cohen’s kappa values were calculated, resulting in a value of 0.773 for the lower-fidelity simulator. This represents a moderate agreement between the analysis in their classification of hits, misses, false alarms, and correct rejections (Landis and Koch 1977).

As shown in Figure 8.4, all three experimental conditions exhibited a relatively high number of hits per condition, and all shared identical interquartile ranges of 1. The Timer condition did not require verbal interaction with the driver, resulting in fewer possible hits than the HazLan and VAA conditions. All experimental conditions contained outliers.

In terms of misses by condition, both the HazLan and VAA conditions had identical median values (3) and interquartile ranges (4). The shorter interaction steps in the Timer condition may have contributed to the lower median value of 1 and the interquartile range of 3.

Lower-fidelity experiment hits, misses, false alarms, and correct rejections, by condition

FIGURE 8.4 Lower-fidelity experiment hits, misses, false alarms, and correct rejections, by condition.

Study 1, lower-fidelity experiment Phi values, by condition

FIGURE 8.5 Study 1, lower-fidelity experiment Phi values, by condition.

In terms of false alarms, all conditions showed equal interquartile ranges of 1 and shared identical outlier values. Median values for the HazLan and VAA conditions were also equal at 1. The VAA condition exhibited a slightly higher number of correct rejections (16) than the HazLan (12) and Timer (13) conditions.

As shown in Figure 8.5, all median values for Matthews correlation coefficient (Phi) by condition were greater than 0.8 (the minimum acceptable criterion), indicating a strong positive relationship. The Timer condition scored particularly high, with a median value around 0.9. The Timer condition also exhibited a large interquartile range, varying from slightly above 0.7-1.0. The HazLan condition interquartile range was slightly lower, between around 0.75 and 1.0, whereas the smallest interquartile range was found in the VAA condition, between around 0.8 and 1.0.

< Prev   CONTENTS   Source   Next >