Success and Failure in Detection and Recovery

In most systems errors are relatively frequent but few impact on safety because of the capacity of humans and organisations to recover from errors. In aviation, for example, numerous studies show that professional pilots make at least one clear error per hour, whatever the circumstances and the quality of the workplace design (Helmreich 2000; Amalberti 2001). The great majority of errors made are rapidly detected by the person who made them, with routine errors being better detected than mistakes. Experts of course make fewer errors overall than novices but the best marker of high level expertise is the detection of error rather than its production. Success in detection of errors is the true marker of expertise, while error production is not. Detection and recovery are sensitive to high workload, task interruptions, and system time management (Amalberti et al. 2011; Degos et al. 2009).

What are the implications for safety and for the analysis of incidents? We commonly assume that the best way to make a system safer is to reduce the number of errors and failures. This is, in many cases, entirely reasonable. Automation for instance, or reminder systems, can have a massive impact on minor errors. A more organized handover process might enhance the transfer of essential information. However eliminating all errors, which would mean considerably restricting human behaviour, is not possible and arguably not desirable.

We need in practice to distinguish errors that have immediate consequences for the patient and those which can be considered as minor deviations in the work process which can be noticed and corrected. The first class of errors do indeed need formal, rigorous rules to protect the patient, such as clear protocols for the management of electrolytes or multiple and redundant patient identification checks. For the many millions of other minor errors it is more efficient and effective to rely on detection and recovery by means of self-awareness and good coordination and communication within the team. These findings also suggest that reliable human-system interaction will be best achieved by designing interfaces that minimize the potential for control interference and support recovery from errors. In other words, the focus should be on control of the effect of errors rather than on the elimination of error per se (Rasmussen and Vicente 1989).

The standard approach for incident analysis in healthcare has primarily focused on identifying the causes and contributory factors of the event, with the idea that this will allow us to intervene to remove these problems and improve safety. These strategies make perfect sense in any system which is either highly standardised or at least reasonably well controlled, since there it is clearly possible to implement changes that address these vulnerabilities. The recommendations from many analyses of healthcare incidents are essentially recommendations to improve reliability (such as more training or more procedures) or to address the wider contributory factors such as poor communication or inadequate working conditions. In all cases we attempt, quite reasonably, to make the system more reliable and hence safer.

We could however expand the scope of the inquiry and the analysis. There is much to learn from the ability of the system to detect and recover from failures and close calls (Wu 2011). For example, in addition to identifying failures and contributory factors we could instead ask 'what failures of recovery occurred in the care of this patient?' and 'how we can we improve detection and recovery in settings such as these?' This would have implications both for our understanding of events and, more importantly, for the recommendations which follow such analyses which might expand to include a much stronger focus on developing detection and recovery strategies.

< Prev   CONTENTS   Next >