One of the most common, and most vexing, examples of technology trying to emulate humans is the disembodied voice . As voice-based interfaces become more common, it’s worth addressing the problems with voice interaction specifically, which is why it gets its own Calm Design principle . [1]

| 33

A couple of years ago, I had a conversation with one of the cofounders of Siri . We talked about how Siri was “trained” on a Californian English vocal pattern and accent, because that’s where Siri was designed . Within minutes, Siri’s cofounder started showing me videos in which Siri completely fails to understand users, despite them lacking any trace of an accent that differs from Siri’s own . This kind of experience can be uniquely frustrating for users, because it forces them to modify their own behavior for the benefit of a machine, but the machine is only demanding such contortions because it’s trying to “communicate like a human . ” Making a computer speak like a human without instilling it with a sense of human context or relationships ultimately leads to a sense of dissonance in the person using it—exactly what affective design is seeking to remedy

Siri is still considered a failure by some groups because it was advertised as having far more accurate capabilities than it turned out to have . We’re accustomed to hearing computer voices in movies, but movies have post-production . They are polished and cut to look perfect . Real life doesn’t work that way. Many of us have been led to believe we can have computers as accurate as the ship’s computer in Star Trek, but that voice is a carefully scripted plot device intended to help bring the computer to life in a way that looks (and performs) better than text on a screen . The computer voice lets the device be a part of the action instead of a simple terminal . When we watch these films, we get used to the idea of talking to a computer, even though the idea of human- computer communication relies much more on context than we realize .

One of the biggest draws for robotic voice systems is the idea that they could anticipate our emotional and physical needs without judgment . They could be our faithful servants and provide unwavering emotional support . And while there are some examples of Japanese virtual boyfriend and girlfriend systems, as well as the virtual chatbot psychologists of early days, the best way to train an AI systems is by connecting it to human context . The Google Search engine does this well . Google uses bots to index content created by people and provides suggested search results . In the end, it is the human that decides what site is more relevant to them . Google just does a bit of the heavy lifting .

Voice interfaces rarely work, for the same reason that visual interfaces at the center of our field of vision rarely work (Principle II): they both require the majority of our limited attention . As we discussed in

Principle II, the path to calmer interaction is primarily about presenting information in parallel, and matching the information density of the interaction with the channel through which it communicates .

A user interface requiring all of our visual focus distracts us from doing anything else. An interface that requires our complete auditory focus (or perfect enunciation) is equally distracting . In the absence of audible interfaces, consider making use of tones or lights or sensory stimulation to get the point across

Voice recognition works best in a quiet environment, but most environments are not quiet I once watched a woman get frustrated while attempting to use a voice-recognition system at a kiosk in an airport Halfway through the automated menu, the noise of her kids, combined with the background noise of the airport paging system, threw her back to the beginning of the menu, interrupting her progress and forcing her to do it all over again An auditory-based machine on a busy street faces similar problems

Or take the example of a parking ticket machine that uses a prerecorded human voice to communicate . First, it speaks in a slow, disjointed voice that’s more disorienting than helpful . Second, it takes your card but offers no feedback of whether the transaction went through . Third, if nothing happens, there is no call button to ask for help from a person, and no person nearby. People get stuck in parking lots this way—a deeply disconcerting experience that puts the user, not the machine, on pause (the next principle addresses this issue directly)

Human voice, then, should be used only when absolutely necessary Introducing a voice creates a variety of new issues: the need for stringing together words in voice “concatenation,” the near certainty of misinterpretation, or the accent issue that Siri so clearly demonstrates . Human voices must also be translated into multiple languages for accessibility purposes, while a simple positive or negative tone, symbol, or light can be designed in a way that’s universally understood.

Instead, create ambient awareness through different senses . Use a status tone instead of a spoken voice. Use a buzz instead of a voice-based alert. Use a status light instead of a display. If done well, a simple light or tone can clearly convey information without forcing users to pay all of their attention .

One example might be the light status indicator you find on a convection stove after the stove is turned off, but the surface is still hot . This indicator is not necessary on gas stoves, as the burners quickly cool down when the gas is turned off Another example is the recording indicator light on a standard video camera


Examples of various light status indicators.

Many of us remember the brief period in the 1980s when several car makers, led by BMW, started putting voice alerts in their cars to convey extremely simple messages . Suddenly, showrooms around the world were full of luxury cars saying, “Your door is ajar! Your door is ajar!” every time you left the door open . Consumer response was swift, strong, and decidedly negative—nobody wants to be lectured by their car, and a speaking voice is overkill for such a basic piece of information . BMW switched to a gentle, wordless tone the following year, other makers followed suit, and now the talking car door is just a footnote, and the subject of a handful of '80s-era comedy sketches .

Getting the alert tone right takes careful consideration . There's no question that our day-to-day technological lives are currently filled with too much beeping and buzzing, but this is largely a result of how those alerts are designed . It's still rare to find an audible alert that uses a calming tone Most are sharp and distracting—a result, most often, of the hubris of designers and engineers who believe that their technology’s alert is the most important thing in the user’s environment, so it must be unmistakable . But getting a loud buzz for every email, status update, and news item quickly makes every buzz meaningless . Strive to match the urgency of the tone with the urgency of the alert, and recognize that many pieces of information aren’t time-sensitive enough to need one .

The Roomba robotic vacuum cleaner, for example, emits a happy tone when cleaning is complete, and a sad tone when it gets stuck The tone is unambiguous but unintrusive, and needs no translation.

Additionally, a light display on the Roomba shows green when clean and orange when dirty or stuck

Where does voice interaction make sense, then? Under certain controlled circumstances, where it’s reliably quiet, where the task is simple, or where a tone doesn’t convey enough information Also, where there are clear benefits to not having to look at or touch your device . Turn-by-turn interactive directions while driving are the most common and most successful example of this kind of interaction .

A car is a closed, quiet, and controlled environment . Driving directions follow a very consistent format, but differ in content every time More important, driving is an activity that demands complete visual attention, making a strong safety case for voice interaction Audible driving directions aren’t chatty or quirky; they provide a secondary focus that can help someone reliably get to a destination without being distracted from the road A vehicle is a realm in which the user’s humanness isn’t really at stake in the same way, so ignoring social and emotional cues is perfectly acceptable . It seems to be a principle of parsimony, or “minimal technology.”

Smartphones that use voice interaction successfully also take advantage of pre-inputted information—for example, learning what address constitutes “home,” then letting the user simply say “give me directions home” and resolving it to the address

We’ve already talked extensively about using the periphery to convey information in parallel . But if you have three types of peripheral notification—visual, haptic (“related to touch or proprioception”) and audible—when do you know when to use which?

The answer is context.

Where is the tech going to be used? Is it a loud environment? A quiet one? A messy one? A dark or light one? If it’s in bright sun, the user might not be able to see an indicator light . If it’s a personal notification, a haptic notification might be the most appropriate . Haptic notifications can consist of anything that involves a sense of touch, including texture, braille, vibration, electricity, or temperature . A haptic alert can be very useful for personal notifications, as touch has the greatest proximity of any alert . A haptic alert can be configured to allow just one person to feel it, especially if it’s a personal device worn on the user’s body. A calm alert is almost always better than a sharp one: a haptic buzz doesn’t have to be intense, a light doesn’t have to be blinding, and a tone doesn’t have to be obnoxiously loud .

Sometimes having two notifications is useful, as it increases the likelihood of the user noticing it without demanding their full attention .

As a primary notification method, simulated human voice has many downsides . Sometimes, it is justified and necessary, but often it can be replaced by something simpler, calmer, and more appropriate . Think very hard about why you might need to put a human voice into a product, and first consider if any of the other alerts described in Chapter 3 might be more appropriate . If there’s a better way to do it, don’t hesitate to change how the product communicates with the user! [2]

There’s a strong temptation for designers and developers to shrug these cases off, either because they’re too rare to address, or the result of a user that’s “too dumb,” but the fact is that everyone is an edge case at one time or another. Sometimes it’s because they’re just learning the technology, and don’t realize exactly what’s expected of them . Sometimes they have an unusual need and they’re trying to stretch the tech’s capabilities . Sometimes—as in the example discussed in “The False Alarm That Wouldn’t Stop”—it’s not their fault at all, and they’re just trying to deal with a rare but unpleasant exception

The problem with edge cases is that their impact far outweighs their frequency. Hue is a lighting system developed by Philips that gives users unprecedented control over the lighting in their homes, through a well-designed app that governs a system of adjustable-color LEDs . When it works—which is the vast majority of the time—it’s quite magical, and the setup and installation are surprisingly straightforward . But in 2014, if you asked Hue owners about their systems, most of them would probably tell you about the time that it crashed, leaving them with a house full of lights at full brightness, and no obvious way to turn them off

A bug in the most recent automatic firmware update was the culprit, and once alerted, Philips did a decent job of rushing the patch in...but try telling that to a family trying to go to sleep in a fully lit bedroom

In fact, there was a temporary fix: you could simply turn the lights off at the wall switch . But many users, accustomed to controlling their lights via the app, didn’t realize this . Philips eventually announced the fact via the Hue Twitter account, and apologized profusely, but the damage had been done . Thousands of users were left with the uneasy feeling that the lights in their house could crash, and Philips has struggled to rebuild the faith ever since

What the designers should have done was anticipated this edge case, and built language into their marketing materials When a user is untrained for what to do during a technology failure, confidence in a product can shatter

We, as humans, hate bumping up against these kinds of gaps in the user experience because they lay bare the difference between people and machines . People have a built-in capacity for flexibility and empathy, and machines don’t, so “crashing” is about the least human thing a piece of technology can do .

When you design, put yourself in the shoes of your users—not just the competent, experienced users doing the thing you want them to do, but

the users just figuring it out, pushing the edges, or dealing with a bug . A simple “off” switch can work wonders . So can a fallback mode that offers less functionality but easier access to the basics .

In general, though, the key to dealing with edge cases is providing redundancy. Make sure your system can still work when a portion of it fails, and give users a choice of options for getting crucial tasks done . Designing and building multiple parallel action paths may not feel like the most efficient solution, but then, neither is training jet pilots to fly a glider.

The False Alarm That Wouldn’t Stop

A few months ago, I saw an update from Facebook friend and author William Hertling on a false alarm from his Nest smoke detector (Figure 2-8). The device gave him a lot of trouble before he was able to turn it off.


The Internet-connected Nest smoke detector and the mobile Nest control application.

Hertling reported "three piercing loud beeps, and then a voice saying 'Smoke detected in the entranceway. Smoke detected in the entranceway.' There were five Nest smoke detectors in the house, all of them doing the same thing, all slightly out of sync with each other, so you hear a weird echoing of the beeps and the announcement about 2-4 seconds apart.”

Was there any way to stop the alarm, or disassemble it?

"At first I could silence it,” Hertling said, "but then it started again, and said ‘This alarm can not be silenced.’ If it had been a cheaper smoke detector, I might have smashed it on the sidewalk outside. But, of course, it's stupendously expensive, so I wasn't about to do that. To try to find the right size screwdriver and disassemble this thing calmly, without any coffee, and with alarms blaring all over the place, kids evacuating the house and trying to find things to put cats in, it was not an easy task.” That's not even the only trouble Nest systems have run into. Wired reported in April 2014: "After a nearly blemish-free record that culminated in a $3 billion acquisition by Google, Nest today issued a surprising halt to sales of Protect, its gesture-controlled smoke alarm. One of the device's key features was that you could wave at it to turn it off. Turns out, other movements might also mute the alarm inadvertently.* Thus, as CEO Tony Fadell put it, 'This could delay the alarm going off if there was a real fire.' Oops.”

Why not have a simple button that stops the alarm? A button is simple to press, and easy for a computer to understand. There's no ambiguity in intent. The product was designed well in general, but missed a core interactive feature: "How do you turn the alarm off?” Sometimes, companies with exceptional design practices can miss the most basic interactions. Testing your device in many different environments is a way to help prevent this. [3]

existing workflow, we begin to ignore it, or take it for granted . This may sound defeatist, but the alternative is far worse: using poorly designed tech is like solving a story problem every time we use it . It requires us, the users, to adapt to clunky technology and find the features we actually need to use

But while good technology is often simple, a good design process almost never is . Good designers aren’t afraid of working through all the tiny details and all the edge cases they can conceive, removing unnecessary features until there is nothing left to take away. They design with the fewest number of components, because more features make for more failures, and complex systems leave more room for security faults

For hardware products, good design means fewer things that can break,

easier assembly, and fewer failed units . It means less physical tech support, shorter onboarding, and a more beloved product . Each new feature must be developed, tested, explained to the market, and then supported. It must also be updated when underlying systems change . Add a feature only when absolutely necessary.

  • [1] principles of calm technology
  • [2] TECHNOLOGY SHOULD WORK EVEN WHEN IT FAILS An airplane whose engines have failed defaults to a glider. On the lesscatastrophic side of things, escalators are more resilient than elevatorsbecause they revert to stairs when they stop working This mindset should govern the design of technology as much as possible. Designers and developers tend to focus their efforts on the usecases that they foresee as most common, putting tremendous thoughtinto how they might make these cases work faster and more smoothlyThis is necessary and admirable, but it does little to address the caseswhen failure is most likely to occur The edge cases are where things go wrong. When users try to do something unusual with the technology, or get to an outcome without goingthrough all the correct steps, that’s when failure tends to strike, andcalmness disappears
  • [3] THE RIGHT AMOUNT OF TECHNOLOGY IS THEMINIMUM NEEDED TO SOLVE THE PROBLEM Perfection is achieved, not when there is nothing more to add, butwhen there is nothing left to take away. ANTOINE DE SAINT-EXUPERY All products might begin as a simple idea, but bringing that idea to lifeinvolves a series of complex processes and design decisions . Designinga simple thing often requires a complicated process . A product that utilizes the right amount of technology becomes invisible more quickly, which is a hallmark of effective Calm Design . Whena product works with us in our existing environment, and fits into our
< Prev   CONTENTS   Source   Next >