Taking Control of Gesture Interaction
GERSHOM KUTLIROFF AND YARON YANAI
Reinventing the User Experience
For those of us old enough to remember a world before iPods, the computer we used when we were 15 years old looked very similar to the computer we were using when we were 35. There was a (generally boxy) console, a monitor for display, and a keyboard and mouse for input. Now, it seems we have a new class of devices every other year — smartphones, tablets, Google Glass, and now smartwatches (not to mention “phablets,” “two-in-ones,” and the various other hybrids). Many factors are driving this rapid introduction of new products, among them cheap (and plentiful) processing, new display technologies, and more efficient batteries, to name a few.
One commonality shared by all of these devices is the critical role user interaction plays in their design. Indeed, today the size of a portable device is largely limited by input/output considerations — the screen size and keyboard — and no longer by the requirements of the different technology components. As devices are further integrated into our daily activities (think “wearables”), the importance of reinventing the way we communicate with them increases.
Gesture control is an intriguing solution to this problem because it promises to enable our devices to understand us the way other people understand us. When we want to indicate an object (virtual or real), we point at it; when we want to move something, we pick it up. We don’t want to be constrained to a keyboard or a mouse or a touchscreen to communicate. This potential has begun to be realized over the past few years as gesture control technology reaches end users in the forms of Microsoft’s Kinect sensor, Samsung’s Galaxy smartphone, and Intel’s RealSense initiative.
As with many emerging technologies, gesture control has enjoyed some early successes as well as some clumsier, less successful attempts at reinventing the user experience (UX). The challenging aspect of the problem is all the more evident when we pause to consider the complexity (and early nature) of the different technology components that must work together smoothly: the sensor, the camera, middleware solutions, and, of course, the applications that must bring all these elements together to create a great user experience.
Moreover, the general difficulty of working with early technology is compounded by the specific design challenges inherent to gesture recognition interfaces: how can a user understand the effects of his actions when there is no tactile feedback? How can false positives be avoided? What can be done to address user fatigue?
Thus, for all of its promise, the futuristic dream of ubiquitous gesture control remains... well, futuristic. Yet, there is a growing sense that although many of the technical challenges will be solved, perhaps the most important question of all remains a riddle: what will designers do with this technology?
In our former company, Omek Interactive, we developed middleware to accurately and robustly track hand and finger movements in real time, by interpreting the data generated by 3D cameras (see Figure 3-1). 3D cameras compute the distance between objects in front of the camera and the camera itself; this “depth” data is very effective in solving many hard problems in computer vision and is therefore a key enabler of the tracking algorithms. Our objective was to enable a compelling gesture-based interaction driven by our hand and finger tracking solution.
Figure 3-1. Depth map generated by a 3D camera (different shades of gray indicate proximity to the camera) (Omek
Interactive Ltd. © 2013)
Gesture-based interaction can be supported by other technologies such as conventional (2D) cameras and ultrasound signals. However, we believed that the fully immersive, natural, and intuitive user experience we envisioned required robust and finely nuanced skeleton tracking, possible only with 3D camera technology. Concurrent with the development of the skeleton tracking middleware, the Omek Studio worked with early versions of the technology to realize a new paradigm for human-machine interaction. In this chapter, we discuss how we approached this challenging task, the inherent usability issues we faced, and especially the tactics we employed to resolve these difficulties. In particular, we describe in some detail how we progressively evolved the gesture-based experience through an iterative prototyping process that made it possible for us to take advantage of the benefits of gesture-based interaction and to compensate for its flaws. We submit that our specific experience of designing a next-generation gesture interface and the lessons we learned are broadly applicable to a range of design problems, characteristic of many emerging technologies.