STREAM COMPUTING TO ADDRESS VELOCITY
Traditional computing was built on a batch paradigm of observed data augmented with reported data wherever observations could not be made. Consider, for example, a call center. All the customer call information was collected at the call center and extracted, transformed, and loaded into a data warehouse, which provided trend analysis and reporting on the call center data. Some of this data was observed data. How many customers called in a particular day? What was the average wait time? What was the average handling time? Either the agent or the customer reported the rest of the data. At the end of the call, the customer had the option of reporting his/her satisfaction with the call. Did the customer provide feedback at the end of the call? How many rated the company a 1, or “very poor”? Was that trend up or down from prior days? Also, the agents provided their recollection of the call purpose and other relevant information. With any reported information, the information may not be consistently collected or properly keyed. In a call center study I carried out, I found the busy call center agent was being asked to use 87 keywords to describe a call at the end of the conversation. Most agents used the top ten keywords to codify the call. Were the calls accurately reported, or did the call center agents memorize a couple of key words and use them repeatedly because it was hard to memorize 87 keywords?
Increasingly, call centers are moving to an event-driven, continuous intelligence view of operations. This approach enables the immediate detection and correction of problems as they appear, rather than after-the-fact changes. It also allows marketers to observe and codify customer conversations using a set of tools that record observations and do not rely on the recollection of the facts by either the agent or the customer. As the conversations are carried out, relevant data can be extracted from these conversations and forwarded to the marketing organization. The process involves creating a stream-computing engine, which can observe conversations and identify relevant information during the observation.
Stream computing is a new paradigm. In “traditional” processing, one may think of running analytic queries against historical data—for instance, calculating the average time for a call last month for a call center. With stream computing, a process can be executed that is similar to a “continuous query” that keeps running totals, as observed data is collected moment by moment. In the first case, questions are asked of historical data, while in the second case, observed data is continuously evaluated via static questions. For example, stream computing can be used to observe the conversations in order to specify the mood of the caller-happy, sad, angry. It can be used to monitor customer reactions in social media to a new product launch, and all this information can be analyzed and reported-in real time!
Stream computing is best used when a marketer is dealing with a high volume or variety of data throughputs and when the data requires filters, counts, or scoring in real time. Sentiment analytics of the presidential State of the Union address is a good example of a situation in which incoming data is unstructured social media comments, and the researcher uses the incoming data to extract and report sentiment information in real time. A number of marketing activities described in chapter 4 are prime candidates for stream computing. The number of events created by the customer or the environment is staggering. For example, as a customer initiates shopping for an electronic device, marketers can observe patterns associated with him/her by analyzing a chain of events as they occur. Stream computing enables analysis and identification of the context of a customer behavior. It can examine a number of campaigns and compare them using prespecified scoring models to trigger certain actions that collaboratively influence customer actions using advertising, expert testimonial, or targeted promotions. An orchestrator can change stream-computing parameters, thereby making its action dynamic as well as adaptive to change.
There are three parallel technical concepts-complex event processing, streams, and ETL-that built the momentum and gave rise to the sophistication behind streams computing, making it the most powerful big data technology for marketing analysts. Complex event processing owes its genesis to the simulation technologies and deals with identifying complex patterns of events.8 As marketers assemble raw data such as web searches, browse through product sites and visits to stores, they can assemble the events to recognize someone shopping for a product. Complex event processing provides the concepts for representing these events. Unix Streams9 provided the concept of streams of data as input to a computer system. Thus, a video stream from a surveillance camera or an audio stream from a call center conversation can be ingested into a program for the detection of patterns. Last, but not least, ETL tools in business intelligence provided the mechanism for extract-transform-load of the data from a variety of operational systems.
Stream computing benefited from these technical concepts, although it is not a replacement for the respective tools that perform these functions. However, it combines the best of these concepts for the marketing problem at hand—the ability to detect patterns of events in streaming data sets and analyze the alternatives in real time. Stream computing does its activity at very high velocity, making it a tool for instant response and collaborative conversation.
Stream computing can perform three functions in real time for marketers. First, it can sense, identify, and align incoming streaming data to known customers or events, and join these events to identify the customers with a specific context and intent (for example, “customer at a store”). Second, it can categorize, count, and focus. These capabilities provide important real-time attributions to the source data, for example, “more than two web searches on a specified topic” or “a user who clicks an advertisement and goes to advertiser’s site to shop” These functions use a set of dynamic parameters that are constantly updated based on deep analytics on historical data. Third, it provides capabilities to score and decide. A set of scoring models might be promoted through predictive modeling that uses historical data. These models can be scored in real time using streaming data and be used for decision-making. In addition, complex decision trees or other rule-based strategies can also be executed with run-time engines that take their rules from a business rule management system (BRMS).
Stream computing is typically performed on a massively parallel platform (MPP) to achieve high velocity and throughput. Let me discuss next the notion of an MPP system and show how these architectures support big data volumes.