HYBRID SOLUTION ARCHITECTURES
The architecture components described in the previous chapter must be placed in an integrated architecture in which they can all coexist and provide overall functionality and performance consistent with our requirements. However, the requirements are at odds with each other. On the one hand, we are dealing with unstructured data discovery over very large data sets that may have very high latency. On the other hand, the adaptive analytics activities are bringing the analytics to a conversation level requiring very low latency. How do we establish an overall architecture that respects both of these components equally while establishing a formalized process for data integration? This chapter describes an integrated architecture that responds to these challenges and establishes a role for each component that is consistent with its capabilities. The architecture outlined in this chapter is the advanced analytics platform (AAP).12
IT organizations in all major corporations face several important architecture decisions. First, an existing infrastructure, with a large body of professionals who care for and feed the current analytics platform, is severely constrained by the growing demand for the four Vs and faces the big data tsunami. Continued investment in the current infrastructure to meet future demand is next to impossible. Second, as market forces seek new ways to create analytics-driven organizations, they are forcing massive changes in how they deal with marketing, sales, operations, and revenue management. Intelligent consumers, greenfield competitors, and smart suppliers are forcing organizations to rapidly bring advanced analytics to all major business processes. Third, the new MPP platforms, open-source technologies, and cloud-based deployment are rapidly changing how new architecture components are developed, integrated, and deployed. AAP grew under these architecture demands with four architecture principles:
- 1. It integrates with and caps the current analytics architecture to the mature functions, which continue to require the current warehouses and structured reporting environments. This integration includes important functions such as financial reporting, operational management, human resources, and compliance reporting. Most organizations have mature data flows, analytics solutions, and support environments. These environments will gradually change, but a radical change takes time and investment, and might not result in the biggest payback.
- 2. It overlays a big data architecture that shares critical reference data with the current environment and provides the necessary extensions to deal with semistructured and unstructured data. It also facilitates complex discoveries, predictive modeling, and engines to carry the decisions driven by the insight created through advanced analytics.
- 3. It adds a necessary real-time streaming layer, which is adaptive, using discovery and predictive modeling components, and offers decision-making in seconds or milliseconds as needed for business execution.
- 4. It uses a series of interfaces to open up the data and analytics to external parties—business partners, customers, and suppliers.
I will use an analogy from sports television coverage to demonstrate how this architecture closely follows the working behavior of highly productive teams. I have always been fascinated by how a sports television production is able to cover a live event and keep us engaged as an audience using a combination of real-time and batch processing. The entire session proceeds like clockwork. It is almost like watching a movie, except that the movie is playing live with just a small time buffer to deal with catastrophic events (like wardrobe malfunctions!).
As the game progresses, the commentators use their subject knowledge to observe the game, prioritize areas of focus, and make judgments about good or bad plays. The role of the director is to align a large volume of data, synthesize the events into meaningful insight, and direct the commentators to specific focus areas. This includes replays of moves to focus on something we may have missed, statistics about the pace of the game, or details about the players. At the same time, statisticians and editors are working to discover and organize past information, some of which is structured (e.g., the number of double faults in tennis, or how much time the ball was controlled by one side in American football). However, other information that is being organized is unstructured, such as an instant replay, where the person editing the information has to make decisions about when to start, how much to replay, and where to make annotations on the screen to provide focus for the audience. The commentators have the experience and expertise to observe the replays and statistics, analyze them in real time, and explain them as they do for the game itself.
The commentators process and react to information in real time. There cannot be any major gaps in their performance. Most of the data arrives in real time, and is processed and responded to in real time as well. The director has access to all the data that the commentators are processing, as well as the commentators’ responses. The director then has to script the next couple of minutes, weighing whether to replay the last great tennis shot or football catch, focus on the cheering audience, or display some statistics. In the course of these decisions, the director scans through many camera views, statistics, and replay collections, and synthesizes the next scenes of this live “movie.” Behind the scenes, the statisticians and editors are working in a batch mode. They have all the history, including decades’ worth of statistics and stock footage of past game coverage. They must discover and prioritize what information to bring to the director’s attention.
Let me now apply this analogy to the big data analytics architecture, which consists of three analytics layers. The first is a real-time architecture for conversations; this layer closely follows the working environment of the commentators. The second is the orchestration layer that synthesizes and directs the analysis process. Last, the discovery layer uses a series of structured and unstructured tools to discover patterns and then passes them along to the orchestration layer for synthesis and modeling.
There have been four significant developments in recent years to make such platforms real-time and actionable.
Reporting versus insight: Many people believe that reports are the key mechanisms for gaining insight into data. Reporting is typically the first task for an analytics system, but it is definitely not the last. You build on reporting often by visualization of various forms that include the overlaying of geospatial visualization and the creation of new semantic models. Doing so helps you to gain insights that lead to new abstracted data. These insights can be broad, ranging from mobility patterns to micro-segments. As you gain insight, you contribute to previously unseen patterns through discovery. This pattern discovery that leads to deep insights is core to effectively using big data to transform the enterprise.
Sources of data and data integration: Merely having data does not mean you can start applying analytic tools to the data. You often must extract, transform, and load (ETL) the data before you can effectively apply these tools. Beyond ETL, it is important to integrate multiple data sources so that the analytics tools can identify key patterns. This integration is especially important given the wide variety of data sources available today. Departments create new intradepartment data everyday, including sensor, networking, and transaction data, which affect the department. The enterprise creates data such as billing, customer, and marketing data, which are essential for the enterprise to operate effectively. Third-party data also becomes critical, often sourced from social media or purchased from third-party sources. These various sources of data, which are often difficult to correlate, must be integrated to truly gain insights, which are currently not possible.
Latency and historical analytics tradeoffs: Latency that is associated with the data can often have a huge impact on how one analyzes the data and the response to the insights gained from the analytics. The perception often is that when you increase the data-gathering speed or fine-tune the hardware and software, you can move from historical analytics to real time. Historical analytics cannot often be performed in real time for a variety of reasons, including the lack of access to critical data in a synchronized manner at the right time, tools that cannot perform analytics in real time, and required dynamic model changes that are not part of existing tools for historical data. This is partly because real-time analytics introduces extra complications such as the need for logic and models to change dynamically as new insights are discovered. In addition, real-time analytics can be more expensive than historical analytics, so you must consider return on investment to justify the additional expenses.
Veracity and data governance: As mentioned earlier, Veracity represents both the credibility of the data source and the suitability of the data for the target audience. Governance deals with issues such as how to cleanse and use the data; ensure data security but still enable users to gain valuable insight from the data; and identify the source of truth when you use multiple sources for a data source and determine which is the source of truth. In most environments, data is a mixture of clean trusted data and dirty untrustworthy data. One key challenge is how to apply governance in such an environment.
This chapter provided prominent technological enablers that help marketing analytics. These enablers become integrated in an advanced analytics platform to support marketing use cases such as intelligent campaigns or real-time targeted advertising. These enablers scale to the challenges posed by the large volume, velocity, variety, or veracity of data.
The enablers use a number of newly developed big data analytics techniques. To develop these solutions, marketers may need to upgrade the analytics skills in their organization. In the next chapter, I will address the skills gap and how these skills can be cultivated.