We now have an interesting challenge. There are several sources of customer data. This data may be generated at different levels in the hierarchy. How is this data aligned across sources to create a unified customer profile? For example, the cable provider may provide channel-surfing information for my household, while the wireless provider may provide mobility patterns for my son, who does not live with me, and hence does not participate in cable viewing in my living room except when he visits my house. However, he shares a family contract with me, and shares the billing address. Each of us tweet to our respective social groups using a Twitter handle and use endomondo to post our bike riding and jogging records in Facebook. His tweet and jogging locations do not correlate with his billing address. At the same time, our family vacation brings us together, and a marketing campaign for travel spots can be directed to either of us. The telecom provider may be willing to sell the mobility data to the airline interested in offering me travel deals. However, the telecom provider may only provide the data at an aggregate level for 25 or more customers in a micro-segment to protect individual identities. How should a marketer organize and align this data? In order to have a meaningful dialogue, a marketer must bring this data to a unit, which can be related to a marketing action. For a television advertisement on regular television, that unit is the household. To a multidevice consumer, the unit could be a specific device used by an individual. The marketer needs a common denominator and an aggregation mechanism to roll up or down the hierarchy. In the case of the Obama campaign, the individual voter data was periodically aggregated for television media-planning decisions.
Customer profiles have been a subject of focus for decades. Customer relationship management (CRM) tools were the first to offer an integrated customer database, one that would unite sales, revenue, and services views of a customer. Aaron Zornes has been a father figure to the customer data integration / master data management (CDI /MDM) groups and has provided a much needed backbone to this area with his MDM Institute.4 His graceful beard is gradually turning white as he patiently tracks the progress of the MDM community. They did a fair amount of pioneering work in tearing down the organizational silos. As organizations merged and departmental information technology (IT) investments were centralized, they found that each organization had a different view of the customer. For instance, in a newly merged diversified insurance organization, the health insurance department tracked various health stages of a customer, while the life insurance department cared about only two issues—whether the customer was alive or dead. The telco provisioning department carried 96 states of customer order, while the sales team had only 6. The billing team was focused on the billing address, while the trucks were rolled out to the service address. To make the situation more confusing, all of these organizations used the same words—customer, product, address, order, and so forth to mean different terms. The first attempt through corporate initiatives, driven by regulators, such as the Sarbane-Oxley Act on corporate reporting compliance,5 was to establish a centralized data model that served everyone. While the attempts succeeded partially, they resulted in models that were hard to change and too static for most businesses. Gentler approaches using registry-style ID mapping or coexistence MDM, in which master data was consolidated as needed, found more popularity than consolidation MDMs, in which a central customer master supported diverse needs.6
The central technical capability in any MDM is its ability to match identities across diverse data sources. How do we integrate big data with the matching capabilities of the MDM solution? Most MDM solutions offer matching capabilities for structured data. MDM software matches customers and creates new IDs that combine customer data from a variety of sources. These solutions are also providing significant capabilities for using customer hierarchies to normalize data across systems. However, in most cases, the format for the data is known, and the content is primarily structured. What does source data for big data’s single view of a customer look like?
Blogs and tweets posted by consumers on social media sites provide a wealth of information for sentiment analysis; however, this data is not structured. Consumers do not always use proper company or product names. The data contains a fair number of slang words, and there is a mix of languages in a multiethnic, multilanguage environment. Consumers may use a variety of words to convey positive or negative sentiments. The link to the author is not very well articulated. Therefore, analysts start with scant information, such as Twitter handles and unstructured references, and filter and link this data to decipher demographics, location, and other important characteristics required to make this data meaningful to a marketer. At the end of the day, the social media data hangs from its own customer ID, which can be aggregated or abstracted by a marketer.
In chapter 3, I covered several other sources of observations that provide a wealth of customer data. However, customer data belongs to one industry and must be filtered or transformed before it is used by another industry. Let us consider the example of location data. For a wireless company, location data may include its source—whether cell tower, Wi-Fi, or Global Positioning System (GPS) data collected from a device. This would be very useful for network performance analysis, but has limited value outside the wireless industry. Possibly, a marketer would be interested in a polygon that defines a geographic boundary, time duration, and a list of people who were inside the polygon in the specified time period. If the polygon represented a mall, the marketer could assign a context and use the data for a variety of purposes.
The hardestjob in data sharing is to come to an agreement on the definitions and to align collaborating parties to the same definition of the data being exchanged. A marketer in a telco may find ways to mine location data and provide a large number of new attributes, which would be of tremendous interest to a consuming industry. However, now we are talking MDM data-governance issues on a much larger scale than ever before discussed or tackled by the MDM industry. What if a telco can track web-browsing activities by location and turn that data into micro-segments, with the ability to identify hockey fans visiting a specific sports bar, and package this data for a marketer who is selling sporting goods near that location? How do we govern this data so that the two industries can relate to each other’s definitions without creating a static model that cannot be changed with the new fashion trend?
This data coming from a variety of sources could be linked together to get a holistic understanding of the customer. Privacy laws are still evolving in understanding what data can be linked, with or without customer permission. Identity resolution is the next step in the evolution of matching technologies for MDM. Jeff Jonas has been working on IBM’s entity analytics technologies. His initial work was for the casino industry, where he developed technologies to identify casino visitors who profited through fraudulent gambling. This is a powerful technology that takes into account both normal as well as deceptive data from customers.7 The technology is based on a set of rules that places the probability of a match on a set of seemingly unrelated facts. As hard facts match, the probabilities are altered to reflect newly discovered information. Customer-initiated actions, such as accepting a promotion, can be hard evidence added to customer handles or user IDs, connecting them to device IDs, product IDs, or customer account information. Jonas has been studying identity resolution in a number of big data entity analytics problems, including the US voter registration process.8
Customer data can be organized at different levels of hierarchy. As organizations begin to share this data with outsiders, they may restrict data access to a certain level of customer hierarchy and may share selected attributes at each level. For example, a telco may share their mobility patterns for a community, specifying the percentage of a community that travels by bus to work, without specifying the members of the community and their specific mobility patterns. While a community mobility pattern is very useful information for marketers, individual data could be both intrusive and ineffective. Let me discuss the example of using location-based mobility pattern data for targeted advertising. A telco would share community patterns, which can be used by a DSP to decide which advertisement to place to a community, for example, putting greater emphasis on leisure travel promotions to a “globe trotter” community. If the consumer gives approval, it may be appropriate to target daily promotions for nearby restaurants that are in close proximity to the consumer’s most frequent hangout.
Whenever I have made this idea part of a presentation, I have seen several raised eyebrows and been asked questions about customer privacy. Customer privacy is always an area of major concern. For years, corporations collected all types of privacy information and matched it from a variety of sources to obtain a single view of the customer. However, most of that information collection was transparent to the customer and happened without full disclosure. Now, however, big data has the potential to correlate data across industries and across sources far more extensively than in the past. As a result, privacy is a major issue that I address in the next section.