In 1980, I was conducting day-after-recall research for advertisements aired on television. A typical day-after-recall requires finding a statistically significant sample of television viewers who watched the programming the previous day and asking them many questions about what they remember seeing. The advertisement was considered successful if a large proportion of those who saw the advertisement remembered it as well as its marketing message. As I started to collect the data, I used census information to design our survey collection, providing adequate coverage of various income groups in the city of Mumbai, and trained a number of my interviewers on the survey questionnaire. To understand the data collection accuracy, I followed a couple of interviewers throughout the day all over the city. Dogs and security guards often chased us, and many prospective respondents slammed the doors before we could ask any qualifying questions. At one house, the main decision-maker did not have the time to answers, so she directed me to her teenage daughter who was eager to answer the questions, but was not the primary decision-maker. My field interviewer and I argued forever about the validity of that observation. He told me the teenage daughter met the criteria specified in the interview, and so the interview was valid. I kept thinking about their next trip to the grocery store and the role the daughter was likely to play in using the advertisement to decide on the product purchase. As the day progressed, I started to get a realistic view of the “statistically-significant perfectly-random sample.” Irrespective of how hard we tried, the sample remained biased toward those who were eager to respond and were easily accessible to us.
That was 33 years ago. As I recollect the experience, it really feels like another century! Today, I have some of that data available from the STB of all digital cable subscribers in the city, so instead of chasing 100 subscribers, I can be looking at data from 5 million subscribers. I can use the data to identify subscribers who saw an advertisement and determine whether they reached for the remote halfway through the advertisement to switch the channel or reduce the volume, and by analyzing the social media messages, I can seek their sentiments about the commercial. Yes, that is still a biased sample, but it represents a much bigger sample size. Depending on the geography, the STB data may represent a biased majority, and the social media messages only belong to those who are eager to respond, like the teenage daughter of the busy housewife I interviewed 33 years ago. The difference is we now have a lot more observations, not just reported samples of data. Also, we may find an overlapping set of data. Each big data source brings its own biases, but truly represents an individual. Despite these biases, we may be able to map a set of customers almost perfectly—where they dine, what television programs they watch, how far they commute to work, when they take coffee breaks, which brands they prefer, and how they respond to different campaigns.
So, how does this change marketing? For decades, statisticians build processes that worked well on random samples of real data. We now have real data. There are no more samples. Also, if we are able to build a relationship with a customer, we can track that customer through different stages of purchases.
Marketing is about making customers aware of the offerings, supporting the buying process via a variety of persuasions culminating in a purchase, and then using this affinity to sell the next product, expand to his/her circle of friends, or design a new product based on those ideas. This chapter established the first proposition, that marketers have a lot of observations they can use for anything they would like to do. It also provided a new task for statisticians, to work on systematic biases and remove their bad effects. Now that we have found a new frontier where there are no more small samples and where marketers have access to enormous observations about each customer, do we continue to broadcast messages to our customers? In the next chapter, I will explore the actions a marketer can take and how big data radically changes how marketers interact with their customers.