Business analytics

Big Data is all the rage because of the opportunities it offers in many and diverse areas. This is especially true for new product development because of the amount of data available on the VOC. Big Data, however, is fraught with problems often overlooked or ignored in many of the commentaries extolling its usefulness and potential for advancing understanding in any area. There are great potentials, but also great problems that have to be noted. See Paczkowski [2018] for a discussion of some problems of Big Data and how to handle them.

The analysis of Big Data falls into two domains: Business Intelligence and Business Analytics. Business Intelligence is concerned with analyzing data; what did, and what is more important, what is currently happening to the business. It is used to alert management teams of problems, real and potential, so they can take preemptive actions. Business Analytics is more forward looking, telling the same management teams what will happen under different conditions. I further discuss the distinction between the two in Chapter 7 where I emphasize Business Analytics for tracking a new product’s post-launch performance. Predictive modeling, Machine Learning, and Text Analysis are the main tool sets.

Both Business Intelligence and Business Analytics rely on Big Data. What is Big Data? The common way to define it is by listing three main features of Big Data, which are its:

  • 1. Volume;
  • 2. Variety; and
  • 3. Velocity.

Volume is the characteristics people immediately think about when they hear or read about Big Data. This should be expected since it is “Big” and not “Small” Data. The volume of data has definitely been exploding because of the different types of data now collected. We used to discuss data in terms of kilobytes, then megabytes, and eventually gigabytes. Now we discuss it in terms of terabytes, petabytes, exabytes, zettabytes, and yottabytes. A kilobyte is 1000 (103) bytes which is a basic unit of measure in computer science. A megabyte is 1 million (= 10002 = 106) bytes. A yottabyte is 10008 (= 1024) bytes which is 10 followed by 23 zeros. Table 1.1 lists the sizes commonly discussed.

TABLE 1.1 These volumes are in decimal measures. Comparable binary' measures are available.

Value

Libel

Symbol

1000

Kilobyte

KB

10002

Megabyte

MB

1000J

Gigabyte

GB

10004

Terabyte

ТВ

10005

Petabyte

PB

10006

Exabyte

EB

10007

Zettabyte

ZB

10008

Yottabyte

YB

Source: https://en.wikipedia.org/wiki/Yottabyte. Last accessed January 26, 2018.

These volumes definitely present many challenges from storage, to access, to analysis. It also increases pressures placed on analysts and heightens misconceptions of what is possible. Analysts are now expected to analyze all the data to find something of use for all aspects of their business, including new product ideas. Unfortunately, much of the data is useless or so complex that analyzing them with current tools is not trivial and, in fact, daunting. There is also the misconception that statistics and sampling are no longer needed or relevant for Big Data. This view holds that they are needed only to provide measures or estimates of measures of key population aspects (e.g., means) when you do not have complete data on the population under study due to the cost of collecting all that data. With Big Data, however, you have that population data so you can measure aspects directly. Unfortunately, this is not true since the population may not be represented at all. Only those who choose or self-select to be in a database are represented. For example, all the buyers of automobiles are not in any one auto manufacturer’s database because people do not buy just one make of automobile. The population is the buyers, not the buyers of one make.

The velocity with which data are arriving into databases is increasing all the time. This is due primarily to new technologies that capture data almost in realtime. Transactions data at grocery centers is an example. In addition to technologies that allow more efficient data capture, new technologies also allow people to almost constantly create data in huge volumes that other technologies capture. Social media in the form of Twitter and Facebook and others of their kind allow people to create text messages at any time and any place they choose. Consider Twitter. One estimate is that there are “350,000 tweets sent per minute” or “500 million tweets per day.”6 This is high velocity/ The volume is dependent on this velocity: the higher the velocity, the larger the volume as is evident by these Twitter numbers.

Variety is the main issue for us. Since the advent of the Internet and social media in particular, there has been an explosion in the types of data, from pure text, to photos, to videos and audios, and to maps, captured and stored in data warehouses. Text data in the form of tweets on Twitter and product reviews on company and review websites (e.g., Amazon) have added a new dimension to the data traditionally collected. Prior to the collection of this type of data, the majority of data were well defined and organizationally structured: amounts (dollars and units sold), dates, order numbers, names and addresses in specific formats. They were well defined because those who collected the data knew beforehand the likely values for key concepts such as phone numbers, dollar amounts, dates, ZIP codes, and so forth. This includes predefined key words and phrases. The database designers and managers could plan for these. They were organizationally structured in the sense that the data were in rectangular arrays or data tables with predefined columns for variables and each object (i.e., observation, case, individual, etc.) in each row with one object per row. The type and nature of the data contained in each cell of the data table was pre-specified and filtered before it could be placed in the cell. This is a structured data paradigm.

The text data now collected is ill-defined and unstructured. It is ill-defined because it could be any content as determined by the writer of the text: differing lengths, symbols, words, and even languages. Tweets on the social media platform Twitter used to be restricted to 140 characters; now they are restricted to 240, but despite this increase this is still a tight space to express a complex thought. Nothing is well defined or structured the way data used to be. Text data are unstructured so now, in addition to a structured data paradigm, there is an unstructured data paradigm. This new type of data is potentially rich in information but, like the traditional well-defined structured data, the information must still be extracted. Since this new form of data, this new variety, arrives so quickly and in such large volumes, the analytical tasks associated with it are now an order of magnitude larger. This is where Big Data Analytics becomes important. This form of analytics not only addresses the volume and velocity issues (the “Big” part), it also addresses the variety issue (the “Data” part).

The sheer volume of data necessitates a new view of analytics but without dropping or ignoring the “old” tools of statistics and econometrics. Those tools, the statistical theory and techniques, still hold and must be relied on. They maintain that there is a population of objects (e.g., people, companies) that exists but that for technical reasons it is too costly to collect data on every object in that population so you cannot directly measure key parameters of that population. One such measure is, of course, the mean. Statistical theory provides tools via sampling that allow you to infer the population measure and to establish ranges (i.e., confidence intervals) around the population measure. Big Data gives the false impression that you now have the population and so therefore the statistical theory and tools, which are based on sampling principles, are no longer necessary. This is incorrect and misleading. Big Data itself, the sheer volume aspect and not to ignore the velocity and variety aspects, present problems that statistical theory directly addresses. See Paczkowski [2018] for a discussion of these problems and how sampling can help.

Although Big Data has issues with the volume aspect that are important to recognize and deal with, it has definite advantages with regard to the variety aspect, especially the text part of variety. Modern technology' has enabled everyone to voice their opinions on any topic from politics, to economics, to science. This is especially true for customer opinions about the products they buy. Customer reviews are now the norm rather than the exception on most producer and retailer websites. These review capabilities allow customers to write their opinions, both good and bad, about a product for others to read. Most sites even provide a rating system, usually a five-point “star” system with one star (or less!) indicating a terrible product and five stars a great product. So the reviews have two parts. The star portion can be viewed as a quantitative measure not unlike a five-point Likert Scale measure of customer satisfaction commonly used in many market research surveys. The text portion of the review can be viewed as a qualitative assessment of the product that amplifies or explains the quantitative star rating. It is the qualitative portion that is important for new product development since this is where customers not only voice their opinions but also make their suggestions for improvement. It is the latter that is the key to new product development ideas. These data are not only important for new product ideas but also for follow-up tracking once the product has been launched. You also need to understand how well the product is performing in the market. Sales data can certainly tell you this - this is the “how” of performance - but the customer reviews are invaluable for informing you about the “why” of performance.

The use of text mining and analytics for new product ideas and tracking will be discussed in Chapters 2 and 7. Methods for Business Analytics are discussed in Chapter 7.

 
Source
< Prev   CONTENTS   Source   Next >