What Is Data Mining?
Data mining, sometimes called data discovery or knowledge discovery, is the process of analyzing data from different perspectives and summarizing them into useful information. On a more technical level, data mining is the process of finding correlations or patterns among many fields in large relational databases (McCue and Parker, 2003).
For the most part, data mining tells us about very large and complex data sets. And these days, there are many large, complex data sets available. For instance, in Chapter 8 we listed some data sets— or databases—that are often used by law enforcement. For instance, one large database is the motor vehicles department (or secretary of state) in your state. If, for example, you live in Illinois, the secretary of state in Illinois manages one of the largest computer databases in the state, keeping track of approximately 8.7 million drivers, 11 million registered vehicles, 466,000 corporations, 230,000 limited liability entities, 159,000 registered securities salespersons, and 16,000 investment advisor representatives. This one database illustrates something that most of us know—or, at least, realize on some level: there is far more information available than anyone can digest, let alone analyze, without a computer. And that amount of information is growing every day. The reason for that is simple, because nearly every one of our transactions leaves a data signature that someone (or more likely some computer) is capturing and storing.
The sheer scale and volume of the data collected by businesses and the government defy our imagination; it is beyond our sense-making capabilities (Furnas, 2012). To try to determine relationships and patterns, therefore, is often too complex to figure out by trying to look at the data. For instance, using our example of the Illinois secretary of state, if you wanted to find a relationship between licensed drivers who had their license suspended during the years 2008 and 2012 and who drove a Chevy Impala, you could not possibly determine this by hand; you could find such relationships, though, with data mining.
In other words, data mining is used to simplify and summarize the data in a manner that we can understand and use. For example, we are all familiar with the way Amazon.com or Netflix utilize data mining, although you may not have known that these companies were using data mining techniques. However, every time you log in to Amazon to look for a book or a DVD, you will see on the website that Amazon knows exactly what you previously looked at, what you previously bought, and what you might like to buy this time. Netflix does exactly the same thing, recommending what movie you would probably enjoy watching next. MasterCard and Visa use data mining to target you for deals or advertising. Most such major companies use sophisticated data mining software to track your data.