Data Mining and Analyzing Intelligence
- 1 What is data mining?
- 2 Understanding data mining
- 3 Computer-based transaction processing
- 4 Analytical systems
- 5 Software available for criminal justice data mining
- 6 What kinds of relationships can data mining software find?
Learning Objectives for Chapter 9
- 1 Understand data mining
- 2 Learn how crime data are analyzed
- 3 Begin to understand how to explore large databases
- 4 Explore examples of data mining
A major challenge facing all law enforcement and intelligencegathering organizations is accurately and efficiently analyzing the growing volumes of crime data. For example, complex conspiracies are often difficult to unravel because information on suspects can be geographically diffuse and span long periods of time. Detecting cybercrime can likewise be difficult because busy network traffic and frequent online transactions generate large amounts of data, only a small portion of which relates to illegal activities. Data mining is a powerful tool that enables criminal investigators who may lack extensive training as data analysts to explore large databases quickly and efficiently.
(Chen et al., 2003, p. 50)
What Is Data Mining?
Data mining, sometimes called data discovery or knowledge discovery, is the process of analyzing data from different perspectives and summarizing them into useful information. On a more technical level, data mining is the process of finding correlations or patterns among many fields in large relational databases (McCue and Parker, 2003).
For the most part, data mining tells us about very large and complex data sets. And these days, there are many large, complex data sets available. For instance, in Chapter 8 we listed some data sets—or databases—that are often used by law enforcement. For instance, one large database is the Motor Vehicles Department (or secretary of state) in your state. If, for example, you live in Illinois, the secretary of state in Illinois manages one of the largest computer databases in the state, keeping track of approximately 8.7 million drivers, 11 million registered vehicles, 466,000 corporations, 230,000 limited liability entities, 159,000 registered securities salespersons, and 16,000 investment advisor representatives. This one database illustrates something that most of us know—or, at least, realize on some level: There is far more information available than anyone can digest, let alone analyze, without a computer. And that amount of information is growing every day. The reason for that is simple, because nearly every one of our transactions leaves a data signature that someone (or more likely some computer) is capturing and storing.
The sheer scale and volume of the data collected by businesses and the government defy our imagination; it is beyond our sense-making capabilities (Furnas, 2012). To try to determine relationships and patterns, therefore, is often too complex to figure out by trying to look at the data. For instance, using our example of the Illinois secretary of state, if you wanted to find a relationship between licensed drivers who had their license suspended during the years 2008 and 2012 and who drove a Chevy Impala, you could not possibly determine this by hand; you could find such relationships, though, with data mining.
In other words, data mining is used to simplify and summarize the data in a manner that we can understand and use. For example, we are all familiar with the wayAmazon.com or Netflix utilize data mining, although you may not have known that these companies were using data mining techniques. However, every time you log in to Amazon to look for a book or a DVD, you will see on the website that Amazon knows exactly what you previously looked at, what you previously bought, and what you might like to buy this time. Netflix does exactly the same thing, recommending what movie you would probably enjoy watching next. Mastercard and Visa use data mining to target you for deals or advertising. Most such major companies use sophisticated data mining software to track your data.
How Does Data Mining Work?
Large-scale information technology (IT) has been evolving for years. In fact, the term information technology is so common that it is used in every large organization. Yet, when we typically use the term, we usually are referring to the IT department of a business or university. We say, “I should call IT to come and fix my computer” or “See if IT can install the new software we ordered.” But we generally no longer question exactly what the term information technology means.
For our purposes, IT is the use of any computers, storage, networking, and other physical devices, infrastructure, and processes to create, process, store, secure, and exchange all forms of electronic data. The term information technology was coined by the Harvard Business Review in order to make a distinction between purpose-built machines designed to perform a limited scope of functions and general-purpose computing machines that could be programmed for various tasks (Applegate, Cash, and Mills, 1988). As the IT industry evolved from the mid-twentieth century, it encompassed transistors and integrated circuits, while our computing capabilities made giant leaps forward.
IT usually includes several layers of physical equipment (hardware), virtualization and management or automation tools, operating systems, and applications (software) used to perform essential functions. User devices, such as laptops, smartphones, or even recording equipment, peripherals, and software can be included in the IT domain. IT can also refer to the architectures, methodologies, and regulations governing the use and storage of data. But it is important for you to be aware that IT has over the past two decades been evolving into separate transaction and analytical systems.
A transaction processing system (TPS) supports the processing of a company’s or organization’s business transactions. For instance, the TPS of a university helps perform such tasks as enrolling students in courses, billing students for tuition, and issuing paychecks to faculty. In addition, the TPS associated with a university’s large employee and faculty pension fund may assist stockbrokers in executing buy and sell orders, while also helping with accounting for the transaction (Mahar, 2003).
Transaction processing systems keep an organization running smoothly by automating the processing of the voluminous amounts of paperwork that must be handled daily. These systems, if we again use the example of a large university, include the accurate recording of transactions, as well as control procedures usually used in paychecks, invoices, customer statements, payment reminders, tuition bills, and student schedules (Mahar, 2003).
The TPS of an organization may be far reaching, extending completely throughout the organization, linking together the entire financial system.