Software Available for Data Mining in Criminal Justice
Several types of analytical software systems are currently available for use in the criminal justice system. For example, the following sections cover some categories of software that are suitable for criminal justice and law enforcement applications,
Statistical Analysis System (SAS) is a software suite that can mine, alter, manage, and retrieve data from a variety of sources and perform statistical analysis on them. SAS provides a graphical point-and-click user interface for nontechnical users and more advanced options.
Machine learning and data mining often employ the same methods and overlap significantly. Machine learning, by definition, is a type of artificial intelligence (AI) that provides computers with the ability to learn without being explicitly programmed. Machine learning focuses on the development of computer programs that can teach themselves to grow and change when exposed to new data.
The process of machine learning is similar to that of data mining. Both systems search through data to look for patterns. However, instead of extracting data for human comprehension—as is the case in data mining applications—machine learning uses those data to improve the program’s own understanding. Machine learning programs detect patterns in data and adjust program actions accordingly. For example, Facebook’s News Feed changes according to the user’s personal interactions with other users. If a user frequently tags a friend in photos, writes on his wall, or “likes” his links, the News Feed will show more of that friend’s activity in the user’s News Feed due to presumed closeness.
Machine learning and data mining can be generally distinguished as follows:
- • Machine learning focuses on prediction, based on known properties learned from the training data. (The data used to construct or discover a predictive relationship are called the training data set. Most approaches that search through training data for empirical relationships tend to overfit the data, meaning that they can identify apparent relationships in the training data that do not hold in general. A test set is a set of data that is independent of the training data, but that follows the same probability distribution as the training data.)
- • Data mining focuses on the discovery of (previously) unknown properties in the data. This is the analysis step of knowledge discovery in databases.
The two areas overlap in many ways: data mining uses many machine learning methods, but often with a slightly different goal in mind. On the other hand, machine learning also employs data mining methods as “unsupervised learning” or as a preprocessing step to improve learner accuracy. Much of the confusion between these two research communities comes from the basic assumptions they work with: in machine learning, performance is usually evaluated with respect to the ability to reproduce known knowledge, while in knowledge discovery and data mining (KDD), the key task is the discovery of previously unknown knowledge. Evaluated with respect to known knowledge, an uninformed (unsupervised) method will easily be outperformed by supervised methods, while in a typical KDD task, supervised methods cannot be used due to the unavailability of training data.
Trying to detect specific patterns of crime and criminal behavior is extremely challenging. Crime analysts can spend countless hours sifting through data to determine whether a crime fits into a known pattern and to discover new patterns. Once a pattern is detected, the information can be used to predict, anticipate, and prevent crime.
A machine learning method called “Series Finder” was developed by Wang et al. (2013) to assist the police in discovering crime series. Initially, Series Finder was trained to detect housebreak patterns, and it “learned” how to do this using historical data from one police department’s crime analysis unit. (Whether you are doing simple multiplication or a complicated calculus problem, you must use a predetermined set of rules, called an algorithm, to solve it. An algorithm includes a finite number of steps to solve any given problem.) The algorithm used in Series Finder tries to construct a modus operandi (MO) of the offender. As Series Finder grows the pattern from the database, the MO for the pattern becomes better defined.