What Kinds of Relationships Can Data Mining Software Reveal?
Generally, any of four types of relationships are sought:
- 1 Entity extractions
- 2 Clusters
- 3 Associations
- 4 Sequential patterns
Entity extraction identifies particular patterns from data, such as text, images, or audio materials. It has been used to identify a persons addresses, vehicles, and personal characteristics, which means that entity extraction can provide basic information for crime analysis.
In criminal justice, as in other fields, such as business, there is a need for collecting and understanding web information about a real-world entity—a person of interest or a suspect, for example. Most of us, if we want to find out more about a person, use a search engine. However, if you do a Google or Yahoo search to learn more about an individual (say, Mark Robinson, to pick a name at random), you will get 174 million hits. To learn more about the particular Mark Robinson you might be interested in would mean having to scroll through thousands of web pages. Even if Google or any other search engine could find all the relevant web pages about Mark Robinson, how long would it take you to sift through all these pages to get a complete view of who Mark Robinson is? A few hours? A few days? Maybe. But it might take even longer.
Entity' extraction works at solving this problem. Microsoft came up with EntityCube to help to search and browse summaries of entities, including people, organizations, and locations.
Software such as EntityCube or Rosette Entity Extractor (REX) automatically' mine from billions of web pages to extract entity information and detect relationships, covering a spectrum of everyday individuals and well-known people, locations, conferences, journals, and organizations.
A cluster is a subset of objects that are similar. Clustering is the process of grouping data into a set of meaningful subclasses, called clusters. For example, in the insurance industry, you may want to group together certain policy' holders; for instance, all policy' holders with high average claim requests and payouts. By identifying this cluster, you can decide how to target that group to reduce its claims.
In criminal justice, crime analysts have started helping detectives and other law enforcement officers to speed up the process of solving crimes. More specifically, a data mining approach using clustering-based models can help in the identification of crime patterns. But providing that help has not been easy because the data related to crime and criminals are often scattered in various databases and around the Internet. Some data are kept confidential, while other data are public information. Data about county' prisoners are usually' found in the county or the sheriffs sites. However, data about crimes related to narcotics or juvenile cases are often more restricted. Similarly', information about sex offenders is made public to warn others in the area, but the identity of the victim is often not accessible. Thus, as a data miner, the analyst has to deal with various issues—and databases—to mine crucial data for detectives.
Furthermore, sheriffs’ offices and police departments may use a computerized reporting system, or they may still use the traditional paperbased crime reports. Whether these crime reports are computerized or paper, they almost always contain certain basic information: the type of crime, the date and time of the crime, the location of the crime, the names and addresses of the victims, the names and addresses of the witnesses, and the name and address of the suspect. Additionally, there is the narrative or description of the crime and MO, both of which are usually in the form of text.That is, police officers and detectives use free text to record certain facts, observations, and conclusions. This is information that cannot be included by checking boxes on a police department form. While some information can be stored in computer databases as numeric, character, or date fields of tables, the observations and conclusions are often stored as free text.
And therein lies the challenge in data mining crime data. Combing through hundreds or thousands (or even more) of crime reports to locate data (such as a description of crime perpetrators or the names of suspects) to gather them into data mining categories is not always an easy job. And that’s where clustering in data mining comes in. A cluster is a group of crimes or people or other kinds of data that are similar and may represent a geographical region, a hot spot of crime, or a possible crime pattern.
Clustering algorithms in data mining are equivalent to the task of identifying groups of records that are similar between themselves but different from the rest of the data. In some instances, clusters will be useful for identifying a crime spree committed by one person or a group of suspects. Given this information, the next challenge is to find the variables providing the best clustering. These clusters will then be presented to the detectives to “drill down” (meaning move to another, often lower or more basic, level of analysis) using their expertise as detectives.
However, clustering requires a skilled crime analyst who is aware that data mining is sensitive to the quality' of input data. What that means is that law enforcement officers’ reports may be inaccurate or have missing information, or that the data entry' step was flawed because names or locations were misspelled, for instance. The skilled and experienced data miner must have a good knowledge of clustering, and know what software will perform the tasks that he or she requires, all the while working closely with a detective, at least in the initial phases of the investigation (Nath, 2006).