Predictive policing

Since Snowden, everyone knows about the use of technolog)' for surveillance. Law enforcement, counter-terrorism and intelligence agencies are constantly innovating. And many more innovations can be named. Among the most obvious are wiretaps (Bloss 2009), CCTV (Welsh & Farrington 2009) and surveillance cameras (Leman- Langlois 2002; Alexandrie 2017). Some of these methods were first introduced a long time ago but continue to evolve with technology and changes in society and the law. Wiretapping, for example, has been around since the 1890s and was widely used during the prohibition era (Berger 1938). Once it became widely known as a police surveillance technique, it immediately began to be the subject of intense public debate (see Brownell 1954).This debate, though on again off again, has never completely gone away.

A newer innovation is ‘predictive policing’. Predictive policing refers to the use of different analytic tools, including algorithms and computer software, to forecast the location and timing of crimes (Moses & Chan 2018, p.806).These forecasts add to the stock of‘intelligence’ used to allocate police resources. The essence of the idea is not really new. It is just the natural development of crime analysis, which promises to be able to direct police resources more effectively. A related approach, for example, is intelligence-led policing (ILP). ILP emerged during the 1990s in the UK, though its roots might be traced as far back as the 1970s. In response to high and increasing crime rates, there were calls for a more proactive approach to policing (Maguire 2000). Rather than responding to reports of crimes, police would try to pre-empt crimes. Information, including that derived from surveillance and the use of informants, would be interpreted and analysed to provide intelligence that could be used to better guide police activity. Following 9/11, ILP received a significant boost in popularity. Predictive policing is another step towards the integration of intelligence techniques and everyday policing (Moses & Chan

2018, p.808).

Predictive policing uses data and software. The software usually embeds an algorithm. The idea is to forecast location and timing for crimes and then direct police to patrol those areas. Sometimes the forecasts are driven purely by the algorithm. At other times, an analyst combines the algorithm’s results with other information to form a more comprehensive intelligence assessment. An example of an algorithm is the ‘self-exciting point process’ applied to the predictive policing of residential burglaries by Mohler et al. (2011).The process was originally developed by seismologists. When there is an earthquake, seismologists use this type of mathematical model to predict the location and timing of‘nearby’ earthquakes. Using data for past burglaries, the general idea is that the math can help identify crime hotspots on a city grid. It is essentially a contagion process. One crime in one location increases the chance of future crimes at the same location and at locations nearby. This should work reasonably well, at least in principle, for crimes such as burglaries, which do have characteristics that make them amenable to study by diffusion or contagion processes (e.g.Johnson 2008). Evidence remains mixed, however (Meijer & Wessels 2019).

Apart from the question, still unresolved, about whether predictive policing ‘works’, most researchers have been concerned with the possibility of bias and discrimination (e.g. Brantingham,Valasik & Mohler 2018). But there are plenty of other areas for debate. What is most interesting about predictive policing is that Gary Becker himself could have come up with it. The very vision of it is straight out of Becker (1968).That is, police resources are allocated seamlessly to problem areas on the city grid to deter crime. This deterrence effect works even without any arrests being made. Consider Brantingham, Valasik & Mohler’s (2018, p.l) comments: ‘The prevailing view, derived from experiments in hot spot policing, is that the presence of police in a given place removes opportunities for crime even without any direct contact with potential offenders’. Beneath the surface, we have Becker’s (1968) narrative. Police officers are resources. Efficient allocation can reduce the costs. Predictive policing increases the efficiency of the allocation of police resources. Efficient allocation is something beyond sending police to arrest more criminals. Criminals respond to incentives and the deterrence effect of police presence. By picking the best spots to send police to, predictive policing spreads a deterrence effect over the locations where it will have the biggest impact per unit of cost. Classic Becker.

This leaves out a lot of agency, for the criminals, the police officers, the designers of the models and the assessors of the results. The crimes are treated as contagion processes rather than decision processes and the police response is treated as a resource allocation problem without a decision-making dimension. The impact of predictive policing on the decisions that officers make has been the implied concern of those researchers interested in the potential for predictive policing to introduce greater prejudice into police practice. Of course, we would also question the ‘point prediction’ nature of some applications of predictive policing. We have argued that pattern prediction is far sounder. Particular types of crime exhibit patterns because the offenders exhibit patterns in their choices. Contagion and diffusion processes capture none of the underlying behaviour. A particular location might be flagged as a hot spot because a crime was reported there. The reason why another crime might occur there is because the outcomes of crimes committed there are a reference point for local offenders. However, by this time, offenders are already expected to be ‘on the move’ because the chances of exceeding the reference point with another ‘score’ at the same location are slim and the offenders probably expect an increased police presence.6 What is needed then is not a prediction of the offenders’

(already outdated) reference point, but a geographic prediction of the direction of their domain of gains.

These weaknesses are apparent in the modelling undertaken by Mohler et al. (2011). Using the self-exciting point process model from seismology and data for 5,376 burglaries reported in an 18 x 18 kilometre grid for the San Fernando Valley in Los Angeles, Mohler et al. (2011) demonstrate that burglaries diffuse from a single point outwards along streets heading north-south, east-west (burglars do not travel at angles through backyards). Of these burglaries, however, 63 percent occurred in the same house and nearby houses during a 1-2-day period. The risk remains elevated for 7-10 days before a quick decline and gradual decay back to baseline. While the analysis seems impressive, the initial burglaries are most likely the work of a single offender or group and could not have been predicted beforehand. Their activity created a hot spot. The original offenders continue to work the same area for a week or so. It is possible that new offenders are initially attracted but not in great numbers and not for very long. Rather, we expect criminals to begin to drift, especially if they have been successful because they are in the domain of gains and, with the police closing in, are motivated by their risk aversion to protect the gains they have made.

< Prev   CONTENTS   Source   Next >