By Pete Mains
In 2020, Santa Cruz, California banned predictive policing and facial recognition technology to the applause of civil liberties groups like the ACLU, NAACP and the Electronic Frontier Foundation (EFF). Facial recognition technology is invasive, and the case against it rests on moral rather than technical grounds. Predictive policing, less widely understood, simply means applied analytics and statistics to predict where crimes are likely to occur, and therefore where a police department should patrol and otherwise monitor.
The arguments against predictive policing betray a misunderstanding of the technology. Writing at EFF.org, Matthew Guariglia and Yael Grauer lay out such a case that misrepresents the abilities and limitations of the technology. They wrongly conclude that predictive policing by its nature leads to increased racial discrimination. With predictive policing banned, Santa Cruz residents—and probably minorities more than others—will suffer from wasted tax dollars and needlessly high crime rates.
The emotional case painted is compelling. Guariglia and Grauer compare predictive policing to “Minority Report”. “Minority Report” is science fiction, though. AI researchers don’t worry that their models will become sentient and develop consciousness or human emotions. Alas, my computer will never love me back. Nor is there any movement afoot to arrest and charge Americans with crimes before they happen. Predictive policing merely helps police departments rationally allocate scarce resources. Artificial Intelligence at its core is just applied statistics. If you change, “Santa Cruz bans predictive policing” to “Santa Cruz bans math,” the problem becomes obvious. We are depriving the Santa Cruz police department of objective analytical tools to combat crime.
The EFF article does get beyond superficial comparisons, but it manages to get most of its facts wrong, starting with the headline, “Technology Can’t Predict Crime, It Can Only Weaponize Proximity to Policing.” The second part of that statement is at least defensible, but the idea that we cannot predict crime is baffling. Technology identifies and predicts criminal behavior in many different ways outside of law enforcement. Your bank makes generalizations about what is “normal” behavior and uses this baseline to identify potential fraud. Modern antivirus software works largely the same way. Google monitors activity on your devices and accounts and predicts how likely you are to fall victim to cyber attacks. How is it non-governmental organizations can predict crime using technology but police departments cannot?
In a trivial example, I recently created a model that predicted with 80% accuracy* which traffic offenders in a major American city were likely to pay fines. It’s likely that my model was indirectly ingesting information about “proximity to policing,” but it did predict the crime (civil offense, technically) of failing to pay traffic and parking tickets. Maybe there is a more charitable reading of the phrase, “technology can’t predict crime,” but in the most obvious and important sense, it is false. This model was hardly groundbreaking. The code itself was only a few pages long and took a few hours to write. The techniques used are well known and understood tools of applied statistics. The academic literature is filled with examples of far more sophisticated and reliable prediction models, many of which explicitly and effectively address the concerns raised.
The chief technical concern raised is that models can get stuck in vicious circles—what Guariglia and Grauer call an “algorithmic loop.” An area is identified as high crime or an individual is identified as high risk. The police focus resources on that individual or area. They detect violations they otherwise would not have. That in turn justifies more policing, which turns up more violations and so on.
This is an accurate explanation of the way that AI models can fail. It is also a class of problem that has well known solutions. In more general terms, this type of model failure is known as a “rich-get-richer” or “circular contribution effect” problem. Left unaddressed, this problem afflicts the PageRank algorithm, the heart of Google’s search engine. Pages are ranked on the basis of inbound links. This puts new content at a disadvantage for being discovered, which undermines the purpose of having a search engine in the first place. The problem is solved by adding what’s called a “damping parameter.” Simply put, a little bit of randomness—10-20% in the case of PageRank—allows new content to be discovered while prioritizing content established to be useful.
We can analogize this to predictive policing. Data collected from police patrols are not the only source of information for these models, but they can be important inputs. If police only patrol areas that they know to have a history of crime problems, they will be slow to detect emerging threats and problems with existing predictions. To compensate for this circular contribution effect, have police randomly patrol areas without known elevated crime rates. If areas were misidentified as high crime by the algorithm, this would give the Santa Cruz police the opportunity to correct the mistakes. In scientific terms, this establishes a control group against which to judge previous predictions.
Another approach would be to cap the amount of resources devoted to any given area. The Santa Cruz police could set a hard cap for a given area or they could taper increases in police presence beyond a certain threshold. The precise mechanism can be adjusted. One great strength of Artificial Intelligence is that it allows us to experiment with different approaches, see their results and respond accordingly.
Flows, not Proximity
One potentially valid component of this critique is that proximity to past offenses may not be the best means to predict future crime patterns. AI researchers know this. A 2019 paper in the Journal of Quantitative Criminology proposes a more dynamic approach. According to the researchers, tracking traffic throughout a city can be effective. Thieves move to where potential victims are. As thousands or millions of residents of a citizen adjust their daily routines, so do criminals. The researchers contrast their “mobility flows” approach to proximity-based approaches. They conclude that the two approaches are complementary. This approach was pioneered in China and the method of tracking traffic through cell phones is ethically dubious, but anonymous traffic data could, in theory, provide similar benefits.
Think of it like this. When children first learn to play soccer, their instinct is to directly chase the ball. They move around on the field as a giant herd of kids, all competing to attack the ball directly. As they become more experienced, they learn to predict where the ball will go in the future. It matters where the ball is right now, because the player who controls the ball now controls the game. Having a dynamic mental model of where the ball will be, though, gives players the ability to better direct their energy and effort throughout the game. Police can and do take a dynamic approach when using predictive policing tools. DUI checkpoints typically aren’t set up at 2PM on a Thursday afternoon. Cutting edge predictive policing allows this insight to be applied more broadly and effectively.
As with the problem of circular contribution effects, the problem raised by proximity models is not that police are using predictive technology. The problem is that police need more and better data, and it needs to be synthesized more effectively. When we find that there are shortcomings to the models, that is an opportunity to learn, grow and improve. The nature of Artificial Intelligence is that bad predictions are detected and the algorithms are adjusted accordingly. By incorporating innovative approaches, AI researchers can give police better tools to reconsider bad assumptions. Without such tools, past mistakes are more likely to persist.
Garbage In, Garbage Out
Guariglia and Grauer implicate another well known AI problem in their analysis. Namely, they allude to the classic “garbage in, garbage out” problem. Namely, these models rely on data provided by police. Other data, like crimes reported by citizens are also influenced by those citizens’ biases and willingness to talk to the police. This data, presumably tainted by individual biases, then justifies bigoted or at least disproportionate behavior by the police.
The incorrect assumption is that such models must rely on this data. In the “mobile flows” model above, crime is predicted using foot traffic data. Businesses could provide estimates of foot traffic derived from CCTV footage or cellular network traffic while the data could remain agnostic regarding race. Maybe, in America’s car culture, car traffic would be a better indicator. Other indicators, like the number of residences with barred windows, hospitalizations, deaths ruled as homicides and so on are less directly influenced by human discretion than citizen and police reports. This is only scratching the surface of the possibilities. The beauty of data science is that it allows you to find indirect, objective indicators that a human might not think to correlate.
There’s a larger problem with this complaint, though. Minority communities are less likely rather than more likely to cooperate with police. This is well known both anecdotally and in the academic literature. If minorities are less likely to contact the police, then it does not stand to reason that our predictive policing models are being fed too many reports of incidents in minority communities.
Over and Under-Policing
Dr. Rod K. Brunson and Brian Wade, criminologists at Northeastern University, wrote a paper called “Oh hell no, we don’t talk to police,” which delves into this issue. From their perspective, black communities are victims of both over-policing and under-policing. Minor offenses are over-policed, so African Americans might receive more traffic and parking fines. On the other hand, more serious crimes are not given the investigative resources that might be allocated to crimes in predominantly white neighborhoods. A lack of cooperation makes protecting minorities from violent crime more difficult than it otherwise would be. This has the effect of eroding trust between African Americans and law enforcement, and the downward spiral continues.
This probably isn’t surprising to advocates at the NAACP, ACLU and EFF. Their expressed concern was that predictive policing would give legitimacy to unfair treatment of minorities. That is a valid concern. The solution, though, is not to prohibit tools that can improve and even save the lives of African Americans. We should embrace data science in policing, whether it’s called predictive policing, machine learning, artificial intelligence or applied statistics. If we prohibit these tools, racism and inequality will persist but be less visible.
Think of the evergreen debate over why minorities are arrested and convicted at disproportionate rates. Civil rights advocates blame racism in policing. Law and order Republicans blame minorities for committing crimes at disproportionate rates. These two explanations are not mutually exclusive and they’re hard to disentangle. How much of the disproportionate treatment is the fault of police and how much can be explained as the fault of criminals themselves? Another round of racial sensitivity training is not going to solve the problem. Neither is sentencing reform.
If we can identify causal factors and make valid predictions, we can propose smarter interventions. For example, the hypothesis that black communities are both over-policed and under-policed contradicts the “broken windows” hypothesis. Broken Windows suggests that police departments should focus on curtailing minor crimes in order to reduce the incidence of larger crimes. This is at odds with the view of Brunson and Wade. Predictive policing, because it embraces modern data science and other tools, has the potential to tell us whether Brunson is right or if Broken Windows is right. My bet is on Brunson and Wade, but the debate will likely continue, unproductively, unless we apply the rigor of modern analytics to our most important social problems. Predictive policing is not the problem. It is the solution.
* “80% accuracy” is ambiguous in data science. There are different ways to balance false positives and negatives. In this example, I used the AUC score.
Pete Mains is a data scientist, political consultant and technology entrepreneur. He has run a number of political campaigns for local and state offices in Arizona and Virginia. In 2016, he was elected an Arizona Republican National Delegate for Ted Cruz.
You can follow Peter on Twitter @pmains.