Police Data meet Big Data

Want to get the attention of the police department?  You don’t have to zoom by at 120MPH – there is a better way.  Tell them that you used their redacted license-plate reader data to locate their secret cameras, and then give them the camera coordinates.

Trust me, it works.

Within three minutes of sending the email to my contact at the Minneapolis Police Department Records Information Unit, my phone was ringing.

I should start by saying that I have great respect for our police, and I support them in their mission to protect and serve.  For that reason, my first action after finding the vulnerability was to contact the MPD and explain how the 2.1 million-row database they had just released to the public could be used to locate their secret cameras.  I gave MPD the information they would need to better redact the data to prevent someone with bad intentions from doing so.  And as long as the locations of the cameras remain confidential, I plan to keep them that way.  I have also remained silent about my discovery to give the City time to complete a special request that immediately classifies the data as private.  That request was made by the Mayor last week.

But it doesn’t end there.  I’m going to show you how data like these can be used to do much more than find hidden cameras.  Much has been made of the privacy implications to individual private citizens, but, as you will see, those most exposed are the police themselves.

First, for those who may be unfamiliar with the technology, Automated License Plate Readers (ALPRs) are cameras that can be mounted on a vehicle or in a stationary location.  The cameras have a single purpose: to read every license plate they see and store the plate number, date, time, and GPS coordinates where the read occurred.  The cameras can immediately compare the plate number to a list of wanted and stolen vehicles and alert the police.  The data can also be stored and analyzed later to identify potential witnesses or suspects who may have been in the vicinity of a crime.

In three months with just eight mobile readers and two stationary readers, the MPD collected over 2.1 million license plate reads.  Before releasing the data, the MPD deleted the GPS coordinates of data collected by the stationary hidden cameras to protect their locations.  Of the 2.1 million reads, almost 1.3 million came from the two stationary cameras.  You can see the number of reads by day/reader in this chart.  Also clear are the regular fluctuations in daily traffic volume and the drop in traffic for the Thanksgiving holiday.

The mobile readers covered most of the City during the three months.  Each location where a mobile plate read occurred is a single dot in this graph.  The dots are colored by reader.  As you can see, they form a pretty good street map of Minneapolis.

What are the privacy implications?  Is your plate in this database?  Could a stalker use these data against you?  If you commute into or through Minneapolis, there’s a chance your plate is in this file.  (And kudos to those of you with creative personalized plates – some of these are pretty funny).

But are you really at risk?  Let’s look at some numbers.  Of the 2.1 million reads, there are just over 621,000 unique plate numbers in this data.  The majority of those, 360,000, were read only once.  In fact, 530,000 were read four times or less in the three month period.  If we exclude the reads by the stationary cameras, which are not located any place you should worry about being seen or stalked, and we exclude police and other government plates, there are only about 8,000 unique plate numbers that have been tagged ten times or more in the three-month period.   To track someone, you would typically need more hits to establish a pattern (unless you really got lucky).  Only 75 plates have been hit 40 times or more by a mobile reader.  While you could certainly use these data and a little luck to track and find a vehicle, I would estimate the risk to any single individual is very low.

And by the way, the readers aren’t perfect – they make a lot of errors.  For example, there are read errors like POL1CE, POLICB, POLTCE, and POLYCE that occur many times.  Apparently, I, E, and sometimes Y are challenging.  In looking at just the POLICE plates, it appears that read errors occur about 10% of the time.  That suggests errors are occurring in all plate reads fairly frequently.  If the 10% error rate applies across all plates, that means about one out of every ten reads returns the wrong plate number.  This is consistent with one ALPR manufacturer’s website, which advertises a 90% accuracy rate.

Still worried about being stalked?  Well, I’m not.

But I am still concerned.  The history of technology is a history of unintended consequences, and I am about to show you some examples.  While the plate readers might not be much use in tracking any particular private individual (unless a lot more of them get installed), they do a great job of tracking the vehicles that carry them.   These data contain the exact GPS coordinates of each ALPR-equipped vehicle for every minute of every day its reader was in operation during the last three months.  You can see the patrol patterns, plot the behavior of individual vehicles over time, and, of course, use the data to determine the secret location of the stationary readers.  There are many ways a bad actor could potentially use these data to impact public safety.

In this video, you can see an accelerated movie of reader-equipped vehicles on patrol on the same day.  The larger the circle, the more plates the reader was encountering at that location.  You can even see one vehicle getting caught in rush hour at a series of intersections.  Please note there are not nearly enough readers to play big brother over the whole city – and this was a busy day with four mobile readers operating at once.

In this video, you can see a single vehicle’s patrol pattern over several different days plotted on the same map.  You can easily see when and where this vehicle generally starts its patrol, and, if I hadn’t cut the animation short, even some of the places where it regularly stops (hint – not a donut shop).

Admittedly, your average criminal knucklehead is not going to have the chops to make use of these data to foil law enforcement or to find hidden cameras with statistical wizardry.  I’m not worried about those guys.  I am concerned about individuals, organizations, or foreign governments who might use these data to assist in planning and executing operations intended to do harm, and who do have the chops to use these data in combination with other intelligence sources.

Just like the rest of society, governments are challenged to keep up with technological innovations.   Sometimes those innovations have unintended consequences, and most police agencies are probably ill-equipped to handle them. I got the distinct impression in my interactions with MPD that this whole ALPR data issue has been a big distraction for them.  They aren’t data scientists or information security experts – they are police officers.  From what I can tell, they don’t have enough readers, enough technical expertise, or the desire to use these devices as some sort of Orwellian big brother.  At this point, it’s much ado about nothing.

That isn’t to say that we shouldn’t take action.  These technologies will likely become as common as video cameras in a few years, and, as I’ve proven, in the hands of skilled individuals these data can be very powerful.  As the legislature considers the classification of these data, they should also consider providing both oversight and support.  Legislative oversight of the use and disposition of these and other intelligence assets has much precedent, and oversight should include input from citizens with the right skillset.  Only people with deep knowledge of data science can understand the possibilities and pitfalls of large datasets like these.  As watchdogs, they can also help dispel unfounded public fears, and even help law enforcement personnel use their data in ways they did not know were possible.

Perhaps someday these data could be used to do things like find a kidnapped child during an Amber Alert.  Imagine combining and storing data from all agencies into a centralized, secure repository (with the proper oversight, of course).  The historical patterns could be used, for example, to determine in real-time likely routes the suspect would take, and thus give police a better chance of finding the child quickly.  That’s just one example – the possibilities are many.  Reaping the benefit of these technologies while protecting civil rights will require careful deliberation and an investment in building the right oversight.  Let’s not just delete the data or lock it away.  Let’s determine how we can use it for the greater good within the framework of a free society.

Want to Learn More?

These books are a great introduction to big data and analytics. You don't have to be a statistician, they were written for everyone.

For more suggestions, see Recommendations in the menu bar at the top of the page.