New York State approved the use of automated speed cameras within a specific range of schools. Tim Schmeier did an analysis of publically available data related to these cameras as part of a project at the NYC Data Science Academy. Tim's work leverages several open data sets to ask the questions: are the speed cameras succeeding in their intended purpose of increasing public safety near schools? What he found using open data may surprise you.
You can read Tim's write up titled Speed Cameras: Revenue or Public Safety? on the NYC Data Science Academy blog. His original write up, reproducible analysis, and figures are a great compliment to this episode.
For his benevolent recommendation, Tim suggests listeners visit Maddie's Fund - a data driven charity devoted to helping achieve and sustain a no-kill pet nation. And for his self-serving recommendation, Tim Schmeier will very shortly be on the job market. If you, your employeer, or someone you know is looking for data science talent, you can reach time at his gmail account which is timothy.schmeier at gmail dot com.
(upbeat music) - The Data Skeptic Podcast is a weekly show featuring conversations about skepticism, critical thinking and data science. - Welcome to another episode of the Data Skeptic Podcast. I'm joined this week by Tim Schmayer. How are you doing, Tim? - Good, how you doing? - Very well, and I appreciate you coming on the show. So I came across when I was just, you know, doing some casual reading online, this really interesting analysis you did as part of the New York City Data Science Academy. To get started, tell me a little bit about how you got involved with that organization. - I recently graduated with my PhD in chemistry and kind of didn't really want to pursue the academic route. So kind of trying to find out what I'd like to do for career and kind of came across the program at NYC Data Science Academy and looked really good. Kind of like working with data, but I kind of lacked the programming background and seemed like that was kind of a specialty or a strong suit of theirs. And so got involved there and hoping to get a job here at the end of the course, which is in May. - So the analysis, at least from my reading, looked at whether or not the speed cameras that have been placed in New York City are contributing to public safety or seem to be more of a nuisance to the public and a source of revenue. Is that a fair assessment of kind of the work you did? - I think that's kind of what I was trying to explore. Yeah, I think that's a good summary. - Yeah, how'd you get interested in that as your chosen project? - There is another great guy who works with the NYC Open Data and his website or his Tumblr is IQuantNY. And he basically tells a lot of stories about New York City through the use of their Open Data. We just got these speed cameras this past year. And so it was kind of an unmind data set. I had no competition from the guys who already work in this space. So I decided to pull that data set and see what I could come up with and kind of concurrently a lot of the other data sets that are ancillary to this question are also available. So the program is purported to protect school children. And so you can also find a data set that's available with all the addresses of the schools in New York. And it's also supposed to increase safety for pedestrians and it just so happens the NYPD data set with all of the pedestrian vehicle collisions is also available. So you have a bunch of different rich data sources to pull from. - Yeah, it's really awesome that Open Data in three relatively unrelated ways has enabled the project. - Exactly, and that's kind of not what I expected, but I was able to find everything I wanted in order to answer all the questions that I had. And it was really kind of excited to find all that data in one place. - So I think you did some research too and to kind of what legislation was set out in order to even establish those cameras. Could you share what you learned about at least the lawmakers original intent when saying that these are things we should put into public service? - I believe the legislation is set at the state level and they kind of allow local jurisdictions to deploy these speed cameras systems. And the original legislation is aimed at protecting school children. And with that aim in mind, the cameras are only allowed to operate within a quarter mile of any given school or of a given school and then also owned during school hours, which I think they've defined as like eight in the morning until six or seven PM in the evening. - Tell me a little bit about those three data sets that you got and if there were any challenges kind of merging them together. - There's three data sets. The first is basically the addresses of all of the primary, middle and high schools in New York, pretty straightforward. That required almost no cleaning or work whatsoever. Another data set, which is kind of your collision data set, it summarizes all the vehicle collision in New York City. And that includes both pedestrian and vehicle vehicle collisions. And so fairly minor filtering out of the vehicle, vehicle collisions and you're left with vehicle pedestrian collisions only. And then the third data set was required much more labor. And that may be because this program is so new, they haven't figured out a great way to code the data up yet. The speed camera data set basically lists all of the dates times of tickets and also the intersections where those tickets were given out. The problem with that is that the way that New York has coded that data is you have a street that the ticket was given on and then you have a range of intersecting streets. And I'm not sure if that's because they want to keep the location kind of hidden or if they didn't want to code it. And then supposedly also they have kind of these roving mobile vans with cameras. And so perhaps they parked on one block and then moved up the street a few weeks later and gave some tickets there. And so that kind of required a lot of manual recoding in order to geocode and get the exact location of those. - If I was a supporter of speed cameras and I liked what was there and maybe wanted to add more, my claim would be that these are reducing accidents, not only collisions but making it safer for pedestrians and that they're protecting children in school areas. It seems that that was sort of the way you took on it and then set out to test that hypothesis. Is that a fair assessment? - So how I like to approach problems like these is, if I was tasked with deploying these cameras in order to maximize safety, where would I put them? In my mind, if the goal is to protect school children, you would want to put these cameras in very school dense areas of New York. - Seems logical. - Yeah, exactly. And then the other way you might think about that is I'm going to just try to protect all of the pedestrians and I'm just going to place these in kind of the areas with the highest density of pedestrian vehicle collisions. That's how I would approach that problem if I was working for the government to deploy this speed camera program. So obviously that's kind of the direction I chose to analyze it. And so my logic was if the distribution of these speed cameras kind of mirrors the distribution of schools in New York or pedestrian vehicle collisions, then you would assume or rather you would conclude this camera rollout was indeed to protect the public. And if it kind of deviates, you might start to ask the question, well, is it the revenue of the goal here and public safety is not really the focus? - Yeah, I'd have to agree. That's how one would want to distribute them if those were the exact intentions. So what'd you find when you double checked on that? - Yeah, so what you find is there's significant deviation from kind of both of those distributions. Actually, most of the schools in New York, or I should say there are 1,200 schools. So New York is very school rich, but kind of the densest clustering of schools is actually in Manhattan and the Bronx. There are fewer speed cameras there than in some of the other outer boroughs like Brooklyn and Queens. As you would expect, probably just from the density alone, Manhattan has the highest overall pedestrian vehicle collision rate. And so you would expect the speed cameras to be almost completely deployed in that area if you were trying to reduce the number of fatalities or collisions, as opposed to where they actually are deployed, which is in the outer boroughs, where you see very few pedestrian vehicle collisions. - You'd also mentioned earlier that the law states these should be within a quarter mile of schools. How well was that followed? - Well, I just took one example. I figured it made sense to look at kind of the highest grossing camera. This camera's given out in excess of 10,000 tickets over the first five months of operation. It's actually located in Queens and it's located on Queens Boulevard and for your listeners that don't know a whole lot about the geography of New York, Queens Boulevard is an eight lane roadway. The lights are timed and people generally drive fairly quickly down this roadway. It's actually mined by these cameras at three or four different locations. And so this roadway is really kind of the cash cow of this program. Looking at this one single camera, you can get the distance. Google Maps will return a distance to you with a couple of commands in R and you find that actually the speed camera location is outside of the quarter mile radius. It's required by law. And so I think for all of the drivers that receive tickets from that exact camera, they may have some legal recourse to try to get that dismissed. - Interesting. So the highest, I'm correct in saying the highest grossing camera and the whole program is also illegally placed. - It does appear that way. - Very interesting find. Let's just for whatever reason, let somebody off the hook on the placement. Maybe they weren't the greatest at optimizing where to place cameras. We can at least look at the question for the places they happen to have been placed. Did we see a decline or reduction in fatalities or just general accidents or that sort of thing? Was that something you looked at as well? - Yeah, so that's kind of the next question, right? Maybe the bureaucrats who were in charge of this program just didn't really look at the data. They just kind of distributed the cameras wherever they thought it might have an impact. You kind of give them the benefit of the doubt and you look, well, let's see if these cameras actually reduced accidents in the locations where they are, you find out that they have no real impact. And so if you look again at the density map of pedestrian vehicle collision after installation of the speed cameras and you compare it to the same time period in the year prior, you find that the densities are approximately the same. And so looking at a map like that, you would expect that if cameras did influence kind of your rate of collisions, you would expect that the density around each of those areas would be significantly reduced as compared to the year before they were installed. - If I were maybe really trying to push hard for the efficacy of these cameras, I could claim that, well, had they not been there, there would have been an increase, that these were, maybe they didn't show an improvement, but they gated what would have otherwise been a surge in accidents or something like that. Obviously, I burden a proof on me, but what did you find trend-wise looking at some of the historical data? - Yeah, that was an interesting question, right? If you look year over year, all of New York City actually experienced a 10% reduction in pedestrian vehicle collisions. And so I wanted to know, is that due to the speed cameras? I mean, I don't think so for reasons we just discussed, but let's take a look back further in history. This kind of more evidence that the speed cameras are ineffective. What you find is that New York's pedestrian vehicle collision rate has been decreasing since about 2012, which is as far back as the data is available. And what you find is it's the same rate for the cameras were installed as it is after the cameras were installed. And so you just kind of see the same trend continuing without any effect due to or attributed to the speed cameras. - Interesting, yeah, if you don't have it in context, one could kind of mislead their claims a little bit, I would think, just given the trend that was already there. - Right, and actually it's exactly the pitch that a lot of these camera companies come with, basically how most of these rollouts of speed cameras work is they place them in accidents that are known for the past few years that have high rates of collisions. And what you can do is you can simulate kind of accident rates with a Poisson distribution. And you find that if you just put speed cameras at a location that happen to have a high collision rate by chance, and then you track that over a two or three year period after the installation, you find a pretty significant reductions in crashes as simulated by the data, just because it's gonna return to the means, you know? And so that's kind of how a lot of these companies pitch their speed cameras is. Ex-town after installation saw a 40% drop in collision rate, and that's really just due to kind of the nature of the Poisson distribution. - Tell me a little bit about the tools you use to do the analysis. - Yeah, so I'm an R user primarily. There's a great package called GigiMap that all the analysis was done in. It basically sits on top of another package called GigiPlot. Both of them were made by a guy named Hadley Wickham, who is one of the Army and Hanners and author of a lot of the great packages in R. And it makes it really easy to do a lot of these sorts of things. Mapping, geocoding, coordinate, plotting points and density maps right on top of Google Maps, the same in Maps. It's really, really powerful and easy to use software in R. - Yeah, so I found your analysis really convincing and it's troubling for me if I were like, someone in the legislature as to whether or not A, the program was executed well, and B, whether or not it even makes sense to have it to begin with. I'm curious if you've heard any response back from your work if there's maybe been an impact or the start of a good discussion on the sort of legislative side of things? - You know, I feel like it's not been found quite yet. It's only been up for about 10 or 12 days now. And I don't know if a lot of people actually go out and search for good examples of data science like you and I would. So as far as I've heard, there hasn't been any sort of legislative impact. What I'm hoping to do is kind of WNYC NPR's branch out here in New York does a lot of stuff with open data and how it affects New Yorkers. And so I'm hoping to be a guest on one of their shows and really kind of lead listeners through the same sort of analysis in hopes of catalyzing a change here locally. - Yeah, I think it'd be great. Are there any other next steps you want to explore or ways you might extend the analysis? - I would have liked to have done the same sort of thing with parking tickets and stuff like that. However, kind of the motivation behind this is this other data science guy at Iquant NYC, and he's actually done a lot of that analysis already. And so you can kind of go back and see where the highest rate of hydrant tickets come from in New York. And you can see all of that stuff on his blog. And so he's really mined a lot of the data that I've already or that I would be interested in kind of pursuing further. - Well, still a great resource over there as well. So lastly, I like to ask my guests for two recommendations. They can be, you know, anything from a book to a software package or whatever. The first is the benevolent reference, which is something you're not associated with but would like to give a nod to through appearing here. And the second is the self-serving reference, something that ideally you get direct benefit from by getting exposed to my listeners. - I guess let me tackle the benevolent recommendation first. I have a Siberian Husky as a pet. As you and your listeners probably know, there's a huge problem with dogs being returned to shelters and euthanized every year. And so I'd like to point out Maddy's fun. They're doing some great work to reduce that number. And more interestingly for people like us, is they're using data to do it. And so kind of using data and diving deep into who's most at risk in these shelters, these cats and dogs, they've been able to reduce kill rates from approximately 45% down to about 5%. It's certain shelters across the U.S. - Oh, that's fantastic. - It's really amazing work. And a lot of their data and analysis is up. And they actually collect data from across the U.S. from different shelters. And so it's a great source of rich data for your listeners as well. It's a great organization. And if you care about animals, it's a great place to make a donation. - Fantastic. - I guess now for the self-serving reference, I will be looking for a job come May once the program is done. And so if anyone wants to reach out and speak to me more, they can reach me online at Timothy Dodge Meyer @ Gmail. - I mean, spell your last name just so people have it if they're typing. They can find it in the show notes as well, but you know. - Sure. C-H-M-E-I-E-R. - Cool. And let's share a little background too. You're in New York City now? - I am. - And are you open to relocation? - I am. Hopefully some more warm. - And what's your ideal job just in case your future employer happens to be listening? - Looking for data science or something closely related to that. - Cool. All right. Well, I hope maybe I send a few leads your way. - Great. That would be awesome. - Well, Tim, this has been great. I really appreciate not only the great analysis you did. I'd like to see lots more work like this all the time, but also taking a few minutes to come share it with the listeners. - Yeah, absolutely. Keep an eye out on that same website. My next post is gonna go up in another week or so. - Cool. I'll be looking forward to it. - Great. Thanks, Kyle. - Thanks again. Enjoy the rest of your day. (upbeat music) (upbeat music)