Data Skeptic

Early Identification of Violent Criminal Gang Members

Duration:: 27m
Broadcast on:: 15 Apr 2016
Audio Format:: other

This week I spoke with Elham Shaabani and Paulo Shakarian (@PauloShakASU) about their recent paper Early Identification of Violent Criminal Gang Members (also available onarXiv). In this paper, they use social network analysis techniques and machine learning to provide early detection of known criminal offenders who are in a high risk group for committing violent crimes in the future. Their techniques outperform existing techniques used by the police. Elham and Paulo are part of the Cyber-Socio Intelligent Systems (CySIS) Lab.

(upbeat music) - Data skeptic features interviews with experts on topics related to data science, all through the eye of scientific skepticism. (upbeat music) - My guest today are Elham Shabani and Paulo Shakerian. Elham is a PhD student at Arizona State University's School for Computing, Informatics, and Decision Systems Engineering. Paulo is an Assistant Professor at ASU where he directs the Cyber-Socio Intelligent Systems Lab. I invited them here today to discuss their recent paper, Early Identification of Violent Criminal Gang Members. Elham and Paulo, welcome to Data skeptic. - Thank you, Kyle. - Thanks for having me here. - So maybe to get started and to have some background, Paulo, could you share sort of the objectives of the lab and then maybe Elham, could you tell us a little bit about your interest in role there? - The Cyber-Socio Intelligent Systems Lab, our focus is on artificial intelligence and machine learning problems, particularly in the areas of social network analysis, cybersecurity, and law enforcement. And we look to enhance these application areas by developing novel methods, as well as applying existing ones. - Very neat. And Elham, how do you fit in there? - I'm part of the Cyber-Socios Lab under the supervision of Dr. Paulo Shakerian. In our lab, our goal is to address the real world problems using mathematical models. I feel interesting to be part of this lab. - So I really enjoyed the paper you guys wrote. Maybe to get started, could you give us a little summary and talk about the problem you were trying to solve or at least the analysis you were trying to perform? - Here we are trying to answer an important question that can we identify potential violent offenders ahead of time? Answer to these questions allows police to more smartly use its resources to avoid violence and reduce homicides. Our aim here is not to direct arrests, but to direct the police presence in time and violence as part. - Makes sense. So could we talk a little bit perhaps about how a police officer will end up benefiting from the use of your project? - What we are doing here, we are solving one of the major problems in the United States, which is gang violence. It's accounting for 20 to 50% of homicides in many major cities. And what we found that the law enforcement has existing data on these groups and the underlying social network structure is important and can be predictive to identify violent individuals and help police for enabling them to do a smart policy. We try to leverage these social networks and create some features to classify individuals as potential violence so the police can use our output to use its resources more effectively. - Yeah, I've seen a lot of interesting work where people are using graphs to do a social network analysis. A lot of times that's more social or cultural projects, but what's really interesting about yours is you're working on something that I think can have a bigger benefit to society. And I imagine you're able to leverage a lot of the same techniques as general social network analysis. So to get started, maybe we could talk about the graph and maybe how you represent that co-offender network in terms of nodes and edges. - In our networks, the nodes represents the individuals who have committed a crime. There is a link between two nodes if they have committed a crime together. What we have in our network, there was like about 9K individuals and around 17K edges among these nodes. - In terms of that data, when you say two offenders committed a crime together, what does that actually sort of mean in the data sets? So I could imagine a scenario where maybe three people do something they shouldn't have done and the police catch them and two are charged and one is let go, maybe on a technicality or it wasn't clear if they were involved or not, does that third person show up in the network? - For the arrests, in this case, all three of them would form a triangle. The charges are not considered only arrests in their data. - Got it. Yeah, it makes sense that if you're arrested together, you certainly know one another. And if you're indeed a criminal, you're probably in cahoots to some extent. But from what I understand, some of the way you model the problem is has the network as being unknown that there could be two criminal associates, two nodes for which there's no link between them. Is that correct? - What do we mean with the unknown is like, the network is discovered over time. We don't know the whole network before we do the experiments. So we try to mimic the real world operations in this model. So we try to consider that the network is discovered as time goes. - Yeah, it makes sense. So I guess in that way, it might help to capture that someone who has unfortunately chosen crime as a profession, they have to start at some point and they might get away with a few things or build their criminal network and that evolves as they grow. Is that more or less what you're trying to represent? Or is it because the police don't always have the full information picture of the associations between co-offenders? - I think it's somehow both of them. We are considering that the network is discovered over time, like some of the individuals who haven't done any violent crime, they might commit some violent crimes or nonviolent individuals they may commit violent crimes. So in the current time, the police doesn't have any idea who are going to change from nonviolent to violent or who are going to commit violent crimes. So to mimic real world operations, we try to consider these experiments. However, without this problem or difficult that what is in real world since in the earliest stages, we are not considering the whole data that we have. So our neighborhood base features, network base features, visuals based on the structure of the network, they become less powerful. And as time goes, when we are having the social network structure, the problem is there is lower number of violent individuals to predict. So the imbalance data problem amplified here. - Yeah, that makes a lot of sense. I see many machine learning applications. In fact, pretty much all of the ones I've worked on in my career tend to have this class imbalance problem where the set that I may be most interested in is the one I have the least observations of. Did you have to do anything special in your analysis to account for that? - What we did here since the minority class was most important for us, we try to evaluate our model based on that class because we are having high value of precision and recall for the non-violent class. What we focus in our experiments to get high precision and high recall for the minority class. - And can we take maybe a step backwards? I think I didn't ask you enough about the data set you guys have. So you'd mention it's arrest records and it's who was arrested in the same incident. Could you maybe describe what you had access to and what were some of the valuable fields and the time period and things like that? - Our data set consists of gang related areas incidents that they gathered from August 2011 to August 2014 in Chicago. We had some information about the locations, dates and the links between joint areas. Also the gang affiliation of the offenders. - Interesting. So I don't happen to know much about gang affiliation. What does that look like? Are there mini gangs or two gangs? Do different gangs, is it hierarchical in nature? - They're like maybe about 10 gangs, 15 gangs. It's what are defined by the police. We really don't know how are this structured among them. We have some names and labels. - The interesting thing in working with the Chicago police what a big problem they have is so Chicago's a really big city with a large gang problem. Throughout the city it's, like what Oham said, between 10 and 20 sort of dominant gangs. And what happens though is many of these gangs start to become more like franchises and certain elements within the gangs will kind of coalesce to carry out their own agenda. And sometimes the agenda will be around violent behavior. And a big problem they have in Chicago from the analytical side was how do we identify these pockets of people who could be as small as five or 10 individuals? How do we identify when they are sort of forming an offshoot group before they become formalized as such? And usually this is when they're trying to make a name for themselves and conducting the most violent actions. - Ah yeah, that would be very helpful to have early predictions of. I imagine law enforcement must have been working on this before you guys took a look at the problem. Could you discuss some of the current approaches or maybe previous approaches that are in use? - There are two existing methods. One is past violent activities that we claim that an individual can meet a violent crime in future if he has committed a violent crime in the past. As you may guess, this method creates too many false positives which is not practical. The other method is to have heuristic. This technique designed to predict gunshot victims in Boston by papa crystals. What they found was there is an inverse relationship between the probability of being a gunshot victim and the shortest past distance on the network to the nearest previous gunshot victim. And the Chicago police have adopted a variant of this method to identify potential gang violence. To do this, they considered all neighbors in one and two half away from the previous violent criminals as potential violent individuals. - How did you guys find that those techniques worked in practice when you applied them to your data set? - For the PVA, as I said, it produces large number of false positives and false positives is really important because it prevents the law enforcement from using this effectively in practice. THH is working better. It creates lower number of false positives. However, in comparison with our method, THH is far behind in terms of true positive as like we are producing four times more true positive than this method. And the number of false positive in our method is comparable with THH. - So just also to make one comment about THH is this idea was developed out of the sociology and criminology literature. It became popular in Chicago because of the findings. However, this is an example of a statistical result that got adopted and to some extent, a little bit applied improperly in practice. And this is sort of where what all of us do come into play is to better help with practical applications of things is take a more data-driven approach. What does the data say? What really works on the historical data? In THH, there's some good evidence in certain situations when you know about victims and so on that it can provide predictive results. But for this particular problem, it was never shown to work and it was just used as a heuristic by the police where we were able to find things that were more data-driven. - I suppose that's what really got you guys into looking at the supervised learning approach which I was really excited to get a chance to speak with you about. To start with, I've always found that feature engineering is the toughest and most crucial part of a supervised learning problem. Can you guys discuss your process in arriving at a good set of features that describe that network? - We leverage intuition from social network analysis as well as criminology and sociology literature. We try to represent these models mathematically. So what we did, we took qualitative idea from these literature's context and make them make some mathematical model to solve our problem. - Makes sense. I think you've broken it down into kind of four general areas. I was hoping we could touch on 'cause I thought they were really novel. Well, I don't know, maybe they're standard approaches you're using, but they're a novel application here. Could you talk a little bit about those neighborhood-based approaches? Are we talking about a neighborhood like, you know, Bronzeville or a neighborhood is in the relationships of different offenders in the network? - What we mean here in the neighborhood is the immediate neighbors of the individuals in the social network that we create, like the individuals who are one hop, evay or two hop, evay from the individual and we predict based on that. For example, one of the features that we found interesting in neighborhood-based features was is the majority of the individuals that we have in one and two hop evay from the individuals, whether they are violent or not. - So you also looked at some network-based features. Could you describe what those are and how they're unique from the neighborhood-based features? - In the network-based features, we are considering the whole network and how the individual is playing in that network. For example, the tipping point model as one of important features and predictive features that we had, which considered that and say that if the individual is predictive, is going to propagate to the network or not. - Ah, so that would maybe capture a case where if perhaps not my direct associates or known associates, but the associates of my associates, if they tend to be violent criminals, then maybe that violence can propagate or sort of get carried over or mimicked by the people closest to them. Is that the basic idea? - Yes, exactly. - And could you touch on some of the temporal characteristics you looked at? - For the temporal features, we consider like, if what's the average consequence times of the individuals for committing crime? So if they are committing crime in a short time, in a role, we think it's more probable for them to become violent or commit of crime again and again. - Ah, so would it be fair to maybe say that that's sort of a feature that could capture like a forgiveness, that if someone has gone quite some time without committing a violent crime, maybe they found a better path in life? - Yes, exactly, but also it depends if they are increasing or not. - Ah, yes, of course. - I guess hard to commit a crime if you're behind bars. And lastly, could you touch on the geographic characteristics you used? - In the geographic features, we want to see like, if the individuals are committing crime in the violent district, violent beat, how are they going to change? Are they going to do violent crimes because of the neighborhood that they are located in? So we get our intuition based on the criminology literature that these kind of features are important and predictive. - Do you find that the features you guys looked at were chosen because their well-studied features used in other social network analysis? Or were they in some way tailored to this specific use case you were discussing of identifying that little clique of individuals who want to make a name for themselves? - The features that we're using is based on both criminology literature and social network analysis, the features that they are predictive for different kinds of problems. So we adopt those features for our problem to get some good model. - So things like centrality measurements that are well-studied in social network analysis and for sort of a playing graph are often used for feature generation. We use these concepts, but we adopted them to the specific domain. So for example, the nodes in our graph were labeled with gang affiliation, as well as labeled with the type of crime that the people committed. In the paper is described, customized versions of these centrality measurements that take into account these labels as well, and do so with an eye to what the application was. - You have this great ROC curve and a nice presentation of some other model diagnostics in the paper. Can you discuss how you use those methods to evaluate the quality of your model? - As I said, our data set was highly imbalanced around like 60K of non-violent offenders and 4K of violent offenders. And what we were looking was for the minority class. So we tried to use precision recall and raw curve for the minority class to see how our model is working. Maybe it's better I started precision what the precision means in our problem is the number of individuals who are correctly predicted as violent to the number of individuals who predicted as violent. And what is important here for us that getting high value of precision and true positive. So the police can use our model in real world to predict, to cover some of the areas in order to prevent getting high rate of homicides. And other thing was like recall, which is the number of individuals correctly identified as violent over the total number of violent individuals in their specific time duration. Recall is also important, but what was the most important as I said, it was getting high value of precision and they were able to get four times more true positive in comparison with the existing methods. - Well, so it sounds like your approach is a notable iterative, maybe even order of magnitude improvement on the two current systems. So I imagine that's probably exciting for the police to use. Perhaps we could talk a little bit about the actual use case for them. It seems to me if the police had infinite time, they could check in all the time with every known violent offender and make sure that they're sort of keeping an eye out that that person's not about to commit another crime. But of course, the police don't have infinite time. They have to prioritize who they might investigate. How are they able to use the model to help with that? - Yes, exactly. If they had like infinite time, maybe the PVA was better than our method because they are considering all the individuals who have committed violent crime, but it's really impossible to go through all of the individuals. So our method is reducing the number of false positive. A real challenge with this work was ensuring we had the right metrics for the application domain. So if we had blown the doors off with this type of applied work and why it was good to work with experts in law enforcement when doing this research, because if we had blown the doors off and say had 70, 80% precision and we had hundreds or thousands of false positives, the police would still not have been able to use the results of our work. And the reason for this would be simple is that like you pointed out, they don't have time to explore everything. So it was really important that we looked at, hey, if we found precision, how much recall can we get? And that's why we even looked at multiple models where we had one that had maybe a little less recall, but the false positives were less. And so this was, again, this was a big challenge with the work and it really helped to be working with the domain experts in doing this. - So I think at some point, someone will certainly compare your work to the classic book and later movie, the book by Philip K. Dick Minority Report, which is all about predicting future crime. How do you respond to that comparison? - What we did here was creating a data-driven model which is different from what they're doing in that report. Their prediction was like a crystal ball and it leads to RS individuals. However, what we are doing, we are not going to RS anyone, but we are directing the police presence so they can use their resources more smartly in time and violent aspects. - The other thing the police also do with results of work such as ours that does not occur in Minority Report is they do outreach to gang members that are at risk for violence and try community interventions. So the swath of tools available to the police is much more wide and nuanced than what is shown in the Tom Cruise movie. - I think it's really great personally that the police are taking an interest in more data-driven methods and in crime prevention over just arrests later. I think it's all around better for all parties involved. So I'm glad to see collaborations like this taking place. I was wondering if you could talk about how this collaboration got established with you guys in the Chicago Police Department. - So back in 2012, I had a paper on some social network analysis I was doing for a counterterrorism application that received some press attention. And I got an email from the Chicago police saying at the end of that year, they were expecting to reach 500 homicides, which was a huge spike. And they realized that their tactics where they were going after certain individuals and the gangs seem to in some ways be making their problems worse. And they wondered if they could do something more data-driven. And so this kicked off a collaboration that has banned now several years where we have analyzed their data and provided them software. So it's been really great getting to work with these guys. I've been to Chicago several times now to present the work and work with them to integrate it into their workflow. I've even been on ride-alongs and got to see the territory and understand the situation firsthand. And so this is just really made for some exciting research and things that we hope can make a difference in those communities. - Yeah, very interesting. How do the officers leverage your work today? - So right now we're in a bit of a transition situation because our big proponent of our work was Superintendent McCarthy, who has recently resigned his position. So right now we're working with them through this transition period, which is always difficult in a politically charged police department like that in Chicago. - Sure. Do you happen to know if the data set you worked on is part of Chicago's Open Data Portal or is this a closed more private data source that needs to be a bit more protected for personal information? - Yeah, this was closed and provided particularly for us. - Makes sense. Curious as to what's next for your work? Are there any future steps along the same lines? - Yeah, we have a couple of initiatives that are related to this. Of course, we are continuing to pursue the violent offender prediction problem. In particular, what we've been working on since this paper was published, was understanding how our classification methods arrived at the results they obtained. And this allows a couple of things. It allows us to examine is the system becoming biased toward cultural or racial groups? It allows us to understand which features caused the system to arrive at a certain decision. And it just gives the police a better level of comfort with the results of the analysis. We are also working a couple of different law enforcement applications, including location of missing persons with a local 501(c)(3) here in the valley called a fine me group, as well as human trafficking crimes also. - Do you happen to have one or several if you'd like links or places people should look into if they want to follow up on both this paper or some of those other lines of research? - I would say you could follow my Twitter account, which is @poloshackASU or on our webpage at lab.engineering.ASU.edu/CYS for sizes. - Excellent. I'll be sure to put those both in the show notes as well. Alhamma, are you by chance on Twitter or maybe a link you'd like people to check out if they want to follow up on your research? - They can follow my research on LinkedIn, www.linting.com/in/alham. They will be my latest publications and projects I'm doing. - Excellent. I'll put that in the show notes as well. Anything else to touch on or promote before we sign off? I think you guys have a conference coming up. Is that correct? - So this summer on August 9th, we will be holding Sysis Tech 2016, which is our annual conference where we showcase the research in our lab. And we have, we really encourage people from industry, government, law enforcement to attend. And we will soon be posting information about this conference on both Twitter, LinkedIn and our webpage. - Excellent. Well, it sounds like an exciting event for people in the field. So Alhamma and Paula, thank you so much for your time today and coming out to talk about this work. I thought it was really interesting and I hope people will check it out. - No, thank you Kyle. We appreciate your interest in the lab. - Yeah. Thank you very much for your time. - Thank you. - Thank you. - More on this episode, visit datascaptic.com. If you enjoyed the show, please give us a review on iTunes or Stitcher. (upbeat music) (beep)