Data Skeptic

Auditing Algorithms

Duration:: 42m
Broadcast on:: 29 Jan 2016
Audio Format:: other

Algorithms are pervasive in our society and make thousands of automated decisions on our behalf every day. The possibility of digital discrimination is a very real threat, and it is very plausible for discrimination to occur accidentally (i.e. outside the intent of the system designers and programmers). Christian Sandvig joins us in this episode to talk about his work and the concept of auditing algorithms.

Christian Sandvig (@niftyc) has a PhD in communications from Stanford and is currently an Associate Professor of Communication Studies and Information at the University of Michigan. His research studies the predictable and unpredictable effects that algorithms have on culture. His work exploring the topic of auditing algorithms has framed the conversation of how and why we might want to have oversight on the way algorithms effect our lives. His writing appears in numerous publications including The Social Media Collective, The Huffington Post, and Wired.

One of his papers we discussed in depth on this episode was Auditing Algorithms: Research Methods for Detecting Discrimination on Internet Platforms, which is well worth a read.

(upbeat music) - Data skeptic features interviews with experts on topics related to data science, all through the eye of scientific skepticism. (upbeat music) - Christian Sandvig has a PhD in communications from Stanford and is currently an associate professor of communication studies and information at the University of Michigan. His research studies the predictable and unpredictable effects that algorithms have on our culture. His work exploring the topic of auditing algorithms has framed the conversation of how and why we might want to have oversight on the way algorithms affect our lives. His writing appears in numerous publications, including The Huffington Post and Wired. Christian, welcome to Data skeptic. - Oh, thank you very much for having me. - Oh, my pleasure. To begin, could you speculate on the degree to which algorithms impact people's everyday lives? Is this something that happens, you know, every couple of months when I buy a plane ticket, I'm affected, or are there more general cases than that? - Well, I think it's pervasive. Today, we interact with computers in all kinds of ways when we don't particularly think that we're interacting with computers. I don't know if you remember, but a few years ago, there was a lot of enthusiasm about smart cities and smart infrastructure. And now you hear a lot about the internet of things, but what people don't really understand, I think, is that, you know, the computers are already there and they're already embedded in water towers and in cars and in whole pricing systems and in transit and obviously in your phone. And so really, whenever a computer is making some decisions, we could call that an algorithm. And so this happens every minute. - Yeah, to what degree, if at all, do people have transparency about the algorithms that affect them? - Well, I mean, I don't think they have much. I'm not sure exactly how worried to be about the transparency issue because a lot of things happen in our life that we don't have transparency into. We've been using complicated technological systems for a long time before they were computers and we didn't necessarily know how they operated as users. I think the real risk is not that we don't individually understand algorithms, but that we collectively don't understand them. So for example, I don't think it's important that everyone who has a phone understand all the algorithms that are operating on that phone, but rather that someone should be looking out to see if there's any kind of misbehavior or something worrying happening with these automated systems that we use every day. - You've spoken about two particular criteria which might help frame a conversation about unexpected consequences of algorithms and that's how predictable the outcomes are and how discoverable the outcomes are. Could you describe these criteria and why they're important? - If we think about algorithmic systems, one of the things that's happened lately is that when people have complained about the way that some of these systems operates, the people that design the systems have gotten very defensive and that's a shame because really I think that if someone says, gosh, it looks like this system isn't working the way we want to, it might be because there's something intentional going on, but it might just be that the systems are in fact very complex. There's an anthropologist named Nick Siever who did this kind of the, I don't know, the Jane Goodall of audio recommendation algorithms. So he's kind of like sitting in the jungle with the apes, but the apes are software engineers that design recommendation algorithms and he pointed out that people who design the algorithms aren't necessarily that clear about what they're doing. And in fact, they're often not clear at all because these are systems that are huge. So if we say, how did Google recommend that search result to me right now? You know, there are hundreds of factors and Google has said in its public statements that it has at least 150 engineers that touch the algorithm. So if you're one of 150 people working on a small part of something complex, how could you know exactly what it's doing? So this is the criteria of predictability. There are some things that algorithms do where if something goes wrong, maybe we should be able to predict it because the problem is so obvious or the algorithm is so simple. But then there are probably a lot of situations where the point is not to be angry at the developer. It's not that they could have foreseen the problem. Some problems are really very difficult to foresee. So that's the predictability criterion. So the other problem we have with algorithms that's a little different than other systems is that because the algorithms are often used to personalize our experiences, it might be very hard for us to know if we're experiencing something that other people are also experiencing. And so that might mean that if an algorithm is behaving in some objectionable way, like for example, it's behaving anti-competitively or fraudulently or it's doing something we don't want with our data or it's discriminating in some illegal or at least unwanted way. It's hard to say, is this a widespread problem? Is it systematic? Is it just a particular instance of something that happened? And so this idea of discoverability is a big problem that we have because with some of our personalized systems, it's very hard for us to know what's going on. So let me give you an example. With television, you could say that the old way we used to recommend television shows would be that an executive at NBC or something would sit in an office and decide what everybody got to watch. Now the process was opaque, right? We don't know really how the executive decided. Something went on in the executive's head. The process is opaque, but everybody has the same process. So if someone says, gosh, it really seems like there really aren't that many people of color on television shows in prime time or a lot of the news appears to be missing this important story. We can all agree that that probably is happening because we all watch the same news and we all see the same shows on prime time. But with today's recommendation algorithms that kind of discoverability of a problem really becomes hard. - In the machine learning literature, we sometimes use the word discriminatory to essentially mean selective. There's this famous corpus of images of cats and dogs and we can say a good algorithm discriminates and says there is or is not a picture of a dog in here, but there's really a second definition or maybe in a more general audience, what I described is the second definition and the primary definition of discrimination or specifically algorithmic discrimination is more about how an algorithm might end up doing the things we consider discriminatory. Do you think that that's something that has to be deliberate? You know, some nefarious real estate baron says I'm not gonna sell to people of a certain shape or color or are these things that we think can happen accidentally? - Well, I mean, I think accidentally is definitely most of the problem here. That doesn't mean that the people who design these algorithms don't necessarily have some responsibility to think about these issues. So let's think about an example. So discrimination can certainly be legal and desirable, but we know there are elements of illegal discrimination. A good example is discrimination in housing. We know that if I go into a landlord and ask if there are any apartments available, it shouldn't matter what race I am. I should get the same answer no matter what race I am. And in fact, if the answer's different, it's illegal. And so that's an example of illegal discrimination. The trick with the algorithms that are now mediating all kinds of services, like for example, you know, what we used to think of as taxis and now we call Ubers or who looks at our resume on LinkedIn and so on, is that there may be nothing written in the code that would say anything like, you know, if this person is a woman, be sure to show some worse jobs, you know, there's nothing like that. We're not saying that there ever would be anything like that. That's ridiculous to think that that would happen. But rather that there are a number of, a very large number of factors, you know, in these equations and that just because we don't put a particular factor like that we're worried about, like race or gender or income or geography or occupation, just because we don't put a factor in there doesn't mean that the machine learning, as you know, Kyle, doesn't come up with it anyway. It could easily come to a discriminatory solution that was never written into the algorithm or intended. - Maybe I'm being a bit fanciful here. I don't know if this is my pitch for an episode of law and order or a genuine concern, but do you think we're gonna face the day where some legitimate provable digital redlining happens and a corporation or an individual says, you know, in a courtroom, don't blame me. We didn't know what the algorithm was gonna do. - Well, that's a great scenario because it illustrates that in some of the laws about discrimination, we have sort of two objectives. And, you know, one objective is to punish those who have committed a crime. And another objective is to eliminate the problem or to make a fair society. And some of the laws on discrimination really don't care if anyone intended to do anything. And so in this debate, we've been really focused on, oh, well, no one meant to put that in there. And that could be important, but equally important is simply realizing that there is some problem and correcting it. So for example, in many scenarios, we don't really care about whether there is a particular, let's say, racist thought in someone's head because how would we prove that? It would make it impossible to ever address discrimination because we would need some kind of amazing smoking gun every time. And so in fact, in the courtroom, cases about discrimination have really pioneered the use of statistical approaches to say we don't know how it happened, but there's some sort of generalized problem here that we should take steps to address. And I think that's what we should think about with algorithms. The challenge there is that a lot of the people who are now working on these algorithms really have backgrounds where they've never had to think about this stuff, and it's not their fault, it's just that the field has evolved, so now they're responsible for things like hiring decisions. And that's significant in a way that just generating a sort algorithm is not significant, and so it's time for the profession, as some of the people in machine learning have said, to kind of step up and realize that the ethics of what they're doing need to be addressed more systematically. - Yeah, hiring's an interesting one in that maybe I'm being too optimistic here, but I would very much hope that anyone building algorithms to work on some sort of hiring platform would not allow gender or race to be even variables in the system because they're completely irrelevant or should be. That isn't to say that they can't creep in there, like a zip code could be in somewhat correlated with race, but I would hope the makers are smart enough not to put gender and race as inputs, but there is something that creators of sites do a lot, which is A/B testing, trying out different experiments to see how they work, which in some sense then yes, it does show two very different results to two very different people, hopefully at random. Do you have any thoughts on the balance or in terms of when I'm seeing different results, how I know if I'm part of a test versus a intended or unintended piece of discrimination? - You know, that's an interesting point because there was a controversy recently when Facebook admitted that it was performing a psychological experiment on its newsfeed. Maybe you remember this. It was the emotional contagion study. Basically what they were doing is randomly selecting people into two groups and well, actually it was more than that, but let's just for the simplification say two groups and one group saw a different newsfeed than another group and this was particularly controversial for people because they changed the newsfeed in order to try and manipulate the degree of sad or happy messages that you saw, so they reduced some of the sad or happy messages that you saw. People felt very betrayed by this and one of the responses from the software development community and from Facebook was that, you know, we do experiments all the time. In fact, if you use one of our products, you're probably part of one or multiple experiments right now. So what could possibly be the difference between this experiment and that experiment? And so I think that's a key point. There was a really spirited defense of experimentation in online platforms from the CEO of OkCupid who maybe was also trying to sell this book on that topic. And so there was a lot of talk about this idea of experimentation and A/B testing. My reaction is that the word experiment, you know, it has a number of senses and we're talking about something else. I mean, one of the things that people reacted to in this Facebook example, they didn't object to the fact that companies try to make their products better by refining their designs. I don't think anybody says no. No company can make their product better by refining their designs. I mean, I think what they objected to particularly was a psychological experiment written by psychologists, published in an academic journal without the consent of the people participating or even their awareness. It turns out that Facebook did not have in its terms of use anything about the fact that the feed could be used in this way. So there would be no way to ever consent to this. So I think A/B tests are great and we're going to see a lot of them and you're part of all kinds of products, improvement experiments right now and that shouldn't be a cause for alarm. But I think the thing that really gets people nervous is the idea where experimentation refers to scientific experimentation and then we don't have the safeguards that we would expect like informed consent and a corresponding concern about ethics that we would see in a psychology department present in these online platforms. - Do terms of use in your opinion or perhaps if you can comment legally, do those, if that's buried in 30 pages of small print font that I just click accept or cancel on, does that qualify as consent? - So ethically, I think we want consent to be something that's effective but it's very difficult to do. So typically in a university if you do a study involving humans, you'll have a consent form that's really quite long and it's a challenge to actually convey what's happening. Nonetheless, I think that these university protocols do convey something. So they have check boxes at the bottom that specify how your data can be used and those are relatively clear it seems to me and people do check or uncheck them depending on their preference. So I think something is getting through there. The online platforms have argued that when they do experimentation, consent online is impossible and sometimes they've said because their platform is too big. Well, this doesn't make any sense. I mean, obviously putting a little pop-up window on there is definitely not beyond the technical capability of a Facebook or a Twitter or any of these companies. It's just that they don't want to because why would they want to put a whole bunch of complicated text in front of you? The main result of these consent processes is probably for you to think about the implications of your participation and perhaps decide not to participate. So they don't obviously want to do that to all of their customers. I'm of the opinion that people developing algorithms be they data scientists or software engineers. There's a certain responsibility they have to ensure that, I mean, we don't have a Hippocratic oath but I like the idea of do no harm and that's in the ACM's charter if people are members of that but I've seen enough accidents and sort of accidental discrimination go on that I'm not so arrogant to assume that I or anyone else like me can do a good job for sure creating algorithms that are not going to discriminate in some way. I think in some sense that's why I'm really excited about this concept of algorithmic auditing but before we jump head first into that, do you have any thoughts on where one's responsibilities lie as a designer of products and algorithms given that you might have a boss saying, hey, we gotta ship the product as fast as you can. What's my social responsibility of due diligence? - I really appreciate the comments you made and I would love to go even further and frame it a little bit in an opposite way and that's that one way to think about this is sort of here we are designing these systems. We have to be careful not to screw up and so what are our obligations to be sure we don't screw up? In fact, I really am excited about the potential of computational systems and I think sometimes in the past I've spoken about the dangers of algorithms and it leads people to think that I hate them but I don't, I actually love these systems and I think they're very promising and so I'd love to see the reverse question which would be what can advances in machine learning and data science do to eliminate discrimination because we've been talking about examples where these systems accidentally or intentionally do things that are undesirable but we can use machine learning and train our algorithms to do things like explicitly watch for harmful consequences and eradicate them. I mean, it could be the case that a computational system that selects resumes as a part of a hiring process is much more fair than one that is done in an old-fashioned way with people just looking through a pile of what's mailed to them and I think that's really exciting and so what I'd like to see is a kind of frontier of machine learning and data science that was about sort of fairness and justice and given the history of what we know about just generally how these kinds of media platforms will operate, we know there's going to be problems so we know that there's sort of work to be done in the area of representation, in the area of privacy, in the area of responsibly dealing with people's data, in the area of representing their identity. We've earlier talked about discrimination of various kinds in hiring by geography so I would really love to see that as an exciting new area of machine learning and I think we're getting there. I mean, there's some really interesting, there was a keynote at a machine learning conference last year that was really an interesting sort of take on this that said, well, you know, given that we know that we always want more data so that we can, you know, sort of, usually this we end up serving the majority and we're happy if we make the most people, you know, moderately successful on some outcome that we're measuring but what if we started thinking about machine learning as sort of lifting up these smaller groups that may not be well served by the summary statistics of old? I mean, it used to be that statistical reasoning was very crude and everything was done about normality and the mean and the average and now we have all kinds of other options. So it's not like Netflix is recommending to me one movie that it recommends to everyone that's the average movie. No, so why don't we use this personalization to address issues of fairness? That's really exciting to me. - Yeah, that's very exciting to me as well. I think there's an opportunity here given that, you know, algorithms are somewhat blind. They don't bring biases to the table. Maybe when we hear about these studies where they find that, you know, certain minorities don't get callbacks from resumes as often, things like that could be eliminated perhaps. So I'd like to get into a discussion about your paper, auditing algorithms, research methods for detecting discrimination on internet platforms. There'll be a link in the show notes 'cause I would definitely recommend everyone check the paper out as well, but to begin that discussion, could you start maybe the way you begin the paper with a historic example of an early case where people are starting to question the operations of an algorithm behind the scenes? - Fans of computing may know that a really important historically significant use of computing has been in airline reservation systems because it's a really complicated problem to manage a fleet of airliners and the seats on them. And so initially, a lot of advanced computing was used in defense, but one of the first commercial applications of heavy-duty computing by IBM and others was airline reservation systems. And in fact, the Sabre system was a pioneer and that's the system that's still around and it's still behind a lot of the travel sites. I don't remember if it's still at the bottom but it used to be if you went to some of the travel sites like Travelocity and Expedia, some of them at the bottom it would say powered by Sabre. So reportedly Sabre came about because the head of American Airlines took a flight with the head of IBM. First class I imagine and they were talking to each other and the IBM person was talking about the different things they had done in coordinating radar installations over large distances. They thought maybe there was an application to airline reservations and so American and some other companies, but I'm just focusing on American, invested a lot of money in designing one of the first large-scale airline reservation systems and in fact, initially every American agent at the counter had this terminal and then later travel agents could get them as well and American expanded to do airline reservation for all airlines and not just Americans because they had already built the infrastructure to do this and that's a big advance. I mean, they actually used to have these bulletin boards located in strategic warehouses and the bulletin boards would have pushpins to indicate if seats were occupied and so to reserve a seat it would require calling these different warehouses and moving the pushpins but then computers changed all that. One of the things that happened with the Sabre system though is that some of the competitors to American airlines 'cause remember, American was an airline but then it also owned the reservation system. Some of its competitors noticed that they were flights recommended first that did not seem like the flights that were the most likely to serve the request made by the travel agent. So the travel agent would put in some request, I want to fly to Miami and they would get some sort of really expensive, really long with a lot of connections American flight before the flights of Americans competitors. And so they got suspicious and they started looking into this. This all culminated, I write about this more in the paper but it all culminated in a hearing before Congress where Bob Crandall then the CEO of American Airlines had to account for himself. Why is the Sabre system recommending these American flights above all the competitors? And he could have gone a bunch of different directions here. I mean, he could have said, "Oh well, it's our business. "We're a private company. "I'm not gonna tell you." Or he could have said, "Oh, it's an accident. "If it was an accident, I hope he wouldn't lie." But he actually just said, "Why on earth would American "invest all this money in airline reservation systems "if we weren't gonna buy us the results?" What's the point? Why would we invest all this money if we couldn't make some advantage out of it? It's not altruism, it's business. So in the paper, I call this Crandall's law and that's that when you're looking at some system that uses an algorithm to organize something that is giving you, you should probably think of that fact. The fact that Bob Crandall said, "Why would we invest all this money in this system "if it doesn't help us?" It might help you as well, but the goal is really to help the platform. Obviously the platform can't do a terrible job of helping you continuously or you might notice and if there's an alternative goes somewhere else but generally we should be skeptical about these platforms. It's actually one of the first interesting uses of human computer interaction was a unit Crandall came up with at American that he called the Screen Science Group and that group was just sat around figuring out how to order their results so that American would make more money, not so that the users of the system would have the best flights. So the idea behind auditing is just that we should take as a default position that some of the systems we interact with right now today are probably out to get us in some way that's not obvious. Just as it wasn't obvious for Sabre until the competitors noticed when the Screen Science became so blatant and so egregious that the 14 hour $15,000 flight was the first one recommended. So yeah, and then in the paper I try to lay out what we might do if we take this assumption that some algorithms are probably out to get us, how would we see inside them or how would we do something about this? - Do you think the objective is for me as a consumer to be aware of the possible consequences of that interaction so I can maybe make an informed free market style decision or is it so that we have a lot of eyes on the processes that are being initiated or perhaps it's all the above? Well, I guess my question is what's the objective of doing an audit of an algorithm? - A lot of people have said that the way forward in this area is which is getting a lot of attention now is algorithm literacy. So algorithmic literacy is actually a stated goal of sort of US policy and it's important in computer science education but what they mean by it now is sort of this skepticism that I was talking about. I'm not sure that's really the right way to think about it because I guess as an optimist about the power of computing, I don't think what I wanna do is take my children and scare them about all the terrible things that the computer is doing so that you can be worried all the time and guard yourself. I think in fact that we collectively have the ability to build the systems that we want and if we decide that systems should work a different way, we can make that happen both as designers and also as a society. So that's what led me to this idea of auditing. So auditing is actually a term you're probably familiar with from finance but there's a second meaning which is my meaning and that's the meaning from social science originally from economics and auditing was a process by which testers would vary the inputs to a system systematically to see if the outputs changed. This was originally done in the 1970s by economists at the US Department of Housing and Urban Development in order to try and combat segregation. So they would test landlords and realtors by doing things like sending white people and then sending black people in a random order to ask for listings. And then they would see if the listings varied based on the race of the people that they sent. Very famous audit studies have done things like test car dealers, test employers. There was a recent audit study just last year that tested professors. I'm a professor so I care about that. By sending appointment requests that were identical but varying the ethnicity of the name requesting the appointment. And this found that actually the least likely group to receive a reply if you didn't know the person if you requested an appointment were actually I believe Asians, so Asian names. So usually when we do these audit studies we find widespread discrimination. It's very depressing actually. So we find that black people are more likely to pay more for a used car and that resumes from people with African American sounding names are less likely to receive callbacks and so on. So we find that there is discrimination in society using these methods and my paper just says what if we apply these methods systematically to algorithms? - So do I have to be a PhD computer scientist or data scientist or something like that to do such an audit? - I think that there are some things that you can do on your own. One thing that we see in blogs and news stories about online systems is that a lot of bloggers have actually found interesting consequences of algorithms accidentally simply by noticing something. A number of people have done things like against the terms of use they've made a second Facebook account under a different name and then they've looked at their own Facebook account and they've been surprised that some things don't operate the way that they thought. This is what led bloggers to do things like they discovered that the way that likes work on Facebook is not particularly intuitive. Some bloggers noticed that dead people were often reported to like things and that puzzled them and so they would ask their readers to try different, have this ever happened to you? What are the circumstances? So I think yes, I think individually we can make some small steps toward discovering some opaque systems by sort of comparing notes with each other. Sometimes by using more than one view into our own data like by using multiple accounts, sometimes that's called a sock puppet using a fake account. But there are other problems that we talked about earlier that really require more systematic action because we don't know if there's a serious problem in terms of sort of, it's difficult on LinkedIn if you're the one looking for a job to tell if there's a problem in hiring because you don't see the employer interface unless you make an account as an employer. So there's obviously some things that an individual messing around their own are never gonna be able to do. And so there's also a role for some trusted third party that's going to be looking into this. We might call it like a consumer reports for algorithms if you've ever seen that magazine. - I think there's one more type of audit we can touch on it. You'd mention the sock puppet audit. And then in some sense, well, actually two more. I think you've also implicitly mentioned the noninvasive user audit where people are just comparing notes about experiences. Can you talk a bit about the scraping audit? - Some computer scientists have really done some excellent work where they try to understand how proprietary opaque systems operate by gathering all the public information they can about them. Some of these systems do things like, I guess a good example might be Netflix, which provides a lot of information about itself publicly on its website. You can, for example, with a web browser and a script try to figure out what are all the categories that Netflix uses to categorize movies because that's listed on their public website. So the scraping audit, it's an attempt to get more data than you would just messing around on your own because it's not practical. But it just uses public data to try and figure out if there's some pattern that we might want to address. It doesn't necessarily prove that there's a pattern but it can give you some evidence. So a scraping audit could do something like test for how the prices of something might change on different websites or look at how, as I said earlier, the categories of something might vary or what categories there are. A challenge of that kind of audit is that these are private companies and they've generally been pretty negative about the idea of even a well-meaning computer scientist scraping their publicly available web presence. And in some countries, there are laws that are really quite restrictive. In the US, there's the Computer Fraud and Abuse Act, which is really restrictive about the use of computer systems without permission. And so I think that the companies, it's not, I think that they're really that concerned about wrongdoing or discrimination being uncovered. I think if there was a problem, they would want to address it. I think instead, they're more concerned about their competitors. And so they just generally have a position that they don't want anyone going around scraping all of their data and taking it 'cause they feel their data is what they have to offer. - I'm somewhat sympathetic to that perspective, I think. In a similar vein, one of the other audits you'd mentioned is the code audit where a third party would go and do some sort of inspection or test. I certainly can't speak for any of the big LinkedIn's or Google's of the world, but I would imagine they would be apprehensive about that idea and the response would be something to the effect of, this is proprietary, it's a trade secrets. Where Google or a search engine is always fighting against people trying to manipulate their system so they don't want their code getting out, which would be the skeleton key for someone to go and do some black hat SEO type work. Is there any room for a code audit to happen? Or is this just a concern that because of the proprietary nature of commercial software, we're unlikely to see? - Well, I think there's a couple of things we could do. The first proposals about algorithms that came out were about transparency and you still hear a lot of talk about that. You know, recently Google has had some anti-trust troubles in Europe. I believe it was that German justice minister was quoted in a news story and he said, you know, the way to get out of these anti-trust problems is for Google to just reveal its algorithm. And I think that demonstrates some naivete about the way that these systems operate. Like I imagine there's a room somewhere and the German justice minister is sitting at the table and there's someone from Google and he says, show me the algorithm and I'm just not sure what they would produce at that point. That would be of any use to anyone because this is gonna be very complicated, it's gonna be very long. And conceptually we know that the algorithm produces results by interacting with the users. And so it's the algorithm combined with the prior data that produces the outcome. And so just the algorithm without the data may not tell you anything. And so I think, you know, some people have addressed your concern a little bit. Frank Pascuali wrote in a fantastic book about this called Black Box Society. And he has argued for a kind of escrow where there would be a trusted third party that would keep the algorithm secret. So that you would have to show the algorithm but only to this trusted third party and it would remain secret. I mean, I think there you still have the problem of complexity and you have the problem of there not having the data, you wouldn't be able to predict the consequences of the algorithm. So I guess I'm a little skeptical about code audits. A related proposal by Ed Felton that I think is interesting and it's kind of like a code audit might be that there's a way to use computing to prove that an algorithm wasn't doing something or was doing something. So in other words, to come up with a proof in the sense of a mathematical proof so that you might not see the algorithm, it may never be revealed but an expert can assure you that the algorithm isn't doing something. And that's an interesting idea. Maybe we could share these proofs as opposed to the algorithms themselves. - Yeah, there's something for me analogous to some of the points you made earlier about discrimination in a courtroom. It's we can't see what a person was thinking realistically because we're not psychics but even if we were that the complexities of all the elements that go into one's decision are somewhat arguably analogous to the complexities of hundreds of engineers working on source code and data that maybe no one person really understands yet we can measure both of these things statistically. So your proposal of perhaps a consumer reports for algorithms I think is a good one. Is there any movement in that direction? Are we seeing bodies forming around doing these sorts of things? - Well, I mean, the New America Foundation has really done some great work in this area and they've produced a kind of policy report which is called Big Data and Discrimination which is an interesting one. And that's also been taken up. It's been mentioned in Slate and The Washington Post. And I think generally there's just some interest in this idea. There's a feeling that the news regularly includes stories about algorithms doing things that we don't want them to do. There's a cover story in New Scientist last year that kind of summed it up for me and it was sort of like the secret algorithms that run your life. I mean, generally the press on this has been pretty bad and something needs to happen. And so generally when the users of these systems, meaning you and me and the listeners, find some of the stuff that's happening. They really don't like it. And so it seems like we have to do something and auditing does seem like a reasonable approach. Although it's really only a first step, right? 'Cause if we audited something, it's not clear that it would be easy to prove in some legal way that something was happening. It's more like a warning sign or a siren that goes off like a fire alarm. If you found a pattern that was worrying, you would then want to address it in some other means. And that would mean you would have to have the will to address it. So for auditing to work, you would need to have a community of developers or the industry ready to change on the basis of finding something out or you would need regulation that was ready to act or you would need some legal course of action to allow you to file a lawsuit. Because auditing in itself isn't going to tell you anything. Like if consumer reports comes out and it says that a particular brand of washing machine is terrible, at least you need someone to read that and to have a legitimate other option and to choose to not buy that washing machine, otherwise it's kind of pointless. - Are there systems, at some point when they get so big, we should have some over, require some overarching auditing on them. You were mentioning Saber, which as far as I know, whether it does today or not, was a monopoly at some point. You could make a case that we have a monopoly in search engines, although just looking at the distribution of market share. At some point do algorithms become so important and so ubiquitous that it's reasonable, we should put requirements at maybe a governmental level on them. - I don't think Saber was a monopoly. There was a competing system that I can't remember the name of, but in that time was very different because then we had a regulated airline industry and there was a strong sense that the way the airlines worked is important to the country. And so actually the first algorithm transparency rule that I can find came from something called the Aeronautics Board, which is what used to be the FAA in the US. And it was a funny, it's a funny rule. Like it basically says that you have the right as a customer to write a letter to an airline reservation system demanding and it called it their sort criteria. So basically that's an algorithm transparency rule. You write the letter and they have to tell you how they're deciding what flights to show you. I have been unable to determine if anyone sent such a letter ever or what the result would be. So I don't know if there's any aviation historians listening to your podcast, but I would love to know if anyone ever requested this. This rule was in effect for a while and then the airline industry was deregulated, basically getting rid of a huge amount of rules, so including that one. So I don't know if that ever worked, but it is the logic you were just talking about. People said, well, this is too important to just leave the chance we'll have to do something. That's also the position of this recent report, big data and discrimination. I mean, one way that the people in this report have posted is that we have a number of gains against inequality and injustice that we've made in the past, I don't know, 50 years, and that these are now threatened because we're re-implementing the processes of society with computers and we're not taking into account these problems. And so the concern that they have is, as you stated, that things like civil rights are too important to society to have them slip backwards suddenly just because we're using computers to implement them. And then other authors have seen that as a real threat. - Yeah, I sure hope that by chance we do have that historian listening, because that's a fascinating case I'd like to get to the bottom of. If them or anyone else want to find you online, where are the best places to look? - My webpage is at niftyc.org. - Great, I'll be sure to put that in the show notes. Are you on Twitter at all? - I am. I'm at @signniftyc and I also blog most regularly at the Social Media Collective. - Very cool, I definitely look forward to seeing a future post there as well. Anything on the horizon for you you want to throw in any new projects or anything? - One of the projects I'm most excited about that's new and isn't out yet is a project about how people understand algorithms. We know that people act on the basis of what they think the algorithm is doing, right? So you might post something on Facebook in a different way if you think that that way will get it seen by more people. If you think that Facebook will show it more. And so we have a new study coming out led by Mata Hari Asalami at the University of Illinois and it's about folk theories of algorithms. So I'm really excited about that. So what do people think the algorithm is doing? Because we know that will shape their behavior and we found some really interesting ones. One that I like the best. We found a widespread feeling that if you post something you should like it really fast 'cause that gives it a sort of a start. And then we found interesting theories that we never thought of. Like one was that if you like something you should also hide it. And the reason you should do that is you want to send Facebook a signal that you don't actually want to see more like that. So you might like something 'cause you want your friend to know that you're their friend. But then you hide it because you want to sort of balance out the like with some negative feedback because in fact the post is boring and you don't really want to see more. And you know that if you just click like you'll be sending the message show me more things. So we found a variety of really interesting things. I think that's another frontier of learning about algorithms is how people think about them and how they come to conclusions about what's happening when they use these systems like Twitter and LinkedIn and Google and Facebook. - That's fascinating. I'm looking forward to reading that and I'll be sure to give it a mention on the show when it comes out. Well, Christian, this has been a great conversation. Thank you so much for coming on and sharing your perspective. - Oh, it's a pleasure. And I'm a fan and I'm looking forward to the future podcast on Data Skeptic. - Wonderful, yeah. And until next time, just want to remind everyone to keep thinking skeptically of and with data. (upbeat music) - For more on this episode, visit DataSkeptic.com. If you enjoyed the show, please give us a review on items or a stitcher. (upbeat music) [MUSIC PLAYING]