Today's episode discusses the accuracy paradox. There are cases when one might prefer a less accurate model because it yields more predictive power or better captures the underlying causal factors describing the outcome variable you are interested in. This is especially relevant in machine learning when trying to predict rare events. We discuss how the accuracy paradox might apply if you were trying to predict the likelihood a person was a bird owner.
Data Skeptic
[MINI] The Accuracy Paradox
(upbeat music) - Data skeptic mini episodes provide high level descriptions of key concepts related to data science and skepticism, usually in 10 minutes or less. - Our topic for today is the accuracy paradox. - I'd like to start this out, Linda, by asking if you know what machine learning is. - No, but I would like to start it by asking what is machine learning? - Well, thank you for that. Any guesses? - Well, you said you do AI, artificial intelligence. So I assume it means you're training a machine how to learn. - Yeah, that's actually very good. I think a more formal definition, like a dictionary definition, would say something to the effect that it's the study of the algorithms that learn from data in order to make predictions about the future. And even that's not necessarily perfect 'cause you might learn from data just to have a descriptive model for some reason, like if it was about old historical stuff. But let's just go with making predictions about the future. If you're gonna make a prediction, the one thing most people would say you need is what? Accuracy, you need to know how accurate the predictions are. If I came to you and said, Linda, I can predict with 50% accuracy whether or not you should bet on a particular team in a sports event. Would that be very good? - Well, I'm not gonna bet at 50%. - Oh, at what percent do you start to be interested in betting? - 75. - 75%. Have you ever bet on a sporting event? - I have. - Yeah, what did you bet on? How would it guess? - Horse racing maybe. (laughing) - Really? - They have those racetracks. - Tell me about this, I didn't know you. - Or, you know, with March Madness, the football, not football, sorry, basketball. - Oh, that was big at your school, right? - Yes. - So we would bet by entering pools. I don't know if that's technically betting. - I think that also, not only is that betting, but it violates your student ethical code if I'm not mistaken. - Everyone did it. - You might get your degree revoked, too. - Well, I don't know if we always did money. I think we just like entered online in like a group, and then, you know, you can measure who's winning or losing, but some people would collect money and then whoever won would get that money. - Well, anyway, getting back towards it, we would look at any system that makes predictions and ask, well, how accurate is it? What does accuracy mean to you? - Correct. - Yeah, it's synonymous with correctness, but how do you measure correctness? If you had three questions and then you asked them and if the person answered one out of three correct, then they have an accuracy rate of 33%. - Oh, how very frequent is to view? Indeed, they do. So you're hitting on a key thing here. You look at the historical data you know, and you say in the past, how many predictions were accurate and how many were inaccurate? And accuracy is the true predictions, well, or I should say the true positive, so when you said something would be true and it was the true negatives when you said something would be false and it was, that's your sort of numerator and you divide those two cases by all the cases of which there are four, right? There's true positive, true negative and false positive, false negative. Also known as type one and type two areas that we talked about and I think our very first episode. - Well, you're gonna have to refresh my mind. - Okay, well, type one is a false positive. False positive is where you say something is true and it's not. And a false negative is where you say something is not true and it is. - Oh, okay. - So, let's get to an example of if we wanted to predict whether or not a new person we just met is a bird owner or not. What would be a false positive? - False positive is that if we said, oh, I think they are, but they're not. - Yep, that's right. And then false negative would be. - If we said they are not a bird owner, but they are, in fact. - Actually, we're, yeah, okay. So you got it. And then of course, there's the two other cases where we guessed correctly, whether we guessed true or guessed false. So those are your four cases and we usually, it's very important actually to look at just this result 'cause accuracy alone doesn't tell the whole story. So for help in this episode, I went to the 2012 US Pet Ownership and Demographic Sourcebook as published by the American Veterinary Medical Association. Do you find this study to perhaps be a work of high quality? - Questionable, how many people bring their birds to the vet? - Well, good point, but I don't know necessarily if they ask these veterinarians to report numbers or if they ask them to estimate the ownership, but by some means they came at these ownership numbers. That's a good question. What do you think they estimated where the percentage of households with a bird? Oh, first guess what ordered it? Like, obviously it's dogs or cats and number one. It's the other number two. What is the rank of bird ownership do you think? - Let's see now, I guess dogs probably number one. - Yeah, you're right, yep. - Cats number two. - Obviously, yeah. - What's that bird? - That's number three, cockroaches. (laughing) - What about hamsters? - Good guess, it's actually birds in the third place. - Oh, so birds number three. - Yeah, so in fairness, there's only the top four, so I can't really confirm that they didn't miss anything, but I'll trust that these numbers seem good. In fourth place, oh, guess fourth. - A fish? - No, yeah, I would have thought fish. Maybe they're breaking fish down by genus or something, but in fourth place, yeah, 'cause fish should be on here. - Come on. - So what is it? - Well, it's horses. (laughs) - Oh my gosh. - Your guess percentage of American ocels that have horses. - I have no idea, but you're gonna give me a clue, if I guess, then I'm gonna know already birds, so I think we should start with the birds. - All right, so yeah, okay, good point, guess birds. - Well, here's the thing, at my work, there's 160 people. - Oh, good day, good day, good day. - And I don't know all 160, but I think I'm the only bird owner. - So let's say it's one out of 160, so that's less than a percent. - And this presumes that your office is a representative sample of the United States. - Well, it probably isn't now that I think about it, 'cause, you know, we're in LA. - You're so unrepresented, too. - What represents the US? That's a good question. - That's not your company. I mean, you have a nice company, but like, I don't think you guys represent the country in some beautiful history. - Yeah. I don't think so either. - All right, so let's divide this out. - A lot of 160, yeah, that would be a little more than half percent, 6.25. - Well, it has to be more than that. - 0.625%. - So I'll just guess 1%. - Not too far off, 3.1%. - Okay, three times as much as what I guessed. - Now guess horses. - I guess 1% than on horses. - 1.5. - 1.5. - I thought that was shockingly high. - Horses are very expensive to care for and groom and et cetera, so I'm shocked. - Yeah, I know, right? That is number four instead of like fish, but like, I guess when you think of fish, their lifespan is so short, you like have it for a week, and then the next thing you don't. So it's like the numbers are... - Especially goldfish. - Yeah. - But yeah, I think they must have excluded something. There's got to be more goldfish houses than birdhouses. - Yeah, they die. Like I said, you have them for a week, and then after dies, you just give up. - Oh, it's like that last term in the Drake equation, I guess. - In the event, if we were back to our predictor of trying to predict if a household had a bird, do you think like demographic data and economic data, employment data, we could get all this kind of great stuff? How predictive do you think that would be of a household's likelihood to have a bird? - How predictive is what? - Every demographic feature you could want and a bunch of other information, like what their TV shows are, if they have other pets, what kind of car they drive. You know, I don't think it is very predictive, but I do want to share this little antidote in which that I told someone at work one time, I was like, you know, I actually have a pet bird, and then they're like, what? I did not place you as a bird owner, and I was like, yeah, I have a bird. And then they go, well, you know, all the bird owners I know are super weird. And then they said, now that I'm thinking about it, you fall into that category. - Wow, that's a little edgy. So I don't know, he, it was a he, he was definitely gathering for his limited data, a predictive model. - What made you so weird? I don't know what he's referring to. - I don't know, but it's interesting that he even thought to generalize that bird owners are weird. - I wonder if the listeners think we're weird. - Well, how many 1% of the population has a podcast about data skittances? - In an event, if I said I was going to build a model that predicted households that have birds, I could immediately, with no effort, build a model that is 96.9% accurate, you know how? - You just assume and it's always no. - Yeah, always no, that's the model. Always know you'll be accurate, 96.9% of the time, if you trust the AVMA's survey data. Like the old expression, a broken clock is correct twice a day. Well, I mean, they never really say in what way is it broken, you know, that always bothered me. - Well, unless it's completely in the shambles, and then it has a clock and a minute hand pointing to a time. - Well, it could also be running slow. Like it could be on a 20, like six hour period, you could control, you could actually control the rate at which it's right per day. It's a bounded thing though, it would caps out. - Yeah, you know, if it's slow, then that's a rate of how slow it is, and then the time passing as to how many times is it correct a day? - How many periods it will overlap at, yeah. So that 96.9% accurate model, would you use that if you were trying to find other bird owners that perhaps we could get Yoshi some play dates with? - No. - Why not? - It's very accurate. - It's not accurate enough. - 96.9% accurate. - Well, I need to identify the bird owners, right? - Well, you can, you give me a person, and my model will predict yes or no, if they're a bird owner. - Oh, well, I'd be very suspicious of the one person, they're like, "This is a bird owner," and then I reach out and they're like, "You are weird." - Yeah, so there's something interesting there, I just, we already figured out what the model is, right? Always say no. So we know that's intuitively a garbage model. But let's say I incorporated, got a bunch of funding, and set up my big company, and I hid my algorithm behind an API, and you had to pay expensive amount of money to call my API and get a result. And I said it was 96.9% accurate, and you didn't know that it was all a scam behind the scenes. My accuracy statement is kind of correct, but you don't trust it. Why? - Because if you're always saying no. - Yeah, eventually you'll be like something's a little fishy here, if you have enough sample data. And I think really what it drives out is accuracy is not enough, you also want to look at metrics like precision and a recall. Have you heard of those? - I have, but maybe you could tell us what that means in your world. - Well precision is the true positives, how many times you say it's, you know, in the class and you're correct, divided by every time that you said it was correct. So the true, so true positives divided by true plus false positives as the denominator. And then recall is of the times it was correct, how many you successfully were able to identify. So true positives divided by true positives plus false negatives. So those in conjunction with accuracy, accuracy is still important, but alone it's not helpful, especially when you have imbalanced class of data. Now this also, I want to give a shout out to my sort of more novice listeners, people getting into machine learning. This is the number one mistake I see people making. They come and they'd be like, "Oh, my model is high 90% correct at predicting fraud or predicting, you know, things like that." And I know immediately, "Oh, your model's wrong. You've either overfit the data or you haven't managed your class imbalance here. You have the accuracy paradox affecting your model." And people sometimes get mad at me because they're like, "You didn't even look at my work, how can you know?" I was like, "Well, because you can't just look at a couple of random features and then predict something very difficult to predict with high accuracy." So like bird ownership, as we were saying earlier, not easy to predict, is it? No. Yeah. If I said, "Okay, it's a man in his 40s, he lives alone, he's an American, but he's a Filipino descent and he drives a Hyundai. Predict whether he's a bird owner or not. How accurate can you be? What's the ceiling on that?" Well, I don't know. I don't really have it. No, I think it's 6.9. I don't have any stereotypes about bird owners. Well, you should. We have some commonalities. I guess actually for me, the number one feature I would maybe look at is other pets they own. And if someone owns a cat, that might make them less likely to own a bird. Or actually could make them more likely if they're like really big pet people or something. But it would certainly have some predictive power unlike what brand of orange juice they drink. The paradox here is in the fact that you can have a less accurate model, but it's a better model because it's going to capture the actual data a little better. So someone that came up with a model that was only like 50% accurate at predicting if you were a bird owner or not by looking at the way you answered a survey, that if you only look at accuracy, 50% is not nearly as good as 96.9. But if you look at it more practically and you look at precision and recall, you'll recognize that the fine-tuned model is going to actually contribute to useful information. It has some predictive power and its construction might even tell you a little bit about what the important features are versus the higher accuracy model, which is not as good. So it's a paradox that you might prefer a less accurate model. Do you think Google uses this in their adwords? In what way? Because if they're trying to narrow down, let's say like there's a company that is selling bird pellets and they want to target bird owners. Oh, so adwords, you can't really target bird owners, you target by keyword. But that's a pretty good way to do it because very few non-bird owners will look up bird pellets. At least I think we're kind of weirdo looks up bird pellets. You could be feeding the wild birds, like what are that, Mary Poppins feed the birds. I suppose, but that's still what they're interested in, right? So I think Google has more success just serving up pages based on what you've asked for than. Right, but they're not going to actually predicting, let's say I search for a vacuum and I already bought a vacuum so they don't actually know that. Do you want them to know that? Well, they just keep spending their ads on like vacuums, even though I already bought one. Oh, I think you're mainly talking about the major flaw in retarget advertising, but then we're off on another subject. Maybe we'll do retargeting sometime as a topic. Okay. We mostly do methodological stuff, not business stuff, so maybe next time we'll do ad retargeting or if not sometime early next year. Okay. So the end of the accuracy paradox, what's your takeaway from this? I guess in summary, when there's such low numbers and not much data to look at, it's better not to be accurate. In a better speaking, that's right. Yeah, especially when there's a big class imbalance, a less accurate model might be more helpful and informative. When do people use this in the real world? I think you need to use it when you're being skeptical of claims made with data, just like the show is all about. So if someone says, "Oh, we've modeled out who's a credit risk or who's an employee that's going to quit in the next months or something like that," and they say, "Oh, this model is 90-something, people love the 90 percent, something 90-something percent accurate." First of all, I always ask myself, "Is that intuitively plausible? Do I believe that you could do that without using magic?" Let's assume you've done the best possible job you can. What's the ceiling here? And that's the beginning of being skeptical of a model. But then the accuracy paradox is think, "Okay, am I interested in a very unusual case? People that have an illness, rare illnesses are above a few percent, right? Most people, except for maybe common stuff like halitosis. Most illnesses are rare, so it's probably hard to predict them." Because it's bad breath, it's not an illness, it's a bacteria. You're not ill from it. Well, the people around you might be. Or a small class like bird owners that's only three percent. When you're in these minority cases that the accuracy paradox can often be encountered, so you need to be a little bit skeptical of accuracy and a vacuum. I guess that's my takeaway. Well, thanks as always for joining me, Linda. Thank you. And until next time, I want to remind everyone to keep thinking skeptically of and with data. Goodnight, Linda. Goodnight. [Music] [Music] [Music]