Archive FM

Data Skeptic

[MINI] The Bonferroni Correction

Duration:
14m
Broadcast on:
22 Jan 2016
Audio Format:
other

Today's episode begins by asking how many left handed employees we should expect to be at a company before anyone should claim left handedness discrimination. If not lefties, let's consider eye color, hair color, favorite ska band, most recent grocery store used, and any number of characteristics could be studied to look for deviations from the norm in a company.

When multiple comparisons are to be made simultaneous, one must account for this, and a common method for doing so is with the Bonferroni Correction. It is not, however, a sure fire procedure, and this episode wraps up with a bit of skepticism about it.

[music] Data skeptic mini-episodes provide high-level descriptions of key concepts related to data science and skepticism, usually in tenons or less. Today's topic is the Bonferroni Correction. So then to hear the way I thought I could explain this. Imagine you were hired by a company to investigate whether or not they have discriminatory hiring practices. And to do this, you're going to look at all of the people they've hired and all of the people who've applied over the last five years. And you're going to try and figure out, like, if that company how often they hire certain types of applicants and not types of others. Sure, so we're assuming whatever means they're trying to get these resumes, and applicants is non-discriminating too. Yeah, that's a good point. Like, if you go to, like, no ugly people resumes.com, and that's your only source, then, yeah, you're going to introduce a bias. And there's also something to be said about universities and training programs don't always, they can have a bias, right? Like, you could have less graduates from some particular category, and then, you know, do you expect the same absence in the job market? So I think maybe for this conversation, we should avoid the sort of real world fact that there is gender discrimination, racial discrimination, age discrimination, because those are complicated issues that maybe we should even talk about later on data skeptic. But I just want to talk about more sort of, like, simpler things, like maybe if you discriminate against someone's eye color, stuff like that. Well, we'll see after you prove your point. So let's pick maybe a trivial distinction, since we're going to over trivialize this topic for demonstration purposes. Let's consider handedness. Do you consider yourself to be left or right handed? I don't consider I am left handed. But as am I, isn't that nice how that worked out? We just can't figure out if Yoshi the bird is left or right footed. She usually sits on my left hand. So let's think about a company. Do you have to know how many other left handed people are at your company, where you work? Isn't there a statistic that says like 10% of the population? About 10% of people are left handed. We have like 160 people, so I think we're big enough to represent. So you should have like 16 left handed employees then about. What if you found out you only had 15 left handed employees? Should we protest? Yeah, that's probably fine. Get all our South Paw friends together and have a stakeout picket sign thing. No. All right. What about only 10 lefties? That would seem odd to me. Yeah. How about only one leftie? Oh, yeah, incredibly odd. And even worse, what if you worked at a really, really big company and there were 999 righties and one left handed employee? Oh, yeah. Yeah, actually that one is suspicious to me too. Yeah, so there's probably a pretty large number of businesses with around 160 or less maybe employees or something. Yeah, so it's probably like 10% of the world's businesses. Sure. If I told you we searched all those businesses and yours was the only one that had a single left handed employee. Would that seem equally suspicious as it did earlier? Yes. Why? You just like once? So wait, you search all them and none of them had left handed employees? No, no. They were all good. Most of them had an average of like 15 or 16 and yours was the only company that had just one. Oh, if it's the only outlier than probably not. There you go. So what changed from earlier? Well, I mean, I still think it might be suspicious. Like 10,000 companies that have 160 persons. It is weird to only have one that has such a low number. Oh, now you're going to go the other way and say that it should be more frequent if that size of businesses. Yeah. Yeah, you might be right. Actually, we could do that calculation offline. But so yeah, if you look at all of the companies of that size, you should actually expect a distribution over those values. Like some, the mean value, most of them should probably be around 16 left handed employees, but then summit 15 and 14, so on, so forth. Sort of binomially distributed over that, over all the possible values, right? What is binomial? Binomial is a type of statistical distribution that is aggregating zero or one outcomes, like the toss of a coin. Yeah, I don't know what that really means. So I'll just have to agree. Well, so like if you flip a coin 20 times and you want to say, what's the odds I get at least two heads, you can use the binomial distribution to answer something like that. Along similar lines, then you can ask the question, well, what's the probability that only one would have one left handed employee? That's how you get a statistical answer to that. It's like a hypothesis test. So you might remember a long time ago, we talked about p-values, right? A little bit, yeah. That is the likelihood that something is due to chance. So just like we agreed earlier, if you expect 16 left handed employees and you only get 15, that's probably just luck of the draw, right? And what about 14? Probably luck of the draw. How about 99? 99 out of a company of 160. Yeah, yeah. Oh, that's crazy. Yeah, that's very suspicious. On the other end, that'd be like discriminatory to right handed people, perhaps. Well, yeah, plus that's harder because left handed people are only 10% in the population. So that's also impressive. Yeah. When you want to ask these sorts of questions, you need to get a p-value. And generally you're going to compare that, sorry for getting technical, but you compare that to some baseline called alpha. And a lot of people pick 0.05, which means that as long as the odds that your result just was a total random chance are better than 1 in 20, we'll consider it probably a real result. 1 in 20 is 0.05. If someone tells you aha, we prove something with statistical significance and alpha of 0.05, that means that their result could be wrong. And 20 people out of 20 people like them, one of them is likely to be wrong. So that's, you know, you can be pretty comfortable with the result, right? Maybe. If out of 20 people, one of them is only likely to be wrong. Oh, yeah, I guess. What's the percentage of 1 out of 20? 5%. So 5% risk of error. Yeah. Yeah, fine. But now that risk of error gets worse and worse if you check for many things. So a moment ago, we were checking, like, if it was weird that one company, the whole nation had only one lefty. But let's not stick it just your one company, okay? So let's say you did this, you surveyed everybody. And in fact, you have, like, 16 left-handed employees. Sure. No discrimination. Well, we're on the average. What about then you're like, okay, I didn't catch them on left-handedness. Let me check everybody's eye color and hair color and what brand of muffins they like to eat. Eventually, do you think you might find something that would show that discrimination you're searching for? Well, it could be, for example, Southern California has a lot of Asians. So maybe we don't reflect the number of Asians in the state in the USA. Which I think is probably between 5% and 10%, maybe even less than 5%, maybe like 3%. That could be. So yeah, we have like 20% Asians. They could be thinking we're just hiring too many Asians. Could be. That's actually making the conversation a little bit too complex in a different direction because now you have to talk about like, well, the local population might be composed differently than the national one and the fields you hire from might produce more graduates of a certain race than another. And so the topic's really complex. So that's why I wanted to stay away from the real social issues that do exist and should be analyzed of like age, race, gender, sexual orientation, discrimination, kinds of stuff. But just talk about goofy ones like your eye color and your hair color. So do you think after long enough of goofy lines of inquiry, you'd find something your company might discriminate on? I don't know if you call it discrimination, but you might find some weird statistics. Yeah, 90% of the company loves strawberries. Yeah. Or they've never hired a candidate who said their favorite movie was a Spielberg movie. Yeah, something like that. And we'll be like, wow, we have a lot of like non Spielberg fans or something, you know, or whatever. So you're certainly like, you're guaranteed to find something goofy that is unusual at your company if you just look hard and long enough. But you've made too many comparisons in that case. So you have to control for multiple comparisons. And the way we're talking about today, which actually we haven't really talked about yet, is the Bonferoni correction. And the Bonferoni correction basically says, remember how earlier we agreed 5% is a reasonable like chance for an error? Yeah. So that was when you looked at one test. What kind of probability would you expect if you looked at two tests? Probability? So what standard of evidence would you hold yourself to? Like how rare would the result have to be before you think it's suspicious? I'm sorry, you lost me about which test you're running. So when we talked about left handedness, you would calculate the probability that you had however many lefties by chance. So if I said, oh yeah, there's about a 1 in 20 chance that you might only end up with a single left handed person. That wouldn't be too surprising, right? But a 1 in 21 chance, that's kind of where we'll draw the line. Or sorry, a 1 in 19 chance, that's where we'll kind of draw the line. We'll be like, all right, now it's just past the threshold we're comfortable with. So you just pulled that number out of nowhere? No, I was just talking about p values in more of like one out of terms. 5% likelihood is equal to, have you ever seen a 20 sided die? Not sure, because I don't count the faces more than 6. Do you want me to show you a 20 sided die? Have you shown me before? I don't think I have. Give me one second, I'll give you one. I think you have. Thanks, though. There it is. What's it look like to you? Well, indeed, you have shown me. So now, if you rolled that one time, I tell you, go roll 6 and you do. Try and roll 6, actually. 4. All right. 2 away. Oh, I meant to say roll a 4, sorry. So now... 2. Okay, 2. So... Hi! If I actually called it and you did roll a 6, how impressive would that have been? Well, I know you, so I'm not impressed. What does that mean? You think I cheated? No, I love you. I love you, too. But what does it mean when you know me? You wouldn't be impressed. You can't be impressed with me, because you know me? Aye, statistically, it's whatever. Okay. And how about if, yeah, you got the 4, you didn't get the 6, I said. And I told you, keep rolling. And you're going to get a 6, keep going. How impressive would it be when you finally got that 6? Depends how many times I had to roll. We already had to roll like 4 times, you still haven't hit a 6. Yeah. Eventually, you'll get one, wouldn't you say, keep going? You want me to say? Yeah, yeah, keep going. Well, then I hope I get a 6, so I could stop. You're going to get a 6. Oh, it's a 9. Keep going. What is it, 10? Uh-huh. Well, it's going to be a 6, this will be a 6. 6! Oh, see? Look how good I am at predicting. [laughs] So, that's essentially what's happening if you're willing to test many hypotheses at all the same time. One of them is spuriously just going to be right eventually. With the Bonferoni corrections, very simple. All you do is divide the p-value you want at the onset, usually, you know, maybe 0.05, divide that by the number of comparisons. So, the first one cuts it in half, and then so on and so forth. So, that means for any of these results to be impressive, or how it rages, they have to be more and more unlikely. Otherwise, we'll consider them, you know, just due to chance. So, that's the equivalent, in this case, of if you had to roll a 100-sided die or a 1000-sided die. If I call the, if I said you're going to get a 6, and you got it in that few of a times, I'm like a 10,000-sided die, that still would have been impressive, right? Still, that would have been impressive. Yeah. This one is not. Right, it's not, because we didn't control for multiple comparisons. And the Bonferoni correction is a very, very simple adjustment to help you do that. Now, it also is something I strongly dislike. Because I don't think it's a good statistical process to use, but it's something you should be aware of. And the reason I don't like it is because I think it's too strict. If you make a ton of comparisons and you apply the Bonferoni correction, it means you're going to limit all your false positives. Like, you're not going to find any effect that isn't actually there. But you're also going to create a bunch of false negatives, because you set your standard of evidence way too high, because Bonferoni correction is quite conservative. So what should people use? Well, good question. So, a couple of options here. One of my guests actually, Nikki, you know Nikki, we were at Nikki's wedding. When I was like, "You used the Bonferoni correction?" And she was like, "Well, we just increased our sample size." Which is a fine statistical answer. If you have a very hard to achieve threshold of evidence, then if you have more observations, which is an option in her field, then that works out for her. For me, I often don't have the option of going and getting more observations. So I would look at something called false discovery rates. That's my kind of favorite way of dealing with the situation. But there's two other ways. There's something like Bonferoni, it's called the home Bonferoni. And it works actually a little bit better than Bonferoni. But I wanted to start there as our mini topic just to set the stage for this concept of multiple comparisons. And then there's this thing called the SIDAC correction, which could be an episode in the future. And until next time, I want to remind everybody else, in addition to controlling for multiple comparisons, don't forget to be skeptical of and with data. Goodnight, Nikki. Goodnight, Kyle. For more on this episode, visit datascopic.com. If you enjoyed the show, please give us a review on items or a stitcher. (upbeat music) [BLANK_AUDIO]