In this quick holiday episode, we touch on how one would approach modeling the statistical distribution over the probability of belief in Santa Claus given age.
Data Skeptic
[MINI] Belief in Santa
(upbeat music) - Welcome to another mini episode of the Data Skeptic Podcast. Joining me as always is my wife and co-host Linda. - Linda, hi, that's me. - So what's today, Linda? - Today is Christmas. - Uh-huh, did we do anything fun? - We went to an eight dim sum in Chinatown. - Yeah, that was nice. So many people celebrate particular, well, depending on your perspective, there are many fictional characters related to Christmas. We're gonna talk about one in particular, so I guess maybe spoiler alert to any parents with kids around, we're gonna talk about Santa Claus. So did you believe in Santa Claus growing up, Linda? - Nope. - Not even a little bit? - Nope. - Did any of your friends? - Yes, I had a neighbor named Heather. And she believed completely and holy. - Was, didn't she try and convince you? - I don't know, I'm gonna try, but she would be like, no, Santa's real. And she said it with like such shared belief. And we were like, well, how do you know? And she's like, well, last year I left cookies out and he ate the cookies and drank my milk. - That's pretty good evidence. - And he wrote me a note. - That's good. - And I made a Christmas list and he got me a lot of presents on them. - All those things are consistent with the Santa hypothesis. - Yep. - Do you think she still believes in Santa? - Nope, because my sister told her, Santa wasn't real, too. I forgot why, but somehow Kim got mad at Heather. So to get back at her. - She told me, well, why did she believe Kim? - I don't know, I think it was like, 'cause my sister said it in a mean way. Like, yeah, well, you know what? What, Santa's not real like that. I don't know how they just started crying 'cause I feel like when you're in the heat of the moment, you're probably gonna say something truthful. - Fair enough. Do you think there are other ways that people stop believing in Santa that don't require a crying? - Probably I'd like to hear the stories, but we don't sit around and tell each other. - Yeah, that's true. But you would say that the rate at which children believe in Santa decays with age, is that fair? - Assuming you believe it all, sure. - Well, that's what I mean. There's some percentage of children do and some don't and the percentage that do will be less and less the older they get. I think that's a fair assumption, right? - Sure. - So something we often might do as data scientists is try and model that process out. So you have a statistical model that tells you, given an age, what is the probability that the child will believe in Santa, independent of all other variables like gender and things like that. Looking just at age, you would model it. So this is an open question. I actually tried to do some research on this. I don't find anything around. There were no data sets anyway that I was able to locate that talk about the age at which people stop believing. There was some very anecdotal stuff that didn't really seem all that relevant, but I don't know. I guess you'd have to go out and survey for this if you wanted the data. But once you have the data, there's an important step is how do you model it? Because you'd like to describe that with some sort of function or fit line or regression or something like that, wouldn't you say? - Wow, I would just agree with you, whatever you say. Well, our priority, before we have any data, what do you think that distribution would look like? - I would guess around 12 people started, that would probably be the height of people stop believing in Santa if they believed it. - So what does that mean that the maximum believers are at age 12? - The maximum number of people who discover Santa is not real. - Oh, got it. So the greatest year of change is 12, is what you're saying. - It's the year in which people stop believing Santa if they believed it at all, the age that they are at. - The maximum age. It's no 13 year olds believe in Santa. - And they maximum say that the most. - The most defectors happen at age 12. Okay, so actually maybe we should back up, do you know what I mean when I say model it or find the distribution and what it looks like? - I assume you're gonna chart it. - Yeah, so I wanna know, given the year, what is the probability of belief for the average child? And then I wanna know how that looks over time. So let's just, for the sake of argument, say that by age four, it's 80% belief rate. And you're saying by 12, it would probably be down very low, right, to like maybe 5% belief rate. Something like that. - Maybe like 20%. - 20%. So you would, either it could kind of go down linearly, like it could lose an even percentage every year from what did I say, four to 12, eight years, 60 point spread, or it could go down like exponentially where actually it goes from 80 to 60 to 40 and then like smaller increments every year. There, we don't know a priori what that distribution looks like. So the best way to do it is to gather empirical evidence and then hypothesize a function that would describe it and fit that to your data and look at how good of a fit it is. Do you know much about looking at goodness of fits? - I think you mentioned this before. - There's lots of ways to do it. Generally, it's based on some measure of error, whether that be root mean square error. In other words, the square root of the average distance between the prediction and the observed value. And that will get you a pretty tight curve that works generally well for everything. Or there's lots of other sort of nuances and reasons you might pick slightly different methods, but that's the gist of it. So we talked about modeling belief in Santa. How do you think kids update their beliefs given new evidence? - Well, they either believe or they don't believe or maybe there's a gradient. - Aha, tell me more about this gradient. - I don't know, I don't know where there would be. I didn't really talk and sit down with six year olds. Where do you fall? This could go one to 10 and your belief is Santa. I never asked. - Well, you mentioned that the cookie and note writing evidence was convincing for your friend, but then your sister's simple assertion was even more convincing. - Well, I'm sure Heather had her doubts. - Yeah. So how would she have expressed her percent probability that Santa is real, do you think? - I don't know, kids know percentage. - Good point. - I don't know. - So kids, do you think they're updating their beliefs in a rational way or is it just kind of, yeah, how do you think that children take in new evidence? - I don't know. - Good question. - We don't have kids and I don't want any. So I don't really sit around thinking about how their beliefs are rational or not rational. - Well, you heard that NORAD has a tracker, right? - Yeah. - You don't find that convincing evidence that Santa might be real? - No, I think it's convincing evidence that a lot of people think Santa's important. - So what do you think would be the maximum age a child would believe in Santa? - I mean, it could be like any age, really. - I mean, not really. It can't be like 30. - Well, someone has a mental problem, sure. - Oh, fair point. I mean, maybe if we take that out, it must go to zero at some rate. - You mean average is zero? - Yeah. - Man, definitely by 16, they probably don't. - I guess fair. So what do you, do you think it's a linear drop off or an exponential drop off or something in between? - Well, if my guess randomly is 12 years old, so I feel like as they get older, they're more likely to less not believe in Santa. - So the rate increases as they get older? - Yeah, probably. - But it doesn't mean that they stop believing increases. I mean, I just feel like if most kids probably figure out by 12 that Santa's not real, at 11, you know, let's say a drop off between 11 years old and 12 years old is 50%. I mean, that's a pretty big drop off, but it doesn't mean by 13. It still, none of them believe in Santa. - So you're thinking, I think we both agree it's not a linear drop off. And it sounds like you're saying maybe it's more sinusoidal kind of in nature, that it would have a low decay rate at young ages and then speed up where it gets to 12, which is sort of the greatest rate of change and then decay back down as it approaches zero. - I don't know. - Well, interesting question. If anyone wants to post on our Facebook page, what caused them to update their posterior beliefs in Santa Claus, maybe we could share some of those. So I guess the moral here is that this discussion sort of shares an interesting point in one of the data science exercises you have to do when you're modeling something new. You should start out with some expectation of what your data looks like, but then go collect empirical observations and see if your expected fit can adequately describe that data, if it can't, you may need to abandon your hypothesis and find some other underlying distribution, which describes your data. So final announcement, coming up January 3rd in Seaside, California, January 3rd, 2015, albeit Skeptocamp Monterey. Linda will be there too, but I'll be giving a talk called how to lie with data and how a skeptic can recognize deception. So if anyone's local in the area, we'd love to meet you. So come on out. There's lots of other great speakers as well. If you want more details, go to datasciptic.com and there's a new events tab in the menu and it's the only event in there. So you can go check that out if you're listening in real time and hopefully you will see somebody there. Merry Christmas, baby. - Happy holidays. (upbeat music) (upbeat music) (upbeat music) [MUSIC PLAYING] [BLANK_AUDIO]