Data Skeptic

Too Good to be True

Duration:: 35m
Broadcast on:: 11 Mar 2016
Audio Format:: other

Today on Data Skeptic, Lachlan Gunn joins us to discuss his recent paper Too Good to be True. This paper highlights a somewhat paradoxical / counterintuitive fact about how unanimity is unexpected in cases where perfect measurements cannot be taken. With large enough data, some amount of error is expected.

The "Too Good to be True" paper highlights three interesting examples which we discuss in the podcast. You can also watch a lecture from Lachlan on this topic via youtube here.

[ Music ] >> Data skeptic features interviews with experts on topics related to data science, all through the eye of scientific skepticism. [ Music ] >> Lachlan Gunn is a PhD student working on a degree in electrical and electronic engineering at the University of Adelaide. He is the lead author on a recent paper titled "Too Good to Be True" when overwhelming evidence fails to convince. It is this paper that particularly captured my attention and the work I invited him to discuss here today. Lachlan, welcome to Data skeptic. >> Oh, thanks for having me. >> So I'll definitely want to go into deeper detail, but to begin with, can you perhaps provide a high-level summary of the analysis you shared in "Too Good to Be True"? >> So essentially the idea is that whenever you make a measurement or whenever you have a measurement process, whether it be something electronic, whether it be some more social thing, there's a probability that the measurement process will fail. And as it turns out, even when there's a relatively small chance of failure, then the probability that your result is correct doesn't always increase as your data is more consistent. >> Yeah, it's a surprising finding. I mean, I was really appreciative that you stressed measurement a lot in this paper because I think that's a key way to look at things. I'm not necessarily an expert here, but I imagine that one who could break down a taxonomy of the types of measurement errors, whether they be due to instrumentation or physical limits. Do you think that's relevant to the conversation? >> Yes, I think it is important because there are a lot of different types of failure that turn up in real systems. And the basic one that every first-year student gets told about is whether you have a systematic or random error. And often we tend to just ignore systematic errors, or we think about them, but deal with them in an ad hoc way, unlike random errors, where there's a well-developed statistical approach where you can often assume that error is independent and identically distributed. But when there's a potential for a systematic error, what this means is that you have potential for unexpected correlations. And so this means that your measurement process can be quite different. And so you have to think about what sort of failure you're going to have with your different measurement system, a physical system, like you mentioned, that has the potential to fail catastrophically. And then it's going to keep being failed for the rest of your measurement. For example, if a connection comes unplugged, then that's going to stay unplugged. That kind of fold is the kind of thing that we're really talking about here. Well, you think you've made good measurements, but it turns out that all the way through, you've been making an error that you had hoped would only happen by chance. >> Yeah, just in case anyone listening isn't familiar with the distinction, could you go a little bit more into the difference between a systematic and a random error? >> A random error is one where you might make a measurement, and you'd expect it's going to be slightly off. So if I take a measuring tape or a ruler and then measure the distance between two points on a piece of paper, it won't be exact because, for example, the ruler might not be placed exactly with zero at the first point. What we can do is we can pick up the ruler, we can put it back again, measure it again, and the ruler will be at a slightly different place. And so we'll get a slightly different measurement result. And we can keep doing this again and again and again, and hopefully those errors, where you've placed the ruler exactly, hopefully they will cancel out. And when you average all the measurements, you get something that's quite reasonable. But if you have a systematic error, that might be, for example, the ruler is miscalibrated. And so the numbers on the ruler aren't quite in the right place. And so each centimeter might be, say, 1.05 centimeters. And if that happens, you get a systematic error where the error is present in all of your measurements. And so if you plot all of your measurements in a graph, you'll see a nice consistent result. But there'll be an error there that's there in all of those measurements. Yeah, it's interesting. At the heart of the Too Good to Be True paper is, for me, sort of a paradox. It's this idea that there are situations in which I can get more and more confirmatory evidence, you know, multiple sources of observation. They're all telling me the same thing, and that as I get greater unanimity of outcome, that I actually get less confident that seems super counterintuitive to me. How do you convince a non-statistician of this result? Suppose you run an experiment 1,000 times, and we've run it on 1,000 to 1,000 different people. So we're running some tests, say, on some gene, whether that gene is present. And you might say, well, half of people have the gene, half of people don't. So you run this on 1,000 different people. You might have to do a number of different tests for a number of different genetic markers. What you'll find is, with people that have the gene, most or all of those markers should match. And with people that don't have the gene, very few or none should match. If you'd expect, though, that not all of them would match because no test is perfect. So you'd expect there to be a bit of strangeness. Most people will have, say, a 70% match for the gene that have the gene, say, 70%, 80%. Now, there will be a few people where every single marker will match. But the problem is that, well, what if there's an error in the machinery? And it turns out that if you had, for example, 10 markers that you test for, you'd expect by chance that maybe one out of 1,000 would have all 10, even if they definitely have the gene. There's only a very small chance that every single one will match. But if you run 1,000 tests and there's a 1% chance of failure, you'd expect 10 people to show all 10 markers by chance. And so this is the approach that I use to visualize this and how I try to resolve the counterintuitive problem in my head. It's that you have to visualize, well, if you actually run this test lots and lots of times, how many times would you expect to get this failure? Yeah, and there's a plot. This being an audio podcast, we can't include it. I'll link to the paper in the show notes and encourage everyone to check it out. I think it's the, well, we'll get into the pottery example in a moment, but there's this really effective way that you've plotted it out. Maybe we could start there, talk about the pottery example and what that plot tells us. Yes, of course. A nice little example that I've used to try to explain this problem without getting into anything too domain-specific is the idea that so you've dug up a pot from the ground and you've looked at it and it looks like it's roman. You want to work out exactly where it's come from. So you want to work out whether it was made in, say, Britain, or whether it was made in Italy on the continent. Perhaps one way you could do this, I suppose it's a chemical test. And we've got this, say, element or mineral or whatever. And let's say that's present in the Italian soil, but not in the British soil. We can do a test, we can send a piece of that away to the lab, and then they'll test whether that chemical is present. And so that gives you a way of saying, well, the chemical is present, it must be Italian or the chemical is not present, so it must be British. The problem is that the test isn't necessarily going to be perfect. What you need to do is you need to send away a few different chunks and they'll do the test several times and you'll get your results back for each piece. If you've got enough to come back positive, you can say, well, it must be Italian. And so this is good. The question then is what happens if sometimes there'll be some pots, not all of them, but a few, where this is in the case. So suppose 1% of the potters in the Roman Empire, some of them in Britain, some of them in Italy, they have this special production process. And they use this mineral, which will be picked up by the test, which isn't normally used. But if you happen to pick one of those pots, and there's only 1% of them will be this kind, but if you pick one of those pots, the test will always come back positive. And it doesn't matter whether it came from Britain or whether it came from Italy. This being the case, all those positive results that you've gotten, they don't tell you anything about where the pot came from. If there's a lot of this chemical, you're not going to get much error, because if there's a huge amount of it, then the laboratory is going to be able to see that. You send a few pieces away. You send more and more, and then you'll get results back from the laboratory, and they'll say, "Yes, it's there. Yes, it's there. Yes, it's there." Now, you wouldn't expect that necessarily to happen if it were an Italian pot that weren't one of these strange ones, if there's only a small amount of chemical normally. And so you might expect the first one to come back positive, second one negative, third one positive, positive negative, some sort of sequence like that. The question then comes, so you send your first piece away, it comes back positive. You send the second one away, it comes back positive. And you keep doing this again and again, and it comes back positive. Now, the question then is, "Well, if there's no error that has to make you suspect, well, maybe we've got one of these strange pots that has lots of the chemicals, always going to come back positive." But there aren't very many of them, given that there aren't very many, how many of these tests that we need to do, we need to do, and how many of them need to come back positive. Before we can conclude that it's one of these strange pots, the more tests are done, and the more tests that come back positive, the more certain that we are of that, and therefore, the less certain we are of where the pot actually came from. Because if you have one or two that come back positive, you can say, "Well, it's probably from Italy because it's probably got the chemical." But once you get five, 10, 20 tests that have all come back positive, and then that seems too good to be true to use the catchphrase. And so you have to think, "Well, okay, maybe there is something strange going on, and maybe we've got one of these odd pots." Would you say this is an intrinsically Bayesian result? No, I wouldn't say that this is intrinsically Bayesian. The original motivation for this question was the idea of, "Well, how do you, if you've got sentences that give binaries and results?" So you might have a camera that has a motion detector, for example, and you want to say, "Well, is there anyone in the room?" And you've got a bunch of these cameras. Suppose 70% of them say that there's emotion. What is the probability that there is someone in the room? Because you might expect them to give some false positives. You can imagine that as you get more cameras, you'd expect to be more sure of the results because, and that just tends to be the nature of things in the central limit theorem and all of that just. A question was, "Well, how do you amalgamate all of these? And how do you interpret this change?" If you want to look from a frequentist point of view, you can draw the likelihood curves for the different numbers of positive and negative results for the two cases where you've got something present and not present. And what you'll see is you've got these two, they might be bill curves, they're normally binomial in the analysis we've done. And they start off, if you've only got a few measurements, then they end up being quite broad, but then as you take more and more measurements, they start to squeeze further in towards their mean because of central limit theorem. This will tell you why you would expect to get close to the average expected number of measurements. But the interesting thing when you actually draw that picture is you notice, "Well, if we look at the maximum likelihood curve, then when we're near the peak, so when we're near the average value for one of those two cases, then we're going to be very high up, it's going to be very large, and so it's going to be quite clear." But as you go towards the extreme, so as all of the data starts to be an agreement, as you add more measurements, as you add more samples, the likelihood goes down as well with time. Sure, as it happens, it grows down faster for the one that's not in agreement. So if all of your things come out positive, or if all of your tests come back positive, then yes, it's more like the true result is positive than negative. But the likelihood starts falling down to very tiny values, and then imagine you've got some third case, some failure case that might be a constant line going all the way across. It might only be a very small probability, it might be 10 to the negative 2, 10 to the negative 3. But when your other things are falling away exponentially, or faster than exponentially, it doesn't take very many samples before that will have greater likelihood than the actual measurements that you're hoping to have done. And so this is the essence of the problem. When you have a binomial distribution, when you take a large number of independent binary measurements and then try to just count how many positives and negatives you've got, the probability of getting us a certain result falls off very quickly as you get away from the expected value. It doesn't take very long before it falls below just about any failure state you can imagine, any failure probability that you can imagine. Yeah, it's quite striking that the curves in that plot I was mentioning, I really encourage everyone to go take a look at that. And I think the very title itself, too good to be true, is really the most eloquent way one can put this phenomenon. And maybe I'm alone in this, but it does seem so strikingly counterintuitive. It's something I want to kind of remind myself as I'm looking at measurements in the future. But I also wanted to ask you if you thought that my surprise or my labeling of this as a paradox is related to the fact that we're talking about small measurement errors that I am sort of heuristically thinking I have a device that predicts very well, and therefore I kind of trusted implicitly. Or do we see these sorts of good to be true situations occur in less precise instrumentation as well? It's certainly true that the paradoxical nature of it is where it does happen in instruments that you need to be very precise. The reason for that being, if you expect that it's reliable, then that probability of failure is very, very low. But because of the speed at which the exponential falloff occurs, which is like I was talking before, when you have something like a Gaussian curve or a binomial curve, which approximates a Gaussian curve, that falls off either the negative x squared. So it falls off very, very quickly as you move away from the meat. Even exceedingly reliable devices can be subject to this. If you are running an experiment in reality, and you plug in your voltmeter or lever, make a measurement, you see one thing, you do it again, you see the same thing, you get the same data again and again and again and again. Intuitively, you can say, okay, well, probably it's broken and I should work out whether there's something wrong with it or not. That's certainly how we all tend to do things in practice when we're measuring things. But when we then go to do the data analysis, we don't really take this kind of approach. We'll set a threshold here, and that's positive or negative, no matter how reliable the system was. It's this disconnect between how we see things when we're doing them in the lab and how it appears when we do the statistics that produces the apparent paradox in my mind. Certainly with less reliable instruments, it has a stronger effect in a way, depending on what you mean by reliability and precision. Because you expect your measurements to be unreliable, then if they're consistent, you should start to be suspicious. The expression I think I used in my talk was that if you have a really bad measurement system, you shouldn't really settle for anything less than bad results. Yeah, I'm going to make sure to link to that talk in the show notes. If anyone wants to continue this conversation and see some of the visuals, that'll be linked to the YouTube page that has that, so if you can go check it out. One of the things I really enjoyed in the paper is you've got a good collection of diverse examples. We talked about the pottery one. I absolutely want to get to the cryptography one in a moment, but I thought we could, for a second, dwell on the eyewitness testimony, for example. Could you share the particulars of how this paradox creeps up there? I thought it would be interesting to look at a non-technical problem, and I think this is quite interesting because it obviously has quite substantial societal implications. One thing is about the legal system is that it's been going continuously for many, many years, for thousands of years. We look at individual cases and we might look at, say, there's a certain probability that it's going to be right, certain probability that it's going to be wrong, but much like at the case of big data, you're doing lots and lots and lots of comparisons, and it's inevitable that you're going to get some wrong, and that there are going to be some edge cases that will appear. They might not happen very often, but if you are doing it lots of times, you need to take that into account. Now, I think I was talking about here is witness identification. So the idea is that you have picked someone up, please have picked someone up, and they wonder, well, did they really commit the crime? Because the witness has seen someone, say, robbing a shop. The police have found someone that matches the description. They bring them into the police station. They show them to the witness, and then they want to find out, well, is that the person you saw or isn't it? Now, the way that they do this in order to try to do it in a reliable way is that they put the suspect in amongst a bunch of other people who aren't under suspicion. So they'll take volunteers from the community, and I'm sure we've all seen this on television. The witness goes through looks at the people with the various numbers and says, no, that's the guy who robbed me. When they do this, you can look at this as an experiment. The police have made a hypothesis. This suspect has committed the crime. What we're doing is we're doing a measurement. We're making an experiment that will try to confirm or reject that. There are a few possible things that can happen. First is the witness will pick the suspect correctly, because the suspect is truly guilty, and the witness will pick that up. And that will happen roughly half the time if the suspect is really guilty. Another possibility, which is the opposite side of the coin, suspect is guilty, but the witness doesn't pick them correctly. And this happens sometimes because they have to pick from a large group of people. They might have only seen them for a short period of time. In reality, our memories aren't that good. So they're going to make a lot of errors. So they'll pick someone, usually, but they'll pick the wrong person. The other possibility is that the suspect isn't guilty. So the suspect didn't really commit the crime. The person who did rob the shop, they are not in the line. The suspect, ideally, can only really pick someone at random. So if you have 10 people in the line-up, who is in the case in the UK from memory, and I believe it's 8 in the US, then there'll be a 1 in 10 or 1 in 8 probability that the suspect will be chosen, that the experiment will save their guilty, even if they're not. Now, the problem in this case happens that perhaps the experiment hasn't been carried out correctly. You know, sometimes it can be very obvious where they'll have a suspect who matches the description of the witness is given, and no one else matters. And this does happen in reality. Sure, and fairly horrific senses. For example, there have been instances where the suspect was black and all of the others were white, and the witness had said the perpetrator was black. And so, obviously, the witness didn't really have any choice. And so the experiment hadn't really said anything about whether or not the suspect is guilty. And this is a kind of problem, and that's an extreme example, but they can be less extreme. For example, perhaps the suspect is dressed slightly differently, or if they're using photographs, sometimes one of the photographs will be newer than the other, so that will tell the witness, even if the person is subconsciously, don't look at that photograph, because if there's something that stands out a bit, you're more likely to pay attention to that, you're more likely to choose that. So you get subtle biases that aren't always obvious. So they can happen completely by accident, and maybe they only happen a small portion of the time. The probabilities that this happens, we've talked about 1% down to 1 in 10,000. More witnesses agreeing does come into effect. So you set up the suspect and the other people. You bring the witness in, they come and identify, they move out. You then bring the next witness in, they come in, try to identify, move out, and they'll keep doing this until they've gone through all the witnesses. Sometimes it might just be one or two witnesses, but there have been cases where there have been 50 witnesses. In the cases that I'm aware of where they had 50 witnesses, the witnesses all agreed, which is obviously a bit suspicious when you think that, well, if the suspect is really guilty and it's been done correctly, normally they'll only get it right 50% of the time. So when everyone has agreed, it tends to indicate, well, maybe there's been a mistake in the process. You know, there's this expression that, you know, in a courtroom, we want to be beyond a reasonable doubt. And I agree with that, although I guess on some level, that's my sort of personal sense of social justice. There's no, you know, rigorous proof that we should be beyond a reasonable doubt, but now I'm reading through paper and thinking like, well, can won't I always have a reasonable doubt? And I'm wondering if you've thought about this and have a way of reconciling these two ideas, Charles? Certainly. I mean, my understanding of the legal system, I'm not a legal expert, but my understanding is that the courts have generally been relatively resistant to providing an actual percentage figure as to what constitutes reasonable now. And the key word there is reasonable because it might be that if it's relatively easy to become 99% certain, if someone really is guilty, if the police do enough work, then if you only get 70, 80% certainty, then you really have to become suspicious. You know, there's quite a big chance that they're not really guilty. And there are huge consequences if you get that wrong. And the other side of the coin is that if you let too many people get away, obviously, the purpose of all is to try to reduce the amount of crime. And so if you let too many people get away, then crime goes up and you're not meeting the purpose of the system. And so reasonable isn't necessarily, at least to me, it's not that there's an absolute sense of, well, we can accept a 5% chance of false imprisonment or 1% chance or a 0.1% chance because some crimes are easier to prove than others. But then to say that all of the difficult to prove crimes shouldn't be prosecuted. Then make the system a bit pointless. So you have to try to draw the balance taking into account, well, what is the cost if we get it wrong? And what is the cost if we let people get away by mistake? In my mind, I can't say, well, there's no doubt whatsoever, but you have to try to draw a cost benefit in the process and try not to be unreasonable about it. Yeah, I think that's an astute way to look at it. And I really agree with that perspective. I was enjoying the paper tremendously as I was reading it. And then it took a turn, or at least for me, that really amped it up when we got to the discussion of how this place into cryptography. Could you share that particular example? What often you might do when you're trying to design a cryptographic system, you pick some probability of failure. A bit of an attack, it will be able to break the system. The problem is then that, well, what if the system has failed in a way that makes it fail in an insecure way? Now, one of the examples that I've given, and which is in the talk, rather than the paper, which is the idea, well, what if we have a function on a computer that we've written that just tries to encrypt some data? So it just has a loop that goes from zero to the number of bytes, and we go through each byte, encrypt, encrypt, encrypt, encrypt, and then send it off over the network. So that's good. And if you do the analysis, you can say, well, encryption, function we're using, that's secure. It will fail one time in two to the power of 128. So you can say, oh, well, that's really secure. The question then is, well, what happens if the computer fails? And what if all of that data doesn't get encrypted? Then you might, in fact, get data that's just sent off unencrypted. Well, the question is, what is the probability of that? And that turns out to be a lot I haven't you would expect. For example, cosmic rays. So you get radiation that comes from space, that comes from the atmosphere, that comes from small bits of radioactive material that are inside the plastic in the computer. And every now and then, you get a particle, the high speed particle, that will hit a memory bit that will flip it. This will mean that you get some change in your code if you don't have error correction. What is the probability that this will happen? Because we don't get that many errors. Obviously, otherwise computers would be crashing constantly. So they're relatively rare, even if they do happen from time to time. There are so many bits of memory in a computer, if you've got to say a 16 gigabyte memory, then you've got billions and billions and billions of bits. And so if a cosmic ray comes in and hits several times a month, then there's only one in many billion probability that any individual bit is going to be hit. Intuitively, you tend to think, well, the probability, the end of the system is going to fail, that we'll get some insecurity because of that. The probability that will happen is small. And so it's relatively easy to simply neglect that. After a certain amount of time, the probability that one of these bit flips has caused the system to fail can end up quite a bit greater than that. And what we've found, and this is based on some numbers that we found by from Google, they've produced paper that talks about memory errors that they've found in their data centers, which is quite interesting because they've obviously got quite a large number of computers running all the time, and they've got a lot of data. And so we can find that, well, what is the probability that a bit is going to flip given any given length of time? As it turns out, it's roughly three times 10 to the negative 13 each month, that one particular bit is going to flip. And this doesn't seem like much. It's less than one in a trillion. So many, it's more than trillions of times, more likely to fail than the underlying cryptography. I can't really describe it without using scientific notation. It makes that bigger difference. So when you take this into account, suddenly your probability of failure jumped up by two to the power of 80, which is a big problem if you're trying to then say, sell a system based on this kind of analysis. Yeah, it's I think especially relevant is we're in this era now of so-called big data. And I was glad to see your reference to Google's error metrics in there. It seems that increasingly so, we're going to find the moral of too good to be true appearing in a lot of our analyses. So, is anything certain anymore? Things are certain if they match your models. So it comes down to essentially a question of model validation. You can try to model your data and you can try to fit model parameters and you can run this over your giant database and then try to estimate how people behave, how a system behaves. But it's important then once you've made your model, once you've done that fitting, to go back and see, well, does the data actually fit to that? That's the problem that we have here, which is essentially, well, we've got this model that includes a lot of random error. But when we make the measurements, it doesn't have any of this error. By the choice of model we've used, by the way that we've designed our statistical tests, you can still get a clear result one way or the other. But the fact that according to our model, the measurement is extremely unlikely means that, well, perhaps our model isn't really valid. So I think it shows that even when you've got only a small chance that this model is going to be invalid, only a small chance that the model has failed, you need to go back and check every time whether that model is really relevant. Yeah, I think that's sound advice. Validation is a step that's all too often skipped and I think is critical to the whole process. So I also want to just ask as long as I had you here, what's the nature of your research? What are you pursuing under your PhD? So the basic question that I'm trying to answer is essentially, when you have a system that processes and data, when is an or noise a good thing? Because there are some cases like this one, for example, if there's no noise in the data, it tends to indicate that probably something is failed and the data doesn't mean anything. But there are other situations where you can actually add noise and the performance of the system will increase. One example of this is called stochastic resonance, where you have some kind of non-linear system that has information going through it. And you can add noise and that would make signal to noise ratio go up. One example of this kind of thing is where you have a thresholding system and you can use dithering. So you add a bit of noise and a signal that was below the threshold will suddenly go above because of the noise. And so this will let you increase your signal to noise ratio despite the fact that you just added a whole lot of noise. This is an interesting type of question because we always tend to assume noise is bad, avoid noise, if you see noise it's a bad thing. My belief is that this isn't really the case and the reality is that the real world contains a lot of noise. Some of it you can control, some of it you can't. Noise is important to how real systems function and often if it doesn't appear or if it's not present, these systems can break down. In this example, if you don't see experimental noise, it can mean the experimental system is broken. I've talked about stochastic resonance and that's been shown to be important in neuroscience for example because the brain uses this kind of approach where you have a certain amount of noise in the way that the neurons fire. This noise tends to be quite important to the way that the brain functions. Yeah, it's interesting. I've seen some work in the deep learning community along those lines and I'm starting to think that machine learning in general, which is some of my background, really needs to start looking at how noise should play into its models and whatnot. I think there's something very interesting going on here and I'm glad to hear smart people are doing research on it. Yeah, I mean one, this is, I guess I didn't really touch on this in the paper but this is a very good application of this, the idea of machine learning because you have the problem of overfitting. If you know that your data is noisy, then if you fit really well and if your fit shows residuals that are less than the noise that's in the data, then that tends to indicate, well, probably I've just fitted to the noise and so you need to do that kind of test and I think we'll intuitively, we accept that and when we do our testing of the system, we'll say, well, okay, does that look reasonable? Are we overfitting? Are we just fitting to noise in the system? And we might do that when we design but I think what we've shown here is that it's important to also do this in operation. If you have a bad measurement system, you shouldn't accept anything that's not bad coming out of it. This may even extend to, for example, if you're trying to find a positive or negative result, you might have a threshold. If it's greater than this, possibly if it's less than this negative, but probably what you should have as well is you should have another threshold to say if everything's too much in agreement, sorry, if all of them are positive, if all of them are negative, you should have some kind of inconclusive result. So I think the important thing to note is that this isn't a completely bizarre out-of-lift field thing that no one uses while it's rare that it happens in practice. It's not completely revolutionary, but it's unappreciated that it does occur with real systems, and that's what we've tried to show, is that even when you have small probabilities of failure, this can be quite important. After we published our preprint, we heard from someone who had written a navigation system for spacecraft. They might make various measurements to try to work out position, direction, that kind of thing. They'll try to estimate various parameters related to what spacecraft is doing, and it'll have some measure of, well, how sure of it is that estimate. But what can happen sometimes is that it'll make an error, and it'll end up being very, very sure of that. And so what in this system they had to do is when a system became too sure they had to not, I can't remember whether it was to throw out the data or whether they just had to reduce it, but they had to limit how certain the system could become, because otherwise, no matter how much contradictory data came in, it would never change its mind. And another example that I didn't touch on before, which is a judicial one, was that around environment time, so about 2000 years ago, the ancient Hebrew court, the Sanhedrin, it actually had a system like this. So they had a panel of judges, and for a capital crime, they needed to have more than a majority of judges, it was a majority plus one or two. The interesting feature, though, was that they said that if all of the judges agreed, so if all of the judges convicted, then the defendant would be acquitted, because if there's no one to disagree, if no one's taking the defendant's side, then surely that must mean that there's some undiscovered evidence. And so it's interesting that this is known intuitively, and there's different systems intuitively, even thousands of years ago, because it seems strange now, when we've got this nice clean system where everything's positive, or if you've got lots of evidence or negative issue, you've got very little, we've got sort of this nice clean system that adding complications like that might seem sort of a strange novelty, I guess, almost a bit of a kind of luxury, if you like. But it shows that, even in the long distant past, this kind of effect was known and counted for, and maybe when we design systems that have to perform this kind of thing, they have to perform hypothesis testing. Maybe we should consider taking this approach now. Yeah, absolutely. I certainly will be in going forward. So I want to really thank you for writing the paper. It meant a lot to me, and I'm glad you came on to share with the listeners. Oh, thank you very much. I'm really glad to talk about it with you. Excellent. Well, until next time, I want to remind everyone to keep thinking skeptically of and with data. So March 11th, it's the last day for the relatively prime Kickstarter. Please, please give me more money. I'm probably still short. I need money. I need money. Oh, by the way, Samuel Hansen, host of relatively prime, real prime.com/kickster. I need your money to keep doing shows about mathematics. Yeah, time's running out, so head over there now. It's in the show notes as we've been talking about the last two weeks. Go support real prime season three. Yeah, if you don't give me money, I'm gonna have to go back and do like data analytics as a job or something like that. Who would want to do that? Who would want to get paid real money to do a fun job when I could get paid almost no money to make these wonderful podcasts? I'm starting to talk myself out of this. Yes, relatively prime, real prime.com/kickstarter. I'm going to do amazing episodes if you give me money, including an hour about the four-dimensional number system that broke algebra free from the shackles of arithmetic. It's going to be an amazing show. You want to hear that alone. That's worth the few bucks that you're going to give me. And you can get great rewards, like say a notebook that says this is for math on the front of it. Come on, you know that you want that. So just, you know, go give me a few bucks. Please help me out. This is my livelihood and I need your help. Thank you. And thank you all so much. Thank you so much, Samuel. Yeah, please go over there and support the show because I certainly want to listen and I need your help. And yeah, so hopefully we'll hear from you again, Samuel. Oh, I certainly hope so. If not, you're looking for like, you know, an intern or something. We'll find out. More on this episode, visit datasciptic.com. If you enjoyed the show, please give us a review on iTunes or Stitcher. [BLANK_AUDIO]