Data Skeptic

Models of Mental Simulation

Duration:: 39m
Broadcast on:: 26 Feb 2016
Audio Format:: other

Jessica Hamrick joins us this week to discuss her work studying mental simulation. Her research combines machine learning approaches iwth behavioral method from cognitive science to help explain how people reason and predict outcomes. Her recent paper Think again? The amount of mental simulation tracks uncertainty in the outcome is the focus of our conversation in this episode.

Lastly, Kyle invited Samuel Hansen from the Relative Prime podcast to mention the Relatively Prime Season 3 kickstarter, which needs your support now through Friday, March 11th, 2016.

[ Music ] >> Data skeptic features interviews with experts on topics related to data science, all through the eye of scientific skepticism. [ Music ] >> Jessica Hamrick is a PhD student at UC Berkeley researching mental simulation from a computational perspective. She's a member of the Project Jupyter Steering Council and is a contributor to a number of noteworthy Python packages. I recently read her paper, "Think again, the amount of mental simulation tracks uncertainty in the outcome, and it's that paper that will start our conversation today." Jess, welcome to Data skeptic. >> Hi. Thanks for having me. >> Oh, it's my pleasure. Before we get into the paper, I was wondering if you could describe what your research interests are. >> I'm broadly interested in mental simulation, which is this ability that we have to imagine things and solve problems based on that. So, like, one classic example of using mental simulation is if I ask you to tell me how many windows you have in your house. Many people will solve this problem by actually imagining what their house looks like and like imagining walking through all the rooms in their house and counting all the windows. And so, by using this type of mental simulation, you're able to learn something new that you didn't explicitly know before. And so, I study mental simulation by combining methods for machine learning with behavioral methods from cognitive science in order to model that. And I'm interested in sort of these like higher level questions about mental simulation. Like, when do people use it? What specific types of simulations they run? So, like, would you imagine walking through your house in one particular way or another way? How many simulations you run? And in general, how people learn new things for mental simulation? >> There's a picture that's on your website that I'll be sure to put in the show notes. I think this is one of those cases where a picture is truly worth a thousand words, but this being an audio podcast, which you maybe give us a few hundred or a few dozen maybe describe that and why, in my opinion, so eloquently sort of describes the type of work you do. >> Are you talking about the tower of blocks picture? >> Yes, indeed. >> Yes, okay. In some of my previous work, we've looked at how people reason about physical objects. So, if you're familiar with the game of Jenga, in which you have a stack of blocks and you have to pull them out and put them on top, people are able to play this game and reason about the stability of that tower. And that sort of inspired a set of experiments that we have in which we show people these Jenga towers of blocks and we ask them to predict whether they'll fall or what direction they'll fall in and you can have the blocks have different masses and then say, okay, if the mass is this way, then it'll fall in one direction. But if the masses are flipped, then they'll fall in another direction. And you can change all these sorts of different variables and it seems like people are pretty good at still being able to run these mental simulations. And really, when you look at the photo, it really feels like you can imagine seeing it fall, even though it's just a static image. >> Yeah, so in the paper, you mentioned the so-called Noisy Newton Hypothesis. Can you define it and summarize the empirical support for it? >> The Noisy Newton Hypothesis is essentially the idea that people do have accurate knowledge of Newtonian physics that they can use to reason about the world. But that knowledge is a little bit noisy and the noise comes from things like perceptual uncertainty. So maybe you're not entirely sure where the positions of the objects are in the world or what their orientations are, or maybe if they have unobservable properties like mass or friction, you have some noise in your uncertainty about what those values are. This is sort of in contrast to previous work, which has suggested that people are really bad at reasoning about the physical world and that they just rely on a set of heuristics and cues. If you're in the scenario where you see two objects collide and you have to infer which object is heavier, there's one heuristic that's been proposed that says that the object that's moving faster after the collision, if they collide at roughly the same speed, that the object moving faster is the lighter object. And so the Noisy Newton Hypothesis says, well, we think we can actually explain those types of heuristics in terms of having accurate knowledge of physics, but that is somewhat noisy. And so that was first proposed by Sandborn, Minsinger, and Griffiths. And they looked at these types of inferences about maths. It's since been explored further by other people, including myself. So I already mentioned making predictions about towers of blocks. There's also been work by Kevin Smith and Ed Vuell, who are co-authors on this think-again paper. And they've looked at what are the sources of uncertainty that come into play when you're looking at balls bouncing around on a table. There's two components to this, which is a little bit of a subtle distinction. But there's one idea, which is just that you have this knowledge of Newtonian physics, but it's noisy, but it's not said explicitly how that knowledge comes into play. And then the sort of more specific hypothesis is that people are actually using simulations as a way to actually access this knowledge. Interesting. When I began reading your work, specifically with respect to this idea of people running a bunch of different Noisy simulations, it reminded me, as someone who does a lot of machine learning work, it reminded me of Bootstrap Aggregation, or as it's more commonly known, Bagging, which is a popular concept in ML. I'm wondering if you're familiar with this and if you see them as similar ideas, or am I reaching too far in my analogy? I'm familiar with Bagging. I mean, they're kind of, they're related, but they're not really the same thing. So probabilistic physical simulation is really just a type of Monte Carlo simulation, whereas I mentioned the noise comes from either perceptual noise or perhaps processed noise. Bagging is an ensemble learning technique where the noise comes from errors from incorrect model fits because you have different data sets that you're fitting to. In both cases, you do end up with a distribution over perhaps what the correct prediction is, but the sources of noise are different in those two cases. Makes sense. Do you think people are running their simulations in serial, like one by one, or do these run kind of in parallel in the mind? Yeah, that's a great question. This one comes up a lot. It's actually really hard to say, so if you're just looking at accuracy, for example, one hypothesis is that people might run and course simulations in serial, or maybe they run and detailed simulations in parallel. So you might end up with, on average, the same accuracy. You might be able to take a look at variants that might tell you something. So if you're running and detailed simulations in parallel, you would expect there to be lower variance because each individual simulation is going to be more accurate, but it's kind of hard to disentangle those for sure. The one thing that we do know from this paper and from a lot of other empirical work is that people's response time changes for different stimuli, depending on the difficulty of the stimulus. So there certainly is something going on there that's causing people to change the amount of time that they take. And so that could be either due to the fact that they're running multiple simulations in serial, or maybe spending more time on a single simulation. Finding out the difference between that, though, is quite tough. So maybe we could take a moment and describe the experiment that you ran and think again what the setup was and what the participants experienced. The hypothesis that we had was basically that people should run different numbers of mental simulations when it's going to be more informative to do so. And so we set up a task that was going to have different trials with different levels of informativeness. So we showed people a 2D box from a bird's eye view. So you can sort of think of if you're looking down from above at a billiard's table and you see the balls moving around on the table, we had just one ball and the box has four walls, just like a billiard table. But we also put an additional wall in it with a hole in the middle of it. And the hole could vary in size. And so people saw the ball moving in a particular direction. The ball would then be hidden from view and they would have to answer whether they thought the ball was going to go into the hole or not, the hole that's in that additional wall. We varied the trials by actually having different margins by which the ball went into the hole or not. So the ball could either go in the hole but by a very small margin. So just go in right at the side. Or if you go in the hole but by a wide margin, so it would go right in the middle of the hole. Or it could not go in by a wide margin. It would hit the wall somewhere far away from the hole. Or it could not go in very close to the hole. So it would just bounce off the side just barely. Varying these margins varied also the difficulty of the task because if we're operating under this assumption that people have noisy perception and they're not entirely sure like exactly the direction of the ball is traveling, if it only goes in by a small margin then it's possible that the noise would cause them to think that it actually wouldn't go in. Whereas if it goes in by a wide margin, even if there's a little noise, it shouldn't change the answer while that much. And what were you finding? So we found that people did in fact take a longer time to respond on the trials that were more difficult. So the first thing that we looked at is sort of this qualitative relationship between people's accuracy and their response time. And what you find is that when people's answers are more accurate, so they're either very certain it's not going to go in the hole or they're very certain it will go in the hole, then they're much faster to respond and when they're very uncertain, so like when half the people said it would go in and half of them said it wouldn't go in, they take much longer to respond on average. We compared these response times and responses to a model that's based on SPRT or the sequential probability ratio test. We find that that does a fairly good job at capturing what people are doing. And so, you know, initially it qualitatively captures the same relationship that we find where people take longer to respond when they're more uncertain. That's something that's predicted by SPRT. And then when we actually like look at SPRT predictions in comparison to people's responses and their response times, we also find pretty good fits there. Yeah, I like SPRT a lot. I think it's a novel metric. Could you maybe unpack it a little bit more so that the listener has a good sense of what it is? So it's an optimal strategy for making 2 AFC, 2 alternative force choice decisions in the sense that for a fixed level of accuracy, it's going to be the fastest thing to use. It'll give you the answer in the fastest amount of time. The way that it works is it's essentially a random walk. So you set a threshold, so for example, t equals 2, and you have a counter that you start at 0, and then you take a sample and every time you get a positive sample so the ball goes in the hole, you increment the counter, and every time you get a negative sample, so the ball doesn't go in the hole, you decrement the counter. And you keep taking samples and incrementing and decrementing until you reach either the threshold or the negative threshold to give like an intuition for this and why it predicts the same relationship that we found in people's responses. If the probability that the ball goes in the hole is 1, then it will always go in the hole, and so you'll take exactly t numbers of samples, so t is 2, you'll take exactly 2 samples. If the ball, or if the probability is 0, then the ball never goes in the hole, and again you'll take exactly 2 samples, you'll just go in the opposite direction with this counter. But if the probability is somewhere in between as it gets closer to 0.5, you will end up sort of oscillating around 0 a bit and walking up and down until you finally reach one of the thresholds. So SPRT predicts that you would take more than 2 samples in that case, and the closer P gets to 0.5, the more samples it predicts you'll take because it's more likely that you'll go back and forth more often. So I'm kind of picturing myself as a participant in the experiment, watching a little bit of the ball move along and then it's hidden for me and I have to guess in this sort of pong, archonoid kind of game. Although I guess your analogy is much more universal, the billiards example, and I'm kind of convincing myself that the way I would do it is to sort of in my mind trace the bouncing and then whether it'll go in that kind of projecting the trajectory mentally. So in thinking through that, I'm thinking of myself with doing only one simulation. Do you think that means I'm just a poor performer that I'm not running multiple simulations, or am I actually deceiving myself that maybe subconsciously I'm actively considering multiple simulations? So I think subjective experiences can give us hints as to what might be going on, but in general, I always take them with a grain of salt because they can be unreliable. Even if it's your subjective experience that you feel like you're only taking one sample, it certainly could be the case that you're taking more under the hood. I feel like when I'm doing the experiment, there are certain trials that are easier than others and certain trials that are harder than others, and I definitely feel like on the harder trials I take a longer time to do it than on the easier trials. I guess sometimes I feel like I would reason about it again. I actually get the experience that I take longer simulating each part of the trajectory. So if the ball bounces once, then I would first simulate the part from the beginning to the first bounce and then from the first bounce to the end, which is a little bit different from how our model actually works. And that's something that we might explore more in the future, but yeah, it's certainly possible that the specific mechanics of how the simulations work are a little bit different, either from person to person or just different from how we're modeling them. We're sort of looking at it here at a more broad level where we can say, "Okay, here is one way that we can characterize this behavior in general, and then we can start drilling down and asking more specific questions about that." In the case of your experiment, I think if we were going to train an algorithm to do this, or we wouldn't even have to really train it, I think what's required to know if the ball goes through the slot is just basic vector calculus. And that's the most advanced math you'd need, but in a more physical system, like maybe the JINGA example, I think you really probably need differential equations to solve precisely center of mass and moving bodies and these sorts of things. So there's this strange contradiction for me because children can play JINGA, and they're generally pretty good at it, but I don't know very many children that can do differential equations. I could make the same analogy about catching a baseball, a pop fly baseball. So there's something that people are doing without doing the math. How is it possible people are so good at physical systems or predicting them when they maybe don't know the math or can't do it in a pen and paper kind of fashion? That's a great question, and I have sort of two answers to it. So the first one is that just because something is hard doesn't mean our brains can't do it. So, I mean, there's actually a lot of examples of stuff that people are really good at automatically and intuitively, that even our best researchers in our most advanced technology can't do. For example, scene understanding, where you walk into a room and you can immediately say what type of room it is, whether it's a kitchen or bedroom or an office, and you can immediately say what objects are where. Where would you expect to find a staple or you know that automatically? People are really good at doing that, but that's something that's at the cutting edge of artificial intelligence and machine learning. And same thing with natural language, you know, people begin being able to speak natural language when they're just a few years old, and yet that, you know, having really true natural language understanding by computers is something that's still a long ways off. So, you know, there can be very difficult things that we do automatically without knowing how to do it explicitly. So, the second answer to the question is, the type of modeling that I do is usually at kind of a higher level of abstraction and analysis, where the idea is we want to describe what the behavior is. And it's not necessarily a hypothesis of like how people are doing the computation, or even that people aren't explicitly like solving differential equations. Like, if you take a look at the behavior of a flame, that's described by differential equations, but you wouldn't say that the flame is computing differential equations. The differential equation describes the behavior of it. And similarly, these simulation models that we have of people's behavior, there are descriptions of what people are doing, but it's not necessarily a hypothesis that people are literally solving the analytical equations and doing that in an explicit manner. So, I think earlier, if I heard correctly, you said that the sequential probability ratio test is an optimal strategy. So, first, am I correct in what I heard, and secondly, can you talk a little bit about why it's the optimal strategy and how people arrive at it? So, the SPRT is optimal based on the way that we've defined the problem, essentially. If you define the problem as you need to make a decision, say, either yes or no to some question, and you can take samples that give you some evidence about that decision. And you say, okay, I want to take samples until I reach, like, in expectation across, like, all decisions like a possibility make a level of accuracy of a certain amount, so say, like, 95% accurate. So, with that fixed accuracy, you can derive, like, what is the smallest number of samples you need to take in order to reach that level of accuracy and expectation, and that's what SPRT gives you. So, it really just falls out of doing this optimality analysis of how you would optimally solve this problem. In terms of how SPRT relates to what people are doing in practice, you know, as I mentioned, it certainly seems that people are doing something differently depending on the difficulty of the task. It's up for debate whether SPRT is specifically the thing that they're doing, and it's certainly possible that it's not specifically SPRT. I mean, the fits that we got were pretty good, but not perfect. There's still some unexplained variance in people's responses, but it's possible that they could be doing something similar, like a similar type of strategy that's optimal, but under, like, different assumptions. So, maybe instead of, like, optimizing for a fixed accuracy, people are optimizing for something else. These are, like, alternate hypotheses that we could explore for sure. The reason why we picked SPRT as a model here is the sort of strategy that we take for modeling things in my lab and in Kevin and Ed's lab as well is saying, okay, what is the problem that people are trying to solve in this context? Like, what do we think that problem is? And then let's take a look at what the best way to solve that problem would be, which in this case would be SPRT. We can compare that to what people are doing and see how closely they match it. And surprisingly, in a lot of cases, people do seem to get fairly close to the optimal thing. There's also cases where they don't. But we think it's like a good place to start looking because it gives us kind of, like, a rational basis for why people might be behaving the way that they're behaving. And if they don't match that, then we can start asking questions like, well, why aren't they doing it this way? It might be that our assumptions are wrong, or it might be that there's, like, process-level assumptions that we need to take into account that we haven't been exploring. Interesting. So one way in which I'm kind of looking at it, I have some background in multi-agent systems. I'm familiar with this idea of being observationally equivalent, that I might not know another person's strategy. But if I know a similar strategy that more or less elicits the same behavior, that there's a certain equivalence there for me. Yeah. Is that something that plays into your work, or are you more deeply interested in the sort of ground truth strategy that people are employing? Yeah, I mean, that is certainly something that plays a role in our work. So the strategy that we take in our research follows two things. One is Mars levels of analysis. The other is rational analysis. Mars levels of analysis sort of break up the scientific problem into three different ways of looking at it. So there's the implementation level which is saying, like, how is this system actually implemented? So that might be, like, you know, how are the neurons in your brain wired up? You could be analyzing the behavior of a computer, you could say, like, you know, what is the actual circuitry inside the computer? The next level of analysis is at a higher level of abstraction, and that's the algorithmic level of analysis, which says, what is the actual, like, algorithm or procedure that the system is using? And then the highest level is the computational level, and that is asking, like, what is the goal of the system? What is it actually trying to do in theory? One good example for having an intuition about the difference between these two levels is that of a cash register. So the purpose of a cash register is basically to do arithmetic, right? You want to add numbers, subtract numbers, make change, and so that's, you know, analyzing it at the computational level, its purpose is to do arithmetic. So the algorithmic level is about the specific algorithm that's being used to actually solve the problem, and the way that it's, like, transforming the data. So that might be, like, doing your calculations in base 10, or doing your calculations in base two, or in hexadecimal. Depending on what your representation is, those require actual, like, different algorithms. If you were to, like, write down the code for it, that would be different for those different representations, but they're still ultimately computing the same thing, which is arithmetic. And then finally at the implementation level is how you actually instantiate that in a physical system. So that might be something like, you know, having a digital computer that's doing it, or having a system of gears that are somehow computing the answer. And so at each level of analysis here, like, you can have an equivalence class of models, right? So there might be many different models of the implementation level that are all implementing the same algorithm. And there might be many different algorithms at the algorithmic level that are all computing the same solution to the computational level problem. And so many different people across cognitive science and neuroscience approach problems at different levels of analysis. In our lab, we approach it at the computational level of analysis, typically. And we use something which is called rational analysis to say, okay, what is, like, the optimal solution to the problem? Not just what is the goal of the system, but given the environment that it's in and constraints that it has, which is the optimal solution there. And then using rational analysis, you can also start to work downwards towards the other levels. And you can say, okay, well, I know that there's going to be some process constraints, like people have limited working memory, so they should try to make, you know, decisions as fast as possible. So what would be the optimal way to make a decision as fast as possible, given these types of constraints? We talked a little bit earlier about some of your results, but I don't think I'd delve too deeply into some of the controls you had, a variable number of bounces. And as you'd mentioned, whether it was a close call or not, I'm curious to hear a bit more about what of the independent factors you tested that had a significant impact on the accuracy and response time. We varied the whole size and the trial type, which is the margin by which the ball went in or missed the whole. We varied those things because we expected it to have an effect on difficulty. And we did find that those two things have an effect on difficulty and on response time. The number of bounces was something really interesting that we varied that also to vary the difficulty because according to our probabilistic simulation model, the longer you run a simulation for, basically, the more noisy it's going to get because you're uncertain about exactly, you know, there might be a little bit of noise in the actual process. So if you run it for longer, then it's going to be more noisy. And there's similarly noise in the way that the ball bounces, there's a little bit of noise in the angle that it bounces off from the wall. So we would expect the number of bounces to increase the uncertainty and therefore increase the difficulty. What was interesting is that we did find that that was the case, but we found a much stronger effect there than we expected to find. Just based on the simulation-based model, that does predict that the number of bounces will increase uncertainty, but the amount by which it seemed that the number of bounces increased people's response time and had an effect on their accuracy as well, was much, much more than that, which is predicted by the simulation model. We don't really have, like, a super good hypothesis at this point for why that is. I mean, one possibility is that when you're actually running a simulation and you come to a bounce, you actually, like, take a little bit of time to think about where exactly, like, it is bouncing off the wall. So maybe you take more time to resolve abouts. That would maybe be consistent with this idea of running a little bit of a more detailed simulation. That's something that we need to investigate more in the future. Interesting. Yeah. So I'm curious to know what your thoughts are on the strategies people might adopt in experiments that have more than two outcomes. I mean, you describe yours as four, but essentially it's a binary proposition that ball goes in or it does not. Yeah. Right. Do you think your findings and hypotheses will scale to three plus type outcomes? I would expect that they would. So there's a paper by Ed Wool, who's one of the other authors on this paper, and others, we call this the one and done paper. That's the beginning of the title. What they did is they analyzed the problem where if you have a fixed amount of decision time, so you can either make decisions or take samples, then how many samples should you take? So it's a little bit related to the type of stuff that we did in this paper. The setup of the problem is slightly different. And they did a whole bunch of analyses on a lot of different things. So they didn't just do two AFC judgments. They also did an AFC judgments and continuous judgments where they looked at, okay, if people have to choose between, you know, six options, like how many samples should they take. And, you know, they've derived these optimal numbers of samples for like a bunch of different things that seem to be consistent with empirical data that's been collected in other contexts. Based on that, I would expect to find similar results that we found in our paper when you generalize it to decisions that have multiple possibilities or even continuous decisions. The way that I would do it is basically by starting, you know, starting at the same place and taking the same strategy of rational analysis that we did in this one. So say, okay, now you have to make a decision and there's like four possible ways you can respond instead of two. Again, like how would you derive the optimal strategy for doing that so that you can be as accurate as possible and also as fast as possible. So my next question is maybe a little bit of a stretch, but I see into my reading, there are certain parallels between your work and some of the things that are going on in deep neural networks. I'm curious if people in those areas have shown an interest in what you're doing. There hasn't been too much cross talk there a little bit. I was at NIPS this past December and I talked to a bunch of people there and there's a lot of really exciting stuff going on. There are there are like some relationships between deep learning and the stuff that I'm doing. I'm not specifically talking to people about it yet, but there's other people who are doing stuff. I mean, the first point is that neural networks and deep learning actually came out of cognitive science. And there's been connectionist models in cognitive science for as long as neural networks have been around, you know, there's certainly a possibility of connecting to stuff that I've been doing to those types of connectionist models, which are very related to the deep learning that's going on in machine learning. I know that there are people who are increasingly interested in taking a look at deep learning models and asking questions like, what are these deep learning models actually doing? Like what are their representations that they're learning? What are the concepts that they're learning? Which I'm really excited about that research because I think it's really important. A lot of deep learning models, you know, they work really well and it's like, okay, this model does super well in object recognition. But I don't know if you saw the like Google inception paper where they, yeah, or I guess it wasn't a paper, it was a blog post. But I think that kind of stuff, you know, doing those types of analyses, but in a little bit more like doing more psychology on neural nets is something that's really exciting and I think there's going to be some more stuff coming out of that soon. There's some people in my lab working on that, some people in other labs, more closely related to this, you know, think again paper. There's some people who are in Josh Tenenbaum's lab who are exploring the relationship between physical reasoning and deep networks. So there was some stuff at NEPs about that. There was one paper where they used a deep network to help to do visual feature recognition in combination with just a normal physics engine in order to do inference about physical properties, which was very cool. And there's also some other people who are looking at actually training deep networks to be physics engines themselves. So there's some people at Berkeley working on that. There's also, which is sort of related training deep networks to be graphics engines. So, you know, saying here's a model, can you render it under like different orientations and lighting conditions? Oh, very interesting. That's something else from Josh's lab. I'll have to look into that. I haven't seen that yet. Celeste, I wanted to ask what's next for your work. We are continuing to work on this paper a bit more. This was a paper at the Cogside Conference, which in our field, a lot of people will like publish something in Cogside, which is the first step of the research. And then later, they'll come out with like a longer paper that does more in depth analyses and maybe more experiments. So we're working, we are working on finishing up with that paper and we're looking at things like, what does it take. So in this paper, in the Think Again paper, we assumed that people had a threshold of two for the SPRT. This is something that we fit to the data, but it would be interesting to take a look at whether people actually adjust this threshold depending on the demands of the tasks. So if they have very little time to respond, then perhaps they lower their threshold. Whereas if they have a lot of time to respond and we really like incentivize them by paying them a lot for correct answers, maybe they would increase their threshold in order to increase accuracy. So we're looking at what types of things can we do to get people to adjust their threshold. I mentioned at the beginning that I'm interested in these other types of questions about how mental simulation works, like when people use it and how they construct them. And so I have a few other projects going on along those lines. I have one project that I'm working on, which is looking at how people use mental simulation and physical simulation to infer hidden properties of objects. So if you take a look at those block towers that I had, if you see the tower fall to one direction, then you can infer which block is heavier rather than just predicting that it's going to fall because you know a block is heavier. I'm also looking at things like what simulations people run specifically. So there's this really classic experiment by Shepard and Metzler in which they show people these like, it's a little hard to describe on a podcast, but these 3D objects that are like cubes that are stacked up. And so they kind of look like these like blocks, snakey things. They show people images of these objects and they ask them to say whether the same object or different objects. And they find that as the angle of rotation increases between the two objects that people's response time also increases. And so this was taken as evidence that people were doing a type of mental rotation in order to rotate the objects and then compare them. This was like some of like the groundbreaking work on mental simulation was from this paper. So their conclusion was that people are doing these mental rotations, but I find that explanation a little unsatisfying because it doesn't say how people know to rotate it in one direction or the other. So if you were to just pick a direction of rotation at random, then on average you would have constant response time. The fact that you don't find constant response time implies that people are doing something else. So that's the question of like how they actually decide what direction to rotate it in. But then I'm also interested in looking at like how people might decide between using a mental simulation as opposed to maybe a more rule based strategy for solving problems. Where is the best place for people to keep up with you if they wanted to follow some of your future publications? Well, I try to post stuff on my website as soon as it comes out. I'm not always the most up to date on that, but I try to be. I also try to keep my Google Scholar up to date so there's stuff there. I also usually try to post slides and stuff when I give presentations so people can see the stuff that I'm working on that it hasn't necessarily been published yet. Very cool. I'll link to those places in the show notes for anyone who wants to follow up. While I've got you here, since I am an avid project Jupiter user, I often do the show notes in it and I know many listeners are as well. Could you tell us a bit about your role on the steering committee? I've been involved with Jupiter for a few years now, even before it was called Jupiter when it was still just IPython. I got really heavily involved because I was a teaching assistant for my advisors class at Berkeley. This is an undergraduate class called Computational Cognitive Science. It talks about a lot of the same things that I've been discussing today. It's basically a class on the research methods that we use in our lab and also the history of computational modeling and cognitive science. For this class, we had the assignments. They used to be in MATLAB and they used to be distributed in a way that we would give students zip files of the assignments. Then they had to fill out the MATLAB scripts and then zip them back up and send them to us. This isn't so bad. This type of strategy isn't so bad when you have a small class. This class has 200 students in it. This was a lot of work. I was really excited about the notebook. The notebook is the perfect format for having assignments because you can have code and text in the same place. You can have your instructions. Immediately below the instructions, have students write code. Below that, have them generate a figure and have them write a free response, interpreting the results all on the notebook. It really is perfect for that. We converted the assignments to notebooks. This also required coming up with a solution for how to actually do auto-grading for the notebooks. I've been developing a tool called EnvyGrader that handles that and also finding out a way to make sure that all the students have access to the notebook. Because we had 200 students, we didn't want to deal with the headache of getting all 200 students with the same version of the notebook installed. We set up a deployment of JupyterHub, which is basically a way to deploy the notebook on a server. Anyone who has an account on the server can come and log in and have their own notebook server ready for them through the web browser. They never have to install it themselves. We set up this JupyterHub deployment for our 200 students and they use that to do the assignments. All of this work on developing EnvyGrader and deploying JupyterHub got me really involved in the project. Very neat. I've been eager to experiment with EnvyGrader for a project. I may hit you up with a question or two. Sure. Definitely. I'm really excited whenever I hear that people are using it. Yeah, in any event, thank you so much for coming on the show. I really enjoyed the paper. I'm glad you took the time to show some details about it. Thanks for having me and I'm really glad to be here. Excellent. And until next time, I'll remind everyone to keep thinking skeptically of and with data. A few quick announcements before we end the show. Berkeley, California, Thursday, March 10th, 2016. I'll be giving a talk at the La Pena Cultural Center. It's titled A Skeptics Perspective on Artificial Intelligence. I'm also finalizing a second talk in Mountain View, California, the very next day, March 11th, that's a Friday. I'll announce the exact details on that talk at datasciptic.com very soon and again here next week, so tune in. If anyone is in the area, I would love to have you come out and to meet the listeners and just enjoy the talks. I'd also love it if you would consider supporting one of my favorite podcasts currently in its second season. So I've got a special guest here with me now to tell us a little bit about it. Hello, I am Samuel Hansen and I am the host and producer of Relatively Prime Stories from the Mathematical Domain. One of my favorite podcasts, I hope many of my listeners are already on it. If not, it comes highly recommended. We're right now enjoying season two. Samuel, can you tell us a little bit about what's on season two? Season two has been great. I worked on it for like the last year and a bit. It's been a crazy ride, but I came out of it with stories about mathematical language and which president has a proof of the Pythagorean theorem that's in Lumis' big book of proofs of the Pythagorean theorem. There's a president, a US president. That's crazy. Stories about gerrymandering, about cities, why cities are like fractals and also some stuff about my own personal online dating profile and weird anti-math messages that I received from it. Yeah, I was oddly amused by those myself. I would also recommend the episode Chinook from season one about the checkers playing computer. I think that'd be of particular interest to data skeptic listeners. If anyone wants to hear about the around 80-hour bus trip that I took from Baltimore to Edmonton, Alberta, Canada to report that story, just send me an email. My email is very easy to find. There's all kinds of crazy stories about that trip that led up to covering that story. Is there going to be a behind the scenes? Maybe a way I could get a hold of a special edition of the show? Well, you know what? If you say back the Kickstarter for season three, I might actually put out on the bonus feed, which is one of the rewards you can get, a behind the scenes story of what it took to get me to Edmonton to report that story. So season three Kickstarter now, you've got now till the beginning of March, right? It looks like March 11th, if my own press releases to be believed, as it should be, to fund what's going to be a bit of a different season for us, because instead of me doing eight episodes and working them all at one time, it takes like a year to do them. Instead, we're going to try to take relatively prime monthly, and that means that on March 31st, so it ends on March 11th, the first episode would be released on March 31st, and then one more episode every month for a whole year. So that's 12 episodes of the best stories from mathematics around. Yeah, I'm friends with all those other mathematical podcasters. Mine, it's the best. This is an empirical study here. I've listened to all of them. Mine is the best relatively prime. I'm just so proud of the show. Well, you've got my backing and my vote on that as well. I'm a huge fan. I hope everyone checks it out. Heads over to kickstarter.com and search, or is there a slash where should they go? So you can just search for relatively prime and kickstarter, and it's the season three. One, the other two have already been finished. It should be easy to figure out. Or you can just go to relprime.com/kickstarter. I have a redirect set up on that URL, or you know, you can just listen to the show at relprime.com or subscribe on iTunes, Stitcher, you know, all the places you would say subscribe to Dataskeptic. Which you should also go rate and review Dataskeptic on iTunes so that it can stay above relatively prime in the rankings. We'll vote for both and we'll see what happens. Yeah, that would be great. Well, yeah, like I said, you've got my vote. I also have that in the show notes for anyone who has their phone open. Is that their computer wants to swipe right as the kids do? That link should be right there on most podcast players, and you'll be able to go straight through and support relprime. Let's say I'm your best of luck with it. Oh, thank you so much for letting me do this. My pleasure. For more on this episode, visit Dataskeptic.com. If you enjoyed the show, please give us a review on iTunes or Stitcher. (upbeat music) (beeping)