(upbeat music) - The Data Skeptic Podcast is a weekly show featuring conversations about skepticism, critical thinking, and data science. - Well, welcome to another episode of the Data Skeptic Podcast. I'm joined this week by Nicole Gobol. Thanks for joining me, Nicole. - Thanks for having me, Kyle. - So I first learned about who you were by the introductory emails we send around at Thinkful. - Thinkful is a company where we both do data science mentoring, it's sort of a online course with an interactive curriculum, and then you have one-on-one mentor sessions. So maybe we could just start there, tell me a little bit about how you got involved with Thinkful. - Sure, sure. Thinkful's geared towards people who are interested in becoming a data scientist, and that would include people with programming skills, and little go skills, and combining all the skills in order to use data to tackle whatever questions or problems you have at hand. - I've been doing that as an oceanographer for the past 15 years or so, but I'm in the transition of applying these skills in a different domain. So I'm interested in transitioning from oceanography to tech, and becoming more formally what they call a data scientist. And so, you know, this is transition I've been making for a while because there are different tools that people use in tech versus academia. And at one point, Thinkful contacted me and said, "Hey, are you interested in kind of sharing "your knowledge with other people making the same transition?" And so I became the mentor. - Very cool, yeah. I was really taken by, when they send around the introductory emails, everyone says kind of what they're doing and what they're up to. And it's such a fun and diverse group of people trying different aspects of applied data science. And oceanography was one I was new to. So it really caught my interest, and I thought we'd have a fun discussion about the overlap of data in that field. - So also, you were mentioning making the transition. I know you've recently accepted a new position. So congratulations, first of all. - Thank you. - Yeah, I'm excited. - Would you like to share anything about your new endeavor? - Sure, sure. It's, you know, I was intimidated to make this transition from, you know, academia is this very comfortable little bubble world that we all live in. And I, you know, I decided, right, enough of academia, I want to, you know, just happen to all these amazing opportunities and Silicon Valley. And once I started looking around, I mean, wow. I mean, there are some phenomenal things going on. Companies are just doing the craziest things. You know, I've interviewed companies who are creating new types of food from plant proteins only, you know, they're trying to tap into this resource of plants that are being underutilized on our planet. But I ended up with a company, and I was kind of looking for something more kind of environmental, or perhaps solving some social good type problem. And I ended up joining a company. I start next week who are using sensor networks to monitor pollutants and the environment with the interest in how it impacts society and their health, you know, just making people more aware of the environments that they live in. And so we'll be, it's been looking at large amounts of time series data being collected on various pollutants in the atmosphere, kind of climate related to those. - Oh, how neat. Is that something that's tracked very well today? - Well, the thing is, it's tracked in very remote places. You know, they measure, for example, carbon dioxide and the atmosphere, and they, you know, they see these upward trends. But nobody's making these measurements in the spaces in which we live. And so that's kind of the angle that this company is taking. - Interesting. Yeah, I'm really interested to see where that goes. - Yeah, me too. - So you told, we were speaking earlier, you mentioned you start out in biology and got into oceanography. Maybe if you would mind share a little bit about your background and your journey towards seeing data as a major interest for yourself. - Sure, sure. So I started off as a biologist and went into the marine sciences as a field biologist where I was on boats and collecting water samples and measuring what was in those water samples. And these laboriously collected data sets are relatively small. They fit inside an exolce spreadsheet. And over time, I started using more complicated equipment to collect measurements. You know, they collected measurements on the scale of seconds rather than, I don't know, days. And, you know, all of a sudden, my data sets no longer fit into an exolce spreadsheet. So I had to acquire the programming skills in order to collect and analyze the data. And from there, I kind of took a leap into ecosystem modeling where we would formulate predictive models to better understand where, when, how, and what types of organisms grow in the environment. So the same way, say, the weatherman would predict, you know, if it's gonna rain tomorrow, we would predict if there is going to be a phytoplankton bloom in a certain area or predict what types of organisms we're gonna find in that area. And that required, you know, generating, you know, the models that we formulated would generate enough data that we would need to run these models on distributed analyzing on the order of terabytes of data. So that definitely really upped my programming skill level. - With regard to those models, the ecosystem modeling you were mentioning, I'm curious a little bit about, I guess the right word would be precision. Like when I think of physicists, you know, they, let's say we had some N-body problem in space. If you look at just Newton's equations, they're incredibly complicated, but if we assume there's no relativistic effects, we can program that and maybe we need a lot of computational power to solve it, but there's a lot of confidence in the model itself. And then, you know, in my daily life, I often work on models where there's a large variance just in the nature. The model can't perfectly describe the system we're trying to measure. What's the state of the art in oceanographic modeling? - That's a good question. You know, it really depends on whether you're talking about the physics or the biology. You know, I was more concerned with the biology coming out of the model, but the physicists that I work with, we basically coupled a biological model to a very well-established physical model that's been in existence. It's still in Fortran, and it's been in existence for like 30 some years. And they have done an incredible job of representing the physics of the ocean very well with this ecosystem model. Partially because they've been working in it for so long, but also because they have the data to kind of check the consistency of the model with. Biology is a whole nother game changer because it's so variable. And to check the consistency of your model with actual data is a very difficult thing to do because the number one phytoplankton, which is the small phytoplankton cells, that thrive in the ocean are basically microscopic plants and they exist within the model. They themselves come and go very quickly. They grow and die very rapidly. So number one, they're hard to measure and capture. And number two, it's very expensive to go out and you actually measure them. And so being able to get the quality of data at high frequency in order to check model consistency is near impossible. So if we're within plus or minus 50% of these measurements of phytoplankton and the ocean, we were kind of happy. But that of course would not be acceptable in terms of the physics. - Sure, sure. Well, the ocean is an enormous place. As everyone knows, we have more water than land on our planet. How do you even embark upon trying to measure such a space so vast? - That's a great question. You know, traditional oceanography, oceanographers will go out on boats and they'll try and, you know, collect water samples as frequent and they can or use oceanographic instrumentation to kind of collect data at a higher resolution. That's the most traditional form of collecting data. But more recently, well, it depends on what you're measuring. But in terms of the work that I was doing, satellites have been an invaluable source of data collection because they can take large snapshots of the ocean, you know, large expanses of the ocean, a relatively frequent temporal scale, but they get this very large spatial scale. The problem with satellites is that sometimes there's clouds between the satellite and the Earth. And so you don't always get the complete picture. And so what they do is they, you know, they satellites orbit the Earth and they get one snapshot, basically the snapshot every day. So they'll make composites of these snapshots. So they'll say, here's a weekly snapshot of what we see or a monthly snapshot and you can kind of start to fill out the gaps as you take these photos. And so these satellites capture a variable such as temperature and, you know, I was concerned mainly with phytoplankton biomass, which is measured by essentially taking a picture of the color that's reflected by the pigments within phytoplankton soils in Hall of Warfield. Yeah, the same way you kind of get these images for us by, by the same way, by a chlorophyll pigment and phytoplankton, you know, same evidence used for capturing the phytoplankton and the surface of the ocean. Well, the other problem with satellite, so the data that's how I like collecting is they get this vast extent of the ocean, but they don't get these fine-scale measurements. You know, I would say they're getting, you know, the finest resolution they're going to get are kilometer-based measurements and sometimes we need finer scale measurements than that. And there are other approaches, such as taken by liquid robotics, a company over here in Sunnyvale that has these autonomous vehicles, essentially drones, that can swim along the surface of the ocean and collect data with onboard sensors. And they're pretty cool. They're driven by basically wave action in the ocean and they have solar panels onboard provide energy for the sensors. They're very robust instruments that can just cruise on the ocean and collect data at very high resolution and on the order of meters. - You need to, they're kind of autonomous robot sensor units. - Yep, yep, it collects tons of data and it's a heck of a lot cheaper than sending a boat out, you know, both $1 a day. And it's pretty eco-friendly, energy required. Yeah, those are kind of three different ways that people collect ocean graphic data. - Me, so liquid robotics, am I correct in saying they launched the PAX Challenge or PACX Challenge? - PACX, so Pacific Crossing. So they thought, well, how far can these, they call them wave gliders. How far can these wave gliders go? And so they put four of them, dropped four of them in the ocean, I'm close to California and they set them across the sea. And all four went to Hawaii and then they split and two went towards Australia and two tried to make it to Japan. And so they, you know, it took about a year for the wave gliders to make their destination and they thought, well, what do we do with this data? And so they put a call out to the scientific community and said, you know, put your proposal and what would you be interested? What would you do with this data? So I thought, well, this is what I would do and I submitted a proposal and I was chosen as a planless. - Very cool, so tell me a little bit more about your proposal. - I'm interested in phytoplankton in the ocean, these microscopic, essentially the simplest way to explain them is kind of microscopic plants in ocean. It's very hard to see, you need a microscope to see them, but they take up carbon dioxide and they breathe oxygen. Well, breathe is empathy morphism, but they supply the majority of oxygen that we breathe and they're the basis of food chain. So they're in this ocean component of the marine food web. But they're very hard to measure because they're tiny, they grow and die very quickly, they're very patchy throughout the ocean. So we don't have a very good sense of how much and at the rates that they phytoplankton grow in the ocean. And, you know, as I mentioned, we have the satellite data, which gets very kind of large scale coverage, but not at a very good temporal scale. And then we have these oceanographic instruments that are collecting them just in certain regions and don't have, you know, they might be more highly results within local regions, but not across the earth. And then we have this new technology that's collecting at high resolutions and you can essentially send a whole array of them out there to cover your spatial coverage. So I wanted to see whether we could use these instruments to improve our estimates of how much phytoplankton's in the ocean. That was kind of my main focus, like can these new instruments from this new way of measuring phytoplankton's in the ocean help improve our measurements? You know, I looked at this data and I compared it to satellite data to try and see how, basically see how they compared and it was encouraging because the wave gliders could capture the patterns that were also captured by satellites. So they complimented the satellite because they also collected extra information. They could pick up these not only fine scale spatial measurements because they're collecting data every couple of meters, but they were moving slowly in enough that they would capture temporal variation in the water column. So if you imagine like a patch of phytoplankton oceans, like, you know, in the order of maybe a kilometer long, it takes them a while to move through this patch of phytoplankton and phytoplankton vary in their concentrations throughout the day. So it started to pick up on daily or dial changes in the biology and the physics going on in the water. And so it was kind of like having almost like a stationary mooring but something moving as well at the same time. And it was great at capturing both the spatial and temporal variability. - Yeah, I love when it's one of my favorite aspects of science when you have two completely different make ways of measuring things that both converge on the same answer. - Yeah, they're complimentary to one another, right? Like the way gliders aren't capable of getting a spatial coverage at the moment, but they can say, hey, we know these tiny little variations and you know the large scale patterns and the two of us can work together to kind of capture the whole picture. - So I have to imagine there were some big challenges there between the satellite data that was vast, but I assume low precision, we can probably say maybe in kilometers, how much each pixel's worth. And as you were mentioning, because it's a composite, there's this time sketch, versus you have the wave gliders that are very precise, but in a single location, it might even have some difficulties in capturing and transmitting back data. I would guess that you don't have that perfect scenario situation where there's two tables that have a common primary key, you can just link on. So did you run into any issues in, I guess, convoluting the two data sets together? - Oh, absolutely. I mean, I wear to wave and start, so that to begin with this chlorophyll content to this pigment and phytoplankton that we use is really a proxy for how much phytoplankton is there, because each phytoplankton has a different amount of chlorophyll, how close the chlorophyll is to the outside of the cell, how does it change, and you know, this relationship between the amount of chlorophyll, and there's their physiology changes throughout the day, so it's really to start off with chlorophyll as a proxy to measure, and now suddenly satellites are trying to measure this from a satellite, I mean, it's very far. And they're using various algorithms to try and improve the relationship between the color changes that they satellite these, and you know, what's actually going on in the water, so that in itself is a pretty loose relationship. You know, same with the fluorometers, that's the instrumentation that's on the wave gliders or, you know, on the oceanographic instrumentation that's also used to measure these changes in chlorophyll in the water, I mean, that has its own issues, you know, trying to calibrate those machines. So now you've got a satellite that you're calibrating with various algorithms and, you know, a instrumentation that you're trying to calibrate, along with the drift that comes with, you know, sticking something in the ocean for months at a time and having all sorts of growth, and things affecting how accurate it is. It's, yeah, it's a challenge for sure, and it was sometimes, you could say, you're comparing apples and oranges, but it was so, it was, it was certainly a challenge to compare two very disparate sources of data, but luckily, because they were picking up on these large spatial scale trends, we could connect the two in some way. It was a tough one to get past the reviewers, but we finally got a way to make it palatable. - Very interesting. So the, I guess the calibration and the variation of observations has a lot of intuitive sense to me. You'd mentioned something I'd never even considered. Tell me if I had this correct, that something might actually grow on one of the wave gliders that you could have a organic life attach itself and block a sensor. Did I understand correctly? - Absolutely. Now that's a huge issue, and, you know, it can cause drift in the baseline of the instrumentation. So if you've ever jumped in the ocean and looked at the underside of a wharf or a boat or anything that's kind of been sitting in the ocean for an extended period of time, you're gonna see all sorts of growth, elbow growth on it. You know, these kind of plants that attach themselves and say, oh, this is a great spot to hang out and grow. You get the same thing, some people get the same thing and they're, I don't wanna say it, but they're Bob's tub, or, (laughs) you know, maybe it was an hour or two months. - Right. - But yeah, no, you see growth wherever there's moisture and nutrients and light, and those are the conditions under which plants like to grow. And so, you know, you have something that's, yeah, readily easy to land on, and yeah. - Makes sense. So tell me a little bit about what the data is there for. Like, you know, we have these great data sets that are being collected from these multiple lines of independent observation. Are these helping us build new models, or are they helping us test the ones we have, or perhaps helping us to observe, you know, what change is going on? What does the data ultimately allow scientists to conclude? - Ecosystem models, in terms of the biology, certainly require better measurements of, for example, by the plankton and higher trophic levels in the ocean. And that is something that is very much lacking in oceanography. So for example, the physical models are very well-represented of the fiscal variables measured in the ocean because that data is so readily available. And how that data is used to improve models is through what they call data assimilation. And so, while running a hindcast of a model, the model kind of pauses every once in a while, aligns itself with observations and says, how far off am I at the moment? And it adjusts itself to the observations which it considers truth, and then carries on. And so with this method, the model can, in a way, tune itself so that it becomes more accurate. That's a very, very high-level description of data assimilation and how our models can improve themselves by utilizing observations. And so that's easily done with physics, but it's a whole not a game changer when you're trying to do that with biology. Not only because of the lack of data, but the quality of the data. As I described these, I'm trying to measure five points and it's challenging. And then, you know, acquiring or using that to kind of check your model consistency as challenging as well. - Makes sense, yeah. So that data assimilation process you're mentioning is that algorithmically driven, or does someone kind of just have to check in and retune a model, or is it like auto-aggressive? How does that take place? - No, it's, that's, I'm associated with the group that was doing that type of work, but for data, it's algorithmically driven. And I would not do justice trying to describe the mathematics behind it because it's not something I have done direct research on. - No worries. So I would imagine that some of that helps fine tune. I wanna make reference to a video of yours and I'll put it in the show notes, but if I understand correctly, it shows the modeling out of different phytoplankton, you know, rises and falls in the changing oceanographic system, is that kind of a fair description of the video? - Yeah, exactly, I, if it's, is it the one where it's kind of a topographical contour map? - Yes. - Showing, yes. Yeah, so I mean, there's, I think there's two different ones. One shows the change in biomass, which is just a measure of how much phytoplankton is there. And there is another one that I showed that shows changes in diversity, or which is basically how many types of phytoplankton are found in the ocean. And that is something that's unique to ecosystem models, often they contain only one or two different types of phytoplankton, but we were using a model developed by McFalo's at MIT in his group that enabled a model to represent on the order of hundreds of different types of phytoplankton, which starts to, it doesn't come close to how many phytoplankton you actually live in the ocean, but these analogs do a very good job of representing the diversity of phytoplankton ocean and how that changes with time and space. - So I would imagine in running simulations like that, you've got a mountain of different challenges because you're trying to describe this, you know, massively complex system on presumably one high-powered workstation or maybe a cluster, but still something much, much smaller in the ocean itself. So I'm curious if, are the limiting factors, how expressive the model is or how advanced our technology has become? Or perhaps it's, you know, the absence of observations? What limits you from making the most accurate model your heart might desire? - You know, I think computationally we're pretty good. I mean, you can get a bigger cluster to run a model faster and that definitely wasn't the limiting. I think the limiting thing is the observations to which to compare our model to. So here we have this model that we're able to say, these are the types of phytoplankton that we're finding as a coach. These are the types of phytoplankton that we're finding offshore. But to find the daily measurements or monthly measurements that align with those model constituents is not there. And so, you know, it resulted in kind of saying, okay, what other uses kind of this model? What, how else can we use this model to kind of represent our ideas? And so, you can only take it so far by comparing it to observations. So I took it in another direction and said, well, why don't we test some ecological theories and see if our model aligns with it? And one of those ideas was to say, well, terrestrial studies are far ahead of oceanographic studies in a lot of ways and they've made a lot of observations where they can say, right, the productiveness of plants increases as you have an increase in diversity. And well, can we test that in the ocean? Well, you can try and go, and the way they test that is they set up these experiments where they have a plot of land and they put in one plant and they see how productive it is. And then they have another plot where they add another plant and they see if that increases production and so on and kind of create these experiment design where they can demonstrate how productivity changes with diversity. You can't quite do that in the ocean unless you have these small mesocosomes but then these mesocosomes or basically small ponds don't represent the advection and transport that have a large effect on phytoplankton production in the ocean. So the best way when it comes on to it is with an ecosystem model. And so I ran multiple experiments using this model where I would add, I run the model with one phytoplankton type and then I would run the model with two different phytoplankton types and three types and four types and then also different combinations within those types to look at how the productivity of the ocean changes with the number and types of phytoplankton in the ocean. And this gave us great insight into firstly, whether or not the model represented what we expected to see in nature, but also it helped us understand how the model worked and whether or not it agreed of what was going on in nature. And so it was a kind of an interesting way to use an ecosystem model and show that, hey, this actually represents what's going on in nature. And from there, I'm giving you the very brief impression, a high level overview. From there you can springboard and say, okay, well, if this is what's going on and this is what's driving it, what's going to happen if these changes occur in the ocean? And you know, you can start looking at climate change and making predictions about how the ocean might change in response to the change in climate. This is the very long winded version of study that took me years. But yeah, it was really cool to see how we could model changes in phytoplankton, we could show the relationship between productivity and function of an ecosystem using an numerical model that represented nature. - Yeah, and the ability to predict and kind of explore what if scenarios, I think, was especially crucial in an era of climate change that we seem to be going into. - Yeah. - So I have to say the video I'll put in the show notes. I'd recommend people to check it out. This might not be the feedback you want, but it's absolutely beautiful just to look at was my first impression. And I definitely want to go back and watch it a few more times to appreciate the information being conveyed, but just sort of the color and the, I guess the stochasticity of it is just fascinating to watch. So it's a great entry point for anyone who just wants to casually look and then go a little bit deeper and understand what the model's telling us. But it does raise some interesting questions about climate change. And I know this is a big topic, but what the ocean can teach us about the nature of climate change. - Well, using these models, you can kind of test, it's funny. The predictions being made about how ocean physics might affect, say, the biology and the ocean can swing two ways. You know, there are some people that say there's gonna be an increase in wind. This increase in wind is going to increase the effects of upwelling. So upwelling is basically when nutrients from deep down in the ocean are brought to the surface and it's driven by the wind at the surface. So basically we have these strong, and along the coast of California, we have these strong Northwest winds that come down the coast. And Coriolis force causes these winds to blow offshore. And when it does this, it pushes the surface of the water out to sea. - Now, if it kept doing that, we wouldn't have any water left at the surface. So what happens is it colds water up from down deep. Well, this deep water, if you've jumped in the waters off of California coast, it's freezing cold, but it's also nutrient rich. And that's what fight of plankton models. They're plants, they love nutrients, they love light. And so when they get this pulse of nutrients from below and they grow like crazy, and this causes very productive waters. And would think, oh, well, more productivity is better, right? Well, in some cases, it's not. If you go to places like Long Island Sound, where there's too much fight of plankton growth, because say unnatural inputs of nutrients into the ocean, this can be judgmental. You know, you can get too much fight of plankton growth, which cause knockovers are predicting a decrease in wind. And this would decrease this upwelling effect of increasing nutrients to the surface. And then that would cause fight of plankton growth to decline. So it's interesting how these, you know, it's, I guess it's a sign of the variability and the predictability of these models to, I guess that's a sign of climate change. You know, there's models, I guess, you know, who said that models are never right. Oh yeah, I'm drawing a blank on the who that quote is from, but it's a wonderful quote. - Yes, I don't know verbatim off the top of my head, but I think models definitely have their use in order to demonstrate what could happen. And, you know, as we collect more data and refine our models and improve the accuracy models, you know, we can further improve our predictions, but they're always kind of these proximate measures, proximate, please. - Well, so I kind of like to wind up all my episodes by asking my guests for two recommendations. I call the first the benevolent reference, something that you're not necessarily affiliated with, but you think is an interesting worthwhile thing to share. And secondly, the self-serving recommendation, something ideally you get some direct benefit from by appearing here on the show. - Well, I think I have to refer back to this course that we're mentoring on. I don't necessarily want it to be applied, but I do think it's a worthwhile program. You know, having made this kind of transition from oceanography to data science myself, I think it's a great program in order to really learn the scope of the skills necessary to become a data scientist. And the beauty of it is having somebody to bounce your ideas off and learn from by having that mentor. I've done online courses and done a lot of self-taught teaching and having that connection with real people. There's no replacement for it, so. - Totally, yeah. - I think it's a great program. - Yeah, we're not having an official advertisement or anything, but it's, you know, there are many great options to learn and think of certainly one of them. But I also want to put in a plug for your blog, datasciencegirl.com. So people can find that in the show notes, but it's a great place to get a few deeper details than what we covered here today on some of your work. And in a way, I see, I don't know if you did this on purpose, but there's a nice correspondence between your blog being, you know, articles that are accessible to the general public and they line up kind of one-to-one with some of your research papers. So there's a great opportunity for someone to take a first stab and then go deeper if they're interested in learning more by starting there. - Cool, thanks for that, Kyle. - Yeah. And I look forward to hearing in the future about what goes on with your new endeavors. - Oh, thanks very much. - Hey, thanks so much for joining me today. - No worries, thank you. - Enjoy the rest of your day. (upbeat music) (upbeat music) [BLANK_AUDIO]