Jeff Stanton joins me in this episode to discuss his book An Introduction to Data Science, and some of the unique challenges and issues faced by someone doing applied data science. A challenge to any data scientist is making sure they have a good input data set and apply any necessary data munging steps before their analysis. We cover some good advise for how to approach such problems.
Data Skeptic
Practicing and Communicating Data Science with Jeff Stanton
(upbeat music) - The Data Skeptic Podcast is a weekly show featuring conversations about skepticism, critical thinking, and data science. - Well, welcome back to another episode of the Data Skeptic Podcast. I'm here this week with my guest, Jeff Stanton, who's the author of An Introduction to Data Science. Thanks for joining me, Jeff. - Sure thing. Happy to do it. Before we get started, maybe you could give me some of your background academically and what led you into the field and how you ended up writing this textbook. - Sure, well, my graduate trainings in industrial and organizational psychology, which is a field that is known for its quantitative methods. So I had a lot of training when I was in graduate school, and I brought that to my research. I do research on surveys and statistics and building psychometrically sound measures, for instance, job satisfaction and job stress. So I ended up coming to an interdisciplinary school, which is where I'm at now at Syracuse University in the School of Information Studies, where I found that those skills translate into this kind of emerging field of data science. So it turns out that being a psychologist is actually pretty useful for something. - Yeah, it's funny how data scientists, a lot of people think, oh, you're a computer science guy or you're statistics, but it's such a, as you say, multi-disciplinary thing that I see a lot of leaders in the field coming from sometimes unexpected places like psychology and what have you. - Yep, very true. - And one of the first things I noticed when I read your book that attracted me to it was, it is very accessible. You don't have to be that CS or stats degree person to get value out of it. In fact, one of the first people I was looking around to try and help was a friend of mine whose degree is in finance and has spent 10 years in that field, so they're a whiz in Excel, but no real programming or anything beyond that. So they were having trouble finding a good, getting started resource. And I recommended your book and it's worked out well for that person. So was that a deliberate choice to make the book, you know, accessible to a wider audience? - It was. I've been teaching statistics for, geez, almost 20 years now. So when I first started, I was really hardcore and I put up a lot of formulas and proofs and just totally freaked the students out. I realized after doing it for a few years that there's a way to teach statistics and quantitative methods, which really focuses more on visual thinking and analogy reasoning. So I tried to build that kind of thinking into the book rather than providing the typical kinds of formulas that you might find in a statistics textbook. This sort of explains things empirically and with diagrams. I don't know how far you've gotten in the book, but by the time the book starts to talk about high level statistical concepts, it really is focusing on working with actual data rather than trying to understand, you know, the more proof or mathematical oriented aspects of it. - Yeah, I think that keeps really engaging. In fact, I was really impressed that, you know, readers are effectively, if they, you know, start at page one and work page by page, they're up and running in R and playing around with census data at only about a third of the way through. Can you maybe talk a little bit about some of the strengths of R and how users who have maybe some Excel experience, but not necessarily programming experience, are gonna be able to get started with the help of your text in both R and with the use of R studio? - When I first said that I was going to write a textbook in this area and I was going to use R for it, people thought I was an idiot because R, being command line oriented is pretty difficult for beginners to get used to, especially if you're used to point and click interface like you find in a spreadsheet. - One of the things that I argued though, is that by reducing things to kind of an atomic level, it forces you to get real with your data. It forces you to really get in there and understand what you're working with. One of the commands, first commands that I teach to my students when they're using R is STR and that reveals the structure of a data object. And that's really useful because you have to understand what's happening at that kind of atomic level before you can build a foundation of understanding larger data structures like matrices and so forth. So R is actually really good for exposing the insides, if you will, of the data management and statistical processes. It is a bit of a learning curve though, for people who've never used a command line interface. And I think that's almost everybody these days because there are very few people like me who go back to the days of DOS and are used to typing things at a command line. So it is definitely a bit of a learning curve but I think there's an advantage in there as well. - Yeah, absolutely. I'm sorry the book is free to download and I think there's also some MOOCs via course sites attached to it as well. Can you talk a little bit about the different courses that are available that are using the book? - We have the course sites MOOC which is actually a freestanding or self-paced version of a MOOC that we offered a couple of years ago as an interactive course. And that was right at the height of the MOOC movement, if you will. And we wanted to do something in order to really get a sense of what was going on in that space. And data science was also hot at that time and it was right when we were trying to start our data science program here at Syracuse University. So everything came together over building that original interactive MOOC. And then when we were done with it, we found that people had enjoyed it so much that they didn't want us to close it. So we converted it gradually into something that people could take as a self-paced activity. So it's worked out pretty well. It helps to, I think the book and the MOOC actually worked very well together. So the MOOC contains supplemental materials that people find very helpful. There's only one little hitch and that I've heard that Blackboard, which is the owner of the Course Sites Company, is going to change the way that it runs things and Course Sites as a freely available product, I think, as I heard is going to go away. I don't know any of the details of that. But we're going to have to look for a different way of presenting our self-paced MOOC going into the future. - Sure, so what's the best place for a listener to go to not only download the book, but to keep up with news on where the course may be evolving to? - Probably just to my website. It's a really easy URL to remember. It's JS, like my initials, JSResearch.net. - Cool, and I'll be sure to put that in the show notes as well. - Yeah, people can download the book there. There's also some supplemental slides that I put up at SlideShare, which people find pretty helpful. And as things change on the MOOC offerings, we will link to them from there as well. - So no, there's this common platitude that you see a lot in the discussion about big data. It's the four V's, volume, variety, velocity, and veracity. And I've never, myself, found this all that useful of a description, but I found in your book, I don't know if you coined it, but at least it's the first place I encountered it was the four A's, which are data architecture, data acquisition, data analysis, and data archiving. And I thought these were a much more meaningful way of expressing the power of data science. Can you give a quick definition of these guiding principles? - I don't know if I can give a good definition, but I can tell you the origins, which might illuminate it a little bit. Our school is pretty interesting in that our roots are in library science. It goes back to the 1800s. That's when our library science program first opened. And if you think about what librarians are really good at, they are really good at keeping information organized, curating it, making information available, helping information users make good choices as far as information that they can use in whatever domain that they're applying it. - That's kind of the roots and the core of our school and that gets projected into our data science program such that we really look at the whole data lifecycle from the point where you're designing some kind of information system so that it can handle data, so that it can collect data, so that it can manage to store an archive and analyze data, all the way to the point where data are ready to go into some more permanent warehouse type storage. That whole chain of handling data is really important in data science, in my opinion. And the interesting thing about it is if you go on the street and you talk to somebody who is interested in data science, the thing that they really wanna do, the thing that they consider the most interesting and cool is just the analytics part. And so we had this kind of tension, if you will, between what data scientists really need to know, which is the whole data lifecycle from design all the way to when data are retired. And yet the thing that people think of primarily is the analytics piece. And so what I wanted to do by thinking of the four As instead of the four V's was to sensitize people to the idea that you need to start from foundations, right? You need to start by architecting a system that makes sense and will produce data that are usable over the long run. And then there's the whole data collection process that determines the provenance of your data, which is really important. You can only make certain kinds of reason decisions if your data came to you in a certain way. And then of course the analysis, which most people consider to be the core of it, that is certainly key and a big emphasis these days. But then archiving is the kind of closing act, if you will, that makes data reusable for the future, worrying about things like metadata that often are kind of second or third thought for analysts. But they're really important because some of the value in data won't even show up until months or years later. And if you don't have a great way of understanding what you had, what that old data had, you won't be able to reuse it. And reusability is really a key part of this. So that gives you the whole spectrum rather than just the focus on the analytics piece. And that was important for us as we were developing our data science program. And important for me as you see in the book that I try to represent all of those steps. - Yeah, absolutely. I was appreciative that that was expressed there. Another point I also appreciate a great deal was you touch on in the book the relationships that data scientists will have with their subject matter experts or SMEs. And I think that's often overlooked as well. Could you talk a little bit about how a data scientist should nurture that collaboration? - Sure. One of the things that's really hard for people who study statistics is to think about the application areas that they will be working in and to imagine and then ultimately do the kinds of communications that are necessary in order to make their skills worthwhile. I teach my students to perform some procedure like they run a linear regression. And then they're really happy about that 'cause they've mastered a new tool and they feel like the world is their oyster and they're ready to hit things with the proverbial hammer. But there's a kind of a second step in the sense that if you don't know how to talk to the people in the domain where your statistical tools are being applied, if you don't know how to elicit from them what their problem is and then you don't know how to communicate back the results of the analysis in a language that they understand, you're not gonna make good use of that tool. And I think it's a really important relationship to develop because the data scientists or the statistician or the computer programmer, whoever is building things from tools, they need to be able to understand certain key aspects of the application domain in order to be able to work effectively. And I'll give you an example, a long time ago in a previous life, I was a computer programmer and I was asked to build an application that would be used in the music studio. And the only problem was I didn't know, I didn't have the slightest clue what people did in music studios and what their needs were. The first thing that I really had to do was go hang out in a studio and talk to engineers and watch them at work and see how they get their job done. And then I was able to understand what I needed to build into the tool in order to make it effective for them. And it's the same for data scientists. You need to be able to understand some aspects of the application domain in order to be effective. And that's facilitated by having a good communicative relationship with a subject matter expert. - Yeah, I had a somewhat of a mentor of mine say it in a less eloquent way, but his advice was, well, you gotta go in and find the guy who knows where the bodies are buried. - That's right, absolutely. - So another great point you made that really sold the book for me because I think it's all too often misunderstood and I'll paraphrase a little bit from the text is when data sets get to be a certain size, conventional tests of statistical significance are meaningless because even the most tiny and trivial of results are statistically significant. And in this era of every five minutes there's a new big data headline. There needs to be this cautionary tale in the data science world about a petabyte of data will always elicit some correlation. Doesn't mean it's an actual correlation. So I was really appreciative you took some time to cover that. Can you maybe talk a little bit about the responsibility a data scientist has to reconcile working with those big data sets and knowing when and how to apply the right statistical tools? - This is, I think, part of an ongoing debate which we're seeing in the statistical area between the frequentists who were kind of the dominant force in the 20th century and folks who have begun to question that approach to things rewind all the way back to Ronald Fisher in the beginning of the 20th century and he wanted to create a simple procedure that people could follow in order to apply statistics and it was, I think, kind of a good-hearted idea. I wanted to make, kind of, a foolproof system and that's where we got our null hypothesis significance test which served us well for many years in lots of different ways, particularly in the time of small data sets because during that time of small data sets, sampling error was a critical problem. So the tools that we had and that we applied 20, 30, 50 years ago were appropriate to the data at the time. Now, though, with a huge data set, there really isn't much point in performing a null hypothesis significance test because it's always going to be significant. What matters much more is effect sizes, the amount of certainty you have around a result and being able to translate that, very apropos to your previous question, being able to translate that into a practical result in the application domain. And one of the examples I always give my students is, would you, if you could improve a, some sort of a commercial process by 1% and you have the statistics to back that up, would that be worthwhile? And they struggle with it for a while and they say, well, some say yes and some say no. And then I say, you know, you need to be able to translate that 1% into dollars, right? How much investment would it take in order to realize that 1% improvement? And then how quickly will that 1% improvement yield a return? And if you don't do that, if you don't translate it into the practical domain, then the statistics are worthless regardless of whether the test was significant or not. And so turning it into some sort of practical implication is a key part of the process. And something is not well served by our older methods. And I won't say anything about Bayes unless you ask me. Oh, please do, Bayes is one of my favorite topics to get to on the podcast. - So I've just this semester for the first time started to integrate trying to teach Bayesian methods alongside the frequentist methods. And it was a bit of a struggle at first to get it all to work together. But you know, I think I've gotten a message now that works pretty well for the students. And I think the interesting thing about some of the output from the Bayesian statistical tools that people are writing into are is they give you some direct information both about the effect size and the certainty around that effect size. And those things I think have the potential in the future to be able to improve communication between people who manage and analyze data and people who need to make decisions based on that data. I think the Bayesian output gives people more of a direct sense of what they're working with. And you don't ever talk about whether you've rejected the null. - Yeah, I've long been a Bayesian guy myself. And I think you hit on one of the key points that I try and hammer home when I talk to people. The end of the day, the data is there to inform some decision. So it has to be communicated and a clear and concise way to be able to help someone make whatever choice they need to make. - Yeah, I totally agree. - It leads pretty well into another section I really liked in the book where you put forward these kind of three cautions I thought we could explore one by one. The first is the more complex our data are, the more difficult it will be to ensure that the data are clean. And when I think about my school days, if a data set had to be printed in a textbook, it obviously needed to fit on one or two pages. Or even when you had a course example, data sets tended to be small and very well groomed. And then I had this big surprise when I got out into the real world and found that data was not so clean. So maybe could you share your definition of what data cleanliness is? And perhaps where the responsibility falls between a data scientist cleaning it up and perhaps an engineer who's working on the data collection phase, having the responsibility to get it into a system in a more groomed way. - Yeah, I have a little story around that. I wrote a chapter a couple of years ago on data mining. And I was looking around for a data set that I could use that would be of suitable size and also would have some interesting information in it that I could explore and could use to elucidate some of the data mining techniques for beginners. And I ended up settling on the American Time Use Survey, which is a really huge data set that's been collected over many, many years. It's in a form that's not sort of a traditional rectangular data set. And so I spent quite a while struggling with just trying to reform the data so that I would be able to do any kind of initial screening analysis on it to get the data mining going. And finally, after searching around on Google for a while, I found this tool that some people had written to organize the American Time Use data so that you could request a certain time period, you could request a certain set of variables. And this tool would actually create a nice rectangular data set for you. And then I thought, all my problems are solved. And in some senses, some of the most difficult problems were solved because that tool, which was extremely useful, and I'm sure has been useful to many, many researchers, really did save a huge amount of time. So I downloaded the data from that tool. And of course, it's in a nice rectangular form that you can recognize with variable names at the top and so forth. And I thought, oh, I'm all set now. This is gonna be a piece of cake. And I started to do some of the initial screening that you would do in order to set up data mining or to figure out what things you wanna include in the set of predictors and so forth. And I found that there were still literally days and days of work just to figure out things like coding. So this is a typical kind of survey data set where there are half a, for any variable, there are half a dozen ways that missing data are coded depending upon why they're missing. And so you had to go through and do all of that. And then you had to go through and bin everything. And then you had to go through and then you had to go through and over and over again. And by the time I was done and actually ready to run a data mining analysis on these data, my chapter was overdue by at least a month, which is typical for me. So I realized and I wrote this into the chapter that really 80% or more of the process of making some data useful in the analytical sense is really just shaping it up, getting it to the point of being able to run a sensible analysis on it where you don't have a bunch of anomalies that are caused by things that you haven't even yet encountered or thought about. So that was a great object lesson for me 'cause there have been a long time since I had analyzed someone else's data. When I collect my own data, I'm careful at every stage, sure keep things exactly the way I want. But that's not a luxury that most data scientists have. You have stuff that's handed to you. You have systems, old archival systems that are handed to you. You don't get much choice and you need to be able to condition the data. And that is a time consuming and long process. And we're thankful to the engineers for any tools that they can give us that make the job a little bit easier. - Yeah, I've sometimes heard it described as the dirty secret of data science that we spend 90% of our time cleaning our data and 10% doing the really fun and interesting stuff. - I think that's a really astute observation. And really, in some sense, the frontier in terms of a tool that will really transform the field would be something that has enough intelligence built into it that it can do some of those jobs automatically. The thing that I was thinking about with that problem that I had with all of the different codes that had gotten entered into the American Tom use survey for different types of missing data. Boy, it would have been brilliant if there was a tool that would go through and automatically recode those things according to some set of values or guidelines that I would give it. That would have been such a huge savings. - Yeah, I see the tool sets getting closer and closer. And I'm still waiting for this sort of like master thing that I can just give it dirty data. And it kind of has a few heuristics that know how to help me. Maybe we'll see something come out one day. - It's called the junior analyst. (laughing) - Yeah, I think you're exactly right. So going back to the cautions you were, I was mentioning from your book, I want to paraphrase the second one. It's that rare and unusual events or patterns are almost always by their nature, highly unpredictable. Even with the best data we can imagine and plenty of variables, we will almost always have a lot of trouble accurately enumerating all of the causes of an event. And I think that's incredibly astute about a lot of the data sets I've worked with. I've noticed that really optimistic people are willing to kind of see data science is magic, which is it's both nice that they have confidence, but also terrifying and that the expectations can kind of be unreasonably high. You'd like to predict those rare and outlier events, right? We'd all like to be able to predict which stock is gonna go up by 10x tomorrow. But those are the most fleetingly hard data points to really capture and model out. In your opinion, what can a good data scientist do to set the right expectations for people? - Yeah, that's definitely a hard one. The assumption of the magic is, I think, a very difficult set of beliefs to overcome. What data scientists can do in order to communicate this issue is to talk about some of the really hard problems that we have historically tried to deal with. And the one that I always pull out from my students is trying to predict earthquakes. And there have been cracks, you know, crack plots and cranks over, you know, hundreds of years who have said, "Oh, I've got the formula. "I know how to predict an earthquake." And over and over again, of course, those have been proven false. And, you know, have no merits. We still don't really understand how to predict an earthquake. And there's so many factors at work that determine when a fault, you know, slips. The hope of being able to find some simple method of deploying sensors and running an algorithm and being able to anticipate an earthquake. It's still very far away for us if we could ever achieve it. And so that's a kind of analogy that a lot of people can relate to. It's an everyday thing. It's very physical. And people have, you know, often lived that kind of history. And certainly anybody that lives in California has probably experienced an earthquake. And they know how hard it is and the amount of expertise that's been thrown at it to no avail. There's some really nice thinking about the base rate problem in a book from GigaRensor. You know, GigaRensor? - I don't. - There's calculated risks is the one that I'm reading through right now, or I should say rereading in order to explain these issues to my students. And calculated risks talks a lot about the base rate problem and uses an example, a kind of lead off example of mammograms for detecting breast cancer. It is very elegant in terms of showing how few people really understand the nature of a test and the accuracy of a test and particularly how well a test functions in the face of a low base rate problem. And so if you can get a look at that book and take it out of the library or, you know, get a copy on your Kindle, it's really quite, quite amazing. And I put some of these problems in front of my students, and some of them who are well-versed in Bayesian thinking, we're able to get the idea pretty quickly because they're used to thinking about prior probabilities and posterior probabilities. But the 70% of my students were like totally blown away by the idea that, you know, the great majority of people who would undergo a test like a mammogram and get a positive result, in fact, do not have the disease. And this is a common issue with Bayesian, it's a common base rate problem. That's the kind of thing that you can also illustrate pretty easily to a manager or to someone that you're trying to communicate how a statistical process will work because they will be sensitized then to how accurate any sort of prediction needs to be in order to be useful. - Yeah, I think that's a great suggestion. The third point I want to touch on, and I'll paraphrase again, with every linkage to a new data set, we are also increasing the complexity of the data and the likelihood of dirty data. - Yeah, data warehouses are great because you have your friends in IT who are madly saving up every snapshot of every system that they, you know, it can possibly get their hands on. And so you build up this massive data warehouse with great old archival data in it and all kinds of recent stuff. And then you begin to try to link the things together, you build a history of personnel files, for instance, and you find that there are so many problems in making it work because there are inaccuracies, there are anomalies, things are coded differently. You know, the further back you go in history, the more of a mess you get. And yet those historical data are really valuable if you had a way of bringing them in. And so there's a constant tension between wanting to be able to get access to the widest range of pieces of information that could be brought together to inform a decision. And yet the wider you cast your net, the more Glock you're gonna get when you pull that net back in, that is a harder problem for most people to conceptualize. And until they've lived it, until they have gone in and tried to create a complex data table that has linkages in it and brings in a lot of historical data that might have anomalies. Until they've done that and struggled through it, it's really hard for people to believe how hard it is. - Yeah, absolutely. You know, I was glad you brought up the earthquake prediction. I think that's a really great example. It reminds me of another study I saw that I'm actually hoping to get someone from this team to come on the podcast. A team from the US Geological Survey looked at Twitter data. Now, they're certainly not predicting earthquakes, but sort of host predicting, I guess, after an earthquake has occurred, saying can we find out via Twitter whether or not an earthquake has occurred? That system is nowhere near as precise as a widely cast expensive seismographic installation. But it's kind of novel in that two guys with a little enthusiasm and laptops can come up with a system that does that, have some reasonable good way of predicting accuracy for whether an earthquake has just happened. - Not so much predicting an earthquake, but localizing where the epicenter is. I think that's what I've seen them try to do with the geocoded geotagged tweets. And I think more generally, there is a possibility in there for emergency response and emergency management to the extent that people can tweet out that they are experiencing something, whether it's a flood or a fire or whatever. And emergency response personnel and other involved folks can use those data in order to zero in on something that's happening. It's in some sense is crowdsourcing the idea of having people on the scene that can inform a response to an event. I have a doctoral student who's writing about this right now who's trying to create a dissertation based on using information and communication technology tools to help with emergency response in a sort of crowdsourcing mode. So I think that's a pretty powerful idea that is certainly enabled by the use of Twitter. It's funny though that you mentioned this because the inclusion of the Twitter chapters is actually one of my great regrets about this book because the thing that I was not really sensitized to at the time that I was writing it is how delicate the APIs are to all of the contemporary services, it's not just Twitter. So I won't single them out for harsh treatment, but basically everyone who's producing an API with a data stream that can come from it is constantly tinkering and tweaking it. And also I think they're fighting off people who are abusing the APIs and so they're changing things in order to throttle the flow and so forth. Putting this Twitter API in the book has cost me more support time than you could possibly imagine. I have people writing to me from 100 countries all over the world saying, you know, I can't get this piece of code to work. Can you help me out with this? And of course I didn't even write the API interface that was written by another fellow who I think I gave credit to in the book. But it really is a delicate thing. And while we're in sort of this handicraft mode of having to connect R for that matter, any other analytical tool directly to an API with little squirrely bits of code, I think we're gonna keep running into this over and over again. Somebody who could create a master API interface, you know, something that would just plug in like the plug plugs into the wall would be able to make a million bucks from it, I think. It's harder than one might think to really get a consistent and reliable connection into a data service like Twitter. Although as you've pointed out, the rewards can be pretty worthwhile 'cause the data can be quite interesting to play with. And the emergency response application that you mentioned is really one that I think has enormous potential. - Yeah, I'm very eager to see where that goes over the next couple of years. So I think a great advice I will have and will continue to give to people interested in getting their feet wet in data science is to check out your book, An Introduction to Data Science and see about that MOOC wherever it evolves into in the future. Do you have any other advice for someone a little bit green looking to jump right in? - We touched on the MOOC that pairs up with the book that I produced. And I think that that's a great idea more generally. There are so many open access resources available now for people who wanna learn in this domain. Our friends at edX and some of the other companies have made wonderful courses available that sometimes you can get access to for free. And I think all of those things are something to be taken advantage of. There's the potential as we mentioned at the very beginning of this to learn a lot on the job. So if you have a, if you work in a place where people are trying to solve these problems and you can sort of get an apprenticeship, that's I think a pretty neat way to learn. And just as the, I think some people have said that the majority of people who serve as software engineers really never studied computer science as such. I think in some ways the same is true for data science that at least in the present day, the majority of people that are doing data science, they weren't trained as statisticians necessarily. And so there's a lot that you can contribute even if you don't have that hardcore statistical background. There's lots of opportunities, I think, to use the emerging tools to do some quite reasonable things. And it's a great way to learn on the job. - Yeah, absolutely. Lastly, I like to ask my guests to provide two recommendations. These can be anything, you know, a book on our package, a paper or whatever the case. First, I like to ask the benevolent recommendation, something you have no affiliation to, but appreciate and think listeners would benefit from learning about. And secondly, the self-serving recommendation, which should be something that directly benefits you, hopefully, for exposure on the show. - One of the base packages that I've been playing with, and I've been playing with two a lot lately and trying to teach them to my students, one is best, it's called B-E-S-T, all caps, and it's a replacement for the t-test. And it produces some output, which I think really helps people to understand the nature of the Bayesian approach to analysis. So if you can find that best stands for Bayesian estimation supersedes the t-tests, it's by a guy named John Krushki, K-R-U-S-C-H-K-E. And it's got, the output's really neat, and I think it can help people understand what the whole point of the Bayes approach is. And of course, the t-tests is a very basic thing, in any way that's where a lot of people start in terms of trying to understand statistics, so I recommend the best package for that. - I have to check that one out myself, I'm not familiar with it, but it sounds really exciting. - So my self-serving recommendation is that, I hope people will check out the iSchool movement. It's an international group of schools, there's, I think, about 60 schools right now, calling themselves information schools or iSchools. Syracuse is one of them and one of the original ones, but there are iSchools all over the US and many different countries now. And we take an interdisciplinary approach to the study of information, and that includes a lot of elements of IT, but it also includes areas like data science, and it's a really exciting field right now. It's a great place to be, there's a lot of good stuff happening there. So if you go to iSchools.org, you can find lots of neat information about the iSchools and where they're at and what they do, and check it out, I think it's pretty neat. For anyone who's interested in the area of data science, an iSchool might be a great place to do some studying. - Excellent, yeah, great recommendation. And can people follow you on Twitter? - They can, although I don't tweet a lot, you know, I'm not a prolific tweeter. I tend to send things out when I think that they're really important and no one else is paying attention to them. So that might be once every month or so. - You're much more likely to find me on LinkedIn. You know, send me a link invitation. I blog on LinkedIn from time to time, and that's probably the best way to get in touch with me. - Sounds great. Well, thank you so much for joining me, Jeff. This has been a great conversation. I'm sure my listeners are gonna enjoy. - Good, thank you Kyle, appreciate it. - Take care. (upbeat music) .