Archive FM

Data Skeptic

Proposing Annoyance Mining

Duration:
30m
Broadcast on:
09 Jun 2015
Audio Format:
other

A recent episode of the Skeptics Guide to the Universe included a slight rant by Dr. Novella and the rouges about a shortcoming in operating systems.  This episode explores why such a (seemingly obvious) flaw might make sense from an engineering perspective, and how data science might be the solution.

In this solo episode, Kyle proposes the concept of "annoyance mining" - the idea that with proper logging and enough feedback, data scientists could be provided the right dataset from which they can detect flaws and annoyances in software and other systems and automatically detect potential bugs, flaws, and improvements which could make those systems better.

As system complexity grows, it seems that an abstraction like this might be required in order to keep maintaining an effective development cycle.  This episode is a bit of a soap box for Kyle as he explores why and how we might track an appropriate amount of data to be able to make better software and systems more suited for the users.

[music] The Data Skeptic Podcast is a weekly show featuring conversations about skepticism, critical thinking and data science. Welcome to a mid-week episode of the Data Skeptic Podcast. I'm going to break from my traditional format of interviews in many episodes and do something that's more or less sort of an opinion piece or a response, if you will. If you found the Data Skeptic Podcast because you enjoy scientific skepticism, you must already know about the Skeptic Sky to the Universe. If you found it because you like data science, you should check out the SGU if you're not already familiar with them. It's part of my regular Saturday morning routine. On a recent episode, there wasn't really a topic, it was sort of a footnote and some banter, but something I wanted to respond to. Let's play a quick clip. You guys have had this experience where you tell the computer to do something like copy files or whatever. You know it's going to take an hour or two. It runs into a problem. Do you really want to move this file or whatever? It stops the whole thing. It stops the whole thing. If it's not smart enough, they haven't figured out. Just put that aside, complete the task that you're doing. At the end of it, you can ask me, "Hey, did you want to do this one file?" I put it aside for you. That doesn't seem so amazing that a computer shouldn't be able to do it. But I just don't understand. So here, Dr. Novella is expressing his frustrations with a seemingly obvious improvement that should have been done to operating systems a long time ago. First and foremost, I overwhelmingly agree with his sentiments. Things like this are frustrating and super annoying. And I hope that maybe some software people out there or some product designers will hear this and think a little bit harder in the future about user interface and getting feedback. And that's ultimately where I want to bring this back around to is a data topic. But if you'll indulge me, I'm going to go on to sort of a software commentary because more or less, that's what this is on the surface. And I want to share my perspective on this particular topic. So this is going to start off seeming off topic, but I promise you, I'm driving towards a point and I hope you'll humor me as I go down the rabbit hole and come back up. And I want to first kind of give you a crash course in computer science and what it is to program and to write software. So I guess at the most fundamental level, coding starts with logic gates. An electrical engineer or material scientist might want to take this a little deeper, but I'm going to start where I think computer science starts, which is at the logic gate. Now, generally, and this is a little hand-away to you, but there are three logic gates. There is the and, the or and the not. Everything in a computer runs on a binary system, which is expressed as high and low voltages. So the and operation, if two inputs are both high or true, it will output true. In other words, true and true is true. True and false is false. The or comes up true if one or both of them is true. So the only time the or is false is if you get false, false. And the not is simply the inversion. Now, believe it or not, those simple little operations, or actually the not and operation is fundamental enough, allows you to do anything mathematically you would want to do. Anything algorithmic or procedural comes from this just very simple operation. But, of course, we can't really think about data or program or anything like that in such simple concepts. It would be mind-numbing to build that up. It would be like explaining World War II in the context of, you know, the positions of electrons or something like that. It's just too fundamental. I'm actually kind of reminded of a short story by Isaac Asimov called The Feeling of Power. You should definitely go read it if you haven't. And it's sort of this futuristic tale in which there's this mathematician who starts reinventing basic arithmetic procedures like long division. And these things have long been forgotten because we just have computers to do them for us. And I kind of want to go down such a diversion and talk about computer science from the ground up. So we have logical gates. What do you do with those? Well, the next thing you want to do is build operations around them. You want to solve problems with logic gates. And the first major advancement in computer science was saying, wait a minute, we don't want to build specialized machines. Let's have a programmable computer, something that captures the ideas but can run many programs. And as trivial as that sounds to us today, that was an incredible innovation. So you start by building a CPU, a central processing unit, and it's built out of logic gates and it has other features like memory. I skipped one, so logic gates can use a feedback mechanism to build something called a flip-flop. And that's the sort of core unit of memory. That's what allows you to capture and store a value and set it and reset it so that you can kind of have a state to your system. So a CPU made up of all these logic gates has to give you some opportunity to program it. And at its most raw level, this programming is done with something called assembly programming language. Now assembly programming language, once you can get past the learning curve, is super fun. You all should really try it, it's awesome. But it will numb your brain as you start on it because writing something in assembly programming sounds a bit like this. The commands go something like get the value stored in memory address X, move it to register Y, get the value stored in memory address A, move it to register Z, add the values of registers Y and Z, and store them back in memory address C. And that, you know, add in an item is what a program looks like on a central processing unit. Now naturally, we would never be able to write useful programs if we had to write code like that. Most people write very little assembly programming these days. You just kind of maybe go in and optimize or do, especially like for graphic routines, you optimize very specific use cases. We quickly realized, we being, you know, engineers, human beings, that there was no way this would scale. And that's when we invented programming languages and compilers. A compiler is something that takes a piece of software, something written in a little bit more readable code and converts it into that assembly language programming. That way, you don't have to write an assembly. You can write in this slightly higher level language, which will then be converted to assembly for you, which the CPU understands. The very first programming languages were, and this isn't a historical thing about programming, so I'm sure I'm going to miss some important stuff. But I think of like Fortran and C. You know, C is really one of the earliest and most important milestone languages. And C was a huge innovation. It allowed people to be programmers, to write cool code, to share their code, to be reproducible. But it also was pretty complex. And one of the major problems with C was around memory management. Programmers had a hard time doing a good job of dynamically allocating memory. It's hard to explain this without being a programmer, but basically you forget what you've written or forget to clear memory and then memory fills up with junk you haven't released. And eventually that kind of resolves into a memory leak. So I'm hand-waving a lot here, but to get around the difficulties of the very raw and native programming language of C, we started invent higher level programming languages. Things like Java or Python or even, I guess, in certain ways, basic. And there's dozens if not hundreds of programming languages today, so I won't get into why they all exist. But each one of them is, at least in theory, created by their designer in order to facilitate some process, to make programming easier, and then to reduce down into these simpler sets. So I'd like you guys to start thinking of this like an onion. At the very core, we've got the logic gates. On top of that, we've got the CPU and it's assembly language programming. On top of that, we've got the lower level languages. On top of that, we've got these higher level languages. We'd even say certain languages like Java sit even on top of that. I was talking a moment ago about the problems with memory management. So a lot of modern languages, their way around that is just to say, let's not trust the programmer to do a good job with that. Let's manage it for them. Let's watch their variables and try and notice when they've stopped using them. And then let's just throw that memory back into the heap and we can use that for other things. And in fact, Java as a language might one day die, but something that's underneath it, the Java virtual machine upon which other languages like Scala have been developed, can live on. So Java compiles to bytecode, bytecode compiles to assembly code, more or less. So you have yet another layer of onion as we virtualize and emulate the machine that code's going to be running on. And these onions, much like Isaac Asimov's original vision, are probably going to continue to build on top of one another. We even have languages that sit on top of languages or Java script which can run on the Java script engine that runs inside your browser is then compiled down to something else. And all of programming is kind of all these layers that try and encapsulate a lot of the more difficult operations or time consuming our hard-to-right code so that programmers are able to write more high level things and be quicker and more versatile and robust in what they build. So part of that is building libraries and frameworks, things that offer you facilities that you can use. So an important step here is the operating system. A computer on its inside has all these complex parts like the I/O bus and the hard drive and the memory and the cache. And you really don't want to have to know about those if you're a programmer. If you're making a website, you don't have any idea about how many registers are on the CPU. You just want to trust that that's all handled for you. One of the important layers of encapsulation is the operating system. It reveals certain functionality to you like if you want to create a file. You don't actually sort of create that file as a programmer. You ask the operating system to create a file on behalf of you. In other words, give me some sort of object that represents where the file is going to be. And then as you push things to that handler, it's all going to get pushed to disk. And the operating system is going to handle all the messiness. It doesn't go on a different cylinder, the hard drive, all these sorts of things. All you have to do is know how the operating system exposes that functionality. And this is getting into the key of what I think might be why problems like the one Dr. Novella is complaining about exist. At every step, engineers are trying to build these layers to make it easy and a robust for the people that come after them to utilize the services they're exposing. But of course, that's never going to be perfect. There's always some corner case they're going to overlook, or maybe they just did a bad job as well. That's always a possibility. So while I don't know the specific problem we're talking about there, and maybe I should review because I just put a short clip in, the real problem is you want to copy a large number of files that might take a couple hours. You set them to copy, you walk away from your computer, hoping to come back later and have them done. Come back a couple hours later to find that there was some condition, some race condition on maybe the third file where the operating system says, "Oh, you already have that file in this location. You want to override it, skip it," or whatever. And rather than finishing up all the rest of the operation, and then coming back and saying, "By the way, here's a few outliers, can you tell me what to do with these?" The operating system tends to lock up and wait and delay the entire process, totally screwing up your streamlining, rather than doing some nice pipelining type operations for you. So why would it do that? Let's first explore if that's actually the proper way to do this. And for that, I want to, for a moment, entertain how databases work. Database, one of the important features, at least of traditional databases, is that they're transactional, meaning you can execute a series of events or commands or changes that are sort of all in concert with one another. And if something doesn't quite work out, you can roll the whole thing back to the original starting point. You can also ensure that no one else comes in and does anything else while you're doing your operation. So, let's think of like an ATM. If someone goes to an ATM and withdraws money, you don't want the, at the same exact sort of millisecond, their credit card to be used by a merchant online, or I guess in this case their debit card, to take out an equivalent amount of money and put that account into the negative. You want to always, let's assume anyway, the bank will deny any transactions that put you into the negative. You want to transactionally ensure that any operation you want to do is completely isolated, and no one else gets to take money out at the same time from that account. So, once you're satisfied that yes, there are at least $20 balance in the account, they withdraws for $20, that's all approved and done, then you commit your transaction, and the next events against that account can begin, but nothing anything simultaneous. Now, file systems are not transactional, so there really is no direct excuse for this situation, except one could speculate that an operating system product designer or engineer decided that this was the right thing, that maybe that confusing file that's a duplicate was good cause for pause, that you would want to wait there and say, "Hey, maybe everything after this should be in consideration of this file." Now, I actually disagree with that, I think this is strictly an oversight, but it's worth noting that could be a deliberate decision on someone's part, and that gets me to sort of another soapbox type argument here. There are a lot of cases where a product manager, someone who's kind of not an engineer, but overseeing the user experience, user interface, features and functionality, things like that, will come in and demand something of an engineer without asking the difficulty level or to scope it or to understand what it will take to re-engineer a system to do something. This sounds like a simple task, a simple request, why can't you just delay these outlier cases to the end? And as simple as it sounds on paper, that might be due to restrictions in the APIs that are made available by the operating system or other libraries or functions. It may not be possible for the system to do that, or it could be that the operating system itself offers this batch copy operation, and that the writer of the application or the folder system is just taking advantage of code written by someone else, so they can't directly necessarily go in and fix it without reinventing the wheel. You have this peculiar division of a labor problem that can exist. Now that's not an excuse, it's just an observation. So I would assert that we have to entertain is this laziness on the part of a company or a product engineer deciding what's worth time and investment to fix or architect or build more robust interfaces, or if these are major setbacks that have some limitation due to earlier design or some deliberate functional design. Tough choices, but by and large, I expect if you're really delved into why this is like this, you get into this finger pointing game. I'm reminded of the quote, perhaps not true quote, but it's a good one nonetheless of when von Braun was asked with reference to the Apollo moon mission. Some reporter, I guess, was interviewing him and said, "How many people do you have to basically put together all the planning and all the engineering feats to put men on the moon?" And he said, "Well, we had, you know, we added it up, tabulated all the different roles." And he said, "Well, I guess we had about 100 people to which the reporter inquired." "Well, man, I wonder what you could have accomplished if you had 200 people?" And von Braun said, "Oh, if we'd had 200 people, we could not have put a man on the moon." And I think that's an incredibly astute point. Bigger's not always better. You need the right amount of people on the right problem, but you also have to trust that things are encapsulated well, and the functionality you need is provided you from elsewhere. And that's not always true. You know, while we're on the topic of space, to my understanding, most space vehicles, satellites, whatnot, actually run on significantly out-of-date hardware. I think I heard someone say that maybe it was James Webb telescope is going to have Pentium 4 processors in it. Don't quote me on that. I could be totally wrong. But I am correct in the fact that we send up really old hardware on a lot of these things, which sounds crazy, right? Why would we do that? Well, it's actually very logical. The latest and greatest processors haven't really been fully vetted. We don't know what bugs they could have, or weird corner cases they break down after a few years. The older technology is proven. We know it's false, even when it has them, and how to work around them. We know it's strengths and weaknesses. So there's a more reliability aspect to them. And if we're going to send up hardware and $100 million billion operation, we have to be sure that the technology is not going to fail us, and simpler often is better. Now, there's another piece to what Dr. Novell was talking about, and specifically, and I don't know the details, but he said, the specialized piece of medical office software that they use is horrendously bad, and horrendous is my paraphrasing. And this unfortunately makes sense to me. I think this is really a problem of economics, unfortunately. You know, if there was something wrong with a browser, like if I like Chrome a lot, if Chrome started to get buggy and they added features I didn't like, which by the way they sometimes do, I have other browsers I can try, and realistically, I could maybe make my own browser if I needed to, or certainly a small startup could come into the market and say, "All browsers suck, we're going to make the better one." It's much more difficult for someone to come in and say, "We're going to develop new, highly specialized medical software, and we're going to develop the relationships with hospital staff and decision makers to get these things rolled out." This is an expensive company to start and a difficult and uphill one, and I think the return on investment simply isn't there. So you're in this unfortunate position where there may not be a lot of competition, and competition is good, and this can cause software to atrophy and to be not up to speed in what it should be if you're making high profit margins and you have no competition. But in their defense, it could also be that there's this fear of change, and fear of change is sometimes good going back to why we send old CPUs up into space. Reliability is important, and as they said on the show, Skype made some interface changes that really worsened the software. I find myself really annoyed a lot when unnecessary change for change's sake is made to software that I use, and I have no way to roll back to old versions. I don't know who originally said this, but there's another great quote that "If you improve a product long enough, you'll eventually break it, and that's so true in software." So without a lot of competition, there's no pressure to innovate, and that definitely can cause things to stagnate, but there's two sides to that coin. So let's start to make this a data discussion, this being the data skeptic podcast. My first point that I hope came across is that the whole structure, the whole technology stack of software is built as this big onion where people are relying on all the layers below them because you can't possibly make everything from scratch. You know, Carl Sagan said it best. If you want to make an apple pie from scratch, you must first invent the universe, and no one can really invent the universe, let alone invent top to bottom how computer works. So you have to rely on the facilities that are there, and sometimes good APIs exist, and good choices were made, good design choices, and they facilitate the production of good software. Other times you're stuck, but how do we fix that? How do we get that resolved? How do we let the designers know that there are problems and improvements that need to be made? So first of all, people need to be proactive, and this is entirely the domain of A/B testing. If you're developing software or libraries or interfaces of any kind, well, it's easier if you're the top-end software, if you're building a website and you're not doing A/B testing, you're doing something wrong because it's super easy, there are plenty of tools to help you do that. If you're going to develop a library that exposes maybe functionality in a GPU, your users are other developers who may or may not want to participate in your beta test, so you've got to go out and evangelize that product, A/B testing is harder. But simulations are available. You can try existing software and see how its performance improves or degrades under the improvements you're going to make. But in that case, the lower-level people, the operating system designers and library writers, they need to collaborate with people who most consume their services and understand their needs and feedback, and I don't have any great suggestions on how you build that system aside from establishing those relationships. But for application writers, I want to talk for a moment about application logging, especially in the context of data science. Logging is critically important because it's how you find things out. If you're developing an application that has tens of thousands or maybe millions of users, you need analytics on how they're using it, how it's performing, and just basic analytics like how much CPU is running and how would the response time of pages exist, that's simply not enough. You need to find out, like, can people find things in your site? You know, my bank recently revamped their site, and honestly, I wanted to change banks. The redesign was so awful, I couldn't find anything. All I want to do is transfer money between accounts. I'm trying to give you my business. Why are you making it harder for me? You know, needless change for change's sake. Application logging can give a developer the opportunity to see where people are struggling. If the operation of transferring money from checking to savings used to take 30 seconds, that should be shorter, I would hope. And you find after a major release, that's now taking three minutes. Ask yourself, did you want to make that take longer or did you break something? Or, you know, is this a trade-off that you have to accept? But that's tricky, too, because application logging tends to track more atomic operations. How long did a page load take? How long did a particular post request take? That's not exactly descriptive of how users experience a web application. You know, like, I just described, I wanted to transfer money from checking to savings. That includes a couple of steps, like going and look at my balance, going to the transfer page, selecting accounts, engaging the transfer, and then approving it and all that. And this is kind of where data scientists can start to come in. The software engineer isn't necessarily responsible for thinking like that. They're going to think about the atomic operations and the API requests and the restful calls that are made. But a data scientist can define the sort of higher-level operations that a user's doing. So, yes, you logged in and sent an email or a message to someone within a system. But is that a message to a friend or a new request? Or is that some sort of sales solicitation? These are abstractions that can be defined probabilistically and are useful for studying a system. Now, I saw a great talk at the last Hadoop in San Jose, where someone from LinkedIn was talking about how they prevent and detect fraud and a great talk. I think it's online if anyone wants to go watch that. But one of the things he was talking about was, you know, how do we know if an account has been compromised and what's detrimental? So, going into an account and maybe adding new context is not necessarily the worst thing someone's going to do. But sending out way more requests than usual and spamming a network, that could be the sign of a compromised account. It can also be the sign of someone who just lost their job and is hitting up their network for contacts and leads and stuff. So, you need to look a step deeper. It's not just a matter of how many messages is someone sending on LinkedIn that's a sign of spam. It's the nature of those messages and what's contained in them that's really a signal for a data scientist. Now, application logging is tricky. Generally, and I like to think about Log4J that I have the most experience with, where you set logging levels. So, in most production systems, the version of the software that you and I as consumers are interacting with, you want to set this at the highest level, like error level only. You only want to log actual, you know, exceptions and major problems with the software. Because if you logged every little change or operation, you would produce a gigabyte of logs a second potentially if you have a lot of traffic. But there are other levels of logging, you know, like debugging level that is probably, I guess, I think the deepest, which you would run when you're developing software that would report every little nuance of what the code's doing. And that can be used for analyzing where things went wrong up to info debugging. So, there are all these levels of logging that can be done and there's a trade-off because the more you log, the bigger your logs get, the harder they are to search and store and archive and all that. But when I talk to companies, I encourage them to establish good archival policies for how they do logging. Logging is way too often thought of as this after effect, like an engineer just does it during their debugging steps to help themselves out. And they'll really nearly turn it on or off or change it with releases. That's so detrimental because logs are a wealth of information for a data scientist to come along and mine and aggregate information that was never thought to be captured in analytics platform. So, I would recommend to everyone pick one web server or maybe one hour of the day or some way that you can flip on really deep logging in quick bursts and capture deep information. And do that longitudinally and commit to saving that stuff and maybe archiving it over time. Because that is a gold mine for a data scientist to come back later and ask questions that allows them to look back in time six months. The worst thing that can happen for a data scientist is to come in, get a really cool problem that they're excited to tackle, and then be told, "Oh, that information you need, yeah, we've never tracked that." So, you're going to need to wait until we slate it into development, develop the tracking of that feature, and then release it. And in two months you can have information for this problem we are hoping you would kick off now. So, I would encourage CTOs and CIOs and decision makers of that kind. Ask your data scientist to get you a list of everything you might want to store. And this is kind of potentially the secret weapon to how we can fix problems like the one they were talking about on the SGU. If those things are available, we can go and do data mining, we can find those problems, we can see them, we can determine which releases cause annoyances. And that gets me into the topic I really want to define here, and I'd like to try and attempt to coin a phrase. I don't know that I originally coined this, I did some brief googling for all of two or three seconds and didn't quite find what I'm going to describe, but the phrase I want to define in coin is annoyance mining, and I hope that this can become a standard process in software development. What I mean by annoyance mining is giving data scientists the access to logs that track lots of user interactions with systems, and task them with looking for things that are making users displeased and uncomfortable with the software. Find operations that users are trying to do that are just running too slow or slowing down over time. There's a huge opportunity for data scientists to give a feedback loop to product developers to tell them how things could be improved or how things were screwed up. That's the worst, right? Improvements are great. But when there's something that works and then gets worse, that's when all of us as users want to flee that system. And you don't always have perception into that because there's no way to write a unit test about how long does it take someone to transfer between two accounts at a bank? I mean, I guess you could use things like Selenium to test stuff like that, but there's just not the right formalism the way there is for requirements checking and unit testing and things like that in software. I think there should be a formal field and at a big enough company and important enough web application or even regular plain old software application that someone is tasked with doing annoyance mining and looking for the changes people didn't know were there and trying to detect them and report on them. Now, I want to take annoyance mining to even a secondary level and reach out to the developers of operating systems, Android, iOS, Windows, and say, can we find some way to put a button on a keyboard, a big red button with an unhappy face that users can press when they're supremely annoyed? When something's going wrong with my machine, I want a button I can press that is just going to tell people that I'm not happy with what's going on. Now, yeah, it would be great if we all went to websites, filled out contact forms and explained exactly what we were doing and reproducible conditions that brought us there, but that's super unrealistic. And it's not going to scale. No one's going to do that except for a couple squeaky wheels. But if we all had a button that was getting logged somehow, that just let the developers and designers know that we're in a situation we don't like. At large enough scales, that's a perfect feedback loop for data scientists to go and do some mining and say, okay, we're detecting all these annoyance reports. When are they consistently happening? What are they correlated with? What are the situations and features that tell us someone's getting annoyed? This could be massively informative to automating and closing the gap of why are people fleeing sites. You know what I do when I'm working on a system that I'm annoyed with? I look for an alternative and I abandon the first solution. I would love the opportunity to give people feedback. Plus, let's admit, there's something psychological. If there was a button that was big red and had an unhappy face, I would at least feel like I was accomplishing something when my computer's not working the way I want it to. And, you know, we're going to be cute about it and say that we're giving my electric shocks to developers who are responsible whenever you hit that button or something. But this is a potentially great way to solicit really highly decentralized feedback mechanisms and to provide the right data sets for data scientists to come in and find application problems. Detect them early, you know, to within an hour of a software release, recommend a rollback because there are certain key operations that people are annoyed with and you broke in fundamental aspects you didn't even think to account for. So, something kind of up, I really believe that the problem Dr. Novell is describing is probably unknown to most operating system developers because how do they get that feedback? I mean, we all use it, but how do we reach out and complain about that? And if we want those people to do a better job at giving us better software and systems, there needs to be a feedback mechanism. And I hope annoyance mining can take off as the way that we do that. You know, I'm reminded every so often of something my father likes to say. He doesn't know what to call it quality assurance or QA, but his basic premise is people, companies who have a product or service, they're going to release pieces of software, whatever. They should send him the software, the device, and let him fit with it and break it and he'll show them all the ways it doesn't work because he's highly specialized at breaking things. And, well, I think actually he does have a lot of potential to take that on as a career. It doesn't scale to pay on a salary or pay someone else a salary to do that, but I think people would freely available to give their feedback in the annoyance button. And with the right amount of logging to track not only that, but to track, you know, in bursts, very deep details at the application level logging about atomically what users are doing and what kind of response times they're getting and what sort of operations they're executing. Could be massive data sets for mining and giving us the right feedback us being, you know, software developers and system developers to improve the tools we make. This is truly a big data problem and one that's worthy of our time. So that's my soapbox for today. I very much agree with the complaints Dr. Novella was opining about. I sympathize because I've been in exactly that same position a lot, so I wanted to share a few thoughts. Maybe, just maybe, there's an engineer listening who's responsible for that little sliver of the operating system and they can go in there and fix that for us, whether we're Windows, iOS, Android, whatever users. Or if not, maybe my suggestions about annoyance mining could perhaps take off and will provide that feedback loop to that person. That's it for today. Thanks for tuning in to the Data Skeptic podcast.