(upbeat music) - So welcome back to another episode of the Data Skeptic Podcast. I'm here today with my guest, Merrick Sine. How are you, Merrick? - Doing well, thank you. - Thanks so much for joining me. So you're currently at Zest Finance, and we're gonna get into a little bit about what's going on over there, but I thought we could start with a brief introduction and tell me your background academically and professionally and how you got interested in what you're working on today. - Sure, definitely. So academically, PhD statistics, M.A. stats, M.A.E. con, and you know, obviously academic training is in data science and predictive analytics. And professionally, I've been in industry over five plus years within the financial services industries, stint at Bank of America, stint at Western Asset Management, and Zest found me, luckily, and it was just a perfect match. Found out what they were up to and what they were doing, and it was exactly up my skill set, and just a really fun project to be working on here. - Yeah, it seems like the sectors you've been in particular have undergoing a lot of change in the time since you started, and what's your perspective on that? - Yeah, absolutely. I mean, just, you know, with the financial crisis, just, you know, I think there is a continued resilience in data science and predictive analytics. Obviously, you put a lot of pressure from regulators on financial industries, especially the larger institutions to invest heavily in data scientists in really high caliber, quantitative talent, and so we're seeing a lot of that, and I think it's persistent to you today. - So, yeah, let's maybe jump in and tell me a little bit about some of the service that Zest Finance offers. - You know, our core product is a small installment loan, and really, you know, Douglas Merrill is really, we're going after payday loans, that's our target. They got a huge target on their back, they're really predatory, and we're here to really disrupt that payday loan space. You start drilling down on some of the fixed income math behind these products, they're really exorbitant in terms of the APRs and fees that they're charging borrowers, and these borrowers are granted, they're at the lower end of the credit spectrum, they're really susceptible to these types of products. So, our mantra here is, you know, to be fair and transparent, and we're on the customer side, our product is very upfront, we explain the math behind it, and I'm not boring to the customer in the details, but, you know, explaining exactly what, you know, they're gonna be paying for how long, over what time period, and, you know, exactly, everything is laid out up front for them, and we feel that, and we see in our, you know, kind of the customer feedback and customer reviews that they're really quite happy with our product. - Very cool. So, in any situation where you're gonna grant someone a loan, you've got to assess creditworthiness, and that's nothing new that's been going on for quite some time, could you tell me a little bit about how historically people did that, and what Zest Finance is doing differently? - Yeah, definitely, and, you know, Douglas Merrill gives the best rendition of this, if you can catch one of his talks, but I'm gonna lift from that one, blatantly lift from that, I'm gonna emulate him on that. You know, historically, go back 50, 60 years, you would go in for, you know, a home loan, credit card's probably even weren't even available 50, 60 years ago to most people, but you would maybe go in for some sort of a loan, a business loan, a home loan, you'd sit across a table from somebody, and they would kind of assess you, and they'd obviously look at some documentation you maybe had and take you on, you know, face value, a lot of times it's led to really unfair lending practices, and we've seen, obviously, there's a whole, you know, regulatory body surrounding, you know, fair lending practices, and so that's kind of historically where things were, way back when, obviously, Fair Isaac's got in the game in the '70s, and they provided, hey, we can provide a credit score, and, you know, it kind of leveled the playing field, but still, what we're seeing now is where FICO fails is at, now, if we kind of fast-forward up to, you know, today, FICO fails in that lower-end spectrum of the credit score. They have very thin-filed and no-hit people that they're trying to assess the creditworthiness, and to be honest with you, Fair Isaac just kind of throws up their hands because these people are, you know, they're under-banked, they maybe haven't had credit cards in the past, maybe haven't had the normal financial products that, you know, a lot of people that are further up to credit spectrum have had, so it makes the credit assessment problem of these people much more difficult because they're thin-filed, because they're no-hit-filed. That's where Zeph steps in, and we're really leveraging, you know, the machine learning techniques to better assess the creditworthiness in that lower end of the spectrum, where we have no-hit, where we have thin-filed, and we develop a score that's able to, there is good risk, there's good risk down-spectrum, just because FICO throws up its arms and can't give you a credit score for these kind of under-banked. Doesn't mean that there isn't good risk down-spectrum. It's really, it is a exercise in separating the CHAF from the week, or the current from the way, if you want. And there is good risk down there, but it's a matter of finding it. And, you know, our approach here is to use ML techniques, data science techniques, the same math, the same statistics, the same algorithms behind, you know, Google PageRank, behind how Netflix knows, you know, which movie to recommend to you, Amazon, right on down the line. - Sure, it seems to me that we look back a couple of decades, as you were describing, we maybe meet with a banker, and there's a tremendous, at least I presume, there was a tremendous qualitative component that the person behind the desk is making their best judgment about the person. It's not necessarily data-driven. And that, in part, was due to the times, I guess we would say. There's been so many innovations in technology and the tools available. From your perspective, what are some of the most influential innovations that allow us to look at credit worthiness a bit differently? - Right, yeah. And I mean, I'm gonna go to just to the heart of what, you know, the technology surrounding what we do here. You know, even if you just, if we step out of the credit space and the particular application that we're using here, let's go back, you know, 20, 30 years. If you wanted to start talking about random forest, or you wanted to start talking about support vector machines, you can probably throw up the theory on the board, on a whiteboard or a chalkboard. You could, you know, Leo Brennan, who, you know, touted with, you know, developing random forest, he could have put up the mathematics and put up the theory on the board, maybe long before it was actually computationally feasible. So the biggest advent nowadays is that these techniques are now computationally feasible. And that's the biggest advent that I see in across, that's actually true across applications. As I said, if we kind of abstract away from credit risk and underwriting, that's true across the boards. But specifically now, it's being, you know, we have the ability to leverage it in these various different applications. - So I read in some press release or some article that I'll quote here, "Zest Finance offers a 40% improvement over current best-in-class industry score." Can you tell me a little bit about that metric and what's being measured? - Yeah, sure, absolutely. So what we're measuring there is our improvement in our score over what is a widely used industry standard score in this kind of granted, not talking for prime or super prime, but in the near prime and subprime space, there's a pretty standard score that's out there that's provided by a vendor. We have benchmarked our score against that. And at our thresholding, we see a 40% improvement in default rates using our score over and above what is widely considered to be an off-the-shelf general solution for underwriting in kind of the subprime/nearprime space. So we're really bullish on our score. We feel really confident and we love what we see in it and our performance plays it out. I mean, end of the day, proves in the pudding. We use gold standard techniques in terms of validation sets, test sets, if you will, hold out untouched data that we validate all of our models on, and that's where that 40% lift is coming from. - So when I'm thinking about issuing someone credit or granting someone alone, it seems to me there's, you can make type two errors where you deny someone credit who probably was credit worthy and you have type one errors where you give someone credit who really shouldn't be getting the credit, whether it's in terms of loan yes or no or the amount of the loan on both sides. So from your perspective, which of these two types of errors is Cess Finance best equipped to improve upon? - You know what, it's both, I gotta say. It's actually both. We really, we're cognizant of, obviously. I mean, you have, you know, the two risks there are that you're extending credit to those who shouldn't, as you mentioned, versus foregone business. You've missed giving somebody a loan that, you know, in reality you should have given them that loan. Our techniques tackle the problem from both sides. You know, really at the end of the day, then it comes down to a question of your, the particular business risk tolerance. What, and that kind of comes at a more higher level corporate decision. - Sure. - What is the tolerance of the corporation for the various particular types of risks that we're facing? Do we, are you looking to grow volume, or are you kind of more in a, you know, I guess a startup phase where volume is not as big a concern, but it's more proof of concept and we want to play that out. So you really have kind of those two axes that you can move on. And again, that's kind of at a, that's more to the top of the house type of decision. - Yeah. I'm gonna ask you next what's kind of an impossible question, but maybe we'll make an interesting discussion. When I look at it from sort of an economics perspective, I would say that the existing system has a certain amount of inefficiency in it, and there's money left on the table. So with improvements to that, there's more efficiency that can be in the marketplace. And probably there's some dollar amount attached to that, that with the right scale, the economy can produce a certain amount more money through higher quality investment choices. And I know it's pretty much impossible to give me a number, but if you had to make a best guess or a rough calculation, either what would be your guess, or how would you approach coming up with such a number? - Right, it's a fair question. And Kyle, I'm gonna blatantly balk at this, but here's what I will say is that let's go out and your listeners and you, and we can go out and we can look at how large the payday loans base, kind of the alternative lending space, let's go out and take a look at what the market size is there. I think people will be astonished at how large it is. And I think that'll help gauge us and point us towards probably what you're driving at. - Yeah, I mean, you can look around. And again, these are the people that we're really going after them. We're going after them aggressively. The payday loan space, you can't drive now. I mean, I think you live here in Los Angeles, and if you drive around the city on the surface streets, you can't go more than a few blocks without seeing one of these places. - Yeah, definitely. So when I think about the, you know, methodologies you're using for, we'll just call them machine learning approaches in general, one of the challenges anybody working at space has is dealing with data that could be sparse. So for some consumer segments, you have a wealth of data, the features you're looking for, and in other segments, for whatever reason, you have a lot more sparsity. Perhaps that's a consumer who hasn't established a lot of history, or someone who, it's hard to create referential integrity across data sources you're looking at. Do you guys face, I think you referred to it as the thin file. It's kind of a similar challenge that's involved here. Do you guys face these data sparsity challenges? And if so, how do you approach working within the data you have available? - Right, now it's a great question. And I mean, you know, obviously, data imputation techniques is a wide open research area, both academically and also obviously, you know, in industry. And, you know, without getting into the details, I'm not gonna speak to the details of, you know, our particular techniques and how we approach that problem. I will say that there are a lot of great R packages out there and various different statistical software packages out there that deal with imputation and deal with these types of problems. Here, our particular flavor of that and our particular implementation, that's pretty, we keep that pretty close to our vest. - Sure. So when I think if I were to, in a very naive way, start approaching a new problem, I'll generally wanna look at historical data and throw some basic techniques at it. Let's say I'll just try a generalized linear model and see if I can extrapolate some trend across the data I'm looking at. But there's an interesting challenge that let's say I look at history of payment. I know Zess is getting into the collection space as well and making some, I guess you'd call it advisory work, creating algorithms that help people determine what where collection risks are in their portfolio and their book of business. So if you were to say, take a company's data and look at who's been paying and who hasn't, just the historical data doesn't, I'm guessing anyway, necessarily tell the whole story because someone is paying their invoices until the day they decide to stop paying those, right? You have a difficult challenge in forecasting that event that wouldn't show up in the history, or at least that's my perspective. Do you guys see challenges like that? - Yeah, definitely. And I mean, I think one thing is that our product is somewhat, the actual term and the structure of our product is somewhat safeguarded against that. If let's say we saw a second financial crisis hit, you know, very precipitously, that just by nature of our product, it's not, we're not talking about a 30-year mortgage here where you face substantial risk, both from a potential, another financial crisis occurring or interest rate movements, or just the structure of our product is not really as susceptible to that type of phenomenon. - So would you say the product is, I could see it going two ways, and maybe it's both. On one hand, maybe you're doing a more precise job assessing credit worthiness, and therefore bringing opportunities to the table that other people who aren't as sophisticated passed on, or it could be that you're hedging a bit by saying, we know that in aggregate, there's a certain amount of risk in a pool of people, but enough of them will are unlikely to default that we'll see the return we're looking for in an aggregate. Which of those do you think is the primary innovation that allows you guys to make a better return on the loans you guys choose to make? - Well, again, I'm gonna say, I mean, really, you know, if you're in the underwriting space and you're trying to tackle that problem, you're actually trying to tackle it from both sides. To your first point, you're definitely trying to find the good risk within, you know, within that, the pool that you know, for us, it's subprime, near prime, subprime, clientele. We're trying to find the good risk in there, but we're also trying to tackle the problem from the other side is that, you know, we understand, we're not a sage, we're not looking into a crystal ball, we will have some customers that will have trouble paying us back, but you know, obviously at the end of the day, you know, we have to look at total profitability of the portfolio. - Right, yeah, so when you mentioned earlier about like the financial crisis, that's, I would say, an unprecedented event, and it would be hard to look at, in any machine learning approaches trained on historical data in general, when there are no major crises in the history, it's hard to be able to forecast them. So there's anyone, you know, in any part of the economy has that same risk, I guess, but do you think it hits harder or softer in the subprime spaces that you guys most focus on? - Yeah, no, so let's try to focus that a little bit. So for one thing that, again, it kind of goes back to the actual product itself, and that we're not talking about like a 30-year mortgage where the duration risk is substantially larger for such a long-term product. The other part of the component there is that we're not, while the underwriting, and many of the other different aspects of our business are done through machine learning, whether it be marketing, or you mentioned collections, or obviously the core product underwriting. You know, if you go back to the DNA of SES, we also bring a real heavy business analyst, and these guys are, you know, they got their heads on a swivel. I mean, they're looking around the entire environment, and they're saying, okay, where do we want to optimize the thresholding? If they, you know, sniff out that there's going to be potential headwinds, people did see this stuff before the crisis. Now, whether or not they were listening, people listened to them or not, that's a different story, right? But there's definitely, if you keep your head on a swivel, and our business analyst, the business analyst team is extremely bright. They leverage a lot of the same analytics that we're producing to help guide them in their decision making. - That makes a lot of sense. So I once heard, and this could be a wives tale, or superstition, or whatever, but someone that I assume probably knew what they were talking about told me, that in your traditional credit score, one major feature that goes into that calculation is whether or not you're currently exercising more than a third of the credit available to you. So let's say I have three credit cards at $1,000 each. I want to have no more than $1,000 balance across them. Otherwise, I might be penalized for not using my credit correctly. I have no idea if that's true, but I heard it. So I decided I was going to act on that. And, you know, when it gets around Christmas time, or I'm taking a vacation and my balances get high, I'll go and pay my credit down early, just so I don't cross this invisible line that I'm worried about. Maybe that superstition to me, or maybe I'm really having a positive effect. But in essence, by doing that, I'm trying to manipulate my credit score by altering my behavior. Do you guys see or worry about similar effects that if someone knows one of the components that you look for that they might find a way to try and effectively hack the algorithm to look more favorable? - That would be extremely difficult. Just given the vast amount of, the vast quantity of variables that are models consume, I could envision somebody like you just laid out, trying to move one or more of them. But that's not going to, on the margin, it's not going to be meaningful in terms of a business impact. Furthermore, I'd probably come back around. I would almost kind of question the premise. It sounds to me as though you have this suspicion surrounding how fight those algorithm works. And as a consumer, you make an effort to try to steer the score in one direction or another, obviously in a higher direction in this case, by paying down some of your outstanding balance. That's, to me, that's indicative of someone who is managing their credit well. And that's probably somebody that I would want to take a risk on. So I'm finding it hard for us to wrap my head around how that could somehow be maliciously used in that setting. - Yeah, that's a good point. So you mentioned that one of the resilient features of your approach is how many variables you look at. Can you talk a little bit about how many variables are involved in FICO scores versus the vector of variables you guys are looking at? - Yeah, I mean, well, we definitely have some ideas surrounding the quantity of variables that FICO and the other big three bureaus are consuming. I'm not going to put a hard number there just because it would be kind of speculation on my part. I don't work with them, obviously. So I'm not going to speak to their modeling techniques. But I do know that they're, I know what we do and I know we're consuming vast amounts of variables that are going into the models and the ability there that machine learning brings in order to sit through those to find the strong signal and get rid of the noise. Granted, some of them are fair enough, they're noise. Some of them are off limits for legal reasons. - Yeah, I feel extremely confident. We are consuming orders of magnitude, more data and more variables than, say, kind of the larger institutions that are in this space. - I read in some of your materials a quote that I thought was really astute. It said, "All data is credit data." Which makes a lot of sense to me. Do you have, and would you be comfortable sharing maybe an example of a surprising type of data that had an unexpected influence on financial behavior? - Yeah, I don't know if it's, I don't want to go too far into that, you know, definitely we look at a lot of things. I mean, I kind of the one that, I know Douglas was quote there that all credit, you know, all data is credit data and I love it, I totally agree. And I think, you know, the one that he bounces around is, how much time do they spend on the website looking at the term? Stuff like this, definitely, you know, we look at it and we're thinking about, hey, you know, is that indicative of somebody with a better credit quality worthiness or not? - Yeah, that's fascinating. Are there any of those data points that you exclude by design? - So for example, perhaps ethnicity or sexual orientation could have predictive power on future credit worthiness. I'll pick on my own demographic, I'm a white male. Maybe white males such as myself are much less credit worthy than most of the rest of the public. Yet there's, one could raise a moral concern about that or even if you think it's okay, there's certainly a social backlash there about looking at particular demographic features. - Is there any, do you guys have these concerns and is there anything you leave out to specifically avoid an unintended prejudice that could arise out from the underlying data set? - Yeah, absolutely, Kyle. I mean, first of all, there is regulatory legal requirements fair lending practices that we abide by. And, you know, obviously we have to from a legal standpoint, but it's also in our guts. I mean, that's what we found it zest on, right? We found it zest here, Douglas did, in order to help people. So we're, you know, we're very cognizant of that. Obviously from a legal standpoint, it's a huge no-no. We take pains to sift through our data. Anything that remotely even sniffs of, hey, could this be a proxy for somebody's age? Not even that, of course, never somebody's age in and of itself. Could this variable somehow be a proxy for age? - So like zip code maybe. - Potentially. - Yeah. - Something along those lines. I mean, anything there that could be, you know, even remotely considered to be off limits from a fair lending practice, it's definitely removed from the models. - Yeah, it makes a lot of sense. So how could either an issuer of credit, yeah. Let's start there. How could an issuer of credit best utilize their services if they were looking to? - To utilize zest services, did I? - Correct, yeah. So if I ran a business and I gave out loans of some kind and I was dissatisfied with the means I had of assessing credit worthiness, is there an option that I could partner with zest to see what you guys could do and help me assess the credit worthiness of my potential customers? - It's a great question. I love it and I would refer you to our business development team. It's a little over my head. That's gonna be more at the top of the house from the business side. But definitely I would be very interested. - Yeah, just in case we have any listeners who meet that little demographic, is there an email or a best place to reach BizDev? - I don't, yeah, it would be, I mean, obviously Douglas and Michelle Sankster is heading up BizDev. I don't have her email right in front of me right now. But yeah, if your listeners get to you and we can get that info along to them. - Sure, and on the consumer side, if someone was in that subprime category and was having trouble getting credit issued to them and they wanted to see if perhaps zest was more willing to consider them what would be the best approach to go and try that out. - Right, I mean, you know, you can look us up as zest cash, zest finance, obviously, we have both domains. We'd encourage listeners to check us out and see if our product is gonna be right for them. - So I'll ask you maybe a tricky question. Feel free to be dodgy with the answer 'cause I certainly wanna respect the proprietary nature of your products. But looking at the wide spectrum of things you could be using in machine learning, everything from random forest, support vector machines and all that fun stuff. What's been most powerful and most useful in within zest potential arsenal of techniques and tools? - Yeah, I'm gonna be a little vague, but, you know, obviously, yeah. You know, all the stuff that comes to mind, you kind of mentioned a lot of the catchphrases there. I think maybe the only one I'd maybe add there is various different ensembling techniques. - Are there any, in particular, who, say, an aspiring data scientist, grad student or whomever might benefit from looking into? Not necessarily the ones you guys are applying, but ones that you think are novel and less known. - So here's what I'll do. If you're an aspiring grad student, you're listening, or if you are a practitioner and you're listening to this, if you are at all interested in predictive analytics, if you're at all interested in algorithms, statistics, machine learning, I recommend you to, both these texts are freely available for download. They're elements of statistical learning and introduction to statistical learning. The intro is a little more accessible, probably at a higher end upper division undergraduate level and the elements text is probably at like a graduate level. Those are great. I mean, it's basically, that is the two quintessential texts. They're both freely available online for PDF download. - Yeah, great resources. So one challenge I've occasionally faced in my sort of data science career and whether it's asked or elsewhere, I'm wondering if you've faced this, is certain approaches are more, let's call them user friendly than others. I like decision tree learning in this respect because pretty much anybody can look at that and there's an intuitive appeal to it. But at the polar opposite, you have a neural network which you can kind of only trust in its cross-validation or it's R square against actuals. There's, it's very hard to look and say, "Well, the weight on this neuron is trustworthy." So there can be a challenge that someone less familiar with data science and machine learning techniques to say, "I don't get this, so I don't believe it." Have you faced challenges like that? You know, it's just or elsewhere and, or if not, what would be your approach? - Yeah, I mean, to someone like that, I would, I guess I would say do that at your own risk. You know, especially depending upon the application. If you're, you know, if you're using predictive analytics in any capacity across the spectrum, you know, turn a profit in some way and you are turning a blind eye to kind of the more modern cutting-edge techniques, simply because they maybe are less interpretable, then I would, again, I'd say kind of do that at your own risk. I guess, you know, where I can see, it really depends on your tolerance for the opaqueness, which comes with a trade-off in terms of the predictive accuracy. We know that the more opaque models oftentimes lead to more highly accurate predictions on future outcomes, and that can be back-tested, versus kind of some of the more, you know, traditional techniques take a logistic regression, right? It's very easy. I can look at the summary and I can read off the coefficients. I can see both the directional impact of a variable and also its magnitude by just looking at the sign and the magnitude of the estimated regression coefficient. Those are quite interpretable. Regulators love to do that, but I think regulators are a little step behind in terms of their, you know, they want to see something that is, you know, very easy to interpret. However, I think those models are going to pay a price somewhat in terms of their predictive accuracy. - Yeah. - So it's really, you got to balance that and you got to, you know, weigh those two competing risks and, you know, choose where you choose the modeling technique and the modeling approach that's going to fit the particular application. - Definitely. You mentioned back-testing, which I'm always, it's a good sign when I hear someone talking about that, especially when it can be done as like a compliment to cross-validation or something like that. Would you mind giving a quick definition and maybe an example of how you guys apply that? - I'll speak more broadly just kind of, you know, in terms of, you know, back-testing and out of time, you know, validation, you know, you can obviously train up a model, you know, through, I don't know, let's just say, say, through the end of December, 2013, and obviously then, you know, do some forward projections onto, you know, up to today, that would provide you with an out-of-sample, out-of-time back-test. - Yeah, it's a nice compliment to what I've always felt is sometimes missing in some approaches that any good model or theory should make verifiable predictions. So it's really promising to hear you guys look at it in that way, to me. - Absolutely, and then you can, you know, sky's the limit there, you can do a walk-through time as you roll that window forward and continue to make, you know, more and more future projections. So yeah, definitely agree. - So what's next for zest finance? - That is a great question. You know, we're obviously continuing to stay focused on our core product. That's really our mainstay. In parallel to that, we're pursuing, you know, a lot of other options on the table. We've generated just a ton of interest from, you know, various different verticals, various different industries at different points in the customer lifecycle and the sales lifecycle. We've generated a lot of interest just in our techniques, our approach, our technology, our platform. And so I think a lot of stuff's in play. And, you know, it's not in my place to kind of, you know, speculate in terms of, or to share really at this point where, you know, what's next for zest, that's more something for Douglas obviously to say. - Lastly, I like to ask all my guests to provide two references to close out the show. The first, I call the benevolent reference, benevolent reference, which is a mention of something, could be a book, an R package, whatever. Something you think is worthy of some exposure, but you don't necessarily have a connection to. And the second, I like to ask for what I call the self-serving reference, which is something that hopefully brings you some direct benefit from exposure on the show. - Right. You know what I kind of, I, probably the first one, I've probably kind of already revealed a little bit. I'd point people to the Hasty Tips, Toronto Techs, the elements, you know, I can't loud it enough, and I would point people towards that. I obviously have no skin in that game, but happy to share that one. The second one, you kind of got me, Kyle, that we personally might, may or may not have a stake in. You know, I guess it would be to for, if there are consumers and customers out there listening, just do your homework, really look at the fees, look at the terms of these competitor products, and consider us, because we're much more advantageous for them, and so I just want to put that out there. It's obviously, you know, it's a pitch. - Yeah, are you a Twitter poster in case people want to follow you? - I'm not on Twitter, but you can find me on LinkedIn. - Sounds good. Well, thank you so much for your time. This has been a really fun conversation. - Thank you, Kyle, appreciate it. - Take care. (upbeat music) (upbeat music)