Data Skeptic

Measuring the Influence of Fashion Designers

Duration:: 24m
Broadcast on:: 14 Aug 2015
Audio Format:: other

Yusan Lin shares her research on using data science to explore the fashion industry in this episode. She has applied techniques from data mining, natural language processing, and social network analysis to explore who are the innovators in the fashion world and how their influence effects other designers.

If you found this episode interesting and would like to read more, Yusan's papers Text-Generated Fashion Influence Model: An Empirical Study on Style.com and The Hidden Influence Network in the Fashion Industry are worth reading.

[ Music ] Data Skeptic is a weekly show about data science and skepticism that alternates between interviews and mini-episodes. [ Music ] >> Yousan Lin holds a bachelor's degree in Computer Science in Information Engineering from National Center University. Currently, she is a PhD candidate in the Department of Computer Science in Engineering at Pennsylvania State University, working in the IPDA lab, where her research interests include machine learning, natural language processing, and social network analysis, specifically in how these ideas apply to the field of fashion. Yousan, welcome to Data Skeptic. >> Hello everyone, it's a pleasure to be here. >> Excellent, I'm glad you agreed to come on. So, I was really excited especially because I've never actually encountered a researcher before that is applying data science techniques to the fashion industry. So, I was curious how you got interested in this domain. >> So, I think it's actually a pretty interesting story. At least people around me think it's kind of interesting. Before coming to the state for my PhD, I was studying in Taiwan, and I was actually a model when I was in Taiwan. I was doing fashion modeling. After coming here, my professor that I was working with asked me to come up with a research idea that I will be passionate about. And that was like my first time trying to conduct research. So, I was thinking maybe to keep the passion in fashion going, I would come up with something that's studying fashion, but at the same time from a computer scientist approach. So, then I proposed the idea of looking into fashion industry in a quantitative manner. Also, at the same time, I was taking a course called Social Network Computing. Then, I also applied the idea of Social Network Analysis to the research. So, that's how it all started. >> Very interesting. You're reading your papers and seeing some of your talk presentations. Like, it's cool work and it seems sort of obvious, but I'd never encountered it before. Do you know if you're sort of the first researcher to look at the intersection of these two areas? >> I went back to look into some other work. And most of the literatures that at least I encountered will be rather qualitative. So, they will, I think research is done maybe a few decades ago. There were like 40 years ago, paper like that. They will conduct surveys to see how fashion trends or fashion ideas move on campus, like among the girls. Or other ones will be people just kind of talking about the possible, the areas possible phenomenon behind the fashion industry. Mostly by economics people, but there really isn't any computer scientist doing this with the data that is actually so out there. Like, there are so many online data about fashion right now. People aren't really leveraging them, at least as far as I know. >> So, I don't imagine my audience will be experts on the fashion industry. So, can you talk a little bit about how it runs, how that industry kind of operates and how trends get started and propagate? >> I think to put it roughly fashion industry can be divided into one part that's called the high end fashion. And another part is called the high street fashion. So, high end fashion is kind of the fashion we have in mind, like the designer brands, such as Chanel, Versace, and DR. Those fashion brands that you will imagine to be very expensive, very pricey. They are the ones presenting their newest designs in every fashion season on the round base. So, they are usually the ones bringing new design ideas and new design elements to the fashion trends. While the high end street fashion are brands more like H&M, in Dara, or Topshop, SLS, their prices compared to the high end fashion are a lot lower. But unlike the high end fashion brands, they are more likely to borrow and follow the ideas proposed by the high end fashion brands, maybe the previous seasons. Yeah, so those will be the ones that most consumers will buy. So, usually the whole big picture is the high end fashion brands, they will come up with their ideas. We are not sure whether it will be influenced by the overall trends in society now, but mostly we assume they just come up with new ideas and they decide to lead the fashion industry with the idea they come up with and on a high straight fashion, they will follow their footsteps. So, that's kind of how I think. Yeah. Have you ever seen the movie The Devil Wears Prada? Yeah, of course. Yeah, I'm reminded of this scene where I think it's Anne Hathaway's character has on this blue sweater. And I think it's Glenn Close who's sort of the center point of the movie says something like, that sweater you got at Target for $30 started five years ago when this designer presented some innovation. Is that kind of the idea? Yeah, definitely. Very cool. So, what does it really mean for a fashion designer to have influence? That's really a good question. At first, I was thinking, because there were researchers done on looking into influential people in other domains. I can't name it right now, but maybe like, especially in academia, like what authors are more influential. Yeah. And these works are mostly done in a way that they plot the authors and the connection between the authors on a social graph, like a social network graph. And the authors are the nodes and how they influence each other. However, you define it will be the edges. So, I was thinking of borrowing the idea and put the nodes as the fashion designers and the edges to be how they influence each other. So, in the end, if you want to see who are the designers with more influence, we can take whatever centrality metrics such as in degrees or out degrees depends on how you define the directions or anything else. So, when I was doing it, I chose in degree. So, designers with higher in degrees, I would say they have more influence. That's how I did it, yeah. And could you elaborate on what you mean by in degrees? Oh, so in degrees, it really depends on how you define the edges. For example, if we have two notes, note A and notes B, and they are designer A and designer B. Maybe designer A in the last season proposed an idea and designer B really liked it. So, in the next season, designer B borrows it and put it or include it into its design. If we identify that, we will create an edge. It's a directed edge with the arrow pointing from B to A. We say, I'll be borrow the idea from me. So, this is a bit questionable because some people will say the edge should be the other way around. We should define it as how the idea of flow. So, maybe we should define it as A pointing to B. But I still kind of stick with B pointing to A because I want the direction to point to where the idea comes from. Yeah, that's what I do. So, how do you go about deciding which arrow should exist and which way to point? How are you able to measure it? By measuring it, we have to let data, which is the fact to tell us. And the data source we chose at that time, we used runway reviews from a website called style.com. They started publishing runway reviews from year 2000. So, it has been 16 years and it's quite a lot of fashion data that we can't get. And the way the right reviews were like they will make it very descriptive. They will talk about in this collection, maybe they have black lace dress and they're paired with studs high heels, things like this. So, they will use really descriptive sentences or terms in their reviews. So, we first started out by identifying that, then each designer in their collection in each season will have like set of terms describing the fashion items. So, then we use that, we call them the fashion symbols because of some previous fashion literatures. They like to call these kind of ideas as fashion symbols. So, we call them fashion symbols and we compare between designers their collections if say in year 2000 in the fall season designer A proposed this idea. And we found that in year 2003 the spring season designer B also used this idea. Then if the fashion symbol is the same or similar, then we can say there's a potential that designer B borrows the idea from designer A since designer A proposed it first. Yeah, that makes sense. Yeah. I could see that we were talking earlier about the like authorship paper. It actually seems like an easier problem because it's very clear who you cite in your paper is deterministic. Exactly. But you have some additional challenges. Could you talk a little bit about some maybe some of the tricky parts of being able to extract the time data as far as who proposed an idea first and where the idea shows up later? Actually, the good thing about fashion is that the time stamp itself is very regular because on the runway they have these fashion shows that are hold very regularly. So, we just assume all the other designers will only see each other's work after they're on the runway. So, that's like the day they announced all the ideas. So, we have the time stamp that's very regular. We don't have to really define the time itself. And I think the trickier part will be how we extract the terms that are actually talking about fashion. Yeah, so that will be way trickier than the timeline itself. Yeah, we should come back and get into that a little bit more, but I'm curious, you know, once you've got your node set up, that sounds kind of straightforward. There's a certain number of designers and now you've got a method for determining the edge directions and maybe weights. Can you tell me a little bit about how you then calculate the influence of a particular designer once you have your social network set up? I have two approaches. One would be whoever proposed it first and later on all the designers will link back to that designer. But we realize that's like way too many edges. For example, in year 2015, maybe the designer have an idea that's exact same as year 2000. But if we go back, go way back to connect to all that. And maybe throughout the years, everyone else has that. Maybe it's way too far 2015 might be influenced by a lot newer like 2010 or 2012. So then we changed it up into we only find it to the nearest one. For example, let's say an idea called pencil skirt, maybe in 2012, it was introduced in 2015. It was also adopted. But we've traced it way back in year 2000. It was adopted. Where do we connect for the designers in 2015 to? We think as long as we pointed to 2012 and if we use somehow a mechanism to take care of how it propagates, then it should be fine enough instead of pointing to all the possible influencer. So we leverage PageRank for that. Oh, interesting. Yeah, we use PageRank to save the complexity because we think instead of creating so many edges, like PageRank, it will trace all the way back. Yeah, so then it will take care of that. Oh, it'll not lose. Yeah, yeah. So from talking about that propagation, from what I understand, fashion kind of starts at the runway and then propagates out into high fashion and eventually ends up in sort of department stores and lower end things. Is that a correct model that things are starting at the runway and decaying and propagating out or are there nuances to it? I definitely agree with that idea, even though, unfortunately, we haven't done it in our work because we were right now focusing on the runway reviews, which are all high ends. So we are more talking about the influences between the high end designers. But I think it's definitely true, as you can see, maybe Zara is a really good example. Once they see what are released on the runway, they quickly make their own design by adopting the ideas from the runway and make their own. But I think we should definitely look into that. Right now I'm working with other PhD students kind of closely related to this, but it's not really concrete yet. So you've talked about how you guys leverage fashion reviews to do the bulk of your analysis. I imagine there's some challenges around crawling, parsing and assembling of that corpus. Can you talk a little bit about the work that went in there? The interesting thing is that I get this feeling, everything about fashion don't really provide researchers with API, so normally they aren't really interested in that. So unlike Twitter or any other Instagram Pinterest, they will provide a really developer-friendly API. For fashion data sources, we kind of just need to identify what we need. I never really crawl things, though. I identify the rules for the URLs, and I just let the program visit at each page and scrape them. And the language I love the most is Python. That's my go-to language. So I always just use Python to visit page after page and then scrape them and use a package called Beautiful Soup. Yeah, to identify which are the part, the contents that I need and to save it. Awesome, yeah, that's a great library. Yeah, yeah, I really like it. So I know you've explored algorithmically extracting some of the fashion terminology from the text. Can you talk a little bit about the taxonomy you used and why you want to do that sort of entity extraction? Actually, the taxonomy I built it my own. I was really hesitant to do that at first because I know it won't be an easy work, but after surveying the existing taxonomies, I decided to do so because there are actually a couple of taxonomies out there. There's one provided by Rob Lauren, one of the designer brands, and another by a pretty famous apparel website called Apparel Search. And there's another called Fairchild Dictionary of fashion. However, all of them suffer from the same drawback. That is, they tend to mention the fashion design terms that are way too ancient. So they are very standard, more like in textbooks. So they are like not modern enough. Therefore, I decided, okay, I'll just look into the data, read the data and see what are the terms that are more likely to be used in the reviews describing the modern fashion designs. So I read through reviews batch after batch and identify the fashion terms that I think are related to fashion designs and put them into my taxonomy and keep checking how the coverage can be. So that's what I did. Yeah. Could you talk a little bit about at least at a high level the structure and scope of the taxonomy you built? I was saying you ended up, the taxonomy was around 3000 words. It's not that much, but at that point when I randomly selected reviews to tag them, they can achieve about both precision and recall around 95%. So I think that was enough for the study. So am I correct in saying that then your taxonomy defines the tokens you'd look for in reviews and how you would classify them? Exactly. So the way I did it, I haven't really classified, even though in the taxonomy, I will classify, oh, this is a top, this is a dress, this is a shoe. But in our work, we haven't really classified saying, oh, this term is about a dress or anything like that. But when we tag them using the taxonomy, instead of directly tagging the match, we actually tag them and extract the noun phrases that include the words from the taxonomy, because we think that will be a way more complete entity for that. For example, I have a dress and I have lace, but maybe in the taxonomy, I won't include lace dress because that's too specific. So for dress, I will identify it and the noun phrase itself will be lace dress, then we can have a more meaningful design entity to extract. Interesting, so it seems to me that's maybe one of the most novel pieces that you can capture innovation that way, because who knows what will become fashion in the next season, like maybe this is probably ridiculous, but metal lace could become a thing. Yeah, yeah, who knows? You would know to look for it, but because you have lace in your taxonomy and you're capturing the noun phrase, you would then sort of in an unsupervised fashion develop that metal lace is the new trend. Am I looking at it? Yeah, yeah, yeah, yeah, that's definitely correct. Oh, it's very novel. Could you discuss some of the distance metrics you used and maybe how you applied term frequency and risk document frequency? The way I use it isn't that tricky, but it makes a big difference. The first metric I used to, the first one is Jacquard Similarity School and the other one is the tier of idea. So for Jacquard Similarity, the way we did it was, as we just discussed, maybe lace dress, some people will use lace only, but some people will use dress, but whatever terms that include lace and dress will somehow have overlap with the fashion symbol of lace dress. So we just assume each fashion symbol consists of set of words. Then when we think of sets, then we should be very tempted to use Jacquard. So that's what we did. We just applied Jacquard to see how much two fashion symbols overlap. So that's a signal of how similar they are. And for TFIDF, we made a little tweak in there. So TFIDF normally gives a sense of how unique this word or this term is compared to all the other documents in the corpus. But we were thinking, for example, let's still use lace dress as our example. Maybe lace dress in 2015 isn't something really new anymore, but lace dress in 2005 is very interesting. Or very unique, or very night innovative. So the way we did it was we changed the coverage of the corpus based on the time. So for the TFIDF to be calculated, maybe at 2005, our corpus will only include reviews at 2005, and for the TFIDF to be calculated in 2015, we used reviews only in 2015. So we think that on the timeline, how unique a term or a word is should be different on the timeline. It should change along the line. So as your analysis was coming to a point where you could make some conclusions about it, could you describe who you found to be the most influential designers? Oh yeah, I found Indian, I found Colin Klein actually. Yeah, but we haven't done, I think it's worth doing to do a qualitative research based on this, so we can really look into all why does this happen? When did they introduce the idea that make them influential? We haven't done that yet, but we just kind of put it out there saying, okay, we found Colin Klein is the most influential based on our model. Sure, yeah, it's a good first step. Was that surprising or were there any other surprises in your influence graph? Oh, I think for influence, we found it interesting and so surprising that when comparing smaller brands in way more well-known brands, we found an example that there was a brand, I forgot the name, but there was a brand that introduced pencils good, very near 2000. But after it introduced it, even though in our observation, in our data collection, it was the first one proposing it, no one else adopted the same idea after so many seasons, but after five years, Louis Vuitton, the very famous designer decided to go ahead and use pencils good as well. And right away, right after that season, everyone else joined the party and started using pencils good. Oh, how interesting. So it's not that the idea itself is totally innovative. It has to be the right idea in the right designer, maybe. Yeah, so we were also really noting the difference between being innovative and being influential. So a lot of times being innovative doesn't mean you will have the power to influence other people to use the same ideas you. Yeah, that's very intuitive, but I wouldn't have thought it on my own. Yeah, very cool. So I guess all of the influence kind of calculations come out of the fashion text. Could you maybe summarize your conclusions regarding the predictive power of the text alone in calculating influence? I think because we used some ground truth, some published rank list for designers as our ground truth, and we do see the performance is great, but I still think there's limitation on text itself, even though I personally really like text mining. I think the domain of fashion very intuitively is about how you visually perceive designs. So if we can the next day, the next step, if we can leverage the image sources, the image data to somehow extract way more features that are missing in the text. I think that will significantly improve the model and just by using the same model, maybe after you extract the feature, you transform them into the same fashion symbol entity. Then I think the model can capture way more. Along those lines, can you talk about some of your future areas of interest? Currently, I am working on a different angle from looking to fashion industry. So previously, the papers I published are on quantitatively studying high-end fashion. So right now, I kind of switched it up into looking into people more like it from a user perspective, from consumer perspective. I'm looking into the website. It's like a fashion social network called lookbook.new. I think it's the largest fashion social network at this time. They have almost 2 million users and they post outfits and they will link where they bought this outfit mostly from high street instead of high-end. And I'm looking into how the users will influence each other. So that's kind of similar. We're from a different angle. Also, at the same time, I'm working with two PhD students, economic students. One is CC from Harvard and Jorge from Hebrew University. I think it's a deeper way of looking into the work I had previously, but they are trying to study it in a more econometrics way. And we are currently making the taxonomy more complete, even though I think it had a good coverage, but because now we are including more railway review sources. So we are thinking maybe the taxonomy should be even richer. Very neat. Yeah, that's exciting stuff. I'm looking forward to reading your future papers. Thank you. So I know I will and I hope I'll encourage the listeners to go check out usanlin.com, your website, and I'll have a link in the show notes. You know, they can see your publications and things like that there. Is there anywhere else people should follow you online? Yeah, people can follow me on Twitter. I don't use it that often, but if you saw on the score, then that's where people can follow me. Excellent. Yeah, I'll try to tweet more. Alright. Awesome. You saw this has been really great. I want to thank you so much for coming on the show. Thank you so much for having me. Until next time, this is Kyle Polich for Dataskeptic.com, reminding everyone to keep thinking skeptically of and with data. (upbeat music) (upbeat music)