In the second full length episode of the podcast, we discuss the current state of personalized medicine and the advancements in genetics that have made it possible.
Data Skeptic
Personalized Medicine with Niki Athanasiadou
(upbeat music) - Welcome to the Data Skeptic Podcast. I'm here with my guest, Nicki Afanasiadu. - Hi. - Hey, thanks for joining me. - Nice to be here. - So we're gonna talk a little bit about personalized medicine today, but I thought we could start with you just giving me a bit on your background, academically and what you're working on now. - Yes, I'm a researcher, currently a postdoc at New York University. And I'm a specialist in next generation sequencing, which is a technology that is bridging our service possibilities with personalized medicine. - Very exciting. What got you into it? - Champs, I guess I was working on something else, typical biological research on the bench, wet south, and then the experiment required that I do some genomic sequencing, and that's how I got in. - Awesome. So maybe we could start, a lot of my listeners, I think, are more computer science-y types. Maybe we could give your high-level definition of what personalized medicine actually is. - High level, I was thinking of actually giving you example to explain what personalized medicine is and where it can get us. Let's think of one of the most well-studied is this, which is currently cancer. Cancer actually comes in many different types, and everyone probably is familiar. We talk about pancreatic cancer, lung cancer, brain cancer, so many people are aware that there are different types of cancer, or although the cancer is an umbrella term to mean a growth in your body that interferes with the functions of your body. So let's keep in mind that there are different types of cancer, but also we now know after all these research that every individual has its own version of a particular type. So we know there are at least five different types of breast cancer, and what determines what type this cancer will be is really the genetic, number line genetics of the person. So that's where personalized medicine comes in, and based on the genetic background of the affected individual, we're trying to grow in the specific mechanism that has been altered or that is not working very well, and he drives the problem. And specifically target this mechanism that in the individual is not working well, and treating it. - So just to see if I kind of have a good understanding, when you say there's five types of breast cancer, could it be that for someone of type one, treatment A is very good, and for someone of type five, treatment E is very good, but giving type E to person one would not be a successful. - Exactly, exactly. - Can you do like a biopsy on the tumor to know which type they have, or is it purely genetic? - Yes, that's what we're trying to do. I have to say early on that in a discussion that we're still in the very early stages of personalized medicine, and we're even in the very early stages of understanding how our genome is organized. And the genome project might have been completed. This happened 2001. However, we're still trying to understand what it all means. So let's keep that in mind. We don't have all the answers. We actually have very few answers, but it's very exciting to feel that I think it's going to be very great results in the decade or so. - Yeah, I'm trying to understand a lot of what it means myself, and not having much of a medical or biology background. Sometimes it's a struggle. So I thought that'll probably be true of a lot of my listeners. So maybe I could kind of give you the way a data scientist looks at genetics, and you tell me where my analogy works and where it breaks down. - All right, yep. - So computers, we all know it's all binary code, zeros and ones. We have something sort of similar in the genetic code, all the base pairs, but there are four combinations, right? It's A with T and G with C. - One base T's? - Yeah. So if we just look at one side, once it's kind of cut in half, it's like a base four number, whereas binary is base two. And then if I understand right, every kind of three base pairs forms, is it a codon? - Oh yeah. (laughing) - And this was actually something really cool is that there are something like 56 unique codons, and so-- - 64. - Oh, 64 exactly. All right, because yeah, that it would take four bits of a four digit-- - The combination of four entries, right? - Yeah, to be able to, so it's like nature found the most efficient number to make a codon. If it had been one extra base pair, it would have been wasteful. - Absolutely of the possible codons, all the possible codons, only in code 20 amino acid. Actually nature is even more clever than that, so it allows for some degeneracy if something goes wrong. - Ah, interesting. So if I were to kind of then take the analogy a little bit further, each codon then is like a single instruction in a computer program. And those instructions, the output of those programs would be proteins. Do I have that right? - A strain, a string of amino acids gives you one protein. I would tend to consider that the bit of information is the protein, the protein can go wrong because some amino acid within it. - Gotcha. - Is wrong. - And then what tells the proteins where to go? Where's that information encoded? - Okay, so you are absolutely right about the codons and how proteins are encoded. But when we're talking about genome wide information, we need to take into consideration that of the whole genome, which is a three billion basis, only one to two percent in code for proteins, the red is non-coding. So the red doesn't fall into the category that every three nucleotides will give you one specific bit of information. So this non-coding regions of the genome arrive there regulatory, which is exactly the answer to your question. So there are these regulatory regions that either tell, give the instruction of whether the specific gene needs to be expressed or not, but also give you all this other information about the regulation, the localization, et cetera, et cetera. - And one thing though that's very different where the analogy certainly breaks down is that genetic codes are very resilient. If I go into a computer program and I change one little bit, probably it won't even run or certainly some aspect of it won't work at all. Whereas you can flip a whole bunch of bits in a genetic sequence and you still pretty much get a functioning species if I understand, right? - Yes, we can limit. You can go beyond the point of no return. - So then we know from generation to generation there's certain mutations. So I look a bit like my parents with some random variation, but one thing I'm kind of confused about is how do we say, so if we describe genetics as a really, really long number made up in base four, how do we know which numbers are humans, which numbers are mice, which numbers are nothing? - I'm not sure I understand the analogy. (laughs) - No, no problem, it's probably a weak analogy. Anyone's own personal genome you can describe is one really, really long number, I presume, right? Say I take their whole, the sequence of-- - Okay, yes, it's like in a, yes, I understand. - So I have a unique number, you have a unique number and our numbers are different, but they're more similar, my number and your number are more similar than my number and my pet Amazon parrot's number. - That's correct. - How do we define it? Or can we define what a species is just based on those numbers? Like in this range, it's, where am I thinking about it all around? - No, actually your question is very insightful than the definition of a species. I'm afraid we don't really be, most of the species are defined based on how they look and the anatomical characteristics as the truth now in the molecular age, we are entering, we have entered the era where we are defining species according to the sequence, but we're not looking to the whole genome. We are actually looking at particular regions of the genome that use, depending on how close, closely related the species are, we might want to look into regions of the genome that are very different in the different species that mutate very frequently. As we say, they have time mutation rate or if we're talking about more distant species, we're looking into parts of the genome that are most stable and the volution has taken care of us. So the way these works is that brought, reduce of the genome that encode proteins that are very basic for all types of life, like the ribosome, it is the machine in the cell that translates the genetic code into the functional protein with the end result. This is so specialized and so universally needed that evolution cannot afford to change it too much, it's already optimized, you look at it like that. On the other half, on the other part, if there is a protein that makes some specific feature of the species or if it's a protein characteristic state of primates, which is us and/or all our ape-like relatives, then these will be a good region of the genome to study to identify the different primates species. - Ah, so we can say that per species, there's sort of areas of interest that were interesting things are going on in the genome and maybe the rest of it's all kind of boring in terms of mutation and variation for that species. - For comparison, yes. - Right, right. - Depends on the group of the species and how closely related. - So I may want to come back and talk a little more about genetics later, but I thought I'll try and keep on course a little bit and ask a little bit about personalized medicine. So I've heard it said that personalized medicine enables more efficient and effective clinical trials. And I'm wondering if you couldn't help me understand why that's true. - Yes. - Before I answer the question, if I can, I would like to explain how we currently develop, how we currently develop drugs. - Yes, that'd be helpful. - Some researchers like me in the lab identify the molecular mechanism, a cellular pathway that when needs prepare causes the disease. So that's step one, identify what happens in the cell to cause the disease. Second step is to find the chemical compounds or the chemical compounds that can actually stop this aberrant pathway from occurring. Then this is the drug. Then having identified the drug that affects the specific molecular mechanism, we're going into clinical trials. And this is multiple rounds of clinical trials that are testing for two things. One thing is that the actual drug doesn't cause harm. We don't want the drug that will kill cancer, but we kill the liver. And the second is that the drug needs to be actually cured in the disease. So these two things are testing in clinical trials. So personalized medicine, let's go back to the cancer. - Sure. - Let's think that we have a drug that really is very good at targeting a very specific type of cancer. And it's really a good, very good possibility for cancer patients worldwide that have this specific type of cancer. Now, if this pain, if this type of cancer is very rare, and only let's say 1% or 1,000 of the population carry the specific genetic underlying mechanism, you want to actually tailor your clinical trials so that you include, not as the control, but we include as the effective group, people that actually carry the mutation. Otherwise, the drug might be dropped, although it would be a good drug for this very small percentage of the people. And this is good both for the patients, of course, and for the companies that have spent all this time and therefore developing the drug, they don't want the money to go down the drain so difficult, but calculation. Having said that, there is, of course, potential problems with that, because as I said earlier, a clinical trials also test for potential adverse effects of the drug. So someone could think that if you cherry pick your patients, they might not show some adverse effect in response to the drug that might be widespread in the population. - Right. - And this has actually been shown, I'm not sure about which drug it was, but there is one case before this actually, that this has happened. - So how do you-- - Can I say? Yes, sorry. - Oh, just gonna ask about how they end up detecting that. So presumably, let's assume it wasn't terribly malicious, then the group picked people afflicted with some very specific illness, and maybe also of those picked those that had some like correlated genetic property, which would make them resilient to whatever side effect there would be. But, and then you might only see that later when you try it on people who don't have that correlated genetic strength. - So, I guess what I'm asking is, how would you then establish that later? Would it be a secondary trial where someone's trying to repeat the results, and they had a better randomization? - That's definitely one way to test it. And the problem with that is usually when a company developing the drug is proprietary, and their rules of open science before all don't really apply. So I feel, especially as I said, personalized medicine. And this type of approach to target disease is very new. I do believe there's regulation that needs to be put and protocols that need to be developed specifically for this type of trials. But as a whole, it's not bad to use target rules. Especially when we're talking about personalized medicine that usually is drugs for a very, very specific type, subtype of the disease. - Is there any required transparency on how they pick that particular test group? - I'm not sure about this. - Yeah, 'cause I guess that, again, having no medical or biological background really, I come at it from a purely statistical point of view, and I'm thinking, if I can pick my test cases, whether I mean to do it or not, how do I control for all these other hidden variables I might not know about? - Oh, yes, there are definitely established protocols for that. So you try to match for all the use factors, your test and control group, age, ethnicity, sex, are definitely things you want to match them for. - Sure. - Yes, and these protocols do exist. This happens. - But if I wanted to be very pragmatic, I would say, well, age, checking and gender and ethnicity, these are very nice, but if I'm going to do a genetically based test, then any one random base pair in the whole genome is fair game to say, you know, your test group was all the people who had A, and this test group was all the people who had G, which is kind of silly, right, in a way, but who's to say that there isn't some segment of the genetic code you have to control on? And at that point, you could control on, you know, a billion different based pairs. - Okay, I wish we actually understood the genome that we have to do it. To my knowledge, it's not possible right now. - I see. So it's unlikely anyone could be doing funny business with their results, but perhaps lack of understanding might, there might be some flag out there we could one day be looking for. - Yes, yes, for sure. - So tell me a little bit more about the cellular pathways. I'm not too familiar with what that would be. - What do you mean? - So you said you identify the cellular pathways that are correlated with the disease, and then you go about finding a drug that will help prevent that cellular pathway. Is that-- - Mitch happens with basic research. So I'm also basic research researcher. So, for example, I'm studying the genomic response to changing in the environment. And by doing my nicely controlled experiments on the bench, I'm aiming to figure out exactly for example, in my work how specific change in the environment, and specifically I'm studying the suites in nitrogen sources from actually only crawling to only glutamine how this affects the response of the genome. This is relevant to certain types of cancer because we know that the micro environment in cancer itself is different. And you proceed through that. So you make associations of basic pathways that when perturbed give you a phenotype as we call it, the exhibit itself in the whole organism in a way that's similar to the disease. And from there on, you go on to the title what we talked about. And of course, this is the traditional approach. - Gotcha. - The new approach, genomic approach is to actually go straight in the disease, sequence the genome of a thousand or so patients and find some association, some correlation between mutations and the disease. But of course, this as you obviously know, this correlation doesn't mean that the specific mutation causes the disease. And this is very different. This is actually a different branch of personalized medicine which is preventive medicine. We can know that an individual has increased probability of showing the disease, exhibiting the disease. We don't know how it happens or why it happens, but we have this marker that allows us to do this from mostly. And this is the case that was very, I think last year, was it with Dr. Linda Jolike, she tested for a specific mutation that we know are associated with breast cancer and she went on to prevent the disease, go out in a double mastectomy. - This is actually something that, what you're mentioning about taking a thousand patients and looking for genetic correlations. It's kind of always fascinated me 'cause I know we don't, as you've been saying, perfectly understand genetics yet. So everyone is this really big data set of base pairs. And if you look at a thousand people, let's say you've confirmed they all have some particular disease, it seems likely that you could get a lot of false positives here. So maybe you'll look at that group and because you're looking at the whole genetic code, you're looking at everything. So what might first jump out is that in a coincidence, all of these people have really bushy eyebrows and you discover the gene for bushy eyebrows. Or if it's not bushy eyebrows, it's toenails that grow really fast. - There's so many characteristics here. How can it be that you look at a large data set and find something that's correlated with high likelihood? - Yes, we are very aware of this problem and we do try and correct or approach the problem in a way that minimizes this type one arrow. Typical, the more conservative approach is to do a bonseronic correction to your data. Also very widely used is a permutation approach in which the actual both control and test samples, the genomes are getting randomized and we're trying to see how significant our findings are but based on that. - Interesting. - There are also, and there is a third approach which is really working the best when you have specific genes affected because as I said, only 1 to 2% of a genome a code for proteins is genes which is called the principal component analysis and we've decided to find your familiar, okay. - Yeah, but not necessarily ever listener will be. So if you have a great summary, that might be helpful. - Again, the principal component analysis is a type of a machine learning approach, I think. - Yeah. - I would classify it as that. So what you do is you try to find the grouping that minimizes the differences with type of grouping of the found mutation really makes the most sense. Let's put in a very, very simple terms because I'm not a physician, I don't know. I cannot really go into too much. - Yeah, we could do a whole show on PCA easily. - I'm surprised to hear you guys would apply the bonseronic correction because it's so very strict. It seems like you'd never get results at all. - Oh, but we do. Just increase your sample size, you know. Also, having said about bonseronic correction, this is actually now a little of passion to do it playing as it is. Actually, it's usually combined with the linkage disequilibrium study, which is now more biological. It has to do more with how the genome is organized. So what we know and I'm not going to go into the why and the mechanism, but what we know is that mutations that are close together in the genome, physically close together in the strength of DNA actually tend to coincide more frequently. So we use this disequilibrium approach in which we divide the genome in blocks or related disequilibrium. And on that, we can get much better data out of that. - So there's this movement now, I don't know if it's a movement, but it's becoming less and less expensive to do some personal genomic work. - I think the current price for sequencing your whole genome, I think it just gives the mark of $1,000 per genome. It's some company tricks, I have to say, but it's reality, you can do it. - Oh, that's not bad at all. - Yeah. So what advantage does that give me in personalized medicine? - You know, I'm not a rich person per se, but I have a $1,000 I could certainly spend. What benefit would I derive from having that done? - There are very few diseases. As I said, it's the very early stages of this discovery that we do have some associations of specific mutation with diseases. So you could potentially prevent that disease, no your risk for preventative. You could potentially know your risk for a small group of diseases that we're currently understanding better. The truth, however, is that I don't really think you have much to gain as we speak. Perhaps it's a good investment for the future, if you can spare the money now. - Sure. I don't think that all might be good, but I don't think there's much to gain right now. - So I know the, as you were mentioning, the Human Genome Project wrapped up. And so what I hear in the press releases, oh, we've sequenced the whole human genome, but they didn't sequence mine. So what am I missing there? Who did they sequence or how do I have it mistaken? So the genome project was the goal of the genome project was to actually read the DNA. We had no idea how the DNA is organized in position one, what you are usually expecting to find. So the genome project gave us if you want the control on which we can face all the other findings. Now following the genome project, there is another big collaborative project that has not got so much attention in the main grid media, which is the cut map project, which actually was aiming specifically identified the most commonly occurring mutation. The genome project just gave us a template on that. We start to understand what are the frequency of common mutation, not this is associated because you also want to know that you know the frequency of disease. And now we're entering finally, and slowly this mapping of mutation getting completed, now we start to study more in depth than this is causing mutation and the sample. - Ah, that makes sense. So I have, maybe this is a bit of a conspiracy theory, but it seems like the very wealthy or maybe even just the first world countries can do a lot of sequencing work, which I assume a project like the Human Genome Project, they take some broad sample as best they can and use that for the baseline. But it could be to say that someone's underrepresented, whether that be people who can't afford participation or people who are living in third world countries, does that mean that people of their genetic lineage will be at a disadvantage in the future for medicine? - So the FAPMA project, which I just mentioned, with the aims to identify the common mutation, actually did try and did include different ethnic groups. But what you're saying is absolutely correct. There is an issue we could get sequenced in order to get this baseline on which we can base the future findings about disease frequency, we already know that different ethnic groups have different frequencies in gene, for example, the blood types. The frequency of the different blood types varies between Asian, African and Caucasian. We know that. And this is the same for many others, it's easy. And also something that's not very frequently mentioned is that they're also, what is really understanding is the differences between male and female snips, and there's some discussion about that now, because although we try to study all the different ethnic groups, still as far as I'm aware, they're mainly focusing for men. For a very good scientific reason, there's a very good scientific reason that men are the only people who have the Y chromosome, and we need to study that too. So it has very little genetic material of which the majority is actually the same as in the X chromosome, and we're talking about a handful of genes that make the X unique. So in preparing, I found this article a new scientist I want to pull a quote from. They say that in a 2012 Harris poll of 2760 US patients and physicians found that doctors had recommended personalized genetic tests for only 4% of patients. Is this just a case of slow adoption? Or is it early adopters that can afford a high cost? Or is this a genuine criticism of efficacy? I think it has to do with slow adoption. You can know, I can imagine that the doctor in the little teaching village in Greece, I grew up in, will not be very familiar with the technology. But also, as I said, our knowledge is still very limited. And to all those, even if we are getting better and better in understanding the underlying mutations in the gene, the mutation in the genome of a afflicted people we're not so good at understanding what it means. And there is usually not much we can do. For example, we know right now that there are many different underlying causes for autism, genetic causes for autism. But in the absence of appropriate treatment or during the test, will not really help the patient. Yeah, so all it is is detection at this point. Interesting, yeah. So one of the strengths I've heard of personalized medicine is that it can be predictive and therefore proactive compared to traditional medicine which tends to be reactive. But in personalized medicine, something can be detected and you have a warning to maybe adjust behavior or take some preventative measure. So if someone were to go down that pace or path of personalized medicine opens up that opportunity, how can we be sure of the large universe of possible diseases and problems a person can be affected with? How do we isolate the ones that a person needs genuinely worry about? In other words, if I go to the doctor and the doctor says, we ran your genome, you're susceptible to 100 different ailments. Make these 500 adjustments to your behavior. I'll never be able to do it. But if the doctor says you're at risk for one thing, maybe I can make one adjustment. Yes, all these numbers and all these correlations come with a probability. So I think that's what you should be looking into. What's the probability of this mutation actually in the future? For a person with A or B mutation, actually in the future showing the disease. If the probability is 90%, then you probably do something even if it's two or three mutations of this type. If the probability is very low, perhaps you want to consider your alternative. I think this is definitely a thing for the patient to discuss with the doctor. Again, I will bring the example of Angelina Jolie and the double mastectomy. Some people still, this was the doctor's recommendations. It's got the history of breast cancer. She did the tests. She found the mutated Brata 2 gene, which has a very good probability of actually exhibiting the disease in the future. And still there is controversy of whether she was too drastic or not. Yeah. So these are all ethical dilemmas, moral dilemmas that we should be discussing, but there's no answer really. At the end of the day, what I think is, it's the individual and the individual's quality of life that is in stake here. You cannot force treatment to someone who cannot have it. In the same way, you cannot force preventative medicine to someone who cannot have it. These are not contagious. This disease is different with some transmitted agents that actually can cause a problem for a bigger group for the society, for all of us. Yeah, a typhoid-marry type situation. Yeah. I really think there is a grey area there. And at the end of the day, if the patient considers that the individual's quality of life is going to be so adversely affected by the treatment, perhaps full team or full care will be best course of action or to do it. And as I said, I'm not a doctor. Right, right. It's the decision to be made between the doctor and the patient. Absolutely. So maybe we could jump a little bit and talk about how this concept of big data can help advance personalized medicine. Yes, personalized medicine is big data. Because, as I said, the genome is three billion basis, three double that, because we have one copy of our father and one copy of our mother. And when we actually sequence, we need to sequence in great depth because we have a sampling issue there. You want to make sure that you have covered the whole genome. So usually, we sequence in a cupboard 50 or 100 text. So that already is 100 text six billion. And that's for one individual. That's true. The control group will typically consist of a thousand individuals. And then you have the test group. Do the math. This is all big data. Absolutely, yeah. I often argue with people about what big data means, but I lost the arithmetic there. So certainly we can say genomics fall into this realm. Maybe they're the best example of it. So do you have any, are you at all close to in your work, the tools and technologies that are helping support these things? Yes, that's what I'm doing. Yeah, I'm actually sick. I'm not working with humans. I'm working with a humble beast, which is, however, a great. It's a great model. It's one of the most primitive, one of the most primitive model systems we have for humans, for us. I see you carry out and this is big, but it's also simple enough and allows us to understand some basic fundamental pathways and fundamental functions of the genome. We have talked a lot about the genome sequencing. There are different ways that personalized medicine can be addressed. It's not only sequencing the genome permutation. There is also understanding the epigenome, which is information laid up on top of the genome to modulate or modify how the actual gene is going to be expressed. To give you an example, there is strictly genomics, so I would expect twin siblings from the same outside. How do you call one zegotic? I can't define it, but I do know the term monozygotic, yeah. Yes, you would expect them to have exactly the same diseases and to have a very similar life in terms of physiology and health. However, not the case. We now know that the genetics is influenced by the environment of the site site and modify the information in the genome. So there is another level of information that is utilized in order to understand the causes of disease and in extension develop personalized medicine drugs. And I have worked on that. Another thing is, as I said, there is regulation in every gene. So one gene might be mutated, but if, for example, there is another gene that has a function very, very similar to the gene that is mutated, perhaps it can take over. And the function can actually make amends for the path gene. This is usually understood through RNA sequencing, not DNA sequencing. And RNA is the molecule that is intermediate between the important information in the DNA and the actual protein that is going to have a function in the cell. That's why it's also called the messenger RNA. It takes information from the chromosome syntax to the ribosomes where protein synthesis happens. So we know that the disease is not only that the gene might be affected, but also the expression levels of the gene might be affected. The RNA levels of the gene might be affected. Help me understand that a little more. Like I know when we say gene is expressed, that means how it goes, how it sort of manifests itself in the person's biology. But what will regulate that up or down? As I said, only one to two percent of the genome is genes. The rest is either stuff we don't understand or stuff we understand, but we have no clue what they're doing there, or their regulatory frequency. These are sequences in front of the gene. It's called the promoter. And this is what tells the cell now the gene is going to be expressed, and it's going to be expressed high. Because you have to think that every cell only has two copies of every gene, one from the mother and one from the father. But every single gene has the possibility of making thousands or millions of RNA molecules. How much of the RNA is going to make, or what flavor of the RNA is going to be made, depends on the regulatory frequency. Got you. So would it be fair to say that then the environment kind of influences how the promoters are going to make the genes? Oh, yes. This is absolutely correct. So help me understand that a little bit. Because from a naive perspective, I think of, well, my genes are pretty much fixed at birth. What does it matter if I live in New York or LA? How does, you know, there's more smog here in LA, but the smog doesn't go into my genes, I don't think. What is it environmentally that really has the effect? In biology, by environment, we don't need the environment in sense, the air with breeze or the water with rain. Which also is very important. We now start to be able to approach these effects. But you should think in a very, very microscopic level, environment is what's around the cells. Got it. And of course, what's around the cells is affected by what we eat, what we breathe and what we drink. But this correlation is a little too distant for us to fully comprehend now. All we have found is certain correlations between specific foods and disease and mechanisms. In one of the cases, like polycosids, pregnant women, now all taking polycosids, these we kind of understand what it does and get to do with the genetics. Although we're not 100% sure we can be at thinking many different levels, we understand one of those. But transferring from the big scale, the macrocosm, our environment to the biologically relevant environment is hard right now. Yeah, it makes sense. So to minors, it sounds like we're very close to the pioneer stages in genetics and epigenetics. Is that fair? Yes, absolutely. If you had to, I know this is a big question, but where do you see things being in 10 years, 50 years, 100 years? Oh. 50 years and we thought that the genome project would not be completed. I think that if I'm not mistaken, it started sometime in the 80s. That sounds right. Yeah. And we thought that we would not be able to complete this until the mid 2010 or 2020. That's what they said, the estimate. And that was absolutely true for the technology back then. But the actual project ignited a lot of innovation. And now we ended up having the $1,000 genome. So it's very tough to predict what's going to happen in a long period of time. I think what seems to be happening for the next 10 years is a lot of genome-wide studies, a lot of personal studies, studies aiming to address personalised medicine questions. We also have the metagenomics, which is a sequencing the microbes that live within us and are very useful. We kind of knew that, but we never realised how useful they are. And we never really realised how they can actually call disease. If the balance of that ecosystem inside us is prepared. So that's another thing that feels that now is booming. Of course, I already talked about the genetics. I think there is one genetic drug in the market now. Don't ask me names. Yes, I think these fields are definitely going. We're going to be very surprised to define this in this field. Yeah, so we talked a little earlier about how some of the analyses that's done is just about looking at a large sample population and looking for correlations of people afflicted with some disease and hoping you find some correlation. If and when we end up with a better mechanical understanding of what the genes do, if we could read it more like we read source code or read a book and really understand what it's doing, perhaps one day we'd come to a state where we can build a nice simulation of it and we actually might not need as many clinical trials or we could simulate a clinical trial if we really had a way of a solid model that describes how genes will manifest in a person. Do you think that's two science fiction or is that a possibility? I think it sounds amazing. I'm also a fan of science fiction. I have to say here, we have talked a lot about mutations and what we now discuss with mutation is science is called SNF, single nucleotide polymorphism, which is exactly what you said in the codon that encodes for specific amino acids. One nucleotide went wrong and something wrong happened is the wrong protein. This is the technique of single mutation, but there are more things that can happen. We can have micro-duplication or micro-deletions or transpositions. This position is when one bit of DNA from one region goes to a totally different region with who knows what effects. So the system is much more complicated that really systematically permeating every single base in every possible combination. For that, it's a little hard to predict how soon we will reach the level that we will be able to really simulate and not need to go through the science. That makes sense. One other point I wanted to ask. Since I suspect more people on the technology, big data, statistics, computer science and will listen to my program than people on the genetics end. These are the people that are often building the tools that will enable the big data type work that is going on in personalized medicine and other areas of biology. I was wondering if you could say some of the open challenges or problems you're having that some of perhaps those more advanced listeners might one day be able to help solve. If any, maybe that's too open of a question. One big challenge right now is the region of the genome that does not encode information, which is the vast majority. And there is a lot of effort both from the biological dance and from the bioinformatics side to try and understand what happens. People typically look for recurring motifs. A sequence which is a block of nucleotides that comes with certain probability in its position. And this is not very well understood. I mean in the bioinformatics, it would be great if someone could come and scan the genome and identify recurring motifs which would allow us then to go in and look in more detail about functional correlations, so functional similarities or even regulatory similarities of these regions. But unfortunately, these tools are really not foolproof right now. And we do need to go in the bench and do it physically and try to fish out specific regions that bind the specific proteins that we're interested in. So yes, I guess one big, as I said, the regulatory regions are influenced by the environment. The MEDDA sort of this influence is another protein. It's a little bit like the chicken and the egg paradox. So the genome makes proteins and then the proteins regulate the genome. And in all of that, you have the environment kind of as an umbrella affecting what really happens. So we really don't quite understand how the protein with the specific shape. We go and prepare to bind specific promoter regions, specific regulatory regions that will affect this set of genes and not the other. Also, we have all these proteins that also help proteins and get up to each other. This is really a mathematical problem of how the different shapes with the different charges and amino acids and with different properties will manifest with their activity. If we manage to figure out exactly what happens there, and I know people are working a lot with simulations to do that, but it's not perfect. So you could really understand how this interaction happens and how with small molecules, you can target specific interactions. This small molecule is a great candidate for drug in the future. And then you can imagine you have two proteins that are normally partners, that are acting, that are found together, they are encoded by two different genes. Now in disease, one of these two genes might have a single mutation that will change one single amino acid in the interface. The proteins will still come as a diamond, we still bind to each other, but it will be a sick diamond, something that is going to cause a systemic problem. If we manage to predict exactly what is that that is going to disrupt this bad interaction, make it good again, that would be great, great drug, a positive drug. Yeah, the step before that, because I think that's a separate thing, but were you talking about essentially what they're trying to solve with the folding at home project? Oh yeah, this is a citizen scientist project, yes, exactly that, yes. Yeah, I don't fully get the problem yet, but I was talking to another biologist, a chemist actually. And as close as my understanding I could get was that we understand the various states of protein can be in, but we don't well understand the transitional model from how it gets from where it is to where it's going. Is that an accurate, if naive way to put it? Yes, yes, I would. They wait, I'm a biologist, I'm not the chemist, so perspective perhaps might be a little different, and since we're talking about genes and genetics, so DNA with the codums, with its codum, and coding for a single amino acid part of the protein, it's a single string. And everything is linear, so by correspondence, the amino acids that make up the protein will, one can imagine, they are also linear, but this is not the case. The protein, the different, the 20 different amino acids combine different ways for its protein, and all these proteins can call it by the genome. What they really do is, because they have different chemical proteins, they don't stay in this random relaxed linear state, but they can't fall. Sometimes on their own, it's spontaneous, sometimes they need other proteins to help them fall in the same shape, in the same product conformation. And this is what makes the protein functional. Now having said that, proteins are not static, even in the form, okay, so one cause for disease could be a misfolding of the protein, and these misfolding could be because of mutation, or because of some other environmental or other factors. Now having said that, most of the proteins, most of the enzyme in an organism, don't have only one conformation, but the transition between conformation A and conformation B, and this transition is what makes them active. For example, an enzyme in protein specializes in a destroyed DNA, imagine there's a pigment here that unless the mouth can open and close, it can never destroy that bit of DNA, which would be a virus, or the bit of RNA would be a virus. So, yeah, the system is complex, even from the linear DNA information and all the permutations that can happen and be healthy, and then what type of permutations in the genome that are not healthy, is going to how this affects the protein, which at the end, the protein is what the cell cares for. Very, very few diseases are only confined in the DNA, three of the proteins that are affected. As I said, there are only 20 amino acids encoded by 64 codals. The nature is wise like that to allow for a few mistakes because they can be too costly. To wrap up the shows, I like to ask my guest to give two references or links of some kind. The first I asked for is what I call a benevolent reference, which is something that's of no particular benefit to you, but highlights something you find valuable or interesting, and for your second to something completely self-serving. Yeah, sure. So, I'm actually very excited about the work done at the Simon's Foundation here in New York. These people are really peerheading for quite some time now the personalized medicine effort. And there is a, especially, there is a lot of interesting research that is built through about autism and how actually autism is a genetic disease that we're trying now to identify. So, my first thing would be that to the Simon's Foundation, which is www Simon's Foundation.org, and it has great, they have great writers there. They explain things very clearly or more technically if you're into that sort of thing and I think it's a great resource to understand better where we're at right now and where we're going. As for the self-serving link, I'm actually, I'm writing, I'm writing about this thing to discuss. Now, it would be more accurate. I'm blogging about the technology that allows personalized medicine. To be a reality, it's called the technology's next generation sequencing and you will find my blog is hosted by Biteside Bio. So the address would be Biteside Bio.com forward slash profile forward slash Nikki and I K. I dash after my Seattle. So, I would spell that for you. Please do. And I will also put both links in the show notes. So, anyone who wants a quick link can go to datasciptic.com and find it there. Great. Well, this has been wonderful. Thank you so much for doing this. I really enjoyed our conversation. I think my listeners will too. Thank you so much for having me. It was a pleasure. Yeah, same here. [Music]