Archive FM

Data Skeptic

Preserving History at Cyark

Duration:
23m
Broadcast on:
05 Jun 2015
Audio Format:
other

Elizabeth Lee from CyArk joins us in this episode to share stories of the work done capturing important historical sites digitally. CyArk is a non-profit focused on using technology to preserve the world's important historic and cultural locations digitally. CyArk's founder Ben Kacyra, a pioneer in 3D capture technology, and his wife, founded CyArk after seeing the need to preserve important artifacts and locations digitally before they are lost to natural disasters, human destruction, or the passage of time. We discuss their technology, data, and site selection including the upcoming themes of locations and the CyArk 500.

Elizabeth puts out the call to all listeners to share their opinions on what important sites should be included in The Cyark 500 Challenge - an effort to digitally preserve 500 of the most culturally important heritage sites within the next five years. You can Nominate a site by submitting a short form at CyArk.org

Visit http://www.cyark.org/projects/ to view an immersive, interactive experience of many of the sites preserved.

(upbeat music) - The Data Skeptic Podcast is a weekly show featuring conversations about skepticism, critical thinking, and data science. - Welcome to another episode of the Data Skeptic Podcast. I'm joined this week by Elizabeth Lee from Sciark. How you doing, Elizabeth? - I'm doing great, thanks Kyle. - Thanks so much for joining me. I thought maybe we could start with a quick overview of your background, your academic interests, and how you started getting involved with Sciark and its mission. - Sure, so my original background is in archeology, specifically Neolithic archeology. Looking at that transition when people went from hunter gatherers to actually settling into small settlements or even cities. But during that time I got very interested in how to best record and share information from these archaeological excavations, and that eventually led me to Sciark, whose mission is to use new 3D capture technologies to accurately and rapidly record our built environment, our historic built environment. - Awesome, so tell me a little bit more about Sciark, such as when the organization was founded and how many people are involved today. - We were originally founded by a gentleman named Ben Kasira. Ben is credited actually, he's often referred to as the father of 3D laser scanning. So Ben was originally from Iraq immigrated to the United States to pursue a degree in engineering, was a civil structural engineer. And during that time, actually, he did a lot of work on nuclear power plants. Being able to record the plants very accurately and quickly was a huge challenge. He had to send people in in the monkey suits, the decimeters, watching the radiation, going in with sort of tape measures and notebooks to get an accurate record of these complex structures. So he thought, boy, there's gotta be a better way to do this. And so in the early 90s, he developed this 3D scanning technology and integrated hardware and software to be able to very accurately and quickly record industry. And so that company was acquired in 2001 by a company called Leica Geosystems. That same year was when these massive 1600 year old Buddha structures in Afghanistan were destroyed, they were gone in an instant. And Ben thought, boy, we have this incredible technology that allows us to record the built environment. Somebody should be doing this for our historic sites. And so him and his wife, Barbara, went on to start psi arc a couple years later with the mission of using 3D reality capture technology to record and then also archive and make available cultural heritage sites or historic monuments and sites around the world. - That's very awesome. The one that I caught onto that first got me introduced to the whole project was the Mount Rushmore scan you guys did. But I see there's a huge wealth of other sites as well. Could you give me maybe an overview of some of the highlights of the places you guys have scanned? - Yeah, so we really are global in nature. So we've done about 150 projects on seven continents. Things like Mount Rushmore, Pompeii and Italy, Chichen Itza in Mexico, the Tower of London. And then even one of Shackleton's exploration huts in Antarctica. - Oh, very cool. Where would be the best place to go if people wanted to read a lot more and see a full list, just to the website? - Yep, so go to the website. It's just sciark.org, C-Y-A-R-K, like cyber arc. - And if you wouldn't mind, tell me a little bit about some of the technology you guys use to do the scanning and also what's involved. Like how big of a team has to be on site? - So we use a variety of technologies. I mentioned that we originally got started around this 3D laser scanning technology, which actually light our base technology, the light detection and ranging. So the way that works is it sends out this pulse beam of light and it measures the time it takes for the beam to hit the object and return. And the machines nowadays do that 100,000 times a second. And what it gathers is this individual cloud of points. And we call that a point cloud. So you have millions and millions of individual sort of coordinate data, XYZs. And then it also captures a capture color and intensity in terms of how intense that beam is reflected back. So that's one of the primary technologies we use. We use other sort of general survey technology, things like GPS and total stations. And then we're doing a lot as well with a technique called photogrammetry, which is driving measurable information from photographs. So you take lots of different photographs at different angles, have common points. And you're able to get three dimensional information from that, which is quite exciting. - Yeah, very cool. I plan to ask you about some of the textures I see in these 3D models. This being an audio podcast, people can't really appreciate the visual aspect, but I hope everyone goes online to your website and checks out some of the renderings and images. 'Cause they're not just wire frames, you're actually capturing a lot of really detailed texture. And I you'd mentioned earlier that that was some of the intensity of color and light reflecting from the laser scan. But I noticed on one site in particular, I think I'm probably gonna butcher the name. It's Tatu Vini, it's a Native American site. There's actual artwork on some of the rocks. So I'm curious, is that captured from the laser scan or is that part of a later texturing phase? - Both, it's both. So we, the intensity is captured as part of the laser data. So the machine records sort of that strength of the return of each individual point. And then it can give it a false color based on that. So you can distinguish different colors on the monument or the stone. And that's based on a variety of factors. So the color or even the material type or things like moisture on the surface will all give different intensity readings. But then we also do photography at the same time. So some of the scanners that we use have really great cameras integrated that can do high dynamic range imaging. Other times we use an external rig to capture that information. And then we go about matching up the photographic texture to the metric information from the scans. - You talked a little bit about the rate at which the machine samples. Can you tell me about sort of the frequency in space? Like how many, maybe how many measurements per centimeter you guys do? - The resolution that we capture. So like I said, we use a number of different technologies. A lot of our work, the bulk of our work usually tends to be on these architectural typesites. And for that, we like to capture a resolution of about two to three millimeters. So capturing a point every two to three millimeters. And then those points can be joined together to actually form a solid surface by way of creating what's called a triangulated mesh. And so that gives you very, very good resolution. And particularly when you combine it with the textures from photographs, you can have that really detailed sort of surface model. But then when it comes to, you know, maybe other details on the site, carvings or artifacts, we go down to about 10 microns or even less than that with some of the scanners. But those are more handheld scanners that are designed to capture detail. - And are you guys kind of satisfied with the forefront? I mean, that sounds awfully small in scale of measurements to me. Are there advancements in technology you're hoping for or do you have the measurement tool that you'd like to have? - Oh, it's always evolving. I think that's one of the fun things about being in a technical field is that the technology continues to change and improve. I mentioned photogrammetry. We've seen huge advances in that in the last few years. You know, the cameras, you know, just even a DLSR allow you to capture unbelievable resolution. And so the software has gotten to a point that you can really kind of maximize some of that photographic capability. We've been really excited to just see this evolve. And I think that we're going to continue to see advances, whether it's in something like the resolution or if it's in something like the speed or even the cost to do this type of work. - So with some of these really impressive levels of resolution, I imagine you must have challenges around how you capture and store the data, can you talk a little bit about the size of a capture and how you keep that all in the database or where it's stored? - Yeah, so that's a huge part of our mission and a big challenge for the field in general. You know, not just sort of a heritage or preservation fields, but really anyone that's dealing with big data. For us, a site, you know, if we spend about a week in the site with a couple of teams operating, we come back with a few hundred gigabytes of data and then that needs to be further developed and processed, like I mentioned, the texturing and the meshes. So, you know, we have projects that range from say 500 gigabytes up to terabyte and a half for a single project. So we are dealing with big amounts of information and we have a couple of ways that we manage that, you know, in terms of the day-to-day, we've got different systems that we use, but then for us, our mandate is to make this information not only available today, but available to future generations. And so we've spent a long time looking at different ways that this information can be archived. And we actually have a great partnership with a company called Iron Mountain, who's a data management records management kind of company, and they helped us develop a solution to be able to take our data here, back it up, and then also store it in their sort of premier facility in Pennsylvania in an old mine. - Can you tell me a little bit about how your data might be available? Like, is there a prep to creative commons license or an academic use? If someone wanted to kind of do some research related to it? - Yeah, so the way our licensing works is that we, all of the data is actually controlled on a site-by-site basis. So we work very closely with the governments and the research teams that we work with to determine what information can be shared publicly. In most cases, we're able to share a lot of the derivatives or lower resolution models just via our website, oftentimes under a creative commons license or just under a non-commercial use. And then what we do, in some of the cases, we have permissions from the site to actually share it a bit more broadly with researchers, again, for non-commercial purposes. And in that case, if people write into us, we have different agreements in place that actually allow us to share that data back for research. - So I imagine you guys have a long list of sites and artifacts you would love to capture. And of course, there must be prioritization and stuff. So how do you decide on which artifacts you want to preserve? - Well, so we launched an initiative called the SIRQ 500 just about a year and a half ago. And the idea behind the 500 was that we really wanted to accelerate our efforts, that we, like I said, we have about 150 sites. When we launched it, I think we had just over 100 sites in the archive. And so we wanted to create a way for us to accelerate our efforts and put a real challenge out there to the people that are using this technology to get involved with us. So through the 500, we've actually created a nomination process where people can nominate sites that they'd like to see included. And then what we've done is we've convened this International Council of Advisors that has helped us develop criteria for how those sites are reviewed and then also help us review those applications and determine which sites are eligible for inclusion in the 500. So we look at, in terms of the criteria, we look at the risks facing those sites because our whole mission is to capture and have a record before these sites are lost. So we look at some of the risks facing sites, both natural factors, things like earthquake. Your listeners may be aware of the recent earthquake in Nepal that really has taken a big toll there, not only in terms of the community and the loss of life, but also of their culture. So we look at things like that. And then we also look at some of the human-caused factors. We've also, unfortunately, seen a huge uptick in that in the last few months with some of the activities in Northern Iraq, the intentional destruction of cultural heritage. And so we look at those factors as well or the potential risks in the future. And then we also look at how this type of technology can really benefit the sites, whether it's ongoing sort of preservation challenges or things like education and virtual access. - As you were mentioning, physical things do degrade. And I would imagine your captures are essentially a snapshot in time, but do you ever see issues where during the time you're on a site that it could actually be changing while you're doing your capture or the process of change much slower than that? - It's usually much slower than that. We have had instances where sites that we've captured, stuff has happened afterwards. And then we've been able to share that data to help in the restoration efforts. But it really is that point in time record. So we know when the data was captured and we also, with traditional survey methods, record other information about the site in terms of not only when, but things like weather factors, if that's gonna have an impact on, you know, even the sort of expansion and contraction of the building. - Are there any sites in particular that have changed significantly since the time you scan them? - We had a case in Uganda where just about a year after we had archived a site called the Royal Ugandan Tunes that the site was lost in a fire. And it was this beautiful, you know, thatch structure. And so it just went up. We were able to share that data back with our contacts on site. And they're in the process of rebuilding now. - Oh, wow. I mean, really unfortunate story, but at the same time, like a bittersweet success that you guys had that still available. - Yeah, no, we never liked to see any of these sites lost, but the hope is that through our archive that we do have a good record of them that can be used in the future. - So I'm curious about how the SCIARC thinks about permanence. Knowing that you have the support of Iron Martin, I would say it's almost certain that historians and let's say 300 years will have access to your models. But do you think about 3000 or, you know, something crazy, like 30,000 years in the future? - Yeah, those are big questions. I think that one of the big challenges with digital data is the need for migration. So we've got a great partner in storage and we're storing them on LTO tapes that have a longer shelf life than just sort of a standard hard drive. But that doesn't remove that need to go in and actually refresh that data and to migrate it often to newer formats. And so I don't think we've solved the 30,000 year question, but it is something that, you know, we're working on and sort of thinking about all the time. - I'm also curious if there's any interest in some of the 3D kind of horizons we're seeing now, things like either, you know, like the Oculus Rift going in and doing some immersive stuff or perhaps like 3D printing scale models of site. - Yeah, so we've actually done both of those. One of the neat things and this sort of again speaks to the evolution of the technology is that we can revisit sites that we collected, you know, 10 years ago and expose it to new, either new ways of viewing it in the case of something like the Oculus Rift or new ways of experiencing it by doing something like a 3D print. We've also even been able to go back and sort of re-render older sites because the computing power has improved and some of the rendering technologies. So it's really fun as, you know, as we sort of grow as an organization to be able to go back into the archive and say, "Oh, you know, this dataset, boy, we processed that 10 years ago "and we have a better method now. "Let's go back and see what we can get out of it now." - And can you tell me a little bit about the team it takes to support the project, like how many of there are you and what's the breakdown and skill sets? - We've got a small core with sort of a big network of partners. We rely quite heavily on people all over the world that have been trained in this capture technology to help us accomplish the work. So SIAC really kind of acts as the core repository and kind of a program manager if you want to think about it that way in terms of making sure this work gets done. So we've got about a dozen staff and it's really a mix between people that are on the side of working with the data and managing teams in the fields. We've got people with architecture backgrounds or computer science backgrounds, some archaeologists and sort of museum studies people that work with the teams that are capturing the data and then also work with how we present that on the website and then managing and migrating the archive. And then the other part of the organization really focuses on some of the advocacy work, some of the fundraising to support the mission and also the government relations and partnerships that actually allow us to get access to these sites and to make all of this possible. - Yeah, along those lines, is there anything off limits or that you guys have had a hard time accessing that you'd love to capture? - Yeah, you know, we just haven't gotten everywhere yet, I would say. Part of the hope with the 500 is that we can create some interest in people suggesting sites, whether it's governments or enthusiasts that are able to get access to these places. It's just a challenge to get everywhere. It's a very big world. There are thousands of sites I think that we could do. So part of it is just getting to know some of the places that are out there and then also being able to make our organization known and sort of the benefits of doing this type of work for the sites. - Do you have a particular favorite site that you've seen captured? - Oh boy, that's like asking to pick a favorite kit. - I suppose. - I love the diversity in the sites that we have. In my time here, I've been able to work on, you know, really interesting sort of Mesoamerican sites like Chichen Itza or Sochi Kalko with these beautiful glyphs all the way from, you know, something like that to, you know, some of the monumental architecture in China at the Eastern Qing tombs and then even, you know, items within the British Museum. And so I think that the diversity is quite exciting in terms of what's possible. - Yeah, absolutely. And are there any opportunities for volunteers to contribute to the project? - We do take advantage of volunteers when we can. So we have, we often have people that in turn with us during summers, you know, students are headquarters in Oakland, California. And then we also have an office in Edinburgh, Scotland. So we draw on some of the university students. And then we also have a core of people that will help us translate things. And we're looking into whether we can do more of kind of the remote type work, whether, you know, we could take advantage of some of the people that love just doing 3D modeling and share with them some of our data and see if they can create content that we could share online. - And if someone was, you know, perhaps had the right background skill set in time, how would they get in touch to explore an opportunity? - I think we have a section on the website, but the best way is to probably just email our info@syarc.org with what they're interested in and we can kind of connect them in from there. - Are there any upcoming sites you'd like to maybe preview or announce interesting projects down in the near future? - We've got a couple of exciting initiatives that we've been working on. In the 500, we've started to look at projects in terms of themes. So we've got a couple of themes that are really interesting right now. We've got a team in the field this week actually at an industrial heritage site in Japan. So looking at some of the sites that maybe people wouldn't normally think of as historic, but are very important to the, you know, sort of the modern world that we live in, you know, these big cranes and some of the sites that were significant as part of the industrial revolution in various countries and they're at very high risk because they're not valued in the same way as a, you know, beautiful neoclassical building. We're kicking that theme off and we've got a team in Japan at a hydroelectric plant, which will be available soon. And then we've also got a theme around looking at sites involved in the transatlantic slave trade, you know, looking at sites in West Africa as well as sites in the Americas and kind of connecting that narrative. And so that's quite an interesting one. And then hopefully we'll have some stuff happening shortly in Greece as well on a theme looking at some of the significant sites and monuments there. - Oh, very neat. So I'd like to wind up every show by asking my guests for two recommendations. That can be anything, you know, a book, a film, a software package. The first being the benevolent recommendation, something you think the listeners would appreciate knowing about, but you're not directly affiliated with. And the second, the self-serving recommendation, something that ideally you get direct benefit from by appearing here. - Okay, for the benevolent one, I would say that there are, for people that are interested in, you know, sort of 3D, either capture or manipulation, there's a lot of great software packages out there that are free or free for educational users. A couple that come to mind are, you know, if you like design and sort of building stuff, there's Trimble Sketchup, which is a really fun program where you can build your own 3D models very quickly. And then Autodesk also has a whole suite of things around this sort of reality capture, reality computing, including apps, so they've got one called 123D Catch, and you can use your phone to actually capture photos that will create a 3D model. So those are really fun to play with, and I would say anyone that's interested in this stuff should just look into some of those, because you can start to get a sense of how some of this stuff works and they're a kick to play around with. And then for, I guess, the self-serving one is that we'd love to hear from your listeners on sites that they might like to see included in the 500, or if people are interested in volunteering or getting involved or donating, please visit us online. We'd love to have your support. - Excellent. So we talked a little bit earlier about the ways some people could volunteer. Knowing my audience, is there anything specifically a data scientist might be able to contribute? - We have a lot of data. So I think that we would be really interested to just see, I could see a whole thesis, actually just looking at the different types of data within our archive and even making comparisons, across sites of looking at some very technical things from the reflectivity or the intensity, across stones in different environments, all the way from just analyzing how people are, the people that come to our website are interacting with things and what we might try to provide more of in the future. - Yeah, I think there's some great projects there, especially knowing that Europe at UC Berkeley are affiliated with the school, are you not? - Not anymore, but we've got sort of an ongoing relationship there because they're down the street from us. - Awesome, and one more time, give me the website and the email address, just so people have it. - Sure, so it's cyark.org, C-Y-A-R-K dot O-R-G, and you can get in touch at info@cyark.org. - Wonderful, well, thank you so much, Elizabeth, this has been really interesting, I really appreciate you taking the time to come on the show. - Great, thanks so much, it's been fun and always fun to talk with somebody on that technical side of things too. (upbeat music) (upbeat music) [BLANK_AUDIO]