Purchased a FORA.tv video on another website? Login here with the temporary account credentials included in your receipt.
Purchased a FORA.tv video on another website? Login here with the temporary account credentials included in your receipt.
To download this program become a Front Row member. JOIN NOW >>
Sign up today to receive our weekly newsletter and special announcements.
Thank you very much Monique for that very good kind introduction. So hopefully everything is working here. Thank you to all of the organizers of Picnic; it's really an honor to be invited. I would like to show a couple of things off today and hopefully we will have a bit of time at the end also to talk about some some of the bigger story that I think this is really a part of. We will start with with Seadragon which is the core technology that that we developed and brought to Microsoft when we were acquired about a year and a half ago. This is the Seadragon engine. It's a client and server based engine, meaning that this is these are operations that you could do either on on a local machine or with information that's that's stored remotely somewhere else on the web. What we are looking at right now is a collection of some hundreds of mostly ordinary eight mega pixel digital photos, and as you can see we are able to sort of navigate through all of these in a fairly smooth way, not quite as smooth as I would have hoped. Anyway that's when happens when you demo live code. Some of these objects that we are looking are not are not ordinary digital images but they are very large ones, like this one which is firm the Library of Congress Maps Collection. This is a 300 mega pixel image. So its very high resolution, as you can see by zooming it on some detail over here. The interesting thing to realize and this is something that's been that's been known in the graphics community for a long time and is very well understood, so we haven't we haven't invented anything ground breaking in the academic sense here. But this is really all about taking the idea of multi resolution and generalizing it to all sorts of content and generalizing it to the web. When you are looking at a very high resolution image, but you are looking at it from away or at low resolution like this, the amount of information that you need to get to your computers is very limited. And in the same sense if we if we dive in somewhere and we are looking very close up, yes this is very high resolution but we only need this one part of it. We don't need the entire image. If we were to take the entire image of this resolution and try to download it to the computer, it would take a very long time, but we can't see all of it, so that will be very wasteful. The big idea here is just that instead of the conventional paradigm in which you have an application that opens a document, reads it all into the memory and then if its too big to show on the screen at once you have some scrollbars, and other affordances that allow you to move to different parts of it. Instead there is a dialogue happening continuously between the source of information and the viewer, and the viewer is telling the source what it needs in a kind of dialogue based way, instead of all at once. So a very simple idea but it means that that any arbitrary kinds of objects regardless of complexity or size can be interacted with over a very modest bandwidth connection. This is "Bleak House" by Dickens. Every column is a chapter, and this is just an example of some other kind of content, its not necessarily image content or over here we have this this is just for a geek value I suppose, but this is fractal. And so here of course the amount of the amount of information is in fact close to zero, but this is all being computationally generated as we zoom in. So it's a very general kind of framework for interacting with large visual objects, whether they are texts or images or calculations as this one is; it doesn't doesn't really matter, as long as it has that property. And I assume that I have my email client open, which might have some thing to do it might have something to do with my performance. Now there a number of there are number of applications for this kind of thing and I I think I am not going to go this time into into a lot of this particulars of different kinds of applications. Many of them are quite are quite intuitive and obvious. This one is a newspaper that we have we have laid out every section as a large version of the front page and smaller images of the subsequent pages and we have also added some some multi resolution advertising in here. It's a fake a fake ad that sort of lets you look more and more closely, all way down to the technical specifications of this of this car which I think doesn't correspond with the car that's in the image but at any case. This is a little prototype that we made sometime about two years ago, of a mapping application in Seadragon. And of course this is probably the most familiar kind of example that that people have seen of zooming interfaces. We have being seeing zoomable maps on the web for for sometime. Seadragon has an approach to doing this that's a little bit more visually fluid than the sort of thing you can normally do on the web browser and that's largely because its its not just web browser code, its not its not so JavaScript or AJAX or whatever its real code. And as such it's able to take advantage of some of the things that your computer has, that the web browser isn't traditionally able to take advantage of, like the hardware graphics acceleration. And this is something that we really have have the gaming industry to be thankful to, and it's at the moment something that outside of games is really not used all that much; the inherent graphics acceleration of modern computers. One of the interesting things about this mapping application, by the way this is this is not so obvious when you first look. It looks like we are zooming in and out of a physical image. But notice that when we are down here, at about 12 meters per pixel or below, things are more or less at the correct physical sizes. But every street remains visible as we zoom out. And this isn't something that conventional mapping software does. You know normally you have discrete levels of zooming and as you zoom out different streets will start disappearing. First the small streets disappear, then the bigger ones and the bigger ones. And here we keep we actually keep all streets visible at every scale, which is something that relies on some interesting sort of mathematical tricks, which we won't get into. All I have being doing is rearranging this series of this series of tiles, this series of objects, and just to further show off how this is using the graphics hardware, I am doing something a little bit frivolous now. I am putting them on the on the surface of a sphere just randomly randomly placed on the surface of a sphere. The interesting thing about this is just that it shows you that arbitrary 3D placement of those objects is possible in the system, when you still got all of this all of this multi resolution capability. So this suggests some interesting sorts of applications that are that are not just regular flat presentations of content, like you generally see. But this is something that we really didn't have any application for when we came to Microsoft, it was just detour demo; this 3D aspect. And that was something that changed within within really a couple of weeks I think of of joining Microsoft, and this is a very pleasant surprise, because you know getting acquired for a small company to get acquired by a big company is I mean in many ways a very nerve-wracking sort of experience. I am sure some of you have had it. And that the delightful surprise for me was to find that that there is a large and very active and very creative community of researchers in Microsoft, Microsoft Research, and they have all sorts of wonderful toys and things that they they have been developing for many years. Microsoft Research is now 15 years old, and especially in the field of graphics which is where a lot of Seadragon's activity was. It's it's sort of like that the Bell Labs of the field. It's clearly quite remarkable. In in the last Siggraph Conference which is the big graphics conference that happens in August. I think something like a third of the papers out of Microsoft Research co-author, which is really a staggering number. So anyway one of the projects and I think one of the most exciting projects that that I saw early on at Microsoft was a collaboration that had been done between Microsoft Research and University of Washington, which is very close by, called Photo Tourism. It was the work of Noah Snavely, a graduate student, was co advised by somebody at MSR and the University of Washington, Steve Seitz and Rich Szeliski, in the other order and what Photo Tourism was all about doing is using computer vision techniques to infer the relationships among photos and do some 3-dimensional reconstruction of the spaces that those photos were taken in. This is something that is very difficult to understand in the abstract. So I will pull up a web browser. And this is a so this is now a live demo, which is a very dangerous thing to do the life of the web from the stage, but we will try. This is something that we had up now for sometime, so you can try it at home if you have a reasonable machine that your children like to play video games on the sort has the good kind of graphic acceleration. This is a set of photos that that one of our that one of our guys took in St. Mark Square in Venice; and as you can see when I put it up here in two dimensions, it has a lot of the same properties that Seadragon does, when you look at local content. We are looking at at many megabytes, I think a couple of 100 megabytes at least of images and we are doing it all on the web. So this is I am trying to just you know this is all remote content of course. But the relationships among all of these photos have been inferred or figured out from the photographs alone. And that's what lets us to this sort of thing. What you see in the background, this this cloud of points, is a cloud of features that have been discovered in common among this set of photographs. Whenever some feature is discovered to match in two or more photos that feature can be you can solve for the position of that feature in three dimensional space, and this is by a process very similar to what what we use to to do depth perception using using the two images of the world that we have from our two eyes. Of course the problem that Photosynths solve is it's a little bit more or Photo Tourism I should say the problem that it solves is a little bit more complex, because in our case we have two eyes and we know exactly where they are and we don't know the three dimensional structure of the world. In case of in the case of Photo Tourism we have many cameras and we don't know anything about where they are in the space. And so part of the problem is to solve for where the cameras are, and in fact if I turn on cameras and do an orbit, then you can see all of these all of these little diamonds show where the people were standing, who took all of these photos. And I can click on anyone of these and dive down to that camera and see what that photo was. I will show you one other demonstration of this that I think shows off some of the Seadragon side of the roots of this product as well. It's one that I think hasn't been hasn't been looked at as much as that St. Mark's Square example. And it's dear to my heart since I I am very fond of this artist Gary Fagan. We have a couple of his pieces and his studio is very close to to the loft where our company started up. Gary Fagan is the is a local art commentator on the National Public Radio in Seattle, and an artist rather good one. And here he is in his studio. What we what we did with this with this shoot is that we took a bunch of photos of him in his studio, and so these of course registered together into a three dimensional version of of the space. But we also added into the mix some very high resolution scans of the artwork. And the high resolution scans registered together with that space as well. So this is I think an an 80 mega pixel image. And we can dive in until we are looking at the at the individual stitches in the canvas. And you can see that this masking tape is painted in fact it's not real masking tape. So we are integrating together images that are just regular digital camera photos, and for almost all of these artworks we have a high resolution scan as well. And this lets you see the artwork in much more detail than you could normally, even if it were just on a regular webpage as well as see at in context. So this is one of my favorite environments. Alright, now zooming out a little bit as it were, one of the things that I think is is most exciting about Photosynth is not just the idea that we can take staged environments like this, take many photographs ourselves in some space and be able to do a very high fidelity reconstruction of some environment based only on images, although that is exciting and it has interesting applications in itself. It's also the collective aspect of this that that I find really exciting. And this is something that we we are working now on a on a release that we think will really will really bring out this aspect of it, community Photosynth, so that that should be forth coming and not very long. I am going I am going to give you a small demo. I have given this one before and it's not of it's not of our new code. In fact it's of code that's older than the one that we released to the web. And that was it was kind of development version. But its one that has a data set that's that we have shown before, so there is no no difficulty in showing this one. These are photos that that were culled from flickr of Notre Dame Cathedral. And all of those orange diamonds are where all of these people stood with their cameras. The user interface at this point when we were building, it was terrible. You see this this profusion of of white squares. But this does show you exactly where all of the photographs were. And what's so exciting about this of course is that this is many people independently taking photos of Notre Dame Cathedral and tagging them that way on this photo sharing site, and what we are able to build up from that set of photos is a three dimensional reconstruction of the Cathedral as well as metadata about all of those photographers that that really goes far beyond just the GPS coordinates. It's not only where they were standing, but which way they were aiming and what their focal length was and everything else. So if we click on one of these we we go down to that photo, and some of these photos are really quite detailed. So there are people standing down here in the archway and taking taking close-up shots. We have photos all the way from all the way from the tiny details in the arches like this to let's see how far out we go. I think all the way across the river. If you start to think about this about this kind of of collective photography, you begin to realize that this is I think one of the first really credible approaches that that I have seen at least to bringing the idea of the spatial or the geospatial web to life or cyberspace, that's what I always thought of as cyber space when I was when I was reading science fiction as a kid. It's a term that I think has come to mean just the web. But that's that's really a kind of watering down of the original sense of the word. We originally imagined that that's when we grew up, we were going to be living in a world that was physical and was also virtual, and where the two have all sorts of interesting interactions with each other. One of the one of the interesting surprises for me when when the web was invented and as we saw things evolve in the 90's and in the early 2000's was just how much mileage you can get out of this seemingly very simple idea of just being able to share text documents in effect and be able to interact collectively with those text documents. And we have seen Web 2.0 kinds of ideas really create some rather wonderful effect with those kinds of of collective documents that we can share, that we can have computers generate and so on. But we we haven't really seen the visual aspect of this or the spatial aspect of it reach those kinds of or even approach those kinds of visions that that a lot of us had back in the 80's. And I think this has the promise to get us part of the way there, and of course it's working it's working one end of the problem. We have been seeing the other end of the problem getting worked for sometime now, the kind of top down geospatial architectures that Virtual Earth, that Microsoft and several other projects are are attempting to make, in virtual worlds using using large scale photography, using satellite imagery and aerial imagery, and to some degrees street level imagery as well. These are things that those approaches are extremely important in that you want to able to have some basic lattice, some structure that covers entire cities. For example you want to be able to understand on a computer what the structure of an entire city is. And there are not all that many cities on earth where there are enough photographs as yet taken by regular people to to do anything like that sort of that sort of reconstruction that you can get for the top down. So that lattice is very important. On the other hand the top down approach only goes so far. It can't it can't go inside all of buildings, it can't it can't scale up to the shear volume of visual information that we find out there in the world. It's not organic. It's a trellis. And and this is an approach that that does promise the scale, to become organic and to go into the interiors of everything and to be able to really knit together public spaces and canonical views of the world with personal spaces and personal views of the world, and really combine the two in a very interesting way. So that's that's I think really the greatest promise of Photosynth, in this combination of top down and bottom up together, making for a very, very interesting collective effect, a sort of Web 2.0 of images and environments. Well now if we still have we still have about almost almost 10 minutes and Monique promised me some questions, and may be this would be a good moment to to be a little bit more interactive. I could I could show a couple of other things like a commercial environment I think is very interesting. May be that would be a good place to start.