The History of 3D User Interfaces

I’ve been reading a fair amount about 3D user interfaces (aka “Spatial Input” aka “Natural Interfaces”) lately. Recent gaming technologies such as the Nintendo Wii, Playstation Move and Microsoft Kinect have brought 3D interfaces into the limelight, sparking plenty of innovation and cool videos, but after reading these papers I’ve actually realized that researchers have been thinking about 3D interfaces for a really long time.

There’s quite a bit of research already out there about 3D interfaces. There are a lot of existing interaction techniques exploring navigation, menu selection, and object manipulation using 3D interfaces. What especially surprised me is that the bulk of this research was done over 15 years ago, when I was barely old enough to ride a bike. The chart above shows the number of publications per year relating to 3D interaction techniques ((Bowman, D., Frohlich, B., Kitamura, Y., & Stuerzlinger, W. (2006). New Directions in 3D User Interfaces, I recommend this and the book 3D User Interfaces for a good overview of existing work). I was very surprised to find that the research has peaked in 1995, and has seemed to quiet down since then.

Judging by the problems plaguing 3D interfaces such as Kinect and Wii (and their limitations), there’s clearly a lot more research to be done in the area of making 3D interfaces usable. For example, most of the papers presenting new 3D interaction techniques in the 90s performed studies that evaluated a single gesture (or small gesture set) to solve one specific task rather than an entire gesture language to execute a collection of tasks. Perhaps now that 3D interfaces are emerging in consumer products, we will see a resurgence of research that focuses on the development of entire gesture languages that enable a rich set of actions, are robust, and easy to learn. I just hope future researchers realize that there’s a huge mountain of work to build upon, we really are standing on the shoulders of giants here.

Drastically Improved Version of Headshot Released!

Photo by Chloe Fan

Headshot is an application that helps you take pictures of yourself by telling you how to move your camera until your head is in the right spot. This was not an easy task. Writing the face detection was the easy part. Much more challenging was to get the user interface correct. How do you tell people how to move their phones? How often do you tell them? How do you teach people to use the app? The first version of headshot was a good start and got people excited, but it still needed a lot of work.

Over the last few months I’ve been polishing headshot, getting feedback from users and trying to develop the right audio prompts and timing to help you get that perfect shot. Here is a summary of all of the improvements for version 3 of headshot:

  • Added multiple person mode Now you can get the perfect shot with two or more people! In multiple person mode Headshot gives instructions by taking the average location of all faces in the image.
  • Auto snapshot when perfect physically snapping a photo can cause blurry images. Now headshot will automatically take a photo when your head is in the right place so all you need to do is smile!
  • Greatly improved audio feedback Headshot now has a bit more personality: It tells you when it can’t see you, and also immediately tells you once your head is in the right place.
  • Fixed bug that caused app to crash on international phones. This bug caused many international users to be unhappy. It should be fixed now!
  • New logo! Thanks to my friend Chloe Fan for making the avatar used in this logo. Update: The new logo hasn’t propogated to the Marketplace yet but it should be up soon.
Excited? Then please update your app, or download headshot by following the link below. If you like it, please review the app, and send all feedback to


The Story of Headshot

I’m releasing an improved version of headshot today (see this post) and in light of this I wanted to tell everybody how Headshot came to be, and what I’ve learned in developing the app.

I came up with the idea for Headshot somehwere in between my cubicle and the microkitchen at Microsoft Research. The idea was to write an application that helped people take pictures of themselves by using face detection to help them adjust their camera until their head was in the right place. I half expected the idea to work, and thought the idea was novel enough to get me into the finals for a Microsoft Intern mobile app contest that summer. Several of my closest friends told me the app was completely useless because of front-facing cameras, but I continued on, spurred on by faith and a desire to complete what I had started. The end result was a decent app and accompanying video (below) which surprisingly enough won me a brief glimpse of the limelight and a trip to Hawaii.

My contest win and a Computational Photography class spurred me to further develop the app from a prototype into a product. I released Headshot (a paid version and an ad-supported free version) on the marketplace in early 2012.

The app tanked. Twice. It got decent reviews in U.S. markets but was doing very poorly internationally, and nobody was buying a full version. It makes about 9 cents a day right now. Not something I’d consider a success given the novelty of the idea. Why is it doing so badly?

Is it because people don’t find it useful? I don’t think so. My friend told me that a necessary but not sufficient condition for having a successful app has to do with how much time to spent perfecting it, getting the polish just right. That’s what I’ve tried to do this third time around: I’ve spent quite a lot of time perfecting the app, getting feedback from lots of different users about what works and what doesn’t, testing the app on different devices, and trying my best to release a perfect product.

I don’t know if this story has a happy ending, we will see how well the app does. I do know that I created something I’m proud of, and that I’ve learned many lessons from the different phases of the app, here they are, in chronological order.

  1. Finish what you start. I never would have won the contest if I had listened to my friends who told me I had a bad idea.
  2. Sometimes offshoots of your project are more rewarding than the project itself. I’m very glad I wrote the face detection library, to go with this project, in some ways I’m more proud of it than headshot itself.
  3. Always make a free version of your app.
  4. It is essential to give your app to many people (at least ten) and see how they use it. Have them use all versions (free, paid, trial). Also, make sure you actually listen to their suggestions.
  5. Spend 3 times more time on something than you think you should. This is at least how long it takes to get an app polished. Though I admit I’ve yet to see whether this actually makes a difference.

That’s my story. I hope you enjoyed it. If you could download and review headshot, I’d really appreciate it:

Face Detection for Windows Phone Gets a Face Lift

It’s been almost four months since I released my windows phone face detection library and I’m surprised by two things:

  1. Microsoft still hasn’t released a face detection library for windows phone
  2. People actually downloaded my library! I’ve gotten almost 270 downloads in 4 months which isn’t stellar but it’s better than I expected.

I’ve released a new version of my face detection library, you can get it at There are two big changes there:

  1. Major fix so that library works in other countries. I have never had to worry about parsing strings in different languages but apparently ignoring this caused my library to totally not work in other countries (this also caused Headshot to not work at all in other countries which is too bad and explains my poor rating there.).
  2. User control for managing camera. I find interfacing with the camera annoying, so I created a user control that you can use to show you a preview of what the camera sees and also lets you programmatically take photos and save them to the camera roll. It’s a useful little gem in the library.

I was really surprised by how long it took me to debug the localization problem. I was even more surprised by how I solved it. I posted a request on twitter for help with my problem, not at all expecting to get a reply and lo and behold, I got a reply! I was pleasantly surprised by the competence of my social network. I guess it helps having both Russian hackers like Vadim, Alexei and Misha, and having amazing researchers like Andy, Ken and Scott all in my social circles. Being able to ask these people anything makes me so much smarter, I always forget how useful just asking people for help is.

So, moral of the story: don’t be afraid to ask questions!

In other news, my blog name is actually relevant now that I do actually climb. Beta is super helpful. I was able to get a v4 that I think would have taken me weeks in just a day because somebody showed me how to do it. Like in climbing, so in coding I guess.


So, What Am I Doing at CMU?

This post is for all my friends and acquiantances that might be wondering what on earth I’m doing at Carnegie Mellon for my PhD. The truth is, I’m doing a variety of things here (I always a variety of side projects to keep me motivated and usually you hear about them in some form or another), however my main research is about probabilistic input. DISCLAIMER: This is NOT a thesis proposal, and there is a chance I might do something entirely different for my PhD thesis, however right now probabilistic input seems like the most likely candidate.

what is probabilistic input?

In a nutshell, the goal of my work is to make user interfaces account for more information when deciding what it is you’re trying to do. I am designing, building and evaluating a new method for modeling and dispatching input that treats user input as uncertain. Modern input systems always assume input is certain, that is it occured exactly as the sensors saw it. When a developer handles an event (say, a mouse event), that event has one x and one y coordinate. This works well for keyboards and mice, but less well for touch, and even more poorly for free-space interactions such as those enabled by the Kinect. After all, your finger is not a point! The stuff I’m working on will allow our input systems to be far more intelligent about interpreting user actions, especially for new input techniques such as touch, voice, and freespace interactions enabled by the kinect. In addition to enabling computers to better understand users, I’m interested in evaluating how we can use this probabilistic approach to design feedback that allows users to better understand how computers are interpreting their actions. For example, what’s the best way for a computer to tell you that it is not sure whether you’re doing a horizontal swipe, or a panning gesture for the kinect? If you think about it, a lot of the interactions you do can be interpreted multiple ways. The challenge of how to communicate this to users to that you understand stuff is ambiguous without being confused or working to hard is a problem I’m trying to solve. Finally, I would like to evaluate how easily developers can adopt this probabilistic approach into real applications, as the ultimate goal of this work is to eventually be adapted into all input handling systems.

what have I done so far?

Most of my work so far has been in designing (and validating through implementation) an architecture for actually dispatching uncertain input. In other words, assume that mouse events now aren’t at a location, but rather have a probability distribution over possible locations. I designed a system that figures out which buttons these new mouse events should go to. This system was published in UIST 2010, you can see the paper here. I then published a refinement of this system (with a few extra bits) that made it much easier for developers to write user controls (buttons, sliders, etc.) for my system. This was published in UIST 2011, you can see the paper here.

what is left?

Right now I’m working on designing better feedback techniques when input is uncertain. After that, I’m going to try to tackle mediation. What’s mediation? It’s basically what shoudl happen when you do something (like a gesture) and the computer can’t decide between two things. So, it asks you what you wanted to do. If it just asked you and had you pick from a list, that would feel unnatural (because it’s a break in your workflow). So, I’m trying to see if there are better ways to mediate between alternate actions. The last piece of my thesis is perhaps the most important and most difficult. It involves evaluating my work on real developers. This is still an unsolved and mostly unexplored area for me, though I know I should be working on it.

what is the best possible outcome for my thesis?

I would be thrilled if at some point in my life I saw mainstream input systems such as those in Microsoft and Apple products turning probabilistic, and if those systems used some of the ideas outlined in previous papers, or papers to come. Given the popularity of natural user interfaces, I think this is a very real possibility, which is quite exciting.

Some Interesting Image Blending Results

Here are some interesting images I generated for my computational photography class this semester. Images are generated using Poisson blending and Mixed Gradient Blending. The idea is to copy in the change in pixel values, not the pixels themselves. You can learn more about the project here (here is my project submission with more detail).

Elephant in New York City.

Name on a Wall

Walking on Water.