Remixing voice recognition and face detection with white rabbits (Intuity)
Language based interfaces are hip right now. Both iPhone and Android OS recently have been equipped with the ability to listen to your talk. So the technology seems to have reached a level where it’s ready to hit the consumer making it easier to communicate with certain devices and services. But the technology alone won’t do the job. It needs an appropriate interface in order for us to understand. Otherwise we won’t realise that a specific device is able to understand our commands.
For instance the Nokia N Series (N73, N80, N95 etc) has “voice recognition” included since several years but only a very few people know about that particular feature (or even use it). So we asked ourself some questions: What are the drawbacks of current voice interfaces? How can we get the human touch in language based technology? And how do we prototype these kinds of interactions in order to come up with new solutions?
How can we augment language based interfaces?
You talkin to me? Using a visual recipient for voice commands
One problem that we identified is the “recipient problem”: Who am I talking to? In which direction do I need to talk? Talking to a screen seems weird to us. The solution we came up with is using a metaphor. In our case a white rabbit.
Schau mir in die Augen, Kleines! Activating devices by looking at them
“The system should always keep users informed about what is going on, through appropriate feedback within reasonable time.” 10 Heuristics of Usability
Another problem we identified: the feedback! Is the device actually listening to me right now? Will the device listen to my commands all the time? Is it active? So we came up with two modes (quiet and listening) and a simple way of switching between them: Using your eyes.
- Quiet Mode: The device is “sleeping”. Nobody is looking at it so it won’t accept voice commands
- Listening Mode: Somebody is looking at it. It’s got the attention of somebody so it will actually listen to voice commands.
How can we prototype language based interactions?
Our setup: a microphone, a webcam, Macspeech Dictate, Nabaztag and a bit of hacking mindset. The quick solution for our rough prototyp: use a rabbit that knows your face! We’ve ‘augmented’ our nabaztag bunny with a webcam and used a face detection algorithm trough Processing to find out when somebody is looking straight into the eyes of our little bunny.
We captured a video in order to document the setup and two simple example implementations for voice commands.
What did we learn?
- Constant feedback. After the conversation was started a constant feedback from the machine is needed like: “listening”, “understanding”, “misunderstanding”, “command was done” etc.
- Language is a fascinating additional layer to interaction design but it’s certainly not the ultimate tool for interacting with devices. A lot of contexts exist where language might not be the appropriate weapon of choice
- On the other hand some things can be expressed in language very easily (see our expample in the video)
- The technology is still not that smart like the one in Star Trek, damn. But it’s definitely a big space for innovation.
- Do we need communication patterns and standards?
- Is there a need for a set of standard vocabulary which can be used when interacting with devices? For example is there a need for some kind of standard “ping”, “help” or “Hey I’m talkin to you” command? We could use it to wake up specific devices around us?
All these findings and questions seem are certainly worth to explore. We’ll keep our ears on the ground in order to check for signals and innvations that will support these kind of interactions.