I'm Aza Raskin @azaaza. I make shiny things. I simplify.

I'm the Creative Lead for Firefox.

 

Solving the “It” Problem

Sponsored by

One of the neater features of Ubiquity is it’s ability to understand pronouns in a natural and contextual way. For example you can write the command

email That's brilliant! to him

and if you’ve got the name “Jono” selected, Ubiquity understand that “him” means Jono, and thus to look up his email address. Thus we get “email That’s brilliant! to jono@mozilla.com”. Although this makes for a great bit of magic, and is wonderfully useful, Ubiquity isn’t perfect in guessing what to interpolate. Say you’ve got “Lolz” selected and then use the command:

twitter I've seen links to this a lot

The question is do you mean to twitter “I’ve seen links to this a lot” or “I’ve seen links to Lolz a lot?”. Both make sense, and which one is better is unclear. The method that Jono DiCarlo pioneered was to embrace the ambiguity: we can make a smart guess about what the user means, but always give them the final choice in disambiguation.

The problem we’ve seen is that as we add more pronouns, the number of possibilities for interpolation goes up exponentially. For two pronouns, there are 4 disambiguation choices; for three pronouns, there are 8 disambiguation choices; etc. And for every extra command that matches, you double the number of disambiguation choices. It’s not a scalable solution.

Also bad is that often what Ubiquity thinks it should be interpolating is not your locus of attention, so as we use Ubiquity we often make mode errors. And we see this happening fairly often in the wild, where tweets have a random words from an errant selection where the word “it” should have been.

Bring the Fight To Them

There are a couple of potential solutions to the pronoun substation problem, but many of them don’t fly. For instance, we don’t want any special markup or computery quotation marks for indicating the to-be-interpolated pronouns. It should feel natural.

The solution I like best so far flattens the decision tree by having you choose as you type. It takes inspiration from cell-phone autocomplete/auto-correcter interfaces.

As you type a magic word, you can click or up-arrow to select the substitution. If you ignore it and keep typing, you get the plain text you typed. Think of it as on-the-fly text augmentation and structuring: we can greatly reduce the complexity of natural language processing by stealing a cycle from the user in a humane way. This scales nicely to other magic words. For instance, “url” can be replaced with the current pages url.

As focus in on Ubiquity uplift into Firefox, smart pronoun substitution may be largely important.

Question: What other solutions are there? We’re interested in radically different ideas to explore the solutions-space before we commit to one.

RT @azaaza Solving the “It” Problem | Follow @azaaza on Twitter | All blog posts

Tags: , ,

View all 16 comments


It’s not great from a natural language processing perspective, but you could resort to special keywords like “thisselection” instead of common English words like “this”.

Maybe use Tab completion instead of Up arrow?

If you type “this” and then press Tab, it replaces “this” with your selection? And if that’s not what you intended, you can backspace to revert back to “this”?

(Backspace to undo Tab completion should work for verbs and everything else, too, by the way.)



ChrisJF

One thing to discuss is what the user most often means by ‘this’. Knowing the stats (i.e. 65% of the time the user uses ‘this’ to mean what they selected) would help.

In my experience, I only use the ‘this’ keyword in commands that have multiple arguments such as the translate command. But for single word commands like google I don’t use the ‘this’ keyword at all because Ubiquity automatically starts googling for the selected text when no parameters are specified. We could interpret the ‘this’ keyword differently for the various commands but this might be confusing for users.

Maybe we should interpret the ‘this’ keyword to mean the selected text all the time. And then we could auto-replace ‘this’ with the selected text akin to iPhone’s auto-correct when mistyping. However, the iPhone fails when the user wants to undo the suggestion because the user has to manually delete the replacement. Preferably, if the user does not want the auto-correction, they could simply click an undo-badge (similar to mouse-based ubiquity) beside the word.

To me, this way is a lot less intrusive than the solution outlined in the post. This discussion is really interesting. Thanks Aza for letting us participate!


“you can click or _up-arrow_ to select the substitution”
up-arrow, really? Why not using tab, just like we auto-complete verbs? Up and down arrows are used to select a command so far…
Moreover, if we ever merge Ubiquity with the awesome bar, it won’t be possible to have this suggestion above “this”.

I’m also thinking about what will happen if the user select a loooong text that he wants, for example, to put in a mail. When replacing “this”, the selected text should be reduced and maybe put in italic like: “mail //A paragraph of a Wikipedia article reduced…// to aza”



Sander D

The autocomplete-like solution might not work well with speech. It seems more natural to say “twitter I’ve seen links to this a lot” and then choose the correct option from a list, than to say “twitter I’ve seen links to this”, press a key and say “a lot”.

How about always telling the user what action has been performed with a big transparent message, and allowing for an ‘undo’ command while delaying the action for at least one minute?



sep332

As I was imagining a UI for my pie-in-the sky, ultra-futuristic computer system (back in grade school) this is exactly the disambiguation system I had in mind. Of course the dialog was spoken instead of typed, and the interface was as big as a wall, but the “mechanic” was the same.

For a simple disambiguation, the computer would show all options, with the most likely one selected by default. As the user continued speaking, a small nod would indicate that the default was correct, or a vague wave of the hand could select one of the other options. This maximizes flow. For a case where the computer couldn’t figure it out, it would highlight the confusing word and the user would either provide more data or a more specific pronoun to help the computer along.

I think one improvement to your current idea is to use the best guess automatically, if it meets some standard of “probably correct” in the computer’s opinion, offer the disambiguation if the probability is not quite as high, and (politely) demand explicit disambiguation if the probability is not very high.


(140 Twitter chars wasn’t enough for this!)

1. Assume all “magic pronouns” are always magic and color them differently in the console. (or maybe use “best guess”, but I’m leaning toward consistency)

2. Position the cursor (keyboard or mouse) over the magic word, and get an option like what happens to link-text in gmail offering you the ability to remove or disambiguate it (using either mouse or keyboard of course). Perhaps like auto-correct in Word, you could even control-z undo the magic blessing.


Disclaimer: up until I read this post, I had no idea that the magic pronouns even existed in Ubiquity, so this is a noob perspective.

I think the solution you have is an elegant one, with one caveat. The pronouns themselves can mean way too many different things in my mind (regardless of what limitations you impose on the meaning in Ubiquity). I feel like that’s the reason I’ve never used them. (hopefully, I need to go back and check my tweets now to see if something got unintentionally replaced!)

I think about the things that ‘this’ could refer to and what comes to mind are: ‘this page’ (that I’m currently on, I may be interested in the URL or the entire contents of the page), and ‘this selection’ (which may be text or an image, and may or may not be hyperlinked).

I feel like a two-word solution feels pretty natural.

“Download these links”
“edit this image”
“Bookmark this page”
“email this page to him”

But, that’s two words, and probably has it’s own drawbacks. Another solution would be to say pronouns are too vague and have magic nouns instead.

“Download links”
“edit image”
“Bookmark url”
“email url to him”

Granted, that language sounds a little stunted, but the meaning is clear from each phrase. “This” would be understood, or could possibly be typed and ignored.

But as for the phone-like predictive text ui you mocked up. It does seem like it would be a pretty natural-feeling interaction, as I sit here and mime it out on my keyboard a few times.



Sushu

The auto-complete is already part of it, no? Like, the preview section tells me that “this” means whatever I selected. In any case, I’d like to have it integrated with the existing preview pane, instead of having one down at the bottom and one right by where I’m typing.

Also, I was frustrated by the map command and I was told to take it up with you guys. ;) (Basically, I wanted to get directions w/ map, but since there was no directions choice on the preview pane, I tried to go to the Gmaps site from the preview pane, but clicking on the Google logo didn’t work, and I didn’t know I had to hit enter b/c all the rest of the map stuff was clicking. The randomness of click-based commands and type-y commands is sometimes disorienting! :(



alex.r.

A suggestion:

You let the user chose as they type but the default changes as you collect the responses.

It could be done based on the probability of choice given context p(c|magic word, preceding words).

An HMM could be used to estimate that.

Obviously this would only help when users repeat the same actions. I don’t know if it’s generally the case.


I agree with Robby. having more specific nouns makes mroe sense. As i understood it, Ubiquity was to feel natural and flowing. Taking the time to take your fingers away from the typing position isnt very natural, nor would it save time. While having to type the whole thing over again might be slightly annoying or slower i think it would flow well and would give many more meanings to each general keyword.

As said above this could relate to url, selection, image, etc



David

Whatever you do, don’t make people take their hands off of home row to make Ubiquity selections. That defeats the whole purpose. (So, obviously, clicking or up-arrow-ing the ‘this’ wouldn’t cut it. Clicking should be an option, of course, but the keyboard equivalent should leave your hands be.)


when it’s possible to press a key to activate some functionallity the interface should show an image of the key. The more the user uses the key the fainter the image gets until it doesn’t appear at all.


Looking at this mockup: http://www.flickr.com/photos/azaraskin/3272673947/

that makes the modifiers discoverable, but what about having to press the right arrow key to see the suggestions, that’s not discoverable. Why not show an image of the right arrow key above the first modifier, then the more the user uses the right arrow key the fainter it gets until it doesn’t appear at all.

Could the right arrow key be pressed again to select the “from” modifier instead of “to”, swapping them over? What happens if there are three or more modifiers, how do you know which will be brought across next? what if you select past the modifier you want, can you select the previous one? I’m assuming the left-arrow key would just move the cursor, maybe ctrl+z, or would they cycle so you have to go full circle.

How about if the cursor is at the far right but not after whitespace pressing the right-arrow selects the first modifier that the preceeding word fragment matches the start of. So if you your example the user typed “f” and hit the right arrow it would bring from across. As the user is type if the start of a modifier matches the word fragment before the cursor that word fragment could be highlighted in bold so the user knows which modifier pressing the right-arrow will select.


Why not simply use quotation marks if there is ambiguity. That’s a fairly common method of solving this problem. So, highlight Ubuiquities keywords in a different way to show they’re ’special’ – or do best guesses if possible, but ultimately let the user over ride it themselves. It’s something that is easily learned.

email “I think this website alot” to john

That’s quite a simple solution.
Also, natural language interpretation is always going to be difficult, so maybe you need to set a grammar for Ubiquity, that keywords aren’t allowed in the middle of random strings. You can add rules to the language to be used. If someone tries one form and it doesn’t work, then they’re smart enough to rephrase it.


Here’s an idea: embrace the txt speak. It’s like the “thisselection” idea but much less clunky. Make it so the vowelless form of the word triggers the substitution. So “ths” is this, substituted, “t” is “it”, “hm” is him and so on. I think people would get the hang of this fairly quickly, and the rule for converting a pronoun to the substitutable form is simple.



Drew F

I don’t really have anything new to offer (although I am trying to think of something). I have opinions about the already mentioned concepts.
I am against the quotation system. I agree with Aza it doesn’t feel natural. I think a main point of the natural language command architecture is to impose far less semantics.

Having specific key words has both a positive and negative effects. The positive being that this seems like a more frequently used word than image or url so the probability of errors occurring are less likely. But by making more keywords if the problems do occur now conflicts arise from each of those keywords.

I like Gerv’s idea of ths and hm but I am more favored to thiss and himm. I have gotten used to this on the iPhone, and although not very discoverable, it is very memorable (as I use it for getting we’ll instead of well). This seems like a solution for a more technically savvy user though as I also use it in things like textExpander. Although I feel that this should be the demographic ubiquity targets, if these features will eventually be added to the awesome bar then I feel that Aza’s recommendation fits a more general audience.


Leave a Comment