Scaling Ubiquity to 60+ Languages: We Need Your Help
The year is 1953. Robert Floyd, the man most cited in Donald Knuth’s The Art of Computer Programing, takes the stage. His topic is on using English as a programming language:
“Each word must be implemented by a procedure which somehow contains its meaning and consistently interlocks with other procedures for other words. I can’t think of any task, intellectual or not, which has ever been carried out which approaches this magnitude.”
Over half a century later, those words ring prophetically true. Instructing a computer to do what you want with natural language is astronomically hard. That’s why Ubiquity cheats left and right to do it. It embraces ambiguity, uses tight feedback loops, and a restricted vocabulary plus grammar to give the appearance of something human. And it can do a lot better.
As we think about Ubiquity uplift into Firefox, we aren’t just bound to English. We need to have an interface that can potentially work in the 60+ languages that Firefox has been localized too. For that we need your help.
I’m pleased to announce that Mitcho Erlewine, a linguist-coder, will be leading the charge in helping us understand how to bring conversational computing to the Firefox scale. His first blog post is on How natural should Ubiquity really be? Although he speaks four languages fluently (English, Japanese, French, Chinese) and is a gifted linguist, he can’t do it alone. Especially if your native tongue is not English we need you to get involved in blogging, thinking, and mocking up ideas for how Ubiquity in Firefox’s Awesome Bar can work in your language. Put your mockups on Flickr and tag them with ubiquity and mozconcept.
Question: What are the greatest difficulties in bringing Ubiquity to your language?
RT @azaaza Scaling Ubiquity to 60+ Languages: We Need Your Help | Follow @azaaza on Twitter | All blog posts
Tags: Firefox, mozconcept, ubiquity
mitcho
Thanks for the introduction Aza.
Letting the browser understand our native tongues is a bold and daunting task, but also tremendously exciting and rewarding one. ^^
I look forward to working with all of you on this bold initiative!
Edgar Gonçalves
Welcome, Mitcho, and thanks for the community call-to-arms Aza!
My native language is Portuguese. I enjoy using Ubiquity (as it is now) because the English grammar makes it practical. I’d like to present an issue that will most likely be true with other languages: web applications / site names are often verbs – or treatable like verbs. This can be done with Google, Twitter, Flickr, Gmail and many others. And this is why it makes much sense to me use a phrase in ubiquity like “TWITTER i think you’re right @ friendname”, or “GOOGLE something cute”. In Portuguese, however, these verbs have no easy translation to Portuguese. Google, for instance, to be a verb, it would probably sound a bit like “Googlar” (pronounced like googlaahr), and its conjugation would be something like “Googla uma coisa gira”. But this is not only a false phonetic approximation to the English name (as it isn’t correct, not in Portugal, nor anywhere), but it’s also not practical to change for every site name and every language. I won’t even go to the Twitter, that can only be translated as “chilreador” (he/it who chirps like birds), and let alone that not being a verb, but a noun, no one in its right mind would correlate twitter – the webapp – with “chilreador”.
This is why helper verbs are needed. So instead of “Google searchterms”, we need to have something like “search Google for searchterms”, or “search searchterms in Google”, to be able to translate to “procurar searchterm no google” or “procurar no google por searchterm”. This type has the advantage to be scalable, as most webapps have a purpose described in a verb (send, post, publish, lookup, translate, map), but has the huge disadvantage of requiring more words to desbribe in a correct syntax (both in English and Portuguese). Yes, that is why I wouldn’t in my life go to the effort of typing “PROCURAR NO GOOGLE POR searchterm”, if I can type the English equivalent “GOOGLE searchterm”. Even if autocomplete helps – I know it would – the screen cluttering is too much.
This settled, I trust most Portuguese users would settle to type “WEBAPP-NAME argument”, like “GMAIL #url PARA contact-name” or “TWITTER message”. But this goes the opposite direction of natural language. However, it would make Ubiquity practical to use. The illusion required from the user is that he/she must be ready to mentally replace the webapp-name with the proper verb (e.g., replace “GOOGLE” by “PROCURAR NO GOOGLE POR” – translated to “lookup on google for”). This type of automatic mental word transformation isn’t quite what you call natural language. But then again, we live in a world where children grow up with SMS on their mobiles, and learn to talk/write using acronyms and shortners of every macabre (often painful) kind, all to save up a few characters and time. And they all get used to it real fast. This happens in Portuguese too. So I guess it wouldn’t be too much to ask for some concessions to the natural language Ubiquity intends to support.
This is the greatest problem of being in a country with a language designed to be rich, full of syntactic sugared redundancies and artistically-oriented. It just isn’t as practical to use as English for these situations. This is also why I intend to keep using the English syntax whenever I’m able to.
Anonymous
@Edgar: I don’t really see why Portuguese makes verbing any less natural than English does. Strictly speaking, the rules of English don’t allow writing something like “flickr a screenshot and twitter #badwebdesign”. English rules would require you to write that as something like “post a screenshot to flickr and send a message via twitter with tag #badwebdesign”. However, as you described, nobody would want to write that way for Ubiquity. The “verbing” convention in English doesn’t follow English grammar rules any more than a similar convention would in Portuguese.
You said that mentally replacing a site name with a verb phrase doesn’t seem like natural language, and I agree entirely. Don’t do that. :) I don’t see “Google for foo” and think “search Google for foo”; I think “Google for foo”. It *becomes* natural when you stop the inner grammarian in you from trying to “correct” it.
Anonymous
Now, that said, I think it *does* make sense to use natural-language verbs, but not together with site names. I’d rather tell Ubiquity what I want rather than how to get it. For example, I could explicitly type “flickr screenshot and twitter to #badwebdesign”, but I’d rather type “send screenshot to #badwebdesign”. Ubiquity could figure out that #badwebdesign represents a hashtag, that such a hashtag likely means a micro-blogging service like twitter or identi.ca, that I have an account on twitter, that sending a file to twitter requires turning it into an URL, that images can get uploaded to an image site like flickr or numerous others, and that I have an account on flickr.
As a side effect, Ubiquity could offer me options that I hadn’t considered. For instance, it might offer me the option of uploading the image to a different image site, or of micro-blogging it to *both* twitter and identi.ca, or of uploading the image as a full blog post and linking to that post via twitter. Ubiquity can’t easily do any of those things if I tell it I want flickr and twitter specifically.
As a side note, consider gstreamer-style pipelines for a moment. You start on one end with a .mp4 file containing H.264 video and MP3 audio, and you end up on the other end with decoded video playing to your screen and decoded audio playing through your speakers. In between, the system constructs a pipeline based on processing blocks which can handle certain inputs and certain outputs. Now consider the scenario I described above, “sending a file to twitter requires turning it into an URL”, and think in terms of processing blocks: screenshot produces an image, flickr takes an image and provides an URL (which works as text), twitter takes text.
Don’t get me wrong, I want the ability to tell Ubiquity exactly what it should do, but I’d like to start out by telling it what I want and seeing what it comes up with. Ubiquity can choose amongst options based on frecency, existence of accounts, and compatibility of input and output types.
Anonymous
One other note about natural language: given the option, I’d rather not even type “google searchterm”; I’d rather type “g searchterm”, or just “searchterm”. Similarly, I find “send screenshot to #badwebdesign” amusing, but I’d rather type “shot to #badw” and let autocompletion turn #badw into #badwebdesign and shot into screenshot. I’ll gladly sacrifice as much of the “natural” part of language as I can get away with not typing, as long as Ubiquity can figure out what I mean.
Edgar Gonçalves
Hmm, let me comment on some discussed items:
1. “I don’t really see why Portuguese makes verbing any less natural than English does.”
– My point here is that most web applications have names that can resemble english verbs. “Flickr screenshot” can be seen as “flickr a screenshot”, and understood as “send the screenshot to flickr”. The same with “Twitter #world”, it can be mentally processed in english as “Twitter (the current text selection) to the hashtag #world”. (Sorry for making the same webapps as examples, but this way we avoid extra confusion). In Portuguese, the phonetic similarity of those names with grammatically viable verbs isn’t that easy to make. For instance, all verbs (in their infinitive form) end up with “ar”, “er” or “ir”. Other terminations just seem odd, foreign-like, and don’t integrate well with the language without starting to sound too technical. But other than that I know the sentence structure isn’t correct in English, either. However, somehow it “sounds” right, and that’s what my mind is looking for. (Note that my examples are only between English vs. Portuguese. I’m sure there may be far more difficult comparisons to make with other languages.)
2. “It *becomes* natural when you stop the inner grammarian in you from trying to “correct” it.”
– I strongly agree. This is what I mean with the teen shortened language. If it is practical, we become used to it, it becomes natural. I’m ok with it, I even expect it!
3. “I’d rather tell Ubiquity what I want rather than how to get it”
– That’s also ok for me. in fact, I already use Ubiquity that way with the “map” command, for instance. But for this to happen I think we have to cut on some command freedom. I have a command called buxfer-add that makes a transaction on Buxfer. It’s syntax is “BUXFER-ADD amount IN description”. I realize now this is a bad syntax, and it’s on my to-do to change it. A better way to achieve this is to make a command for the webapp “buxfer”, registered to the action “spend” (e.g.), with a numeric direct (i.e., unnamed) complement and a “described as” complement. This way Ubiquity could offer me the Buxfer option (or make it default, or some other choices I can’t envision now), when I start typing “spend”. This could work well in portuguese, too: “gasta 65€ em hotel”. The translation work is, as a side-effect of removing syntax freedom, made easy for everyone. We only have to translate the “doing” action words, and the complement introduction prepositions (e.g. “to”, “in”, “as”, …). Do you have other plans, does this idea fail and fall down miserably on something I’m missing now?
4. “I’ll gladly sacrifice as much of the “natural” part of language as I can get away with not typing, as long as Ubiquity can figure out what I mean.”
– Ahh, that for me is vital in Ubiquity. take away its autocomplete, and ubiquity looses a fan. But I’m not sacrificing anything. One of the most interesting thing about the (current version of) Ubiquity for me is that it builds a description of the command we’re going to perform, so that we can read it, and know we’re not making a mistake. And this description comes up in natural language. So if I type “shot #badw” I can always read the sentence “send the current secreenshot link to the hashtag #badwebdesign via Twitter”. Long live preview messages for the bridge between shortcut madness and intelligible thinking flow!
That said, thank you for commenting my comment. One side-question. Are you Mitcho, Aza, an Ubiquity user/fan/developer? I’m talking simply as a user/fan, just giving my 2 cents, and hoping to help moving forwards. And just to make it clear (my first comment may lacked that), I believe translating Ubiquity and get it closer to more and more people is not only noble, an interesting problem, but also a public service: everyone should be able to be more productive!
Jason
UBIQUITY IS FREAKING AWESOME! Just felt I needed to tell someone that.
Robert Kaiser
The largest challenge I see for bringing such an interface to my native language German (which is probably among the more simple languages for that) is that the command form of a sentence having different order than the normal sentence, but I expect that users naturally will try to use either of them when typing in ubiquity.
While in English “search Google for ‘foo’” is both the command and a normal sentence (infinitive construction actually), in German “Suche ‘foo’ mit Google” is the command form and “‘foo’ mit Google suchen” is the infinitive construction that probably as much people would enter naturally, not to speak of different prepositions they may chose instead of “mit” or a passive construct that may be as natural (“Google nach ‘foo’ durchsuchen”).
The major challenge for German is that there are very often many ways to say the same thing and people will (try to) use either of those ways.
Zayıflama Lida Fx15 Ve Biber Hapı Zlfvbh
Thanks for the introduction Aza.
Letting the browser understand our native tongues is a bold and daunting task, but also tremendously exciting and rewarding one. ^^
I look forward to working with all of you on this bold initiative!