I'm Aza Raskin @aza. I make shiny things. I simplify.

I'm VP at Jawbone, focusing on health.

 

Does Google Censor Tiananmen Square? How To Create an Internet Hoax

Update: Since putting this post up, it looks like Bing has fixed the issue and there are now some results that appear in Google only because of the popularity of this article.

Let me start by saying that, at least in the US, Google does not censor Tiananmen Square. Nor does Bing. Nor Yahoo. But we can make it look like they do. If you don’t believe me, click here, here, and here.

As you can see, I’m linking to the real Google domain and the looks and acts legitimately. The URL looks normal. You can even change the search, say remove “massacre” and Google still doesn’t find anything. Try it with quotes. Remove square. Still no results. The “censorship” certainly feels fairly real, and the hoax would be even harder to detect if if I had said that they were only censoring links from some third party sites.

Now try copying and paste the search to a different search engine. Nothing again. It’s a conspiracy! And look, Digg censors searches for their rival Reddit!




Or how about the government forcing videos from Afghanistan to be removed from YouTube from within the US?

What’s going on?

I’m using a search query that looks like one thing but is, in fact, another. This particular search query uses unicode characters that look identical or similar to normal characters. In this case, I’ve replace one of the “a”s in Tiananmen with look-a-like character from the Cyrillic alphabet. Nobody uses such a tampered string when writing about Tiananmen Square so Google naturally doesn’t find any results. Hence, it looks like they are censoring. As long as you don’t delete the entire query and start again, modifying the query in place will continue returning results that appear “censored”.

In effect, this is taking the old phishing trick of homoglyph attacks—an attack consisting of using confusing look-a-like URLs like paypa1.com with the numeral one replacing the letter ell—and adding a dash of cross-site scripting but where you become the agent of infection: the supposed “censorship” may be shocking enough to cause you to forward the link. You can find a list of look-a-like characters here.

Using this technique you can create viral links showing that Bing censors BP oil spill images, or that Techcrunch has never used the word perfect. With a mischievous eye, these kinds of searches might well cause damage. Someone will figure it out eventually, but probably not before the PR damage is already done.

Being #1 In Google. The Easy Way

Another way to take advantage of this hack is to easily appear to be the first hit for any term in Google.

Step 1: Decide on the term you want to own. Say “Used Cars”. Now perform the homoglyph substitution. When you search for the tampered phrase, you should get no results.

Step 2: Use the tampered phrase on the site you want to appear as #1.

Step 3: Make a set of throw-away web sites that mirror your competitors sites that all use the tampered phrase and have them link to the site you want to be #1.

Step 4: To prove that you are the most reputable used car site on the web, just link people to the Google search for the tampered phrase. The page will list you at the top for the phrase “used cars” with all of your “competitors” ranked below you.

Step 5: Congrats. You can now control any search engine’s search page, as long as you provide the link.

Mirror Character

I stumbled on this technique through an exploration of the unicode “mirror” character, which reverses the direction of all text after it. Doing a search for “‮” seemingly breaks Google. Going a step further, you can write your queries backwards with the mirror character at its front, making it look normal and also yield no results. When I tried this particular technique on my Twitter following most of them figured that something strange was going on based on strange interaction experience and odd search results. The unicode homoglyph method does not suffer from these issues.

Search engines could nullify this attack vector by watching for such strange homoglyph characters in the middle of normal words and quietly swap them back.

RT @aza Does Google Censor Tiananmen Square? How To Create an Internet Hoax | Follow @aza on Twitter | All blog posts

No related posts.

View all 80 comments


Well I guess Yahoo actually shows some results


This reminds me of being back in the anti-spam business, and how there are something like several million legible unicode permutations of the word “viagra”.


˙uıds pɐǝɥ ʎɯ buıʞɐɯ sı sıɥʇ ˙ǝpoɔıun ɥʇıʍ sbuıɥʇ ʎzɐɹɔ ʎuɐɯ os op pןnoɔ noʎ ʍouʞ ʇ,upıp ı ‘ʍoʍ



    Aza Raskin

    My signature in all of my emails read “– aza | ɐzɐ –”


One way to defend against this is to use fonts that somehow display these characters differently, though unfortunately, that requires turning off the ability for web developers to use their own fonts, which can ruin layouts.


Way cool!


Actually, Google does return relevant results for me. It seems it is transcribing the Cyrillic words to their English alphabet equivalent.

For example I tried searching for “Суrilliс”, in which the two “c”s and the “y” were written with Cyrillic glyphs, and all others were in English. The result was a search for “surillis”. Which is not without logic, might I say. The cyrillic “c” is read similarly to the English “s”, and the cyrillic “y” is read similarly to the English “u”.

Maybe someone else will confirm this, because it’s obvious screenshots won’t provide any evidence to my words.



    Aza Raskin

    From what locale are you trying this?


      Bulgaria.
      Using google.com in English.



        tom

        same here, from serbia (using google.com in english)



          Debayan Gupta

          This is because you are using the link from a location where the Cyrillic font is common.

          The same thing will happen if you use a trick link with Armenian characters ( հ,ց or something ) from Armenia.


    What a nice read, not at all what one would expect to read. It is refreshing to come across these articles. Easy, clean, cleaning services guide.



fuska

I think web browsers should watch for this kind of attacks too, i.e. highlighting in the address bar homoglyph characters in the middle of normal words.



Neal

I would recommend the search engines use something similar to their misspelling logic and then show the original search string below the search box with the offending homoglyphs rendered bolded, colored, and with a background color.

For example:
Did you mean moo? You searched for: mоo (middle o bold, blue, pale blue background)
If the user is interested you can put a tooltip or link to why the letter is colored.

There are a couple advantages to this.
- The presentation logic and flow can be similar to what the search engine does normally uses for misspellings. For misspellings, Google asks “Did you mean moo?”, Bing says “We’re including the results for moo. Do you want only mоo?”
- You can still search for mixed character words if you need to. Maybe you actually want to find M0O!

Doing something like what Bing does (including results from the closest real match) may also have a benefit of discouraging this homoglyph attack, since you can’t be at the top of the search results anyway.


    Google’s misspelling logic is not based on a huge dictionary but on something much simpler – if you search for “Obma” (assume here it’s the first time ever) you meant “Obama” so you’re very likely to change the query yourself. Google records that and if there are other people like you making the same mistake, it will start to offer suggestion automatically.

    Nobody is going to train Google is suggesting words without homoglyphs. :P


      Of course I meant “in suggesting”. Sorry.



      Trevor

      Their spelling suggestions may be auto-generated (or partially so), but that doesn’t mean a homoglyph protection feature would have to be. It shouldn’t surprise anyone to learn that security features are often programmed. A homoglyph dictionary would obviously be much smaller and more finite than a spelling dictionary. Good spam filters (even Google’s) do it, why couldn’t Google?

      It’s important that vendors (whether they’re sites, browsers, or other web-connected software) protect users from this sort of attack, if for no other reason than to avoid being viewed by their users as vulnerable to this sort of attack.


Oh! to add, I almost forgot I did this a few years ago. It’s the Unicode direction override glyphs. If I recall correctly, Firefox (and surprisingly IE) was smart about it but Safari wasn’t.



    Aza Raskin

    Very cool! I played around with that too, but was unable to confused Firefox that way. Interesting that Safari is susceptible.



      Trevor

      It’s been mitigated somewhat in newer versions. Safari 4 Beta (and probably going back to some point release of 3.x) will strip the ‮ character and display the address in its normal (ltr in this case) order. After entry of the URL, of course. And while it will display text in a page with the direction reversed (as expected), it will display the non-misleading address in the status bar before clicking such a link.



        Trevor

        LOL but apparently your site’s comment section doesn’t filter entities.

        It’s been mitigated somewhat in newer versions. Safari 4 Beta (and probably going back to some point release of 3.x) will strip the ‮ character and display the address in its normal (ltr in this case) order. After entry of the URL, of course. And while it will display text in a page with the direction reversed (as expected), it will display the non-misleading address in the status bar before clicking such a link.



          Trevor

          Er, but it does filter the ampersand entity. Okay, I give up!



Riccars

Yea i noticed that, I wanted to find the image of the man standing in front of the tanks and it was very difficult to find.



jim

..actually, just exploring and testing this out.

for more info.http://www.guaranteeddirect.com


This site converts text upside down using the described unicode hacks:

http://www.fileformat.info/convert/text/upside-down.htm



Anonymous

Funny that Aza told Google about this a while ago and were complacent enough to not fix it. Just calls BS on the whole “we care about security”.



Pedro

Just to point out that in Opera the unicode characters are not shown and are instead replaced by special codes, in the status bar and in the URL bar.


There is no such thing as Unicode Mirror character. You’ve linked to the RLO character, which force the direction of the text to be RTL. In fact, I am using the same technique in order to write email addresses on the web and making sure that web aggregators won’t be able to capture that mail address!

You can also place hidden characters inside your query instead of characters that look similar to others. For example, ZWJ is used by Arabic and some other languages as well, and doesn’t have any visible change to the human eye.



Daniel

Great article!



Debayan Gupta

It seems that google does substitute characters in say, Cyrillic, but only if you perform the search from a place where the font is used (say, Bulgaria).

So, if you search the Bulgarian site in English : http://www.google.bg/search?hl=en&q=tian%D0%B0nmen+square+massacre

You will get the correct results.

I tried using servers in other countries to try out some more languages, and it seems that this is google’s general strategy – while this makes _some_ sense, google (and others) should certainly put some kind of check in place for this sort of thing.

As far as unicode characters are concerned, most search engines are utterly unprepared for things like the LRO.


Excellent article! I never thought there could be that simple exploits in Google’s search algorithms. Hopefully they’ll fix this, as it certailny creates numbers of possibilites for phishing attacks.


That’s one of the problem with writing articles like this: everyone goes for the ironic but-your-site-fails-too :)



Dave Nihil

Well surprise surprise but clicking on the Bing link (3rd) gives me

“We did not find any results for tiаnаnmеn square.
Were you looking for: tiаnаnmеn squared”

And I live in Eastern Europe.


    Bing doesn’t check for similar characters (based on location) – in fact, “http://www.bing.com/search?q=bіng” will turn up stuff from YouTube..


well, this sounds as if the whole world turns into a global little china, where information are limited. the documentary about the tank man (see youtube) is the best example for what will happen to us, if we let this move on…


wohoo that is amazing! what a hacking skill!
and I like the five steps for being #1 xD



Silvan

Search for live and you’ll see: google is evil! http://is.gd/elCsF ;-)



Carl

Hmmm. The Google search provides relevant results for me and I’m here in the US. I wonder if they’ve changed something since you posted this.


Doesn’t mean a damn thing, because you use Safari, and safari is for huge tools.
tl;dr.



c.j

you’re an idiot….it’s google.cn that they censor


surprised a lot of features here


Way cool!


how exactly can I get this effect? I don’t know how I can insert this character.


Actually, Google does return relevant results for me. It seems it is transcribing the Cyrillic words to their English alphabet equivalent.


I think web browsers should watch for this kind of attacks too, i.e. highlighting in the address bar homoglyph characters in the middle of normal words.


If there are two characters are alike; I don’t see the need for another character. Wouldn’t it be easier to fix the standard?


Silvan here gave me an idea – Combine the above attack with any url shortener, and you’ll be sure no one changes your spoofed url.



Paul

I enjoyed this!


i’ll try it thanks for sharing!


Interesting findings. There are so many ways that people can go around to fool search engines. But, the recent Google algo change really fools lots of people. I believe that’s more for the Google’s benefit than the visitor. Our site was ranked down a lot and even the bloggers and reviewers now rank higher than our main site!


China wholesale beads store, free shipping, and very good post really


Men cry in secret behind closed doors, often called “Men do not cry, ” he becoming enlarged.


Look forward to exciting innovations with a nice article


good share. thank you.


aza finds an interesting hole in URL perception vs reality


Google censor many Hebrew words as well, thanks for this hoax, now I can bypass it.


I’m a bit scared after reading all this..


but overcrowed prisons and keep the guards with job security, and were paying for it. Hopefully if it is put on the ballot again the voting machines won’t be tampered with like in the last election, where the found that the diebold machines were supposedly not working right!!!! And also the one they should get rid of is Matt Cate who has done nothing but collect a pay check


Everything is explained in detail relevant to the subject site is within the existing social networking sites I came across a very impressive skins and best regards I

wish you continued success with your site, I found the opportunity to examine in detail…


This was my google But we can make it look like they do.Now try copying and paste the search to a different search engine. This particular search query uses unicode characters that look identical or similar to normal characters. Now perform the homoglyph substitution. When I tried this particular technique on my Twitter following most of them figured that something strange was going on based on strange interaction experience and odd search results. The unicode homoglyph method does not suffer from these issues.Search engines could nullify this attack vector by watching for such strange homoglyph characters in the middle of normal words and quietly swap them back. It seems it is transcribing the Cyrillic words to their English alphabet equivalent. highlighting in the address bar homoglyph characters in the middle of normal words.There are a couple advantages to this.Nobody is going to train Google is suggesting words without homoglyphs. A homoglyph dictionary would obviously be much smaller and more finite than a spelling dictionary. Interesting that Safari is susceptible. There are so many ways that people can go around to fool search engines. He also creates modular cardboard furniture called Bloxes.


Link building is a very important part of a website-promotion on the internet. You contact webmasters of other, related websites them related boggers also and let them know your website exists. If the value that you have worked so hard your website is evident to them, they will assist their own customers by linking back to your website.


Great article, reminding your readers that the META KEYWORDS is not used by Google search engines, to really capitalize on keywords you must embed them in your article and inside your META DESCRIPTION, leverage the keywords more than one time and early on in the article for maximum SEO benefit.


Pfizer really stand out from the rest this year. I just know it can be difficult to imagine all the possibilities when you work so closely with a brand. I think this new wordmark achieves that without offending too many people. The fact of it being seen everyone means nothing. The critical debate about Wolff Olins credibility remains a mystery to me though. The shame of it is that barely any of the recommended applications have been implemented. AOL rebrand is on the wrong list. Should be both the best and the worst of the year. It should be the worst though because the execution is terrible. So much so that I instantly moved from being an illustrator to becoming a designer. My hatred of that logo burns in my chest to this day. The name of the designer behind that logo is the only reason for any of its success. The best list the year is pretty mediocre for me. no logo is an island and none should be entirely judged in isolation of the way they are used. Not to mention the oddly huggable serif. Removing the tilt on the two background hearts also really enhanced its impact. I get the feeling that alot of branding is done by techies nowadays. Brands are important but their strength depends on a tightly micromanaged image. The free flow of information threatens a tightly controlled image. this is not a association that Facebook created or marketed but a reputation its users created. Does good brand application override the need for a decently designed logo. Wolff Olins are just too good to be frivolous. If you release a new list in a few days I promise to pretend like this list never came out. Congratulations to you on a fantastic and entertaining year. This years list was definitely a more difficult one. I found the logos lower on the best list were better then at the top of the list. Melbourne was a great addition to the list. This comment section has become worse than QBN threads. A perfect example is the SyFy logo. people hated it and now that they see it in action its no deal at all. Aol should have been the worst definitely. Particularly after the Brand New staff itself flamed the logo as it was being unveiled to the public. The logo itself is not much of a logo at all. Stop suggesting them for this list. Though the conversation seems to be flowing about that pick. Melbourne should have been at the top of the list. Aol is a signal for many of the rebrands to come. I think it is probably the most innovative


and explanatory text has been narrated was a great wide thanks to everyone who contributed

consistency has been fully explained in the article is quite descriptive writing and sharing parts of your post I want to share all liked it very much thanks

Thank you, that your site is very nice touch topics that up until a date and I congratulate you for the labor of a successful site

The topics discussed were very descriptive site Thank you all wish you continued success deserves praise..


Everything is explained in detail relevant to the subject site is within the existing social networking sites I came across a very impressive skins and best regards I

wish you continued success with your site, I found the opportunity to examine in detail


Tütüne Son, tutuneson, Tütüne Son Hap, Tütüne Son Tablet, Tütüneson, Tütüne Son Kullanıcı Yorumları, Tütüne Son Sipariş, Tütüne Son set Kullananlar.
Bu sayfayı ziyaret ettiniz.


well articles.. thanks sharing


thnks
goooooooooooood
min:)ااا


I like such topics


The Food and Drug Administration is in charge of regulating medical devices just as it does drugs, including the machines used to give shock treatment. But it’s not doing its job. It has allowed these machines to be used on millions of patients over the past generation without requiring any evidence whatsoever that shock treatment is safe or effective! This is so even though shock machines are Class III—high risk—devices, which by law are supposed to be investigated by clinical trials as thoroughly as new drugs and devices just coming onto the market. But because of intense lobbying by the American Psychiatric Association—which claims the devices are safe but opposes an investigation—the FDA has disregarded its own law. (For the full story of how shock survivors have fought for a scientific safety investigation of ECT for the past 25 years, see the new book Doctors of Deception: What They Don’t Want You to Know About Shock Treatment by Linda Andre.)


N hésitez pas a partager cette article. Il en vaut vraiment le coup. Je dois dire que je ne regrette en rien de m’être abonné à votre weblog. Continuez !

expert référencement naturel http://pandagooglealgorithm.wordpress.com ameliorer référencement web coût


thank for sharing..


hi helo good information dearrrrrrrrrr


thank you very much


What a nice read, not at all what one would expect to read. It is refreshing to come across these articles. Easy, clean, cleaning services guide.


Looking at the current fashion trends, sunglasses are an important aspect of almost everyone’s life. They are among the most unique and extravagant accessories that one can wear when going out in the sun. Increasing exposure to UV rays may cause cataract and damage the retina of their eyes. For this reason you need sunglasses which are not simply a fashion accessory but also shield your eyes from harmful UVA and UVB rays. fake oakley sunglasses http://pinterest.com/fakeoakleysoaho/fake-oakleys/


Children?s eyewear: If you are looking for children?s eyewear, you will be amazed to see the variety of colors, styles and designs available in the market. You can opt for a plastic or metal frame to offer durability to your child?s eyewear. Spring hinges which allow more flexibility and polycarbonate lenses are great for children?s eyewear. foakleys


Leave a Comment