I'm Aza Raskin @azaaza. I make shiny things. I simplify.

I'm the Creative Lead for Firefox.

 

How To Phish, Protect Your Email, and Defeat Copy-And-Paste with CSS

Sponsored by

It’s not often that you learn something from spam, besides that there are an extraordinary number of generous Nigerians (replete with theme song) and amazing number of variations in the spelling of viagra. Yet, I recently got spam where the offer was written in pristine English: no numbers replacing letters, no images, and no misspellings. How had such a brazen piece of spam got through my filters? The answer, it turns out, was some clever CSS that caused the HTML markup to be garbled but its visual rendering to be readable. I’ll show you how to use this for both good and evil.

The Good

Using CSS to obfuscate HTML can be used to protect your email address in a quick-as-you-type way. No need for unwieldy inline images or arcane Javascript functions. For example, here’s my email address:

azSPAMa@moREMOVEzilla.com This email address cannot be copy and pasted. Try it.

Notice that it appears to be in normal, selectable text. Don’t be fooled by appearances. Copy and paste it. You’ll get the following: azSPAMa@moREMOVEzilla.com, which is easy for a human to fix but difficult for a bot. How does it work? Here’s the sample code:


<style>
.z{
  float:right;
  font-size:.001px;
  color:transparent;
  display:inline-block;
  width:0px;
  }
</style>

...

az<span class="z">SPAM</span>a@mo<span class="z">
REMOVETHIS</span>zill<span class="y">a.c</span>om

This takes anything marked with the class z and visually hides it, leaving only the email address rendered as if nothing was amiss. When you go to copy my email address, however, the browser doesn’t see the garbage text as hidden and so it gets selected along with my email address. This is the same way that your DNA protects itself, by hiding the real information in a huge amount of noise. It would be difficult for a scraper to even know that there was an email address there, let alone implement a program that understands CSS well enough to parse it out: there are a semi-infinite number of ways to use CSS to make text visually disappear. If you wanted to get really tricky, you can add multiple classes to each little span; because CSS cascades and inherits it becomes very hard to know, without just looking at the results, what the text will end up saying.

Foiling spammers with a bit of their own medicine brings a smile to my lips.

The Bad & Evil

Besides spammers using this trick to get around your Bayesian spam filters, there are other bad things for which this can be used. This first is for that misguided holy-grail of publishers: copy-protection for their words. A publisher could generate, on the server side, a new random mess of HTML and CSS that would render their text uncopyable. This also has the side-effect of making your pages impossible for search engines to index sensibly; it’s an easy way to keep your information human-readable but cloaked from Google’s all-seeing Sauronic eye.

Here’s a simplified example of how a publisher might use this:

Try copying and pasting this. It will only give you garbage. Then look at the source. It’s pretty unintelligible. Note that this is a quickly-coded example and could be made much harder to reverse engineer.

mTheIismM trpexsmt aEiscy uexnc nopiNyanxblaie.vk KnAeiLAthev Roaup zerant dwMutnrde.ocmh eswaaIs svbo LrnsB inyn .MehMlbilou Crnnae wGineo 1gx93 z1.oC Advt vothlae pBtiavmefI, rGhi rs fkfaizthsherld, nGKetxitarh MMuourdlyoc rh,es wdkasle avE rryegdlioblna ml iFnelzwsTEpaaqpe ar eomaezgnrvathbe robaBfseasd cmou yt aIof P MizelrwbopIursOneih a vndhO ayNs rIa Hrel.suoPltun tiphegt ftnameGilrLy kvwaozs lMwedfalevthtMy. n Reeupeuerost fwa hs oAgrmdooomme Ed npbyul hisis F fhwattFhecHr uzfrdAombM a Jn eneaiqrleOy eMag ze,yp apwndlx w.venest msofrof sptoaL scatueudy b piGhinploaasovqphnFy,Ld peNolotit Micads dkant d etecLoneHomazicsas natsp Onyxf .orhrd ieUn civnPerwksieytygF i vn owEndnglvcanlcd.p

The other evil thing this could be used for is phishing attacks. Sometimes you run across a URL unlinked on a web page. To go there, you copy and paste it into your location bar. In fact, we are often told that the only way to really trust a website’s location is to put it there yourself. Try copy and pasting this URL into your location bar:

http://facebook.com.evil.comThis URL will phish you if copy-and-pasted into the location bar.

While you think you copied one url, it directs you to an entirely different, evil URL. While this particular example is easy to detect for pedagogical reasons, you can use all of the standard phishing methods of disguising a URL to make it look more legitimate.

Brainstorm

I’m sure there are other good and bad things you can do with this technique—these are the ones I came up with in a couple minutes. What are your ideas?

RT @azaaza How To Phish, Protect Your Email, and Defeat Copy-And-Paste with CSS | Follow @azaaza on Twitter | All blog posts

No related posts.

View all 63 comments


When the CSS is not used for rendering HTML (feed readers like Google Reader ignore the site css), your original email is never shown correctly.

One simple solution is to embed your email address in a TABLE. LiveJournal uses this for user profile pages. No CSS dependency. For example, view the HTML source of this page to see how the email address is embedded in a TABLE – http://lustymonk.livejournal.com/profile



Simon

Same remark here. I was reading your post in Google Reader and missed the point, you need the CSS. Does an RSS reader strip inline CSS?


There are two other CSS methods mentioned at

http://en.wikipedia.org/wiki/Address_munging#Alternatives

The text mini-logo one would still be readable if the inline css wasn’t applied. It’d just turn into a big-arse logo :)



George

I have seen script such as the following work well.

<!–
x5h=’‘,l6m,’@',g8x,
”); // –>



Blair McBride

I was about to post about the possibility of a bot simply stripping anything enclosed in additional tags.

But I now see you’ve already done what I was about to suggest: add an additional tag for a valid piece of the address (in the example the “a.c”).

I’d like to see more of these type of techniques on the net, rather than some of the horrible ways some sites use.



George

You are placing a lot of faith in people actually noticing that the email has changed when they paste it into their mail. I know I wouldn’t


Hi,

I’m using Safari 4 Beta and when I’m copypasting your e-mail, I actually get the real one.

Would a Webkit-using spam robot get the same ?


    I have the exact same query Kevin! :)



    Aza Raskin

    Good catch—I’ve changed the CSS so that it works in Safari, Chrome, Opera, and Firefox.



Dave

Works too well… When this post shows up on planet.mozilla.org the example shows as azspama@moremovethiszilla.com both times. I was confused on first read. ;)



Kuno

Same remark here. Did you try to read your post at planet.mozilla.org?


The good thing is that it degrades fairly gracefully. Yes, if you don’t have CSS it will show the obsfucated one, but (if you don’t mangle it quite as much as Aza) it’s not really terribly different from what hundreds of people do already.

You could do something interesting like:

.z{ position:absolute; visibility: hidden; }
#username::after { content: “@”; }
#domain::after { content: “.”; }

andrewjanuary at gmail dot com

Though that doesn’t work in a most of older browsers.

As someone already said, it does rely on people realising it’s changed, so I think it’s probably safer to just list the email with the obsfucation in place.


Ack, when will people realise eating tags is bad!

Hopefully this will work:

<style>
.z{ position:absolute; visibility: hidden; }
#username::after { content: “@”; }
#domain::after { content: “.”; }
</style>

<span id=”username”>andrewjanuary</span><span class=”z”> at </span><span id=”domain”>gmail</span><span class=”z”> dot </span>com



James Heaver

How well will screen readers and accessibility devices handle the above?

Also, could the above technique be used to produce captchas? Mechanical turk techniques have made captchas easy to beat, but perhaps a technique like this could make it more difficult

There isn’t a single image to pick up and pass to a human, you would either have to identify the bit of text to display, or pass the whole page. With typing captchas such a quick process, perhaps this could increase the cost by a factor of four or five as the human has to break the repetitive routine to search teh page for the captcha.

I don’t know CSS at all, but could a similar technique be used with traditional captcha images aswell – displaying a real image along with a number of coded, but hidden fake captchas?


These guys are implementing “DRM” using a slightly different technique:

http://www.misaustralia.com/viewer.aspx?EDP://20080708000020876768&magsection=news-headlines-list&portal=_misnews&section=news&title=Marriage+made+in+customer+heaven&source=/_xmlfeeds/mis/news/feed.xml

If you try to copy and paste, you’ll only get every second letter. Although they have to used fixed-width fonts and it’s easier to defeat.



Dao

Not particularly user friendly …

I prefer this:
<a>foo at bar.com</a>
respectively:
<a>contact me (foo at bar.com)</a>

… and then fix it up with JS:
http://phpfi.com/330174



Dao

Btw, when using CSS, I think you want display:none rather than position:absolute;left:-100px;.



Aza Raskin

@ Sridhar et al: Bleck. I forgot entirely about feed readers. It would certainly be nice if they didn’t strip tags. I like the TABLE trick, but it seems to be attackable simply by striping the tags, which is I think one of the more common ways of creating a scrapper.

@Simon, Dan: I normally put the garbage characters in all caps, which helps readability. Or I’ll put other delimiter letters. That way it looks like azSPAMa@moREMOVETHISzilla.com. I think I’ll update the article to put it back this way — I just didn’t want to get comments saying that a spammer could remove any all caps letters…

@IRC: That’s a wonderful use of the CSS!

@Dao: The problem with foo at bar.com style email address obfuscation is that it is trivial to write a regexp to find and scrap the email address. In fact, I would assume that because that method is so popular, scrappers have long since gotten wise. And although the user-interaction isn’t great when copying, visually it looks perfect on-page.

@James: Good question. I think the reader would probably read it out, so it won’t sound perfect, but a human should be able to figure it out. It’s at least better than an image.

@Kevin: That’s great from a user-perspective. I was trying to find some magic CSS to make that happen in Firefox, but couldn’t manage it. It wouldn’t be possible to write a Webkit scrapper because it would be very difficult for a script to know visually where the email address was.


My normal approach would be something like:
kourge AT gmail DOT com
And then use JavaScript to replace the AT and DOT with the correct characters. Not only are users able to copy and paste correctly, if the user is using a screen reader, the reader would read it out loud correctly as well.


Oops, I guess those tags got swallowed up. I mean to wrap the text with a span tag whose class is “email”.


I just don’t even bother.

Hey, spammers! Here’s my e-mail address!

azarask.in-spam@elliottcable.com

I haven’t gotten more than one spam mail to any given domain in 6 years – instead of removing/moving to a folder I ignore, my scripts flag mail they think is spam, so I can decide whether or not to blacklist a particular address (and notify the owner of the relevant domain that their site is somehow leaking e-mail addresses).

Wildcards are the shit d-;


The table solution would cripple a screen reader. Also, what about making it clickable? I guess you could add some js to do that using your classes as selectors, but it seems like overkill when you could just use the js to begin with. I also throw in a vote to @Mossop’s comment regarding cut and paste – I see a lot of bounced emails with this solution.

All of this said, the value here is in the different approach. It will hopefully lead to more thought in using css for obfuscation or machine blurring.



Vijay Chakravarthy

Thinking about this from the flip side –
This could be an interesting way for spammers to build messages that are human readable, but get past the bayesian filters.


Viva La Evolucion ;)



Jack

Hi,
I am using
http://www.mobilefish.com/services/hideemail/hideemail.php to protect my email address against spam bots.
This site also contains other useful tools.



DR

Stuart Langridge (of LugRadio fame) mentioned something similar: see http://www.kryogenix.org/days/2008/08/21/readable-non-harvestable-email-addresses-with-css .


CSS obfuscation must be the most inconvenient way to do it. Obfuscation based on HTML entities is slightly better but not the best available option to obfuscate as you can all also confirm from this chart.

Mac OS X Dashboard widget called obfuscatr provides JavaScript encoding, which is more convenient (for both ends) than with CSS. The other possible option is just plain hexadecimal encoding of your email addy, involving above mentioned HTML entities. So 2 alternatives available from obfuscatr. See the details at flash tekkie.

obfuscatr was also featured in MacWorld Italy of March 2008.


Never thought of that, heh. Good idea.


Using the hidden text to make text uncopyable was already common used on Chinese forums.

This feature was included in Discuz, a popular fourm system, on five years ago.



sep332

> Yet, I recently got spam where offers was written in pristine English:

Irony!



    Aza Raskin

    Very ironic. And fixed now :)



Mike Smith

It’s probably not going to show up correctly in this comment, but I like the idea of using the unicode mirroring character (U+202E) in front of your email, reversed.

‮moc.allizom@aza



    Aza Raskin

    That is very cool. I didn’t even know unicode had a mirroring character.


    Very cool indeed. To bad this has problems with copy & paste and needs to be enclosed with specific HTML tags.


      *Problems with copy and pasting with specific applications, EditPlus for instance will paste the reversed email, while on Notepad and Firefox the email will be pasted correctly but also the control char – which makes navigating the text with the keyboard arrows very difficult.



    Abdulla

    yeah but I don’t think it will work in most cases because of the naming issue..

    I have read about this last week from trend micro when they talked about SASFIS Malware

    http://blog.trendmicro.com/sasfis-malware-uses-a-new-trick/



Mike Smith

Oh, sweet. It worked. Try copying the above email address.



Matthijs

When posting essays on-line to websites use the CSS to prevent plagiarization


Without the need to use the U+202E control character, wouldn’t

<span dir=”ltr”>moc.allizom@aza</span>

do the same?



    Aza Raskin

    It would, but that would be easy for spammers to modify their scripts to nab your email address. The CSS trick relies on the difficulty of processing CSS rules and their visual affects.


    LTR? Don’t you mean RTL? Anyway, even RTL doesn’t seem to work for me.



cc

Great, so the technology to prevent web pages from being machine-readable is already available. I guess firefoxen’s copy&paste will resort to on-screen OCR by the time such schemes become common as ‘copyright-protection’.


Wow, so simple and useful!


Actually, I wondered a while ago why Gecko would copy invisible symbols in a selection. It seems to do more harm than good, ignoring any symbols that don’t have any dimensions sounds like a viable idea – and Kevin’s comment sounds like Webkit already does exactly that.

Btw, any scraper using a real rendering engine (with DOM access and everything) can easily get around your trick – exactly by ignoring the DOM nodes that have no dimensions and only extracting the text from the other nodes.

PS: Is it intentional that your blog displays no dates whatsoever? When I saw 40 comments I tried to figure out whether this is really an old post that only bubbled up due to a minor modification – no luck…


PPS: Ok, apparently this is intentional – I found the dates commented out in the source code. And this post is ancient as I suspected…



    Aza Raskin

    It is intentional—I feel that a lot of these blog entries are more akin to articles than time-limited pieces. This post, for example, was based on a thought from 2008 which I then entirely rewrote and extended dramatically.



kl

Opera 10.6 and Safari 5 are immune to this, so this clipboard hack seems more like Firefox bug.



    Aza Raskin

    I’ve testing with Safari 4, Chrome, and Firefox 3.6. I haven’t tested with Opera or Safari 5. I’m sure slight tweaks to the CSS would fix the problem.



Adam A

I faintly remember reading a blog post about someone where a spam mail passed his filters using css tricks. He had a clever idea of using it against spam harvesters. Hmm.. dejavu?


I’ve done something similar, but found that using JS to print your email is just as effective at deterring spam bots, and as a bonus can leave the email easy to copy paste (I don’t want to make people’s lives hard).

Even more interesting is to just list your email on pages that use SSL. I’ve done that as well now. Since I force my contact page over SSL I’ve yet to get spam go through my contact form, I’m pretty sure no bots scrape it either. I still use JS to print my email address though.



Dan

Hey Aza, on a completely unrelated note, Are you involved with the Enzo zenPad (http://www.enso-now.com/), or did they steal your logo?



Wo0T

Hello,
I’ve made a little python script to generate the code.
You can view it here:
http://dpaste.com/210601/


I’ve done something similar, but found that using JS to print your email is just as effective at deterring spam bots, and as a bonus can leave the email easy to copy paste (I don’t want to make people’s lives hard).



Debayan Gupta

I’ve been using a combination of js and css to hide my email – I usually use a variation of what you’ve mentioned here. Say,

john dot smith @ xyz dot com

Here, I put the “dot”s inside spans, resize them, and give them black backgrounds to make them look like dots – that way, a human who copies my address ends up with understandable text.

You can even reorder the letters using css, instead of hiding them – just use letter-spacing with a normal span and a floated one (so that they overlap – you could use multiple z-indexes or something if you’re feeling particularly vindictive).

I’ve also experimented with using different fonts and languages (“Foiling spammers with a bit of their own medicine brings a smile to my lips.” – absolutely!). The unicode mirror character is also particularly useful.


Bot evasion, copy prevention and gray-hat seo of varying shades of gray – that’s the main applications of this broad family of css tricks.

ps: You forgot pixel.gif, styled with width, height and background.jpg



sildur

Any CSS-based DRM can be easily defeated by exporting the web page to PDF and copying the resulting text.


Sign: zdbrw Hello!!! cguhn and 4759meuegzrgun and 4743 : I love your blog. :) I just came across your blog.


thank you bro very nice post


Leave a Comment