Hiding contact details from email scrapers

2022-08-19

I wanted to add my email address to the about page on this website, but without it getting scraped by web crawling robots and spammed.

I managed to display the email address and get it to look and behave like an email link (i.e. hover state, text selection) while making it hard for scrapers to detect that there is an email address worth harvesting. I thought it might be worth posting about how I did it in case anyone else is interested.

The email address should only exist in memory on the client

If the email address is sent with HTML from the server or added to the DOM with Javascript on the client, then it will be fairly easy for a scraper to find. I got around this by encoding the email with a simple Caesar cipher with a random offset for each page load and decoding it into a Javascript variable when a visitor loads the page.

I'm not saying it's a perfect solution—if you specifically wanted to scrape this website it would be fairly easy to get the email address by decoding the cipher. But there would be no point, if you're only interested in this website the email address is plainly visible. I'm more interested in preventing generic scrapers that are crawling the web looking for anything that looks like an email address. The idea is that most crawlers will not be looking hard enough to notice one on this page.

Making the link clickable

This part is easy. I couldn't add the email address to the href attribute with a "mailto:" as that would mean it's in the DOM and discoverable by a scraper running a headless browser. Instead I added a click event handler that opened the link using the email variable. I also encoded the "mailto:" part so as not to give anything away.

Making the email address visible

It's fairly simple to render the text to a <canvas> element on the page to make it visible. The code looks at the parent element's CSS styles and applies them to the rendered text, so it uses the right font at the right size with the right line height.

I wanted the rendered canvas to displayed at the right width so that it could fit in between normal text. To do this, I first rendered the text on a canvas that is full-width of the viewport, then looked at the rendered pixels to find out where the text ends and adjust the canvas accordingly.

Making it behave like a link

The fillText method for <canvas> doesn't include rendering text with an underline, but that's fairly easy to do by just drawing a line 2 pixels below the text.

Then there was the hover state to implement for when you mouse over the link. Because I'm using the parent element's style to draw the text, I could place the <canvas> inside an <a> element and just redraw the canvas when a visitor hovers over the link. The text colour changes just like a real link.

The most challenging (and probably unnecessary) part was making the text appear to be selectable. It was fairly easy to show a selection highlight using the x-coordinate of the mouse pointer, but that gives a smooth intermediate selection behind the characters so it looks like you can select a portion of a text character. A real text selection "jumps" to the end of the character as the mouse passes the mid-point in that character. To mimic this, I pre-rendered each character in sequence offscreen and measured the width of each. As the mouse selection moves over the text I then calculate which character is selected using a threshold value.

Drawbacks

The main drawback I can see is that this will not work well at all for visitors who are nearly completely vision-impaired. If I added ARIA attributes then these would be easily scrape-able and defeat the purpose. Visitors with milder vision impairment could zoom the text—not ideal but it's something.

Another minor drawback is that if you zoom in on the text you can tell that it's rasterised as it becomes pixellated. You could mitigate this by drawing the text on a higher resolution canvas and shrinking it down if you wanted to. I do render the canvas text at 2x scale so it's compatible with high-res screens.