The comments on this blog used to get quite a bit of spam. I tried to clean it out manually, but it eventually became enough of a hassle that I gave in and added Google reCAPTCHA to the comment form and contact page.
The spam stopped. I was forced to admit that, for all my dislike of reCAPTCHA and the extra hassle it required, it did its job pretty well.
I was wrong.
When you are asked to solve a reCAPTCHA challenge, Buster adds an icon at the bottom near the “Verify” button:
After you press this icon, Buster will switch the reCAPTCHA to an audio challenge, use an API to solve it, and fill in the answer for you, all within a couple of seconds. In my experience, it works nearly all of the time and has saved me a great deal of trouble.
The Case for reCAPTCHA
Despite this obvious shortcoming, reCAPTCHA successfully stopped nearly all spam, which initially surprised me. If solutions like Buster are free and open-source, can’t bots just use them to solve reCAPTCHAs when necessary?
Yes, they can. But Google knows that’s the wrong question. We should be asking whether that system is financially sustainable for hackers.
You see, reCAPTCHA doesn’t stop bots; it slows them down. By forcing bots to run slow and computationally difficult procedures, Google effectively raises the cost to keep these bots running. At a critical point, operating costs exceed any profit gained, and spammers turn their attention elsewhere.
In this fight between Google and spammers, Google wins.
Every Silver Lining Has Its Cloud
However, Google’s approach to this problem is flawed in numerous ways.
- Sustainability. As time goes on, server space gets cheaper, making cracking reCAPTCHA profitable again. To counteract this, Google has to raise the difficulty of the challenges, which hurts human users.
- Scale. You’ve heard of the economy of scale, right? The idea is that bigger businesses can afford to buy raw materials in bulk, lowering their prices. Well, the same principle works in reverse here. The more sites reCAPTCHA is on, the more profitable a solution is to spammers. Even though battling reCAPTCHA is not currently financially profitable, scammers are incentivized to keep developing solutions, hoping to find a profitable one. In response, Google—once again—has to raise the difficulty, hurting human users.
- Privacy. As we’ve established, Google has a conflict of interest when it comes to your privacy. Nearly all of Google’s profit comes from exploiting user data for their own purposes. In the case of reCAPTCHA, Google uses the data to train their self-driving cars and build more data about the sites you visit. (Here’s why web privacy matters.) Even if reCAPTCHA is no longer a successful approach, Google will never admit that, since a loss of users means a loss of revenue.
A Possible Solution
You may have noticed that this site’s comments no longer use reCAPTCHA. Instead, I’ve implemented what I’ve seen called a honeytrap.
Look at the comment form. Here’s what you should see:
Pretty straightforward. This, however, isn’t what bots see! They detect an extra input:
While any person whose browser or screenreader decides to show the field can clearly see that they should not fill that out, bots are told to add an email address. Once the form is submitted, I know to automatically reject any comment with an email provided in that field.
In one sense, this changes the challenge from “I’m not a robot” to “I am a robot,” tricking bots into taking action when none is required.
I’ve done my best to ensure this won’t hinder human readers. The field is only visually hidden and should still appear to screenreaders and bots (as verified with Lynx). To users of screenreaders, it should be clear to leave the field blank, and if they add something anyway, they’ll get an error message that lets them resubmit (without erasing their message).
The form field is still in the tab index, so keyboard users will still focus on it even though it is not visible. I don’t expect this to be a problem, for the following reasons:
- Just pressing “Enter,” as if tab had done the expected thing and highlighted the submit button, will still submit the form.
- If they do enter anything, the browser should not let the form submit without a valid email address.
If you know of any other potential accessibility concerns, please contact me, and I’ll attempt to resolve them.
It’s too early to make conclusive judgments, but I do think the honeytrap is working. Before I added reCAPTCHA, I got one or two spam messages per week. I have not gotten any in the two or three weeks since I launched the new system, though I believe my traffic has increased since the early days.
I’ll consider adding some sort of counter to see how many messages are rejected because of the honeytrap; that would help me make better conclusions.
My honeytrap is not the only solution. Remember, the way reCAPTCHA stops bots is by raising the cost. There are many ways to do that, and the best is—unexpectedly—to lower standardization.
This might surprise some technology enthusiasts. Usually, we work toward greater standardization and compatibility between platforms. In this situation, though, having a standard means that programmers of bots have less work to do to be able to pass all challenges. Forcing them to come up with something new for every site they want to spam makes a prohibitively large amount of work, making it actually more profitable to spam by hand.
reCAPTCHA is a target because it is so widely used. My honeytrap is not; even if someone comes up with a script for it, a minor tweak is all it takes to break their code.
So, here are some other ideas for CAPTCHAs:
- Simple math problems. Ideally, these should be phrased as word problems: “What is thirteen plus five?” Having multiple ways to ask the question will make it even harder for bots: “What number is five more than thirteen?” You’ll need to create a generator to make these, but it’s easier to generate them than for bots to solve them.
- Asking the user to perform an action, chosen randomly: “Type the word ‘blue’ below” or “Select the types of fruit that are yellow.”
- Asking users to create something verifiable: “Enter a five-letter word.”
reCAPTCHA should not be used:
- It doesn’t stop bots.
- Its real strategy, raising the cost of spamming, forces Google to constantly increase reCAPTCHA difficulty, frustrating humans.
- Having a Google service on a site allows Google to track users in a way they can’t opt-out of without abandoning the site (or never leaving a comment).
- There are better strategies that are much easier for humans without being easy for bots.
- Having reCAPTCHA used so widely raises the incentive for spammers to learn to circumvent it. Destandardizing the CAPTCHA industry would, in contrast, enormously increase the cost of spam.
If you’re a website owner, remove reCAPTCHA from your site. Code something yourself instead; it’s not very hard. Just look at the list of examples earlier in this article for inspiration.
See you in my next article, where we’ll talk about Machine Learning (our current form of AI) and how it affects our digital experiences.
This article is part of a series on digital citizenship, the way we live in a technology-saturated world, which you might enjoy reading in order from the beginning.