Has reCaptcha been cracked / hacked / OCR'd / defeated / broken?

SecurityCaptchaOcrRecaptcha

Security Problem Overview


Have any programming methods have been used to defeat reCAPTCHA?

I'm interested in seeing evidence and potentially demonstrations that reCAPTCHA in particular has been made obsolete by completely automated, humanless methods.

To clarify, not looking for reCAPTCHA-cheating solutions that involve humans in any way, whether teams tasked with filling out CAPCHAs, porn-seekers, or Mechanical Turk.

I'm also not looking for alternatives to reCAPTCHA, like picking the type of animal, or background fields or javascript trickery.

Security Solutions


Solution 1 - Security

I notice that almost all the answers here relate to the ineffectiveness of the concept of CAPTCHA, in principle - and while I very much agree with them, in fact gave a talk at OWASP a few months ago explaining just that - the question is very specific, so I will provide for a demonstration.
But first, I will reiterate that demonstration aside, re-read the other comments, since it's truth that CAPTCHA is pointless and not helpful, irrelevant of implementation....

But really, check out CAPTCHA Killer. You can upload a CAPTCHA image, and it will automatically, if not immediately, provide the OCR'd answer. It also provides for an API (REST, I think, but maybe also SOAP). I personally tried numerous reCAPTCHA images, and it was actually some of the easiest ones (or at least quickest) broken.

UPDATE: CAPTCHA Killer's website is now taken down, apparently under legal pressure. See http://captcha.org/ for a complete overview of the topic.

And yeah, OCR is not the best way to break a CAPTCHA protected site - there are many other better ways.

Solution 2 - Security

You might be interested in this detailed report on how 4chan defeated reCAPTCHA, and used it to manipulate Time.com's annual TIME 100 Poll results.

> Hacking Recaptcha (aka ‘The Penis Flood’) > > The next tactic used was to see if they could find a flaw in the reCAPTCHA implementation. One thing they discovered about reCAPTCHA was that it always presents two words to a user for decoding - one word is a control word known by the reCAPTCHA system, while the other is an unknown word (reCAPTCHA uses the humans to help correct OCR errors). Wikipedia describes the process: “Scanned text is subjected to analysis by two different optical character recognition programs; in cases where the programs disagree, the questionable word is converted into a CAPTCHA. The word is displayed along with a control word already known and is labeled by the human. Those words that are consistently given a single label by human judges are recycled as control words”. 2iasdo4 What Anonymous realized was that if they always labeled the unknown scanned text with the same word - and if they did this thousands and thousands of times eventually a large percentage of the unknown words would be mislabeled with their word. All they had to do was look at the two words in the captcha, enter the proper label for the ‘easy’ one (presumably that would be the one that the two optical scanners would agree upon) and enter the word “penis” for the hard one. If they did this often enough, then soon a significant percentage of the images would be labeled as ‘penis’ and the ability to autovote would be restored (one side effect, that was not lost on Anonymous, was the notion that for years to come there would be a number of digital books with the word ‘penis’ randomly inserted throughout the text. Update: I asked Ben Maurer, chief engineer of reCAPTCHA about this ‘penis flood‘ attack, Ben says that they’ve anticipated this type of attack and they have numerous protections that will keep the penises from penetrating the reCAPTCHA barrier. > > Optimizing reCAPTCHA > > As appealing as the notion of sprinkling the word ‘penis’ into texts, the Anonymous team knew that the clock was ticking, and if they were going to restore the Message they didn’t have time to wait for the autovoters to come back online - they were going to have to vote manually, many, many times. And so they needed to be able to enter captcha’s as fast as they could. They developed a set of guidelines that allowed them to quickly decide which reCAPTCHA words they could skip. For example:

>> You will be given 2 words: 1 real, 1 fake. >> >> For [REAL FAKE] or [FAKE REAL], you can just type in REAL and it should be accepted. >> >> If it’s [LOOKSREAL LOOKSREAL] or [LOOKSFAKE LOOKSFAKE], it’s usually just quicker to just type in both words. Don’t waste precious time deciding which one of them is real. >> >> Use both the appearance and the type of word to identify a fake >> word. Don’t rely on just one of them. > > The whole ruleset is here: fake captcha.

Solution 3 - Security

The weakness of CAPTCHA systems is that people set up rooms full of people in China whose only job it is is to look at a CAPTCHA image and type in the result, which plugs into the automated system that's actually doing the spamming.

Not much you can do about that really.

It's also far cheaper than trying to do image recognition, OCR, etc on the actual image (you may get a response for under $0.01 the other way).

Solution 4 - Security

Before giving in to the pressure of using captcha, consider creative workarounds such as having a field labeled "Your Comments" that is hidden by CSS. If the field is entered, the request is dropped by the server. Most bots will fall for it even if there is still not a good way to defeat the room full of underpaid laborers, which captcha does not help with anyways.

UPDATE: Just read a case study where removing CAPTCHA increased conversion rates by almost 10%. That would indicate to me that it is rather broken if you are losing 10% of your leads just to filter out bots. Imagine what 10% means to most businesses.

Solution 5 - Security

My favorite captcha is from Microsoft: http://research.microsoft.com/en-us/um/redmond/projects/asirra/

> Asirra (Animal Species Image > Recognition for Restricting Access) > is a HIP that works by asking users to identify photographs of cats > and dogs. This task is difficult for computers, but our user studies > have shown that people can accomplish it quickly and accurately. > Many even think it's fun!

It is a free service and they have example code to get you started.

I wonder how long it will be before it is cracked.

Solution 6 - Security

reCAPTACHA isn't broken and it won't be for a very long time. The thing is, if you implement your own captcha if it's broken, it probably takes a long time to fix it.

This is taken from the page about reCAPTCHA security:

> reCAPTCHA is a Web service. That means > that all the images are generated and > graded by our servers. (…) this also > provides an extra level of protection: > our CAPTCHAs can be automatically > updated whenever a security > vulnerability is found. > > For example, if somebody writes a program that can read our distorted > images, we can add more distortions in > very little time, and without Web > masters having to change anything on > their side.

I believe as they are specialized on captchas they have improved versions stored, ready to be deployed in little time if needed. (Why should they create stronger security when the weaker isn't broken yet?)

Solution 7 - Security

Not only has it been defeated, but also a useful application has been successfully built on top of it, to become the most amazing tool to defeat all kind of free-account protections of a big list of direct download sites (not only megaupload and rapidshare).

Jdownloader is open source and written in Java so a peek at the source code can answer not only if it is broken but also how.

Edit: Most of direct download sites do not use reCaptcha, but a simpler Captcha method (3 capital letters colored in different colors). Nonetheless Jdownloader and Cryptload (a program similar to Jdownloader) are the only working implementations that I know that effectively have broken a Captcha method. I have not heard of any implementation to crack reCaptcha.

Update: It seems that at least one implementation of reCaptcha (not whole reCaptcha itself) has been cracked too.

Update Dec 2010: Jdownloader seems at last to be defeating reCaptcha. The plugin is still experimental and works only on Windows versions of Jdownloader, but, as I have been told by a mate who tried it, it does work.

Solution 8 - Security

There was a speech at Defcon last year that went into the problems with CAPTCHAs in general. One of the things they did is use multiple free OCR engines and had them vote on the best words. Doing this, they were able to achieve a somewhat decent chance of succeeding. For one kind, it was 40% or so, I don't think it was reCaptcha, though.

Solution 9 - Security

  • "In fact, it [reCAPTCHA] became pretty useless on 4 January [2011] when spammers apparently got their collective hands on a piece of software that circumvents reCAPTCHA and allows for a fully automated registration process. The bots have been busy, very busy indeed, ever since" 1

2-3 years ago the text-typing based captchas approach trespassed the line when they lost its battle, i.e. further complications just make them relatively (since computer power is increasing, while human's not) easier for machines and more repugnant and repelling, if not completely impossible, to humans. This contadicts to original paradigm of CAPTCHA as a test to to ensure that the response is not generated by a computer

Update:
Note that reCAPTCHA is owned by Google Inc. but Google Inc. does not use it by their own services.
Here is a link containg webpage with captcha used by Google itself/internally for ex., for Gmail registration:

alt text



Note that Google's reCAPTCHA always has 2 words.
Here is the link for image with Google's reCAPTCHA offered to be used by others.

And reCAPTCHA's screenshot:

alt text

I leave to make the obvious conclusions to a reader.

Cited: 1
vBulletin forums hit by reCAPTCHA cracking spam bot | PC Pro blog
Posted on January 12th, 2011 by Davey Winder

Solution 10 - Security

I'm seeing blog comments on a system protected by reCAPTCHA where the page loads and 1 second later the post was made successfully. The User-Agent was nonsense (in this particular case it claimed to be running Ubuntu 9.25/Firefox 3.8), the referrer was from a completely unrelated site with no link to us.

This is clearly automated.

Solution 11 - Security

reCAPTCHA has not been defeated. If it had been, then why did Google just buy it and announce they will be applying the technology within Google to increase fraud and spam protection for Google products?

from Google Acquires reCAPTCHA posted to the Google Blog on 9/16/09:

> In this way, reCAPTCHA’s unique technology improves the process that converts scanned images into plain text, known as Optical Character Recognition (OCR). This technology also powers large scale text scanning projects like Google Books and Google News Archive Search. Having the text version of documents is important because plain text can be searched, easily rendered on mobile devices and displayed to visually impaired users. So we'll be applying the technology within Google not only to increase fraud and spam protection for Google products but also to improve our books and newspaper scanning process.

Solution 12 - Security

The easiest way to defeat Captchas is Amazon Mechanical Turk. There's a guy named Kermit Welda who pays people a nickel each to register Hotmail, AOL and Gmail accounts. That's 6,000 fake email accounts at 5 cents = $300 a day. The cost of doing business is pretty cheap when you have other people do the dirty work for you. No wonder our server's spam filters want to reject anything from Hotmail.

Solution 13 - Security

AFAIK In practice there is no tool to crack RE-captcha implementation, however eventually I assume someone will get it.

Funny enough if someone manages to get it then the whole RE-captcha project is pointless because re-captcha designed digitalize books which can't be done in an automated way.

BTW :

> The weakness of CAPTCHA systems is > that people set up rooms full of > people in China whose only job it is > is to look at a CAPTCHA image and type > in the result, which plugs into the > automated system that's actually doing > the spamming.

You can't secure a system thinking like that, this is like saying "your web application is not secure enough if your host is not in a old military bunker, because now people can steal your machine".

Solution 14 - Security

There are lots of methods that are used to crap recaptcha. While its hard to use neural netwpork enabled programs to automatically solve them, its possible to grab the image and have amazon's mechanical turk or some equivalent program to solve them.

http://codemagician.wordpress.com/2010/01/22/solving-recaptcha/

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionDave RutledgeView Question on Stackoverflow
Solution 1 - SecurityAviDView Answer on Stackoverflow
Solution 2 - SecurityMathias BynensView Answer on Stackoverflow
Solution 3 - SecuritycletusView Answer on Stackoverflow
Solution 4 - SecurityDavGarciaView Answer on Stackoverflow
Solution 5 - SecurityBoltBaitView Answer on Stackoverflow
Solution 6 - SecurityGeorg SchöllyView Answer on Stackoverflow
Solution 7 - SecurityFernando MiguélezView Answer on Stackoverflow
Solution 8 - SecurityFryGuyView Answer on Stackoverflow
Solution 9 - SecurityGennady Vanin Геннадий ВанинView Answer on Stackoverflow
Solution 10 - SecurityBenjamin FranzView Answer on Stackoverflow
Solution 11 - SecurityMikeView Answer on Stackoverflow
Solution 12 - SecurityDr. KlahnView Answer on Stackoverflow
Solution 13 - Securitydr. evilView Answer on Stackoverflow
Solution 14 - SecurityredstickView Answer on Stackoverflow