// AK Testing collecting header - temporary process How to Solve Simple Captchas using Python Tesseract

How to Solve Simple Captchas using Python Tesseract


CAPTCHA stands for Completely Automated Public Turing test to tell Computers and Humans Apart. As the acronym suggests, it is a test used to determine whether the user is human or not. A typical captcha consists of a distorted test, which a computer program cannot interpret but a human can (hopefully) still read.

This tutorial will show you how to bypass simple captchas using an OCR in Python.

What is an OCR?

Optical Character Recognition, or OCR, is the recognition of printed or written characters by a computer. It enables you to convert different types of documents, such as scanned paper documents, PDF files or images captured by a digital camera into editable and searchable data.

Popular open source OCR tools are Tesseract, GOCR and Ocrad. We will use Tesseract for this tutorial.


Tesseract is an open source OCR engine for various operating systems. It’s considered one of the most accurate OCR engines currently available, with the precision depending on the clearness of the image. Google has sponsored its development since 2006.


Python-Tesseract is a python wrapper that helps you use Tesseract-OCR engine to convert images to the accepted format from Python. It can read all image types – png, jpeg, gif, tiff, bmp, etc.

Using Tesseract to solve a simple Captchas

Tesseract is designed to read regular printed text. If we want to use Tesseract effectively, we will need to modify the captcha images to remove the background noise, isolate the text and then pass it over to Tesseract to recognize the captcha.

Below are the package requirements for this tutorial in python.

  • Python 3.0 ( https://www.python.org/downloads/ )
  • PIP to install the following packages in Python ( https://pip.pypa.io/en/stable/installing/)
  • Tesseract-OCR, installation instructions for Tesseract are available at (https://github.com/tesseract-ocr/tesseract/wiki)
  • PyTesseract,  requires Python Imaging Library(PIL) and python version 2.5 or later. To install it use pip install pytesseract
  • Python Imaging Library (PIL) , for adding image processing capabilities to your Python interpreter and to support library formats. Install it using pip install pil
  • ImageMagic Tools  for processing and resampling image. Find the install instructions here https://www.imagemagick.org/script/download.php

    To know more about how to install PyTesseract with Tesseract, read here.

    For this tutorial, we will show you how to solve captchas. If the captchas we are trying to interpret are not difficult or messy we can make use of PyTesseract to bypass the captcha.

    As an example we will use the following captcha image

    After resampling, the image will look like this:



    The script below can recognize the captcha and read the captcha image.


    If the embed above does not work you can download the script from this link.

    The script is named captcha_resolver.py. To run this script in command prompt or terminal you must type in the script name followed by the name of the captcha image as shown below.

    python captcha_resolver.py cap.jpg

    This will give the output as

    Resolving Captcha
    Resampling the Image
    ('Extracted Text', 'Viearer')

    This code can resolve simple captchas with sufficient clarity like the one we have just shown above. Let us know in the comments how this script worked for you or if you have a better solution.

    If you need professional help with scraping complex websites, contact us by filling up the form below.

    Tell us about your complex web scraping projects

    Turn the Internet into meaningful, structured and usable data

    Please DO NOT contact us for any help with our Tutorials and Code using this form or by calling us, instead please add a comment to the bottom of the tutorial page for help

    Disclaimer: Any code provided in our tutorials is for illustration and learning purposes only. We are not responsible for how it is used and assume no liability for any detrimental usage of the source code. The mere presence of this code on our site does not imply that we encourage scraping or scrape the websites referenced in the code and accompanying tutorial. The tutorials only help illustrate the technique of programming web scrapers for popular internet websites. We are not obligated to provide any support for the code, however, if you add your questions in the comments section, we may periodically address them.

    Posted in:   Scraping Tips, Web Scraping Tutorials


      Vasista Reddy October 24, 2018

      Since all the pixels in the pic are either white or black, you can use “OPENCV” to find the contours and draw them on the blank image with black pixels and then follow the above steps


    Seb Tota January 3, 2019

    Hey guys I was wondering if you could help me out with this. I get an error:

    usage: captcha_resolver.py [-h] path
    captcha_resolver.py: error: too few arguments


      Seb Tota January 3, 2019

      Never mind. I have to learn to read directions more carefully. Thank you so much!


    Anryonga Daniel June 11, 2019

    Excuse me, when i ran following command, i can’t see recognized text!

    python ocr.py capture.png

    Resolving Captcha
    Resampling the Image
    (‘Extracted Text’, u”)

    help me!!!


    Amit Seth June 16, 2019

    Resolving Captcha
    Resampling the Image
    Invalid Parameter – -resample
    Traceback (most recent call last):
    File “captcha_resolver.py”, line 24, in
    captcha_text = resolve(path)
    File “captcha_resolver.py”, line 14, in resolve
    check_output([‘convert’, path, ‘-resample’, ‘600’, path])
    File “C:\ProgramData\Anaconda3\lib\subprocess.py”, line 395, in check_output
    File “C:\ProgramData\Anaconda3\lib\subprocess.py”, line 487, in run
    output=stdout, stderr=stderr)
    subprocess.CalledProcessError: Command ‘[‘convert’, ‘cap.jpg’, ‘-resample’, ‘600’, ‘cap.jpg’]’ returned non-zero exit status 4.

    I am getting the above error ! plz assist!


      NickCool June 20, 2019

      Try change ‘convert’ to ‘magick’.


    Bharath Reddy July 30, 2019

    Download/Create a .png file which contains a simple arithmetic expression like “2 + 2”. Write a python script which can interpret the expression, evaluate it and return the calculated value.
    The above is my task i have almost zero knowledge about that can any one help me that?


    sudhir S September 30, 2019

    how to extract captcha from a website to fill


    achikam mor December 3, 2019

    please help me!!!
    I’m trying to run your code after I downloaded and installed everything you said.
    but I keep getting this error : “usage: temp2.py [-h] path
    temp2.py: error: the following arguments are required: path”
    even when I change the “help” in line 17 to the location of your example photo in my computer or to its link by inspecting this page html code.
    what should I do to make it run?? what should I write in add_argument() in the ‘help’ part?
    thanks in advance.


      Cesar May 8, 2020

      Are you running this from your terminal?
      Not sure about windows but to run it on Mac would be the following:

      Open terminal.
      Go to the directory of your program; image should be in the same directory. (For example, I have my program and image on my desktop directory.)
      python3 program_name.py image_name.png (or whatever file extension for your image).


    Mat December 5, 2019

    Resolving Captcha
    Resampling the Image
    Traceback (most recent call last):
    File “captcha_solver.py”, line 33, in
    captcha_text = resolve(path)
    File “captcha_solver.py”, line 24, in resolve
    check_output([‘magick’, path, ‘-resample’, ‘600’, path])
    File “C:\Users\tarkan\AppData\Local\Programs\Python\Python37\lib\subprocess.py”, line 395, in check_output
    File “C:\Users\tarkan\AppData\Local\Programs\Python\Python37\lib\subprocess.py”, line 472, in run
    with Popen(*popenargs, **kwargs) as process:
    File “C:\Users\tarkan\AppData\Local\Programs\Python\Python37\lib\subprocess.py”, line 775, in __init__
    restore_signals, start_new_session)
    File “C:\Users\tarkan\AppData\Local\Programs\Python\Python37\lib\subprocess.py”, line 1178, in _execute_child
    FileNotFoundError: [WinError 2] The system cannot find the file specified


    DJO December 19, 2019

    Hello, can you help please? I am trying to automate my test using selenium python to Login. After entering the email address, and clicking the Next button, rather than bring up the password field, the system encounters a captcha which makes the test fail. How can I automate the Login to bypass the captcha please?


    Comments or Questions?

    Turn the Internet into meaningful, structured and usable data