How To Scrape Amazon Product Details and Pricing using Python and SelectorLib

In this tutorial, we will build an Amazon scraper for extracting product details and pricing. We will build this simple web scraper using Python and SelectorLib and run it in a console. But before we start, let’s look at what can you use it for.

How to use Amazon Product Data

  1. Scrape Product Details that you can’t get with the Product Advertising API
    Amazon provides a Product Advertising API, but like most other “API”s, this API doesn’t provide all the information that Amazon has on a product page. A scraper can help you extract all the details displayed on the product page.
  2. Monitor products for change in Price, Stock Count/Availability, Rating, etc.
    By using a web scraper, you can update your data feeds on a timely basis to monitor any product changes. These data feeds can help you form pricing strategies by looking at your competition – other sellers or brands.
  3. Analyze how a particular Brand sells on Amazon
    If you’re a retailer, you can monitor your competitor’s products and see how well they do in the market and make adjustments to reprice and sell your products. You could also use it to monitor your distribution channel to identify how your products are sold on Amazon by sellers, and if it is causing you any harm.
  4. Find Customer Opinions from Amazon Product Reviews
    Reviews offer abundant amounts of information. If you’re targeting an established set of sellers who have been selling reasonable volumes, you can extract the reviews of their products to find what you should avoid and what you could quickly improve on while trying to sell similar products on Amazon.

Or anything else – the possibilities are endless and only bound by your imagination.

What data are we extracting from Amazon?

This tutorial is limited to extracting the data points below, from a product page:

  1. Product Name
  2. Category
  3. Original Price
  4. Sale Price
  5. Availability

We’ll build a scraper in Python that can extract details of any product URL from Amazon.

If you don't like or want to code, ScrapeHero Cloud is just right for you!

Skip the hassle of installing software, programming and maintaining the code. Download this data using ScrapeHero cloud within seconds

Scrape this in cloud for free
Deploy to ScrapeHero Cloud

Setting up your computer for web scraper development

We will use Python 3 for this tutorial. The code will not run if you are using Python 2.7. To start, you need a computer with Python 3 and PIP installed in it.

Most UNIX operating systems like Linux and Mac OS comes with Python pre-installed. But, not all the Linux Operating Systems ship with Python 3 by default.

Let’s check your python version. Open a terminal (in Linux and Mac OS) or Command Prompt (on Windows) and type

python --version


python -V

and press enter. If the output looks something like Python 3.x.x, you have Python 3 installed. If it says Python 2.x.x you have Python 2. If it prints an error, you don’t probably have python installed.

If you don’t have Python 3, install it first.


Install Python 3 and Pip

Here is a guide to install Python 3 in Linux –

Mac users can follow this guide –

Windows Users go here –

Install Packages

We will need some packages. Below are the package requirements:

What is SelectorLib?

Selectorlib is a combination of two packages:

  • A chrome extension that lets you markup data on websites and exports a YAML file with it.
  • A python library that reads this YAML file, and extracts the data you marked up on the page.

Instead of inspecting HTML elements and writing XPath for each data point, we are going to use the Python package SelectorLib and its companion chrome extension.

Creating YAML File

We will create the YAML file to extract the product details from a product page.

Downloading and Installing SelectorLib

After downloading the SelectorLib extension, open the Chrome browser and go to the product link you need to markup and extract data from. Right-click anywhere on the page, go to ‘Inspect’ and the Developer Tools Console will pop up. Click on ‘Selectorlib’.

Creating a Template and Adding Elements

Select ‘Create Template’ and give the template a name. We have named the template amazon.

Next, we will add the product details one by one. Select a type and enter the selector name for an element. On the selector input, click on ‘Select Element’ and hover over the page. Click when the element you need is highlighted in green and press ‘Save’. The GIF below shows how to add elements.

Highlight and Preview Data

Once you have created the template, click on ‘Highlight’ to highlight and preview all of your selectors. Finally, click on ‘Export’ and the download the YAML file.

If you would like to learn more about the extension and how to markup the data –

This is how the YML file will look like:

    css: 'div.a-subheader ul.a-unordered-list'
    type: Text
    css: span.priceBlockStrikePriceString
    type: Text
    css: span.a-size-large
    type: Text
    css: 'td.a-span12 span.a-size-medium'
    type: Text
    css: 'div.feature div.a-section div.feature div.a-section div.a-section span.a-size-medium'
    type: Text

The Code

from selectorlib import Extractor
import requests 
import json 
import argparse

argparser = argparse.ArgumentParser()
argparser.add_argument('url', help='Amazon Product Details URL')

# Create an Extractor by reading from the YAML file
e = Extractor.from_yaml_file('selectors.yml')

user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36 Edge/12.246'
headers = {'User-Agent': user_agent}

# Download the page using requests
args = argparser.parse_args()
r = requests.get(args.url, headers=headers)
# Pass the HTML of the page and create 
data = e.extract(r.text)
# Print the data 
print(json.dumps(data, indent=True))

Running the Scraper

Execute the full code by typing the script name followed by a -h in command prompt or terminal:

usage: [-h] url 

positional arguments:
 url Amazon Product Details URL 

optional arguments:
 -h, --help show this help message and exit

Here is the command to extract the product details of a URL:


You should see a JSON file in the same folder as the script:

    "category": "Clothing, Shoes & Jewelry › Girls › Accessories › Sunglasses"
    "originalprice": "$13.88"
    "name": "YAMAZI Kids Polarized Sunglasses Sports Fashion For Boys Girls Toddler Baby And Children"
    "saleprice": "$10.99"
    "availability": "In Stock."    

What to do if you run into captchas (Blocked) while scraping

We are adding this extra section to talk about some methods you could use to not get blocked while scraping Amazon. Amazon is very like to flag you as a “BOT” if you start scraping hundreds of pages using the code above. The easy answer is to NOT get flagged as BOT. Okay, how do we do that?

Mimic human behavior as much as possible.

While we cannot guarantee that you will not be blocked, we can share some tips and tricks on how to avoid Captchas.

1. Use proxies and rotate them

Let us say we are scraping hundreds of products on from a laptop, which usually has just one IP address. Amazon would know that we are a bot in no time as NO HUMAN would ever visit hundreds of products in a minute, say an hour. To look more like a human –  make requests to through a pool of IP Addresses or proxies. The rule of thumb here is to have 1 proxy or IP address make not more than 5 requests to Amazon in a minute. If are scraping about 100 pages per minute, we need about 100/5 = 20 Proxies. You can read more about rotating proxies here

2. Specify the User Agents of latest browsers and rotate them

If you look at the code above, you will a line where we had set User-Agent String for the request we are making.

 'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36'

Just like proxies, it always good to have a pool of User Agent Strings. Just make sure are using user-agent strings of the latest and popular browsers and rotate the strings for each request you make to Amazon. You can learn more about rotating user agent string in python here.  It is also a good idea to create a combination of  (User-Agent, Ip Address), so that it looks more human than bot.

3. Reduce the number of ASINs you scrape per minute

You can try slowing down the scrape a bit, to give Amazon fewer chances of flagging you as a bot. You don’t have to be too slow. But about 5 requests per IP per minute isn’t much throttling. If you need to go faster, add more proxies. You can modify the speed by increasing or decreasing the delay in the sleep function of line 18 of the code above.

        # Retrying for failed requests
        for i in range(20):
            # Generating random delays
            # Adding verify=False to avold ssl related issues

4. Retry, Retry, Retry

When you are blocked by Amazon, make sure you retry that request. If you look at the code block above we have added 20 retries. Our code retries immediately after the scrape fails, you could do an even better job here by creating a retry queue using a list, and retry them after all the other products are scraped from Amazon.

How to scrape Amazon on a large scale

This code should work for small-scale scraping and hobby projects and get you started on your road to building bigger and better scrapers. However, if you do want to scrape Amazon for thousands of pages at short intervals here are some important things to keep in mind:

1. Use a Web Scraping Framework like PySpider or Scrapy

When you’re crawling a massive site like, you need to spend some time to figure out how to run your entire crawl smoothly. Choose an open-source framework for building your scraper, like Scrapy or PySpider which are both based in Python. These frameworks have pretty active communities and can take care of handling a lot of the errors that happen while scraping without disturbing the entire scraper. Most of them also let you use multiple threads to speed up scraping – if you are using a single computer. Scrapy can be deployed to your own servers using ScrapyD.

2. If you need speed, Distribute and Scale-Up using a Cloud Provider

There is a limit to the number of pages you can scrape when using a single computer. If you are going to scrape Amazon on a large scale (millions of product pages a day), you need a lot of servers to get the data within a reasonable time. You could consider hosting your scraper in the cloud and use a scalable Version of the Framework – like Scrapy Redis. For a broader crawl, you can use a message broker like Redis, Rabbit MQ, Kafka, etc., so that you can run multiple spider instances to speed up the crawl.

3. Use a scheduler if you need to run the scraper periodically

If you are using a scraper to get updated prices or stock counts of products, you need to update your data frequently to keep track of the changes. If you are using the script in this tutorial, use CRON (in UNIX) or Task Scheduler in Windows to schedule it. If you are using Scrapy, scrapyd+cron can help schedule your spiders so you can refresh the data promptly.

4. Use a database to store the Scraped Data from Amazon

If you are scraping a large number of products from Amazon, writing data to a file would soon become inconvenient. Retrieving data becomes tough, and you might even end up getting gibberish inside the file when multiple processes write to a single file. Using a database is recommended even if you are scraping from a single computer. MySQL will be just fine for moderate workloads, and you can use simple analytics on the scraped data tools like Tableau, PowerBI or Metabase by connecting them to your database. For larger write loads you can look into some of the NoSQL databases like MongoDB, Cassandra, etc.

5. Use Request Headers, Proxies, and IP Rotation to prevent getting Captchas from Amazon

Amazon has a lot of anti-scraping measures. If you are throttling Amazon, you’ll be blocked in no time and you’ll start seeing captchas instead of product pages. To prevent that to a certain extent, while going through each Amazon product page, it’s better to change your headers by replacing your UserAgent value to make requests look like they’re coming from a browser and not a script.
If you’re going to crawl Amazon at a very large scale, use Proxies and IP Rotation to reduce the number of captchas you get. You can learn more techniques to prevent getting blocked by Amazon and other sites here –  How to prevent getting blacklisted while scraping.  You can also use python to solve some basic captchas using an OCR called Tesseract.

6. Write some simple data quality tests

Scraped data is always messy. An XPath that works for a page might not work for another variation of the same page on the same site. Amazon has LOTS of product page layouts. If you spend an hour writing some basic sanity check for your data – like verifying if the price is a decimal, a title a string less than say 250 characters, etc., you’ll know when your scraper breaks and you’ll also be able to minimize its impact. This is a must if you feed the scraped amazon data feeds into some price optimization program.

We hope this tutorial gave you a better idea on how to scrape Amazon or similar e-commerce websites. As a company, we understand e-commerce data having worked with it before. If you are interested in professional help with scraping complex websites, let us know, and we will be glad to help.

Need some help with scraping eCommerce data?

Turn the Internet into meaningful, structured and usable data

Please DO NOT contact us for any help with our Tutorials and Code using this form or by calling us, instead please add a comment to the bottom of the tutorial page for help

Disclaimer: Any code provided in our tutorials is for illustration and learning purposes only. We are not responsible for how it is used and assume no liability for any detrimental usage of the source code. The mere presence of this code on our site does not imply that we encourage scraping or scrape the websites referenced in the code and accompanying tutorial. The tutorials only help illustrate the technique of programming web scrapers for popular internet websites. We are not obligated to provide any support for the code, however, if you add your questions in the comments section, we may periodically address them.


shree May 14, 2019

How to scrape the feedback from consumer?
Thanks in advance


Bharat Bhushan June 25, 2019

@ ScrapeHero
Can you please give some idea like how to crawl data from amazon for a specific city ?


Tiana August 15, 2019

I am getting this errors:”, line 72, in
ReadAsin()”, line 67, in ReadAsin
PermissionError: [Errno 13] Permission denied: ‘data.json’


    ScrapeHero August 16, 2019

    Looks like the output file cannot be written due to lack of permissions.
    Please google for such generic python errors.


jan August 19, 2019

Is there any way to scrape the Asin automatically? I mean, I want to scrapy over 1000+ products and I don’t want to make a list with that much Asin numbers.


Comments or Questions?

Turn the Internet into meaningful, structured and usable data   


Get Amazon Product Details using our Real-Time API