How To Scrape Amazon Product Details and Pricing using Python


In this tutorial is  we will build an amazon scraper for extracting product details and pricing. We will build this simple web scraper using python and LXML and run it in a console. But before we start, let’s look at what can you use it for.

What can you use an Amazon Scraper for ?

  1. Scrape Product Details that you can’t get with the Product Advertising API
    Amazon provides a Product Advertising API, but like most other “API”s, this API doesn’t provide all the information that Amazon has on a product page. A scraper can help you extract all the details displayed on the product page.
  2. Monitor products for change in Price, Stock Count/Availability, Rating, etc.
    By using a web scraper, you can update your data feeds on a timely basis to monitor any product changes. These data feeds can help you form pricing strategies by looking your competition – other sellers or brands.
  3. Analyze how a particular Brand sells on Amazon
    If you’re a retailer, you can monitor your competitor’s products and see how well they do in the market and make adjustments to reprice and sell your products. You could also use it to monitor your distribution channel to identify how your products are sold on Amazon by sellers, and if it is causing you any harm.
  4. Find Customer Opinions from Amazon Product Reviews
    Reviews offer abundant amounts of information. If you’re targeting an established set of sellers who have been selling reasonable volumes, you can extract the reviews of their products to find what you should avoid and what you could quickly improve on while trying to sell similar products on Amazon.

Or anything else – the possibilities are endless and only bound by your imagination

What data are we extracting from Amazon?

This tutorial is limited to extracting the data points below, from a product page:

  1. Product Name
  2. Category
  3. Original Price
  4. Sale Price
  5. Availability
  6. URL

We’ll build a scraper in Python that can go to any Amazon product page using an ASIN – a unique ID Amazon uses to keep track of products in its database.

First, let’s identify a product ASIN.

For example, in this product – Imploding Kittens ), the ASIN is B01HSIIFQ2.

Gather the ASINs for the products you need data from.

The next step is to build a script that goes to each one of those product pages, downloads its HTML and extracts the fields you need- e.g., Product Title, Price, Description, etc.

XPaths are used to tell the script where each field we need is present in the HTML. XPaths are one of the few ways in which you can select some content from a big blob of XML or HTML (properly structured HTML is similarly structured as an XML document) content. An XPath tells you the location of an element, just like a catalog card does for books. We’ll find XPaths for each of the fields we need and put that into our scraper.

Once we extract this information, we’ll save it into a JSON file.

Since we already have the list of products, let’s get started.

Skip all this code and get this scraper as a ready to integrate API ( 500 API calls per month for free ) by signing up for our preview

What tools do we need?

For this tutorial, we will stick to using Python and a couple of python packages for downloading and parsing the HTML. Below are the package requirements:

  • Python 2.7 available here ( )
  • Python PIP to install the following packages in Python (
  • Python Requests available here ( Requests allow you to send HTTP requests. You won’t need to add query strings to your URLs manually. It’s an easy-to-use library with a lot of features ranging from passing parameters in URLs to sending custom headers and SSL Verification.
  • Python LXML (Learn how to install that here –

If you have PIP, installing Requests and LXML would be as easy as running the line below in a python enabled terminal:

The Amazon Scraper

If the embed above doesn’t work, you can download the code directly from here.

Modify the code shown below with a list of your own ASINs.

Assuming the script is named Type in the script name in command prompt or terminal like this.


This will create a JSON output file called data.json with the data collected for the list of ASINs present in the AsinList.

The JSON output for a couple of ASINs will look similar to this:

You can also extract reviews from product pages. Head over to this new blog post to learn how.


6 things to keep in mind when scraping Amazon on a larger scale

Usually, there is a limit on large websites. Amazon lets you go through 400 pages per category. This should work for small-scale scraping and hobby projects and get you started on your road to building bigger and better scrapers. However, if you do want to scrape amazon for thousands of pages at short intervals there are some important things you should be aware of :

1. Use a Web Scraping Framework like PySpider or Scrapy

When you’re crawling a massive site like, you need to spend some time to figure out how to run your entire crawl smoothly. Choose an open-source framework for building your scraper, like Scrapy or PySpider which are both based in Python. These frameworks have pretty active communities and can take care of handling a lot of the errors that happen while scraping without disturbing the entire scraper. Most of them also let you use multiple threads to speed up scraping – if you are using a single computer. Scrapy can be deployed to your own servers using ScrapyD.

2. If you need speed, Distribute and Scale Up using a Cloud Provider

There is a limit to the number of pages you can scrape when using a single computer. If you are going to scrape Amazon on a large scale (millions of product pages a day), you need a lot of servers to get the data within a reasonable time. You could consider hosting your scraper in the cloud and use a scalable Version of the Framework – like Scrapy Redis. For a broader crawl, you can use a message broker like Redis, Rabbit MQ, Kafka, etc., so that you can run multiple spider instances to speed up the crawl.

3. Use a scheduler if you need to run the scraper periodically

If you are using a scraper to get updated prices or stock counts of products, you need to update your data frequently to keep track of the changes. If you are using the script in this tutorial, use CRON (in UNIX) or Task Scheduler in Windows to schedule it. If you are using Scrapy, scrapyd+cron can help schedule your spiders so you can refresh the data promptly.

4. Use a database to store the Scraped Data from Amazon

If you are scraping a large number of products from Amazon, writing data to a file would soon become inconvenient. Retrieving data becomes tough, and you might even end up getting gibberish inside the file when multiple processes write to a single file. Using a database is recommended even if you are scraping from a single computer. MySQL will be just fine for moderate workloads, and you can use simple analytics on the scraped data tools like Tableau, PowerBI or Metabase by connecting them to your database. For larger write loads you can look into some of the NoSQL databases like MongoDB, Cassandra etc.

5. Use Request Headers, Proxies, and IP Rotation to prevent getting Captchas from Amazon

Amazon has a lot of anti-scraping measures. If you are throttling amazon, you’ll be blocked in no time and you’ll start seeing captchas instead of product pages. To prevent that to a certain extent, while going through each Amazon product page, it’s better to change your headers by replacing your UserAgent value to make requests look like they’re coming from a browser and not a script.
If you’re going to crawl Amazon at a very large scale, use Proxies and IP Rotation to reduce the number of captchas you get. You can learn more techniques to prevent getting blocked by Amazon and other sites here –  How to prevent getting blacklisted while scraping.  You can also use python to solve some basic captchas using an OCR called Tesseract. 

6. Write some simple data quality tests

Scraped data is always messy. An XPath that works for a page might not work for another variation of the same page on the same site. Amazon has LOTS of product page layouts. If you spend an hour writing some basic sanity check for your data – like verifying if the price is a decimal, a title a string less than say 250 characters, etc., you’ll know when your scraper breaks and you’ll also be able to minimize its impact. This is a must if you feed the scraped amazon data feeds into some price optimisation program.

We hope this tutorial gave you a better idea on how to scrape Amazon or similar e-commerce websites. As a company, we understand e-commerce data having worked with it before. If you are interested in professional help with scraping complex websites, let us know, and we will be glad to help.

Need some help with scraping eCommerce data?

Turn websites into meaningful and structured data through our web data extraction service

Disclaimer: Any code provided in our tutorials is for illustration and learning purposes only. We are not responsible for how it is used and assume no liability for any detrimental usage of the source code. The mere presence of this code on our site does not imply that we encourage scraping or scrape the websites referenced in the code and accompanying tutorial. The tutorials only help illustrate the technique of programming web scrapers for popular internet websites. We are not obligated to provide any support for the code, however, if you add your questions in the comments section, we may periodically address them.

44 comments on “How To Scrape Amazon Product Details and Pricing using Python


I don’t get the output. No error too. In the json file all the values are ‘null’, for eg:
“CATEGORY”: null,
“NAME”: null,
“URL”: “”,
“SALE_PRICE”: null,


I got my mistake. We need to give our own headers={ }. The Useragent is different for different users. This can be easily get from using the link give below.


    Glad to hear you got it working !

    Subhasis Mukherjee

    Thank for this. This solved the issue.


    Hi , the user agent trick didn’t work for me, scrapehero is there something changed on the amazon code that i get this results:

    “CATEGORY”: null,
    “ORIGINAL_PRICE”: null,
    “NAME”: null,
    “URL”: “”,
    “SALE_PRICE”: null,
    “AVAILABILITY”: null
    “CATEGORY”: null,
    “ORIGINAL_PRICE”: null,
    “NAME”: null,
    “URL”: “”,
    “SALE_PRICE”: null,
    “AVAILABILITY”: null


      We will have a look at the code and see if it still works and get back with a comment as soon as our paying job allows 😉


      thanks bro..

Rakesh Pandey

Nice python script. Great work for beginner.


Are there any cheap web hosting solutions what have Python installed? Hoping I could set up my required Amazon products, update prices daily then point a website/app to the .json file on my new shared hosting.

Maybe even AWS, Azure etc or a Cloud IDE. Just looking for a simple solution to start off with.


    Most VPSs or shared hosting plans support Python. Just ask them before buying.

Konstantinos Bazakos

Nice implementation! Very well done! Just a question…What is the purpose of the sleep() functions? How comes Amazon does not return a typical robot/spider message to use their api?


    sleep just pauses the execution for a bit so that we dont hammer the server.
    Can you clarify the second part of the question – not sure what that means.


      In the beginning I did not use headers in the requests.get() so in the HTML (html.fromstring()) content there was the following message “To discuss automated access to Amazon data please contact mail. For information about migrating to our APIs refer to our Marketplace APIs at link, or our Product Advertising API at link for advertising use cases.” from Amazon.


        You should mimic the browser as much as possible including headers, cookies and sessions – that with IP rotation will work for small scale data gathering


Any wayto extract the reviews based in ASIN number for particular product

Saul Bretado

Does this code works for extracting 1500 products?… Adding IP rotation off course. Please let me know.


    Hi Saul,
    The code should work but at those numbers (1500 products) the code is not the problem.
    Everything else related to web scraping that we have written about on our site starts to matter.
    Please try the code by modifying it and let us know.


      Saul Bretado

      I was trying to read a csv file as:

      AsinList = csv.DictReader(open(os.path.join(os.path.dirname(__file__),”asinnumbers.csv”)))

      But I am getting the error below:

      Traceback (most recent call last):
      File “”, line 66, in
      File “”, line 57, in ReadAsin
      url = “”+i
      TypeError: cannot concatenate ‘str’ and ‘dict’ objects

      Any recommendations? I already google about, but could not find anything.


        Hi Saul,

        You are trying to concatenate a dictionary object with “”.

        Can you try replacing

        url = “”+i


        url = “”+i[‘asin’].

        This is assuming that your CSV looks like this


          Saul Bretado

          Thanks a lot for this amazing tutorial, but, after using the script for few days, now is not working well, I am getting much as bellow:

          “CATEGORY”: null,
          “ORIGINAL_PRICE”: null,
          “NAME”: null,
          “URL”: “”,
          “SALE_PRICE”: null,
          “AVAILABILITY”: null

          And as I told you, everything was working amazing well, even I add the code below to switch headers every time…

          navegador = randint(0,2)
          if navegador==0:
          headers = {‘User-Agent’: ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36’}
          print ‘Using Chrome’
          elif navegador==1:
          headers = {‘User-Agent’: ‘Mozilla/5.0 (Windows NT 6.3; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0’}
          print ‘Using Firefox’
          headers = {‘User-Agent’: ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36 Edge/12.10240’}
          print ‘Using Edge’

          And, everything was perfect, til today, any ideas why?


Does anyone know of a commercial version of this process? I am looking to scrape Amazon data for an inventory system. We have the ASINs on incoming excel sheets, but need to pull product data and images to populate the inventory. We’d be happy to pay for a pre-existing version of this process rather than build it ourselves or hire a developer.

Jamen McGranahan

The main issue I see with this is that it only gets the offer from the Buy Box, but not every offer available from Amazon. I’m trying to do this now to see if I can get it to work; just not overly familiar with python. But I know the URLs stay pretty much the same:{ASIN}/ref=olp_f_freeShipping?ie=UTF8&f_freeShipping=true&f_new=true&f_primeEligible=true


    Hi James,
    You are correct, the tutorial only scrapes the buy box price.
    You will need to modify the code to get the 3rd party sellers.


Hello ScrapeHero,
what of I want to get other product details, how can I change the code, I assume it’s the following parts
“XPATH_NAME = ‘//h1[@id=”title”]//text()’
XPATH_SALE_PRICE = ‘//span[contains(@id,”ourprice”) or contains(@id,”saleprice”)]/text()’
XPATH_ORIGINAL_PRICE = ‘//td[contains(text(),”List Price”) or contains(text(),”M.R.P”) or contains(text(),”Price”)]/following-sibling::td/text()’
XPATH_CATEGORY = ‘//a[@class=”a-link-normal a-color-tertiary”]//text()’
XPATH_AVAILABILITY = ‘//div[@id=”availability”]//text()'”


    Hi Dan,
    Yes – you will need to add or update XPATHS to get additional data.


I Try to get Image url using this xpath :

XPATH_IMG = ‘//div[@class=”imgTagWrapper”]/img/@src//text()’

but the result is Null, can you give me the point to achieved this

    syed mustafa

    Yes i am also getting same error Did you Find the solution if yes please help


Hejsan from Sweden,

I am a total “dummie” regarding python. I tried to use this code with Python 3 instead. There you have pip and requests included as I understand. Anyway, I do not get a data.json file respectively the provided code is not running and if i check it through python they mention missing parentheses. I just wonder if the code should work for python 3 as well and if not, why? Is it a different language?

best regards,



    Hi Chris,
    Yes it is almost a new language – v2 code will not work in 3 for most cases especially with libraries used.
    Try downloading and running in V2.



    Hi Chris,

    I am running the following version of python:
    Python 3.5.2 |Anaconda custom (64-bit)| (default, Jul 2 2016, 17:53:06)
    [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
    Type “help”, “copyright”, “credits” or “license” for more information.

    I changed the code only a little to fit python 3. Pasted the code below. Let me know if you need any help.

    from lxml import html
    import csv,os,json
    import requests
    #from exceptions import ValueError
    from time import sleep

    def AmzonParser(url):
    headers = {‘User-Agent’: ‘Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36’}
    page = requests.get(url,headers=headers)
    while True:
    doc = html.fromstring(page.content)
    XPATH_NAME = ‘//h1[@id=”title”]//text()’
    XPATH_SALE_PRICE = ‘//span[contains(@id,”ourprice”) or contains(@id,”saleprice”)]/text()’
    XPATH_ORIGINAL_PRICE = ‘//td[contains(text(),”List Price”) or contains(text(),”M.R.P”) or contains(text(),”Price”)]/following-sibling::td/text()’
    XPATH_CATEGORY = ‘//a[@class=”a-link-normal a-color-tertiary”]//text()’
    XPATH_AVAILABILITY = ‘//div[@id=”availability”]//text()’

    RAW_NAME = doc.xpath(XPATH_NAME)

    NAME = ‘ ‘.join(”.join(RAW_NAME).split()) if RAW_NAME else None
    SALE_PRICE = ‘ ‘.join(”.join(RAW_SALE_PRICE).split()).strip() if RAW_SALE_PRICE else None
    CATEGORY = ‘ > ‘.join([i.strip() for i in RAW_CATEGORY]) if RAW_CATEGORY else None

    if not ORIGINAL_PRICE:

    if page.status_code!=200:
    raise ValueError(‘captha’)
    data = {

    return data
    except Exception as e:

    def ReadAsin():
    # AsinList = csv.DictReader(open(os.path.join(os.path.dirname(__file__),”Asinfeed.csv”)))
    AsinList = [‘B0046UR4F4’,
    extracted_data = []
    for i in AsinList:
    url = “”+i
    print(“Processing: “+url)

    if __name__ == “__main__”:


Hello there!

What is my item price change according to its color?

Great script, love it 🙂


Thanks a lot for this very useful script. I m going to the next step : Scalable do-it-yourself scraping – How to build and run scrapers on a large scale

Neha Sharma

Hi In the bulk extraction for product details ! is it limited to 10 , Would it be possible to extract more than 10 Product details


    Yes its possible for more than 10 ID’s.


AVAILABILITY does not work in .cn website.


Hi there!

What if I have a list of Urls in this form (ASIN + Merchant ID) and only want to scrape the actual quantity?
Quantity: 30


































Harrison Kenning

I keep on getting this error: SSLError: HTTPSConnectionPool(host=’’, port=443): Max retries exceeded with url: /dp/B00YG0JV96 (Caused by SSLError(SSLError(“bad handshake: Error([(‘SSL routines’, ‘tls_process_server_certificate’, ‘certificate verify failed’)],)”,),))

What am I missing?


    Hi Harrison,
    It is most likely an old version of python.

      rijesh ck

      use verify=False. like this requests.get(url, headers=headers, verify=False)

Vivek Verma

Getting Captha i.e. the error, do let me know how to fix it.

ERROR: execution aborted


I am looking to modify this script to also scrape Walmart, Gamestop, Target, etc what resources can you point me to to modify this script to include those?


Ge This Error
File “”, line 48
print e
SyntaxError: Missing parentheses in call to ‘print’. Did you mean print(print e)?

Join the conversation

Turn websites into meaningful and structured data through our web data extraction service