How To Scrape Amazon Product Details and Pricing using Python

Scraping Amazon Tutorial (Custom)

Amazon provides a Product Advertising API, but like most APIs, the API doesn’t provide all the information that Amazon has on a product page.

The only way to get the exact data that you see on a product page is by using a web scraper. Scraping ensures that you can get exactly what you see by visiting the site using a web browser.

Scraping Amazon for data is useful for a lot of things, such as:

  1. Scrape product details that you can’t get with the Product Advertising API
  2. Monitor an item for change in Price, Stock Count/Availability, Rating etc.
  3. Analyze how a particular Brand is being sold on Amazon
  4. Analyze Amazon marketplace Sellers
  5. Analyze Amazon Product Reviews
  6. Or anything else – the possibilities are endless and only bound by your imagination

An easy way to get started with scraping Amazon is by building a crawler in Python that can go to any Amazon product’s page using an ASIN (a unique keyword Amazon uses to keep track of products in its database)

If you are looking for a service to collect this data for your business needs, we can help.

Get clean Amazon.com data delivered to you as a service


 

If not, lets continue with the tutorial.

First lets collect a list of products identified by their ASINs.
e.g. An ASIN looks like

B00JGTVU5A or B00GJYCIVK

Then we will download the HTML of each product’s page and start identify the XPaths for the data elements that you need – e.g. Product Title, Price, Description etc. Read more about XPaths here.

The Code

Prerequisites:

For this tutorial, we will stick to using basic Python and a couple of python packages – requests and lxml. We will not use more complicated packages like Scrapy for something simple.

You will need to install the following:

  • Python 2.7 available here ( https://www.python.org/downloads/ )
  • Python Requests available here ( http://docs.python-requests.org/en/master/user/install/) . You might need Python pip to install this available here – https://pip.pypa.io/en/stable/installing/)
  • Python LXML ( Learn how to install that here – http://lxml.de/installation.html )

We make this process a bit easier for you by providing you the actual Python code. The code will help scrape few important data elements such as Product Name, Price, Availability, Description etc.

Feel free to copy and modify it to your needs – that is the best way to learn ! You can download the code directly from here.

 

 

Modify the code shown below with a list of your own ASINs.

def ReadAsin():
  #Change the list below with the ASINs you want to track.
	AsinList = ['B0046UR4F4',
	'B00JGTVU5A',
	'B00GJYCIVK',
	'B00EPGK7CQ',
	'B00EPGKA4G',
	'B00YW5DLB4',
	'B00KGD0628',
	'B00O9A48N2',
	'B00O9A4MEW',
	'B00UZKG8QU',]
	extracted_data = []
	for i in AsinList:
		url = "http://www.amazon.com/dp/"+i
		extracted_data.append(AmzonParser(url))
		sleep(5)
	#Save the collected data into a json file.
	f=open('data.json','w')
	json.dump(extracted_data,f,indent=4)

and run it from a terminal or command prompt like this (if you name the file amazon_scraper.py):

python amazon_scraper.py 

You’ll get a file called data.json with the data collected for the ASINs you had in AsinList in the code.

Here is how the JSON output for a couple of ASINs will look like

{
        "CATEGORY": "Electronics > Computers & Accessories > Data Storage > External Hard Drives", 
        "ORIGINAL_PRICE": "$1,899.99", 
        "NAME": "G-Technology G-SPEED eS PRO High-Performance Fail-Safe RAID Solution for HD/2K Production 8TB (0G01873)", 
        "URL": "http://www.amazon.com/dp/B0046UR4F4", 
        "SALE_PRICE": "$949.95", 
        "AVAILABILITY": "Only 1 left in stock."
    }, 
    {
        "CATEGORY": "Electronics > Computers & Accessories > Data Storage > USB Flash Drives", 
        "ORIGINAL_PRICE": "$599.95", 
        "NAME": "G-Technology G-RAID USB Removable Dual Drive Storage System 8TB (0G04069)", 
        "URL": "http://www.amazon.com/dp/B00UZKG8QU", 
        "SALE_PRICE": "$599.95", 
        "AVAILABILITY": "Only 2 left in stock."
    }

This should work for small scale scraping and hobby projects and get you started on your road to building bigger and better scrapers.

However, if you want to scrape websites for thousands of pages there are some important things you should be aware of and you can read about them at Scalable do-it-yourself scraping – How to build and run scrapers on a large scale.

Web scraping is very useful to automate such simple or many complex tasks that can easily be done by computers.

Thanks for reading and if you need help with your complex scraping projects let us know and we will be glad to help.

EDIT: Nov 25 2016 – If you want to also scrape Amazon reviews for a product, head over to this new blog post.

Need some help with scraping eCommerce data?


Disclaimer: Any code provided in our tutorials is for illustration and learning purposes only. We are not responsible for how it is used and assume no liability for any detrimental usage of the source code. The mere presence of this code on our site does not imply that we encourage scraping or scrape the websites referenced in the code and accompanying tutorial. The tutorials only help illustrate the technique of programming web scrapers for popular internet websites. We are not obligated to provide any support for the code, however, if you add your questions in the comments section, we may periodically address them.

27 thoughts on “How To Scrape Amazon Product Details and Pricing using Python

  1. I Try to get Image url using this xpath :

    XPATH_IMG = ‘//div[@class=”imgTagWrapper”]/img/@src//text()’

    but the result is Null, can you give me the point to achieved this

Join the conversation