How to scrape Amazon Reviews using Python

This tutorial is a follow-up to Tutorial: How To Scrape Amazon Product Details and Pricing using Python, by extending the Amazon price data to also cover product reviews. The scope of this tutorial is limited to web scraping an Amazon product page to retrieve review summary and the first page of customer reviews for any product from Amazon.

If you need a quick and easy scraper to extract all reviews for a product from amazon, please follow the tutorial below

Extract all reviews from Amazon.com with just Google Chrome and Web Scraper Extension

Scraping Customer Reviews from Amazon can be useful for

  1. Getting complete review details that you can’t get with the Amazon Product Advertising API.
  2. Monitoring  customer opinion on products that you sell or manufacture using Data Analysis
  3. Create Amazon Review Datasets for Educational Purposes and Research

Amazon used to provide access to product reviews through their Product Advertising API to developers and sellers, a few years back. They discontinued that on November 8, 2010, preventing customers from displaying Amazon reviews about their products, embedded in their websites. As of now, Amazon only returns a link to the review.

Amazon Product Advertising API Review

Take a look at the screenshot below, from a StackOverflow thread on the same topic.

amazon-customer-review-api-discontinued-stack-over-flow

We were able to find few tutorials on doing this using Perl ( http://archive.oreilly.com/pub/h/977 ). Being the Python Enthusiasts, we are ( check out the other web scraping tutorials we have published before), we thought of making one using simple Python and the simple python library – LXML.

We’ll follow this post up with a tutorial on how to turn this code into a web API that you can use or integrate with your projects.

Requirements

For this web scraping tutorial using Python 3, we will need some packages for downloading and parsing the HTML. Below are the package requirements.

Install Python 3 and Pip

Here is a guide to install Python 3 in Linux – http://docs.python-guide.org/en/latest/starting/install3/linux/

Mac Users can follow this guide – http://docs.python-guide.org/en/latest/starting/install3/osx/

Windows Users go here – https://www.scrapehero.com/how-to-install-python3-in-windows-10/

Install Packages

If you don't like or want to code, ScrapeHero Cloud is just right for you!

Skip the hassle of installing software, programming and maintaining the code. Download this data using ScrapeHero cloud within seconds

Scrape this in cloud for free
Deploy to ScrapeHero Cloud

The Code

Here is the GIST link for the code above https://gist.github.com/scrapehero/900419a768c5fac9ebdef4cb246b25cb

If you would like the code in Python 2.7, you can check this link – https://gist.github.com/scrapehero/3d53ae193766bc51408ec6497fbd1016.

Modify the code below. Add your own ASINs to the line. AsinList = ['B01ETPUQ6E','B017HW9DEW'] If you are getting banned by Amazon, try increasing the delay from 5 seconds by editing the line sleep(5). Increase to say 10 seconds. sleep(10)

def ReadAsin():
    #Add your own ASINs here 
    AsinList = ['B01ETPUQ6E','B017HW9DEW']
    extracted_data = []
    for asin in AsinList:
        print "Downloading and processing page http://www.amazon.com/dp/"+asin
        extracted_data.append(ParseReviews(asin))
        sleep(5)
    f=open('data.json','w')
    json.dump(extracted_data,f,indent=4)

Once you are done modifying the script, run this script using Python 3 in a Terminal or Command Prompt. We named our file amazon_review_scraper.py.

 python amazon_review_scraper.py

Once the script completes running, you can see a file called data.json, with the reviews data in a JSON format.

Below is the formatted output we received for the ASINs we supplied

amazon-review-scraper-output-scrapehero

Here is the full output attached in a GIST.

This code should work for a relatively small number of ASINs for your personal projects, but if you want to scrape websites for thousands of pages, learn about the challenges here Scalable do-it-yourself scraping – How to build and run scrapers on a large scale.

Thanks for reading and if you need help with your complex scraping projects let us know and we will be glad to help.

Do you need some professional help to scrape Amazon Data? Let us know

Turn the Internet into meaningful, structured and usable data


Please DO NOT contact us for any help with our Tutorials and Code using this form or by calling us, instead please add a comment to the bottom of the tutorial page for help

Disclaimer: Any code provided in our tutorials is for illustration and learning purposes only. We are not responsible for how it is used and assume no liability for any detrimental usage of the source code. The mere presence of this code on our site does not imply that we encourage scraping or scrape the websites referenced in the code and accompanying tutorial. The tutorials only help illustrate the technique of programming web scrapers for popular internet websites. We are not obligated to provide any support for the code, however, if you add your questions in the comments section, we may periodically address them.

Posted in:   eCommerce Data Gathering Tutorials, Web Scraping Tutorials

Responses

Sarah November 25, 2018

how would we get like 100 reviews off the site?

Reply

    ScrapeHero November 25, 2018

    You would need to find the link to next page of reviews and parse it similarly as in this tutorial

    Reply

clarosantiago January 18, 2019

Is there any way to get the product rank as well?

Reply

doomhouse May 23, 2019

How would you scrape 12000 products by search query only?

Reply

    ScrapeHero May 24, 2019

    Amazon restricts the number it shows and it is far below 12000

    Reply

Comments or Questions?

Turn the Internet into meaningful, structured and usable data