How to Scrape Amazon Reviews using Python in 3 steps

In this web scraping tutorial, we will build an Amazon Review Scraper using Python in 3 steps, which can extract review data from Amazon products such as – Review Title, Review Content, Product Name, Rating, Date, Author and more, into an Excel spreadsheet. You can also check out our tutorial on how to build a Python scraper to scrape Amazon product details and pricing. We will build this simple Amazon review scraper using Python and SelectorLib and run it in a console.

Here are the steps on how you can scrape Amazon reviews using Python

  1. Markup the data fields to be scraped using Selectorlib
  2. Copy and run the code provided
  3. Download the data in Excel (CSV) format.

We have also provided how you can scrape product details from Amazon search result page, how to avoid getting blocked by Amazon and how to scrape Amazon on a large scale below.

If you do not want to code, we have made it simple to do all this for FREE and in a few clicks. ScrapeHero Cloud can scrape reviews of Amazon products within seconds!

Use Amazon Review Scraper from ScrapeHero Cloud

Here are some of the data fields that the Amazon product review scraper will extract into a spreadsheet from Amazon:

  1. Product Name
  2. Review Title
  3. Review Content/Review Text
  4. Rating
  5. Date of publishing review
  6. Verified Purchase
  7. Author Name
  8. URL

We will save the data as an Excel Spreadsheet (CSV).

Amazon Review Scraper Data Sample

Installing the required packages for running Amazon Reviews Web Scraper

For this web scraping tutorial to scrape Amazon product reviews using Python 3 and its libraries. We will not be using Scrapy for this tutorial. This code can run easily and quickly on any computer (including a Raspberry Pi)
If you do not have Python 3 installed, you can follow this guide to install Python in Windows here – How To Install Python Packages.

We will use these libraries:

Install them using pip3

pip3 install python-dateutil lxml requests selectorlib

The Code

You can get all the code used in this tutorial from Github – https://github.com/scrapehero-code/amazon-review-scraper

Let’s create a file called reviews.py and paste the following Python code into it.

Here is what the Amazon product review scraper does:

  1. Reads a list of Product Review Pages URLs from a file called urls.txt (This file will contain the URLs for the Amazon product pages you care about)
  2. Uses a selectorlib YAML file that identifies the data on an Amazon page and is saved in a file called selectors.yml (more on how to generate this file later in this tutorial)
  3. Scrapes the Data
  4. Saves the data as CSV Spreadsheet called data.csv
from selectorlib import Extractor
import requests 
import json 
from time import sleep
import csv
from dateutil import parser as dateparser

# Create an Extractor by reading from the YAML file
e = Extractor.from_yaml_file('selectors.yml')

def scrape(url):    
    headers = {
        'authority': 'www.amazon.com',
        'pragma': 'no-cache',
        'cache-control': 'no-cache',
        'dnt': '1',
        'upgrade-insecure-requests': '1',
        'user-agent': 'Mozilla/5.0 (X11; CrOS x86_64 8172.45.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.64 Safari/537.36',
        'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
        'sec-fetch-site': 'none',
        'sec-fetch-mode': 'navigate',
        'sec-fetch-dest': 'document',
        'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8',
    }

    # Download the page using requests
    print("Downloading %s"%url)
    r = requests.get(url, headers=headers)
    # Simple check to check if page was blocked (Usually 503)
    if r.status_code > 500:
        if "To discuss automated access to Amazon data please contact" in r.text:
            print("Page %s was blocked by Amazon. Please try using better proxies\n"%url)
        else:
            print("Page %s must have been blocked by Amazon as the status code was %d"%(url,r.status_code))
        return None
    # Pass the HTML of the page and create 
    return e.extract(r.text)

with open("urls.txt",'r') as urllist, open('data.csv','w') as outfile:
    writer = csv.DictWriter(outfile, fieldnames=["title","content","date","variant","images","verified","author","rating","product","url"],quoting=csv.QUOTE_ALL)
    writer.writeheader()
    for url in urllist.readlines():
        data = scrape(url) 
        if data:
            for r in data['reviews']:
                r["product"] = data["product_title"]
                r['url'] = url
                if 'verified' in r:
                    if 'Verified Purchase' in r['verified']:
                        r['verified'] = 'Yes'
                    else:
                        r['verified'] = 'Yes'
                r['rating'] = r['rating'].split(' out of')[0]
                date_posted = r['date'].split('on ')[-1]
                if r['images']:
                    r['images'] = "\n".join(r['images'])
                r['date'] = dateparser.parse(date_posted).strftime('%d %b %Y')
                writer.writerow(r)
            # sleep(5)
    

If you don't like or want to code, ScrapeHero Cloud is just right for you!

Skip the hassle of installing software, programming and maintaining the code. Download this data using ScrapeHero cloud within seconds.

Get Started for Free
Deploy to ScrapeHero Cloud

Creating the YAML file – selectors.yml

You will notice in the code above that we used a file called selectors.yml. This file is what makes this tutorial so easy to create and follow. The magic behind this file is a Web Scraper tool called Selectorlib.

Selectorlib is a tool that makes selecting, marking up, and extracting data from web pages visual and easy. The Selectorlib Web Scraper Chrome Extension lets you mark data that you need to extract, and creates the CSS Selectors or XPaths needed to extract that data. Then previews how the data would look like. You can learn more about Selectorlib and how to use it here

If you just need the data we have shown above, you do not need to use Selectorlib. Since we have done that for you already and generated a simple “template” that you can just use. However, if you want to add a new field, you can use Selectorlib to add that field to the template.

Here is how we marked up the fields for the data we need to scrape Amazon reviews from the Product Reviews Page using Selectorlib Chrome Extension.

Selectorlib Amazon Reviews

Once you have created the template, click on ‘Highlight’ to highlight and preview all of your selectors. Finally, click on ‘Export’ and download the YAML file and that file is the selectors.yml file.

Here is how our template (selectors.yml) file looks like:

product_title:
    css: 'h1 a[data-hook="product-link"]'
    type: Text
reviews:
    css: 'div.review div.a-section.celwidget'
    multiple: true
    type: Text
    children:
        title:
            css: a.review-title
            type: Text
        content:
            css: 'div.a-row.review-data span.review-text'
            type: Text
        date:
            css: span.a-size-base.a-color-secondary
            type: Text
        variant:
            css: 'a.a-size-mini'
            type: Text
        images:
            css: img.review-image-tile
            multiple: true
            type: Attribute
            attribute: src
        verified:
            css: 'span[data-hook="avp-badge"]'
            type: Text
        author:
            css: span.a-profile-name
            type: Text
        rating:
            css: 'div.a-row:nth-of-type(2) > a.a-link-normal:nth-of-type(1)'
            type: Attribute
            attribute: title
next_page:
    css: 'li.a-last a'
    type: Link

Previous Versions of the Scraper

If you need a script that runs on older versions of Python, you can view the previous versions of this code to scrape Amazon reviews.

Python 3 (built in 2018) – https://gist.github.com/scrapehero/900419a768c5fac9ebdef4cb246b25cb
Python 2.7 (built in 2016) – https://gist.github.com/scrapehero/3d53ae193766bc51408ec6497fbd1016.

Running the Amazon Review Scraper

You can get all the code used in this tutorial from Github – https://github.com/scrapehero-code/amazon-review-scraper

All you need to do is add the URLs you need to scrape into a text file called urls.txt in the same folder and run the scraper using the command:

python3 reviews.py

Here is an example URL – https://www.amazon.com/HP-Business-Dual-core-Bluetooth-Legendary/product-reviews/B07VMDCLXV/ref=cm_cr_dp_d_show_all_btm?ie=UTF8&reviewerType=all_reviews

You can get this URL by clicking on “See all reviews” near the bottom of the product page.

 

Here is how the Amazon scraped reviews look like:

Amazon Review Scraper Data Sample

 

This code can be used to scrape Amazon reviews of a relatively small number of ASINs for your personal projects. But if you want to scrape websites for thousands of pages, learn about the challenges here How to build and run scrapers on a large scale.

What can you do with Scraping Amazon Reviews?

The data that you gather from this tutorial can help you with:

  1. You can get review details unavailable using the official Amazon Product Advertising API.
  2. Monitoring customer opinions on products that you sell or manufacture using Data Analysis
  3. Create Amazon Review Datasets for Educational Purposes and Research
  4. Monitor product quality sold by third-party sellers

Amazon used to provide access to product reviews through their Product Advertising API to developers and sellers, a few years back. They discontinued that on November 8th, 2010, preventing customers from displaying Amazon reviews about their products, embedded in their websites. As of now, Amazon only returns a link to the review.

Building a Free Amazon Reviews API using Python, Flask & Selectorlib

If you are looking for getting reviews as an API, like an Amazon Product Advertising API – you may find this tutorial below interesting.

Thanks for reading and if you need help with your complex scraping projects let us know and we will be glad to help.

Do you need some professional help to scrape Amazon Data? Let us know

Turn the Internet into meaningful, structured and usable data



Please DO NOT contact us for any help with our Tutorials and Code using this form or by calling us, instead please add a comment to the bottom of the tutorial page for help

Disclaimer: Any code provided in our tutorials is for illustration and learning purposes only. We are not responsible for how it is used and assume no liability for any detrimental usage of the source code. The mere presence of this code on our site does not imply that we encourage scraping or scrape the websites referenced in the code and accompanying tutorial. The tutorials only help illustrate the technique of programming web scrapers for popular internet websites. We are not obligated to provide any support for the code, however, if you add your questions in the comments section, we may periodically address them.

Posted in:   eCommerce Data Gathering Tutorials, Web Scraping Tutorials

Responses

R January 2, 2017

This script does not seem to work. The json written does not have any views in it.

Reply

    ScrapeHero January 4, 2017

    Please copy the detailed error or how you ran this so we can check.
    Thanks

    Reply

ARJUN S January 28, 2017

how to increase the number of reviews obtained ??

Reply

    ScrapeHero January 28, 2017

    Hi Arjun – that’s what’s called “an exercise left to the reader”. You will have to look at the pagination – click that and then get the next page and so on. Most likely you will get blocked pretty soon.

    Reply

E June 13, 2017

The ratings dictionary is very helpful for getting the percentage distributions of the reviews based on the number of stars, however is there an easy way to see the total number of reviews? For example, are those percentages based on 11 reviews or 3,000? Thanks!

Reply

    E June 13, 2017

    I’m not very familiar with lxml so I think that’s where the I’m getting stuck

    Reply

Amy Smith February 9, 2018

Hi,
I don’t think it’s working. Can you help me fix it? This is the output of the json file:
[
{
“error”: “failed to process the page”,
“asin”: “B01ETPUQ6E”
},
{
“error”: “failed to process the page”,
“asin”: “B017HW9DEW”
}
]

Thank you!

Reply

    ScrapeHero February 9, 2018

    Could be an IP block?

    Reply

SkyChaos March 7, 2018

Not showing all reviews. Any ideas ? My products have alot of reviews and the total result after i used the script isnt even close to that.

Reply

    ScrapeHero March 18, 2018

    This script doesn’t get you all reviews. It was written specifically to demonstrate scraping reviews using Python, and was never intended as a fully functional scraper for thousands of pages.

    Reply

gargi May 21, 2018

I ran the code on Jupyter. The code ran without any error but I am not getting any output file.

Reply

    ScrapeHero May 21, 2018

    When using in Jupyter Notebook, you should call the function ParseReviews with your ASIN.

    For example,

    ParseReviews(`B01ETPUQ6E`) would return a dict similar to

    {'name': 'Samsung Galaxy J7 - No Contract Phone - White - (Boost Mobile)(Carrier locked phone )',
     'price': '$293.96',
     'ratings': {'1 star': '10%',
      '2 star': '4%',
      '3 star': '5%',
      '4 star': '17%',
      '5 star': '64%'},
     'reviews': [{'review_author': 'JLO',
       'review_comment_count': '',
       'review_header': 'Best phone I have owned with Boostmobile!',
       'review_posted_date': '19 Jul 2016',
       'review_rating': '5.0 ',
       'review_text': 'I love this Samsung J7, since it is bigger than the s6 and s7, and as good as my old S5 (as compared in the images). I had no issues charging it or switching it over from old phone to this new phone - via a 4 minute phone call to Boostmobile. Yes, the S6 and S7 have -much- faster processors, but I do not need that for what I do... and so far, after 1 month of use, I absolutely love this phone!. The phone feels great, responds quickly, and looks freaking awesome. Pros: -Great price -AFFORDABLE! -Bigger than most other phones available -Great quality screen Cons: -2 bottom buttons- on side of the main home button- DO NOT light up -Camera is not comparable to that of the S5, S6, and S7 -Overall quality does not feel as sturdy as the other models mentioned (Shell plastic is thinner). After one month of use, I rated this phone with 4 stars. I will be updating this review in about 6 months. REVIEW UPDATE: February 19, 2017 After eight (8) months of owning this phone, and 7 months since this original review, I am back today to continue my review as promised. I am doing 2 things for those of you who are reading this review for the first time: 1) I have added a star to make this a 5 star review! 2) I will explain why I have decided to come back to add this star to my review. Let me clarify that since reviewing this, I have purchased a second one for my wife. I will also list what items I purchased it along with: Mr Shield Tempered Glass Screen Protector for Samsung Galaxy J7 [Will Not Fit For 2016 Version] - 2-Pack Samsung EVO 64GB Micro SDXC Memory Card with Adapter up to 48/MB/s (MB-MP64DA/AM) Phonelicious SAMSUNG Galaxy J7 Case(Boost,Virgin,TMobile,Metro PCS)Slim Fit Heavy Duty Ultimate Drop Protection Rugged Cover with Screen Protector & Stylus (Navy Blue Matte) Of course, purchasing a case, the Screen protector, and a good quality memory card may also influence how this product has performed. This phone continues to work as expected, and has delivered so far. I feel my investment on this phone has paid off, and my money has been worth. Being a somewhat frugal shopper, I thought I would give this phone a try to since it\'s price was reasonable for what I was getting. My 1 year old (who is now 18 months) has used and abused this phone. He has thrown it on the floor dozens of times, and has scratched, as well chewed on it. What has happened to the phone so far? - It has gained a large crack, on the screen protector. The phone is still working as it did 8 months ago, and the cheap items I purchased to protected have taken quite the toll. I had originally mentioned that the phone felt "cheap", and the thin plastic that it is made out of, is certainly noticeable compared to the quality of the S5, and other S series for that matter. Given that this phone has performed well, continues to deliver, and has outlasted quite some abuse, I made the judgement to give this phone the extra star, since this is a great product to my standards. I plan to come back in June, and review this item again after a full year of use!'},
      {'review_author': 'Frankie',
       'review_comment_count': '',
       'review_header': 'Very good phone for the money',
       'review_posted_date': '13 May 2016',
       'review_rating': '4.0 ',
       'review_text': "I bought this phone from boost Mobile website for 200 bucks. Of course its not as good as the s7, but if you don't wanna spend 700$ for a phone, then you can't go wrong with j7. Very good phone for the money. I already bought a couple spare batteries. Pros- Good price, good call quality, fast internet, good camera, good selfie camera, great display, perfect size, great for texting, Great battery life (leave it on power save mode), good speaker, 6.0 marshmallow is great, Cons- No LED notification light, incoming ring tone couple be louder."},
      {'review_author': 'Colleen Marie',
       'review_comment_count': '',
       'review_header': "GORGEOUS phone .... I'm in love with it!",
       'review_posted_date': '06 May 2016',
       'review_rating': '5.0 ',
       'review_text': "I love this Samsung J7. I couldn't afford the sticker price for the S6 or S7 (sticker shock!). Therefore; I opted for this which is scaled down in terms of processing/memory BUT it's much better quality than the phone I was using for my Boost Mobile account. I had no issues charging it or switching it over from old phone to this new phone - via my online account (I didn't have to talk to anybody to do this and it worked out just great). My hubby has the Samsung S6 which he saved/paid for upfront (no contract at Boost Mobile). Comparing my new J7 to his S6 - well; apples to oranges. His has 2 processors (a quad-core and an octa-core). This J7 has just the octa-core BUT for me is proving to be plenty of processing power along with the 2GB ROM. This was pretty comparable to the ZTE Warp Elite I've been using since January 2016 but I'd have to say that the Samsung is much faster - also that the screen response is so much better in this J7 model (even with the glass protection I placed on the screen). My hubby bought me this for my upcoming 60th birthday! GREAT present for me.....all mine! Thank you 'shopcelldeals' for selling this at a reasonable price that was affordable. Recommended."},
      {'review_author': 'Leesaa',
       'review_comment_count': '',
       'review_header': 'Love the phone',
       'review_posted_date': '05 Jul 2016',
       'review_rating': '5.0 ',
       'review_text': 'Awesome phone for the price. Face it, these are not the S7 phones but a cheaper version. I have no problem with them at all. Plenty of memory and large screen. I love the phone. You are getting a Marshmallow system which should keep me going for a couple of years. Boost mobile is like every other cell company. Hard to deal with on the phone but once the service is set up, no problems. Fast 4gLTE.'},
      {'review_author': 'TG',
       'review_comment_count': '',
       'review_header': 'Worth it.',
       'review_posted_date': '19 Nov 2016',
       'review_rating': '5.0 ',
       'review_text': 'I love this phone! Coming from the galaxy S3 for Boost this is a big jump in the right direction. $20 cheaper on amazon than the boost mobile site another plus.The battery last all day. That is while talking/texting using apps. It is two times bigger than the S3 but def worth it. The operating system is much faster and more responsive as well.'},
      {'review_author': 'David Erickson',
       'review_comment_count': '',
       'review_header': 'This is a great phone for Boost Mobile',
       'review_posted_date': '06 Oct 2016',
       'review_rating': '5.0 ',
       'review_text': "This is a great phone for Boost Mobile. I have had it a few months and it's been great. Much nicer than my old one and the big screen has spoiled me. The iPhone 6 I use for work is tiny by comparison and I much prefer this phone over the iPhone."}],
     'url': 'https://www.amazon.com/dp/B01ETPUQ6E'}
    
    Reply

Connor July 13, 2018

Any idea why I would be getting this warning: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning) ? I followed the instructions on the urllib3 page but am still getting the same warning. I am in Jupyter (python 2). Thank you!

Reply

pewds October 9, 2018

Love what you guys are doing, big fan of yours. I am currently collecting emails of Amazon reviewers and it’s a very time consuming process. If you could help me with a code for doing this it would be awesome and thank you for reading all of this.

Reply

    ScrapeHero October 12, 2018

    Sorry we can’t write code on demand but you can hire someone on upwork to do all this.

    Reply

Katie October 23, 2018

I keep getting the error “unable to find reviews in page”, what could be the problem? [ I promise the product has reviews ]

Reply

    Nithu October 23, 2018

    The HTML parser seemed to have a depth limit. It wont traverse further to parse the text if the depth exceeds 254. We have updated our code to handle this.

    Reply

    rijesh November 21, 2018

    We found Amazon sending null bytes along with the response in some cases which caused the Lxml parser failure. Our code base is now updated.

    Reply

Sarah November 25, 2018

how would we get like 100 reviews off the site?

Reply

    ScrapeHero November 25, 2018

    You would need to find the link to next page of reviews and parse it similarly as in this tutorial

    Reply

clarosantiago January 18, 2019

Is there any way to get the product rank as well?

Reply

doomhouse May 23, 2019

How would you scrape 12000 products by search query only?

Reply

    ScrapeHero May 24, 2019

    Amazon restricts the number it shows and it is far below 12000

    Reply

Siddharth Agrawal April 7, 2021

Hello, this is amazing. Can you please guide how to do similar process in BestBuy ? It would be really great for me and many others.

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Turn the Internet into meaningful, structured and usable data   

ScrapeHero Logo

Can we help you get some data?