The Web Scraper Extension is a great web scraping tool for extracting data from dynamic web pages. Using this, you can create a sitemap that shows you how the website should be traversed and what data should be extracted. With these sitemaps, you can easily navigate the site any way you want and the data can be later exported as a CSV. In this tutorial, we will show you how to extract product details using the Web Scraper Chrome extension. We are using Amazon BestSeller List as an example here.
Import the Amazon BestSeller Scraper
Right-click anywhere on a page, go to ‘inspect’ and the developer tools console will pop up. Click on the tab Web Scraper and go on to the ‘Create new sitemap’ button and click on the ‘Import sitemap’ option. Now paste the JSON (given in the gist link below) in the Sitemap JSON box.
The GIF above shows you how to import the sitemap.
We will show you how to create the scraper step by step:
1. Creating a Sitemap
After installation right-click anywhere on a page to go to ‘inspect’ and open the developer tools console pop up. Click on the tab Web Scraper and go on to the ‘Create new sitemap’ button.
We will set the starting URL as pet supplies category on Amazon bestsellers:
Understanding the pagination structure of the website enables you to scrape multiple pages. You can easily do that by clicking the ‘Next’ button a few times from the homepage.
2. Navigate from the root to listing pages
The Web Scraper tool is now open at the _root with an empty list of child selectors. You can create a selector that selects each product listing on the first page by clicking on ‘Add new selector’. Let’s give it the id name product, with its type as Element.
The ‘Select button’ gives us a tool for visually selecting elements on the page to construct a CSS selector. Click on ‘Select’ and hover your mouse over the listing page. We are selecting the element that encloses all the product details. ‘Element Preview’ highlights the elements on the page. When you click on that you’ll see all the elements on the page highlighted in red.
Since we need to get all the product listings on the page, we have to check the ‘Multiple’ box. The GIF below shows how the ‘product’ selector is created.
You can see the sample of your data by clicking on ‘Data Preview’.
3. Scrape Elements
Here are the following data fields we’ll be extracting within the product element:
- Product Name
- Number of Customer Reviews
- Rating (out of 5)
Let’s go back to the bestseller page and take the Web Scraper tab. Click on the ‘product’ selector we have created. Now we can create selectors for each data field. These selectors will be the child selectors to the parent selector ‘product’.
We’ll create selectors just like we did with the selector ‘product’. The GIF below shows you how to add a child selector to a sitemap:
So far we have created a scraper for a single page. Since the Amazon bestseller list has pagination, we have to create another selector to go to the ‘Next’ page. Let’s create a selector for the Next button within the root.
Note here that the parent selectors for the selector next is ‘root’ and ‘next’. This allows the scraper to keep scraping product listings as long as there is a next button.
4. Run the Scraper
Once you have made sure everything in the selector graph looks good you can start scraping.
Go to the Sitemap and click on ‘Scrape’ from the drop down. A new instance of Chrome will launch, enabling the browser to scroll and automatically grab the data. If you want to stop the scraping process in between, just close this window and you will have the data that was extracted till then.
Once the scraper run has finished you’ll get a notification. Go to the sitemap tab to browse the extracted data or export it to a CSV file.
We can help with your data or automation needs
Turn the Internet into meaningful, structured and usable data
Disclaimer: Any code provided in our tutorials is for illustration and learning purposes only. We are not responsible for how it is used and assume no liability for any detrimental usage of the source code. The mere presence of this code on our site does not imply that we encourage scraping or scrape the websites referenced in the code and accompanying tutorial. The tutorials only help illustrate the technique of programming web scrapers for popular internet websites. We are not obligated to provide any support for the code, however, if you add your questions in the comments section, we may periodically address them.