How to Create A Spider Using ParseHub

ParseHub is a web scraping tool to scrape websites. It provides a lot of features which can be overwhelming for a new user but can be helpful once they know how to use it.

We will work with a mock e-commerce website – scrapeme.live/shop that sells Pokémon toys. The product details (product names, images, and prices) will be extracted using ParseHub.

If you would like to know more about other web scraping tools you can read this article – Best Free and Paid Web Scraping Tools

Prerequisites

To get started, first you need to download the ParseHub app. Visit the ParseHub download page which contains the links for download in Windows, Mac, and Linux (it also works as a Firefox extension). After installation, wait for the app to finish its first run and load fully. You will be greeted with a tutorial which will cover the basics of ParseHub and how to use it. You can complete the tutorial if you are a complete beginner to scraping.

ParseHub website is full of tutorials covering each aspect of their software, which will be very useful for new users to get to know how to use ParseHub.

Step 1: Starting a Project

Click on ‘New Project’ from the sidebar menu. A new page will open up where you can paste the start URL – https://scrapeme.live/shop/ in the box on the sidebar. The application will load up the site in the main display area. Make sure you are on the main template, and click on ‘Select page in the tree shown in the sidebar.

start-new-project-parsehub

 

Step 2: Selecting Elements

  • With selection1 selected on the sidebar, click on the name of the product you want to select. The name will be highlighted in green, while other similar elements will be highlighted in yellow.
  • Select the other elements by clicking on them, until all desired product names are selected. You can see the number of elements in selection1 in parenthesis to make sure all the product names are selected.
  • To rename an element, click on the ‘Select selection1’ command in the sidebar. Let’s rename selection1 to productname for our convenience.

 

selecting-elements-from-page-parsehub

You can also see a preview of the data at the bottom, which will show the product names and images, as well as the corresponding URLs.

Step 3: Selecting Relative Elements

Now we need to select the respective prices and images of the products. For this, we need to create a relative selector.

  • Click on the “+” button beside productname from the selector tree and a menu will pop up.
  • From there, select the ‘Relative Select’ tool. The tool lets you create a relationship between data that is already selected on the page to any data that you want to attach to it.
  • Click on the product names, and drag your mouse to select the price. You should see an arrow created between one product name and its corresponding price. This will make the price elements related to the products. You will be able to see the price in the data preview shown below.
  • Rename the relative selector to price.

Repeat the same technique with images.

relative-select-for-elements-parsehub

 

The final selector tree is shown below:

selector-tree-octoparse

Step 4: Pagination

Next, we have to handle pagination, which can be quite tricky.

  • First, click on the ‘Select page’ from the main template tree, and choose ‘Select’.
  • Then select the next page links from the pagination bar which is at the bottom of the websites page.
  • From the sidebar, select the new selector and choose ‘Click’.
  • A popup will appear to confirm the next page button you want. Choose the first option ‘yes’.
  • When prompted with what to do, choose to repeat the current template. This will add pagination support for your scraper.

 

pagination-next-button-parsehub

In some cases, when you cannot select ‘next’ element, or the element stays selected on the last page, you might want to consider to use an XPath selection to select them. 

Step 5: Running the Scraper

To run the scraper, click on ‘Get Data from the bottom section in the sidebar.

If you choose a test run, it will run the first few pages in your local machine. You can choose ‘Run’ and it will run the scraper in ParseHub’s servers. You can also schedule the run, but that will require you to have a premium account.

Step 6: Download the Data

When the data is ready you will see the options CSV and JSON. Click on one of these buttons to download the data in the format you prefer. You will also get an email when your run is complete along with a link to download the data.

Pros

  • Point and Click Tool is simple to set up and use
  • It has a lot of advanced features like pagination, infinite scrolling pages, pop-ups, and navigation to create some complex scrapers.
  • Desktop application works in Windows, Mac, and Linux
  • Supports IP rotation, scheduling etc in paid accounts.
  • You can integrate data from ParseHub into Tableau.
  • Supports javascript heavy websites
  • Well written documentation and tutorials for beginners.

Cons

  • The software is complex to use.
  • You have to download and run their software in your local machine to create a scraper
  • It has a high learning curve
  • Cannot write directly to any database

If the websites to scrape are complex or you need a lot of data from one or more sites, this tool may not scale well. You can consider using open source web scraping tools to build your own scraper, to crawl the web and extract data. To create a custom web scraper for a particular website you can check out our tutorial section: Web Scraping Tutorials

If you are new to web scraping you can start with our Beginner’s Guide: What is web scraping – Part 1 – Beginner’s guide

We can help with your data or automation needs

Turn the Internet into meaningful, structured and usable data


Please DO NOT contact us for any help with our Tutorials and Code using this form or by calling us, instead please add a comment to the bottom of the tutorial page for help

Disclaimer: Any code provided in our tutorials is for illustration and learning purposes only. We are not responsible for how it is used and assume no liability for any detrimental usage of the source code. The mere presence of this code on our site does not imply that we encourage scraping or scrape the websites referenced in the code and accompanying tutorial. The tutorials only help illustrate the technique of programming web scrapers for popular internet websites. We are not obligated to provide any support for the code, however, if you add your questions in the comments section, we may periodically address them.

Posted in:   Tools and Services, Web Scraping Tutorials

Comments or Questions?

Turn the Internet into meaningful, structured and usable data   

Best Practices while Scraping Websites Yourself

Get instant access to our free guide on

Best practices and systems when scraping the web yourself on a large scale.

Subscribe to our updates and view the guide

Enjoying our Tutorials?

Subscribe to our weekly updates on the latest tutorials in Web Scraping and Data Extraction

ScrapeHero Logo

Can we help you get some data?