Top Free and Paid Web Scraping Tools and Software

Web scraping tools automate web-based data collection. These tools generally fall in the categories of tools that you install on your computer or in your computer’s browser (Chrome or Firefox) and services that are designed to be self-service. Web scraping tools (free or paid) and self-service websites/applications can be a good choice if your data requirements are small, and the source websites aren’t complicated.

However, if the websites you want to scrape are complicated or you need a lot of data from one or more sites, these tools do not scale well. The cost of these tools and services pales in comparison to the time and effort you require to implement scrapers using these tools and the complexity of maintaining and running these tools. For such cases, a full-service provider is a better and economical option.

In this post, we will walk through how these tools work so that you can evaluate if these work for your need.

Here are the best web scraping tools

Name Pricing Type Handle Large Volumes?
Data Scraper Free Chrome Extension No
Web Scraper Free Chrome Extension Yes
Scraper Free Chrome Extension No
ParseHub Paid Firefox Extension/Desktop Application Yes
OutwitHub Free Firefox Extension/Desktop Application No
FMiner Free Desktop Application No
Dexi.io Paid Web-Based Scraping Application Yes
Webhose.io Paid Web-Based Scraping Application Yes
Octoparse Paid Web-Based Scraping Application Yes

Browser Extensions and Desktop Applications

Data Scraper (Chrome Extension)

Data Scraper is a simple web scraping tool for extracting data from a single page into CSV and XSL data files. It is a personal browser extension that helps you transform data into a clean table format. You will need to install the plugin in a Google Chrome browser. The free version lets you scrape 500 pages per month, if you want to scrape more pages you have to upgrade to the paid plans.

Download the extension from the link – https://chrome.google.com/webstore/detail/data-scraper-easy-web-scr/nndknepjnldbdbepjfgmncbggmopgden?hl=en. After you have added the extension to your browser you can extract data from a website.

We’ll show you how to extract data from Amazon.com using the extension.

download-data-miner-plugin

Open the website that you need to extract data from. We’ll scrape the product details of air conditioners under the appliance category from Amazon.com. Right-click on the web page and click on the option ‘Get Similar (Data Miner)’. You’ll see a list of saved templates on the left side. You can choose any one of them or create your own one and run the template.

data-miner-scrape-amazon-products

To create your own template click on the option ‘New Recipe’ or choose from the generic templates under the option ‘Public’.

data-scraper-recipe-public

Data Scraper is user-friendly as it will show you how to create your own template step by step. You’ll get the output presented as a table:

data-miner-results

Then press on download and extract the data as CSV/XSL format.

Web Scraper (Chrome Extension)

Web scraper, a standalone chrome extension, is a great tool for extracting data from web pages. Using the extension you can create a sitemap to how the website should be traversed and what data should be extracted. With the sitemaps, you can easily navigate the site the way you want and the data can be later exported as a CSV.

Download and add the extension to Chrome using the link – https://chrome.google.com/webstore/detail/web-scraper/jnhgnonknehpejjnehehllkliplmbmhn?hl=en

You’ll find it in developer tools and see a new toolbar added with the name ‘Web Scraper’. Activate the tab and click on ‘Create new sitemap, and then ‘Create sitemap‘. Sitemap is the Web Scraper extension name for a scraper. It is a sequence of rules for how to extract data by proceeding from one extraction to the next. We will set the start page as the cellphone category from Amazon.com –  https://www.amazon.com/s/ref=sr_hi_1?fst=p90x%3A1&rh=n%3A2335752011%2Ck%3Acellphones&keywords=cellphones&ie=UTF8&qid=1523426607 and click ‘Create Sitemap’The GIF  illustrates how to create a sitemap:

web-scraper-tool-creating-sitemap

Navigating from root to category pages

Right now, we have the Web Scraper tool open at the _root with an empty list of child selectors

add-new-selector-web-scraper

Click ‘Add new selector’We will add the selector that takes us from the main page to each category page. Let’s give it the id category, with its type as link. We want to get multiple links from the root, so we will check the Multiple box below. The ‘Select button’ gives us a tool for visually selecting elements on the page to construct a CSS selector. ‘Element Preview’ highlights the elements on the page and ‘Data Preview’ pops up a sample of the data that would be extracted by the specified selector.

build-web-scraper-selector

Click select on one of the category links and a specific CSS selector will be filled on the left of the selection tool. Click one of the other (unselected) links and the CSS selector should be adjusted to include it. Keep clicking on the remaining links until all of them are selected. The GIF below shows the whole process on how to add a selector to a sitemap:

web-scraper-tool-create-selectors

A selector graph consists of a collection of selectors – the content to extract, elements within the page and a link to follow and continue the scraping. Each selector has a root (parent selector) defining the context in which the selector is to be applied. This is the visual representation of the final scraper (selector graph) for our Amazon Cellphone Scraper:

web-scraper-tool-selector-graph

Here the root represents the starting URL, the main page for Amazon Cellphone. From there the scraper gets a link to each category page and for each category, it extracts a set of product elements. Each product element, extracts a single name, a single review, a single rating, and a single price. Since there are multiple pages we need the next element for the scraper to go into every page available.

Running the scraper

Click Sitemap to get a drop-down menu and click Scrape as shown below

web-scraper-tool-scrape-elements

The scrape pane gives us some options about how slowly Web Scraper should perform its scraping to avoid overloading the web server with requests and to give the web browser time to load pages. We are fine with the defaults, so click ‘Start scraping’. A window will pop up, where the scraper is doing its browsing. After scraping the data you can download it by clicking the option ‘Export data as CSV’ or save it to a database.

Scraper (Chrome Extension)

Scraper is a chrome extension for scraping simple web pages. It is easy to use and will help you scrape a website’s content and upload the results to Google Docs. It can extract data from tables and convert it into a structured format. You can download the extension from the link https://chrome.google.com/webstore/detail/scraper/mbigbapnjcgaffohmbkdlecaccepngjd/related?authuser=2

Open the website you need to highlight a part of the page that is similar to what to want to scrape. Right-click, you’ll see an option called ‘Scrape similar’. The scraper console will open as a new window showing you the initial results, where you will see the scraped content in a table format.

scraper-tool-extract-data-elements

The “Selector” section lets you change which page elements are scraped. You can specify the query as either a jQuery selector or in XPath.

You can export the table by clicking on ‘Export to Google Docs” to download and save the content as a Google Spreadsheet. You may also customize the columns of the table and specify names for them if you would like. After making customizations, you must press on the “Scrape” button to update the results of the table.

scraper-extension-results

ParseHub

ParseHub is a web-based scraping tool which is built to crawl single and multiple websites with the support for JavaScript, AJAX, cookies, sessions, and redirects. The application can analyze and grab data from websites and transform it into meaningful data. It uses machine learning technology to recognize the most complicated documents and generates the output file in JSON, CSV or Google Sheets.

Parsehub is a desktop app available for Windows, Mac, and Linux users and works as a Firefox extension. The user-friendly web app can be built into the browser and provides step by step lessons for beginners on how to extract data from websites. It has all the advanced features like pagination, infinite scrolling pages, pop-ups, and navigation. You can even visualize the data from ParseHub into Tableau.

The free version has a limit of 5 projects with 200 pages per run. If you buy the paid subscription you can get 20 private projects with 10,000 pages per crawl and IP rotation.

All you need to do is enter the website you need to scrape and click on ‘Start Project’. Then click on the ‘+’ button to select a page or title. After selecting and naming all the fields you need you will get a CSV/Excel or JSON sample result.

parsehub-scraper-extract-table-data-of-part-one

Click on ‘Get Data’ and ParseHub will scrape the website and fetch your data. When the data is ready you will see CSV and JSON options to download your results.

Fminer

FMiner is a visual web data extraction tool for web scraping and web screen scrapingIts intuitive user interface permits you to quickly harness the software’s powerful data mining engine to extract data from websites. In addition to the basic web scraping features it also has AJAX/Javascript processing and CAPTCHA solving. It can be run both on Windows and Mac OS and it does scraping using the internal browser. It has a 15-day freemium model till you can decide on using the paid subscription.

We’ll show you how to extract a table from Wikipedia using Fminer. We are going to use the link https://en.wikipedia.org/wiki/List_of_National_Football_League_Olympians. First download the application at the link http://www.fminer.com/download/

When you open the application, enter the URL and press the button ‘Record’ to record your actions. What we need to extract is the table of Olympic players.

fminer-insert-link

To create the table, click on the ‘+’ sign that says table. Then select each row by clicking on the option ‘ Target Select’, you’ll see one whole row selected from the table. In order to expand the whole table, click on the option ‘Multiple Targets’ – this will select the whole table. Once the whole table is highlighted you can now enter your new fields by clicking on ‘+’ sign (shown in the image below).

fminer-add-table-and-column

After you have created the table click on ‘Scrape’. You’ll get a notification that the scrape has finished. Just click on ‘Export’ to save the data as a CSV or XLS file.

fminer-running-the-scraper

 OutwitHub (Firefox Extension)

OutwitHub is a data extractor built in a web browser. If you wish to use it as an extension you have to download it from Firefox add-ons store. If you want to use the standalone application you just need to follow the instructions and run the application. OutwitHub can help you extract data from the web with no programming skills at all. It’s great for harvesting data that might not be accessible.

OutwitHub is a free tool which is a great option if you need to scrape some data from the web quickly. With its automation features, it browses automatically through a series of web pages and performs extraction tasks. You can export the data into numerous formats (JSON, Excel, SQL, HTML, CSV, etc.).

Web-Based Scraping Applications and Services

Dexi.io (formerly known as CloudScrape)

Dexi supports data collection from any website and requires no download. The application provides different types of robots in order to scrape data – Crawlers, Extractors, Autobots, and Pipes. Extractor robots are the most advanced as it allows you to choose every action the robot needs to perform like clicking buttons and extracting screenshots.

To start, first, you have to sign up and create an account in dexi.io. It’ll then take you to the app https://app.dexi.io/#/. When you get there you can start by clicking on ‘Create New Robot’. It might take a while to get a hang of it, but there tutorials on how to create your first robot. If you need help you can check out their knowledge base

dexi-new-project

Dexi.io has a simple user interface. All you need to do is choose the type of robot you need, enter the website you would like to extract data from and start building your scraper.

dexi-create-robot

The application offers anonymous proxies to hide your identity. Dexi.io also offers a number of integrations with third-party services. You can download the data directly to Box.net and Google Drive or export it as JSON or CSV formats. Dexi.io stores your data on its servers for 2 weeks before archiving it. If you need to scrape on a larger scale you can always get the paid version.

Webhose.io

Webhose converts web content domains into machine-readable data feeds from millions of websites. It’s great for identifying trending content. You can search keywords that you like and extract all sorts of data such as new articles, reviews, e-commerce, dark web and broadcast data. Using Webhose API, you can filter and consume the data that your application needs in multiple formats, including JSON, XML, RSS, and Excel. The free version allows you to make 1,000 requests per month while the paid subscription offers a plan based on the number of requests you need.

You can get started by navigating to the webhose.io homepage and click ‘Use for free’. Once you enter your email address and set a password you can open a free account where you can see activity and query quota usage for a month.

Octoparse

Octoparse is a visual scraping tool that is easy to understand. Its point and click interface allows you to easily choose the fields you need to scrape from a website. The web scraper can handle both static and dynamic websites with AJAX, JavaScript, cookies and etc. The application also offers a cloud-based platform allows you to extract large amounts of data. You can export the scraped data in TXT, CSV, HTML or Excel formats.

The free version allows you to build up to 10 crawlers, but with the paid subscription plans you will get more features such as API and many anonymous IP proxies that will faster your extraction and fetch large volume of data in real time.


Even though these web scraping tools extract data from web pages with ease, they come with their limits. In the long run, programming is the best way to scrape data from the web as it provides more flexibility and attains better results.

If you aren’t proficient with programming or your needs are complex, or you need large volumes of data to be scraped, there are great web scraping services that will suit your requirements to make the job easier for you.

You can save time and get clean, structured data by trying us out instead – we are a full-service provide that doesn’t require the use of any tools and all you get is clean data without any hassles.

Need some professional help with scraping data? Let us know

Turn websites into meaningful and structured data through our web data extraction service

 

Note: All features, prices etc are current at the time of writing this article. Please check the individual websites for current features and pricing.

Join the conversation


Turn websites into meaningful and structured data through our web data extraction service