Web Scraping Tutorials


LEARN HOW TO USE WEB SCRAPING TO ENHANCE PRODUCTIVITY AND AUTOMATION

We provide many step-by-step tutorials with source code for web scraping, web crawling, data extraction, headless browsers, etc.

Our web scraping tutorials are usually written in Python using libraries such as LXML, Beautiful Soup, Selectorlib and occasionally in Node.js.

The full source code is also available to download in most cases or available to be easily cloned using Git.

We also provide various in-depth articles about Web Scraping tips, techniques and the latest technologies which include the latest anti-bot technologies, methods used to safely and responsibly gather publicly available data from the Internet.

The community that has coalesced around these tutorials and their comments help anyone from a beginner hobbyist person to an advanced programmer solve some of the issues they face with web scraping.

These tutorials are frequently linked to as StackOverflow solutions and discussed on Reddit.

Please feel free to read and participate in the discussions with your comments.

All Tutorials

How to Scrape Real Estate Data without Coding

How to Scrape Real Estate Data without Coding

Scrape real estate listings from Realtor and Zillow. Scrape real estate data such as the address, pricing, broker information, and more using the ScrapeHero Cloud

How to Scrape Jobs without Coding

How to Scrape Jobs without Coding

Scrape jobs from Indeed and Glassdoor. Scrape jobs data such as title, location, company, salary and more using the ScrapeHero Cloud.

How to Scrape Google Without Coding

How to Scrape Google Without Coding

This tutorial will show you how to scrape Google data for free using ScrapeHero Cloud. Using these crawlers we will be scraping Google Search Results Page, Google Maps, and Google Reviews. 

Scrape Glassdoor Job Data without Coding

Scrape Glassdoor Job Data without Coding

This tutorial will help you scrape job data from any Glassdoor domain using the Glassdoor Job Listings Crawler in ScrapeHero Cloud. The crawler accepts multiple search URLs and filters. You can scrape job data such as Job title, salary, company, address, industry, revenue, website, and more.

How to fake and rotate User Agents using Python 3

How to fake and rotate User Agents using Python 3

When scraping many pages from a website, using the same user-agent consistently leads to the detection of a scraper. A way to bypass that detection is by faking your user agent and changing it with every request you make to a website. In this tutorial, we will show you how to fake user agents, and randomize them to prevent getting blocked while scraping websites.

How To Rotate Proxies and change IP Addresses using Python 3

How To Rotate Proxies and change IP Addresses using Python 3

When scraping many pages from a website, using the same IP addresses will lead to getting blocked. A way to avoid this is by rotating IP addresses that can prevent your scrapers from being disrupted. In this tutorial, we will show you how to rotate IP addresses to prevent getting blocked while scraping.

Turn the Internet into meaningful, structured and usable data