Scraping Tips


Interesting tips and articles about Web Scraping. How to successfully use automation to gather data from websites. Data extraction techniques and code are available in our tutorials

How to fake and rotate User Agents using Python 3

How to fake and rotate User Agents using Python 3

When scraping many pages from a website, using the same user-agent consistently leads to the detection of a scraper. A way to bypass that detection is by faking your user agent and changing it with every request you make to a website. In this tutorial, we will show you how to fake user agents, and randomize them to prevent getting blocked while scraping websites.

How To Rotate Proxies and change IP Addresses using Python 3

How To Rotate Proxies and change IP Addresses using Python 3

When scraping many pages from a website, using the same IP addresses will lead to getting blocked. A way to avoid this is by rotating IP addresses that can prevent your scrapers from being disrupted. In this tutorial, we will show you how to rotate IP addresses to prevent getting blocked while scraping.

How to Scrape Websites Without Getting Blocked

How to Scrape Websites Without Getting Blocked

Anti scraping tools lead to scrapers performing web scraping blocked. We provided web scraping best practices to bypass anti scraping

Get Sales Leads From Google

Get Sales Leads From Google

In this tutorial we will show you how businesses can get sales leads from Google for free using Google Maps Crawler and Contact Detail Crawler available on ScrapeHero Cloud.

How do websites detect and block bots using Bot Mitigation Tools

How do websites detect and block bots using Bot Mitigation Tools

An in-depth analysis of how most of the bot mitigation tools work, and how they distinguish between bots and humans on the server-side and client-side, going through the fundamentals of the web.

Scalable Large Scale Web Scraping – How to build, maintain and run scrapers

Scalable Large Scale Web Scraping – How to build, maintain and run scrapers

Here are the high-level steps involved in this process and we will go through each of these in detail – Building scrapers, Running web scrapers at scale, Getting past anti-scraping techniques, Data Validation and Quality Control & Ongoing Maintenance

Python Frameworks and Libraries for Web Scraping

Python Frameworks and Libraries for Web Scraping

Comparison and Use Cases of popular python frameworks and libraries used for webs scraping like – Scrapy,Urllib, Requests, Selenium, Beautifulsoup and LXML

How To Make  Anonymous Requests using TorRequests and Python

How To Make Anonymous Requests using TorRequests and Python

Tor is quite useful when you have to use requests without revealing your IP address, especially when you are web scraping. This tutorial will use a wrapper in python that helps you with the same.

How To Install Python Packages for Web Scraping in Windows 10

How To Install Python Packages for Web Scraping in Windows 10

Web scraping using Python in Windows can be tough. In this tutorial follow the steps to setup python 3 and python packages on your Windows 10 computer for web scraping in Windows 10.

How to Solve Simple Captchas using Python Tesseract

How to Solve Simple Captchas using Python Tesseract

CAPTCHA stands for Completely Automated Public Turing test to tell Computers and Humans Apart. As the acronym suggests, it is a test used to determine whether the user is human or not. A typical captcha consists of a distorted test, which a computer program cannot interpret but a human can (hopefully) still read. This tutorial will […]

How to Parse Addresses using Python and Google GeoCoding API

How to Parse Addresses using Python and Google GeoCoding API

Web scraping can often lead to you having scraped address data which are unstructured. If you have come across a large number of freeform address as a single string, for example – “9 Downing St Westminster London SW1A, UK”,  you know how hard it would be to validate, compare and deduplicate these addresses. To start […]

The best data and file formats for scraped data

The best data and file formats for scraped data

The data we provide comes in various forms from the source and is largely text (barring rich media such as images and videos or proprietary file formats such as PDFs). Our customers need this data in various formats and the key to a successful and scalable solution that fits the best data formats for web […]

Turn the Internet into meaningful, structured and usable data