Web Scraping Tutorials


Step by step tutorials for web scraping, web crawling, data extraction, headless browsers, etc. Our web scraping tutorials are usually written in Python using libraries such as LXML or Beautiful Soup and occasionally in Node.js. The full source code is available to download or clone using Git.

All Tutorials

How To Scrape Amazon Product Details and Pricing using Python

How To Scrape Amazon Product Details and Pricing using Python

In this tutorial  we will build an amazon scraper for extracting product details and pricing. We will build this simple web scraper using python and LXML and run it in a console. But before we start, let’s look at what can you use it for. What can you use an Amazon Scraper for ? Scrape […]

XPaths and their relevance in Web Scraping

XPaths and their relevance in Web Scraping

XPath (XML Path Language) is a syntax for defining parts of an XML document. XPath is a query language for identifying and selecting nodes or elements in an XML document using a tree like representation of the document. XPath was defined by the World Wide Web Consortium (W3C). XPaths are one of the few ways […]

Why *not* scrape yourself

Why *not* scrape yourself

Before you get all kinds of ideas about what the topic of this article means – please look at the context – We are talking about Web Scraping here ! This post will talk about reason why not to do this yourself and why to call in a professional (wink wink – use ScrapeHero) You […]

Webscraping using Python without using large frameworks like Scrapy

Webscraping using Python without using large frameworks like Scrapy

If you need publicly available data from scraping the Internet, before creating a web scraper, it is best to check if this data is already available from public data sources or APIs. Check the site’s FAQ section or Google for their API endpoints and public data. Even if their API endpoints are available you have […]

How to prevent getting blacklisted while scraping

How to prevent getting blacklisted while scraping

Web scraping is a task that has to be performed responsibly so that it does not have a detrimental effect on the sites being scraped. Web Crawlers can retrieve data much quicker, in greater depth than humans, so bad scraping practices can have some impact on the performance of the site. If a crawler performs […]

5 tips for scraping big websites

5 tips for scraping big websites

Scraping bigger websites can be a challenge if done the wrong way. Bigger websites would have more data, more security and more pages. We’ve learned a lot from our years of crawling such large complex websites, and these tips could help solve some of your challenges 1. Cache pages visited for scraping When scraping big […]

Turn the Internet into meaningful, structured and usable data