This tutorial will show you how to scrape hotel data and pricing from Booking.com using Python and Selectorlib. You can use this for scraping hotel data from Booking.com.
How to scrape Booking.com
- Search on Booking.com for Hotels with your conditions like Location, Check In Date, Check Out Date, Room Type, Number of People, etc.
- Copy the Search Result URL and pass it to hotel scraper.
- In the scraper, we will download this URL using Python Requests
- We will then parse this HTML using Selectorlib Template to extract the fields like Name, Location, Room Type etc.
- Scraper will then save the data to a CSV file
This hotel data scraper will extract the following data. You may add more fields
- Hotel Name
- Hotel Location
- Type of Room
- Price
- Price For (eg: 1 night, 2 Adults)
- Bed Type
- Overall Rating
- Rating Tile
- Number of Reviews
- Link
Read More – Scrape TripAdvisor for hotel data
Install the packages needed for running the Booking Scraper
Follow this guide to setup your computer and install packages:
How To Install Python Packages for Web Scraping in Windows 10
We will need the following Python 3 Packages
- Python Requests, to make requests and download the HTML content of the Search Result page from Booking
- SelectorLib python package to extract data using the YAML file we created from the webpages we download.
Install them using pip3
pip3 install requests selectorlib
The Code
All the code used in this tutorial is available for download from Github at Booking.com Web Scraper
Lets create our project folder called booking-hotel-scraper. In the folder, add a Python file called scrape.py
Paste the code below in scrape.py
from selectorlib import Extractor import requests from time import sleep import csv # Create an Extractor by reading from the YAML file e = Extractor.from_yaml_file('booking.yml') def scrape(url): headers = { 'Connection': 'keep-alive', 'Pragma': 'no-cache', 'Cache-Control': 'no-cache', 'DNT': '1', 'Upgrade-Insecure-Requests': '1', # You may want to change the user agent if you get blocked 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.113 Safari/537.36', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9', 'Referer': 'https://www.booking.com/index.en-gb.html', 'Accept-Language': 'en-GB,en-US;q=0.9,en;q=0.8', } # Download the page using requests print("Downloading %s"%url) r = requests.get(url, headers=headers) # Pass the HTML of the page and create return e.extract(r.text,base_url=url) with open("urls.txt",'r') as urllist, open('data.csv','w') as outfile: fieldnames = [ "name", "location", "price", "price_for", "room_type", "beds", "rating", "rating_title", "number_of_ratings", "url" ] writer = csv.DictWriter(outfile, fieldnames=fieldnames,quoting=csv.QUOTE_ALL) writer.writeheader() for url in urllist.readlines(): data = scrape(url) if data: for h in data['hotels']: writer.writerow(h) # sleep(5)
The code above will
- Open a file called urls.txt and download the HTML content for each link in it
- Parse the HTML using the Selectorlib Template called booking.yml
- Save the output to a CSV file called data.csv
Let’s create the file urls.txt and paste our search result URLs into it, then lets go ahead and create our Selectorlib Template.
Create Selectorlib Template to Scrape Hotel Data from Booking.com Search Results
You will notice that in the code above that we used a file called booking.yml
. This file is what makes the code in this tutorial so concise and easy. The magic behind creating this file is a Web Scraper tool called Selectorlib.
Selectorlib is a tool that makes selecting, marking up, and extracting data from web pages visual and easy. The Selectorlib Web Scraper Chrome Extension lets you mark data that you need to extract, and creates the CSS Selectors or XPaths needed to extract that data. Then previews how the data would look like. You can learn more about Selectorlib and how to use it here
If you just need the data we have shown above, you do not need to use Selectorlib. Since we have done that for you already and generated a simple “template” that you can just use. However, if you want to add a new field, you can use Selectorlib to add that field to the template.
Here is how we marked up the fields for the data we need to scrape using Selectorlib Chrome Extension.
Once you have created the template, click on ‘Highlight’ to highlight and preview all of your selectors. Finally, click on ‘Export’ and download the YAML file and that file is the booking.yml file.
Here is how our template – booking.yml looks like
hotels: css: div.sr_item multiple: true type: Text children: name: css: span.sr-hotel__name type: Text location: css: a.bui-link type: Text price: css: div.bui-price-display__value type: Text price_for: css: div.bui-price-display__label type: Text room_type: css: strong type: Text beds: css: div.c-beds-configuration type: Text rating: css: div.bui-review-score__badge type: Text rating_title: css: div.bui-review-score__title type: Text number_of_ratings: css: div.bui-review-score__text type: Text url: css: a.hotel_name_link type: Link
Running the Web Scraper
To run the scraper, from the project folder,
- Search in Booking.com for Hotels
- Copy and add the search result URLS to urls.txt
- Run
python3 scrape.py
- Get data from data.csv
Here is an example data from a search results page
You can parse the address scraped using this tutorial.
We can help with your data or automation needs
Turn the Internet into meaningful, structured and usable data
Disclaimer: Any code provided in our tutorials is for illustration and learning purposes only. We are not responsible for how it is used and assume no liability for any detrimental usage of the source code. The mere presence of this code on our site does not imply that we encourage scraping or scrape the websites referenced in the code and accompanying tutorial. The tutorials only help illustrate the technique of programming web scrapers for popular internet websites. We are not obligated to provide any support for the code, however, if you add your questions in the comments section, we may periodically address them.