How to Scrape Booking.com for Hotel Data

This tutorial will show you how to scrape hotel data and pricing from Booking.com using Python and Selectorlib. You can use this for scraping hotel data from Booking.com.

How to scrape Booking.com

  1. Search on Booking.com for Hotels with your conditions like Location, Check In Date, Check Out Date, Room Type, Number of People, etc.
  2. Copy the Search Result URL and pass it to hotel scraper.
  3. In the scraper, we will download this URL using Python Requests
  4. We will then parse this HTML using Selectorlib Template to extract the fields like Name, Location, Room Type etc.
  5. Scraper will then save the data to a CSV file

This hotel data scraper will extract the following data. You may add more fields

  1. Hotel Name
  2. Hotel Location
  3. Type of Room
  4. Price
  5. Price For (eg: 1 night, 2 Adults)
  6. Bed Type
  7. Overall Rating
  8. Rating Tile
  9. Number of Reviews
  10. Link

Install the packages needed for running the Booking Scraper

Follow this guide to setup your computer and install packages:

How To Install Python Packages for Web Scraping in Windows 10

We will need the following Python 3 Packages

  • Python Requests, to make requests and download the HTML content of the Search Result page from Booking
  • SelectorLib python package to extract data using the YAML file we created from the webpages we download.

Install them using pip3

pip3 install requests selectorlib

The Code

All the code used in this tutorial is available for download from Github at Booking.com Web Scraper

Lets create our project folder called booking-hotel-scraper. In the folder, add a Python file called scrape.py

Paste the code below in scrape.py

from selectorlib import Extractor
import requests 
from time import sleep
import csv
# Create an Extractor by reading from the YAML file
e = Extractor.from_yaml_file('booking.yml')
def scrape(url):    
headers = {
'Connection': 'keep-alive',
'Pragma': 'no-cache',
'Cache-Control': 'no-cache',
'DNT': '1',
'Upgrade-Insecure-Requests': '1',
# You may want to change the user agent if you get blocked
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.113 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Referer': 'https://www.booking.com/index.en-gb.html',
'Accept-Language': 'en-GB,en-US;q=0.9,en;q=0.8',
}
# Download the page using requests
print("Downloading %s"%url)
r = requests.get(url, headers=headers)
# Pass the HTML of the page and create 
return e.extract(r.text,base_url=url)
with open("urls.txt",'r') as urllist, open('data.csv','w') as outfile:
fieldnames = [
"name",
"location",
"price",
"price_for",
"room_type",
"beds",
"rating",
"rating_title",
"number_of_ratings",
"url"
]
writer = csv.DictWriter(outfile, fieldnames=fieldnames,quoting=csv.QUOTE_ALL)
writer.writeheader()
for url in urllist.readlines():
data = scrape(url) 
if data:
for h in data['hotels']:
writer.writerow(h)
# sleep(5)

The code above will

  1. Open a file called urls.txt and download the HTML content for each link in it
  2. Parse the HTML using the Selectorlib Template called booking.yml
  3. Save the output to a CSV file called data.csv

Let’s create the file urls.txt and paste our search result URLs into it, then lets go ahead and create our Selectorlib Template.

Create Selectorlib Template to Scrape Hotel Data from Booking.com Search Results

You will notice that in the code above that we used a file called booking.yml. This file is what makes the code in this tutorial so concise and easy. The magic behind creating this file is a Web Scraper tool called Selectorlib.

Selectorlib is a tool that makes selecting, marking up, and extracting data from web pages visual and easy. The Selectorlib Web Scraper Chrome Extension lets you mark data that you need to extract, and creates the CSS Selectors or XPaths needed to extract that data. Then previews how the data would look like. You can learn more about Selectorlib and how to use it here

If you just need the data we have shown above, you do not need to use Selectorlib. Since we have done that for you already and generated a simple “template” that you can just use. However, if you want to add a new field, you can use Selectorlib to add that field to the template.

Here is how we marked up the fields for the data we need to scrape using Selectorlib Chrome Extension.

hotel-data-scraper

Once you have created the template, click on ‘Highlight’ to highlight and preview all of your selectors. Finally, click on ‘Export’ and download the YAML file and that file is the booking.yml file.

hotel-scraper

Here is how our template – booking.yml looks like

hotels:
css: div.sr_item
multiple: true
type: Text
children:
name:
css: span.sr-hotel__name
type: Text
location:
css: a.bui-link
type: Text
price:
css: div.bui-price-display__value
type: Text
price_for:
css: div.bui-price-display__label
type: Text
room_type:
css: strong
type: Text
beds:
css: div.c-beds-configuration
type: Text
rating:
css: div.bui-review-score__badge
type: Text
rating_title:
css: div.bui-review-score__title
type: Text
number_of_ratings:
css: div.bui-review-score__text
type: Text
url:
css: a.hotel_name_link
type: Link

 

Running the Web Scraper

To run the scraper, from the project folder,

  1. Search in Booking.com for Hotels
  2. Copy and add the search result URLS to urls.txt
  3. Run python3 scrape.py
  4. Get data from data.csv

Here is an example data from a search results page 

You can parse the address scraped using this tutorial.

scraping-hotel

 

We can help with your data or automation needs

Turn the Internet into meaningful, structured and usable data



Please DO NOT contact us for any help with our Tutorials and Code using this form or by calling us, instead please add a comment to the bottom of the tutorial page for help

Disclaimer: Any code provided in our tutorials is for illustration and learning purposes only. We are not responsible for how it is used and assume no liability for any detrimental usage of the source code. The mere presence of this code on our site does not imply that we encourage scraping or scrape the websites referenced in the code and accompanying tutorial. The tutorials only help illustrate the technique of programming web scrapers for popular internet websites. We are not obligated to provide any support for the code, however, if you add your questions in the comments section, we may periodically address them.

Posted in:   Travel Data Gathering Tutorials, Web Scraping Tutorials

Responses

nicowarschauer June 24, 2019

Guys, amazing stuff. Would you help me out modifying the JSON code so that the script also extracts hotel’s addresses?
Thanks!

Reply

flaviaanoukiebanshee September 5, 2019

Hey that’s great! I would like to have the same help asked by nicowarschauer. Thank you guys!

Reply

Hassan Qayyum September 11, 2020

is it allowed by booking.com robots.txt

Reply

marco May 19, 2021

Is it possible to extract the hotel reviews? What is the field?

Reply

Ricky June 10, 2021

it doesnt work when i run the scrape file. lot of indent errors and this one:
SyntaxError: ‘return’ outside function
Any suggestions>

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Turn the Internet into meaningful, structured and usable data   

ScrapeHero Logo

Can we help you get some data?