How to Scrape Booking.com for Hotel Data

Share:

scrape-booking-com-twitter

Table of Content

This tutorial will show you how to scrape hotel data and pricing from Booking.com using Python and Selectorlib. You can use this for scraping hotel data from Booking.com.

How to scrape Booking.com

  1. Search on Booking.com for Hotels with your conditions like Location, Check In Date, Check Out Date, Room Type, Number of People, etc.
  2. Copy the Search Result URL and pass it to hotel scraper.
  3. In the scraper, we will download this URL using Python Requests
  4. We will then parse this HTML using Selectorlib Template to extract the fields like Name, Location, Room Type etc.
  5. Scraper will then save the data to a CSV file

This hotel data scraper will extract the following data. You may add more fields

  1. Hotel Name
  2. Hotel Location
  3. Type of Room
  4. Price
  5. Price For (eg: 1 night, 2 Adults)
  6. Bed Type
  7. Overall Rating
  8. Rating Tile
  9. Number of Reviews
  10. Link

Install the packages needed for running the Booking Scraper

Follow this guide to setup your computer and install packages:

How To Install Python Packages for Web Scraping in Windows 10

We will need the following Python 3 Packages

  • Python Requests, to make requests and download the HTML content of the Search Result page from Booking
  • SelectorLib python package to extract data using the YAML file we created from the webpages we download.

Install them using pip3

pip3 install requests selectorlib

The Code

All the code used in this tutorial is available for download from Github at Booking.com Web Scraper

Lets create our project folder called booking-hotel-scraper. In the folder, add a Python file called scrape.py

Paste the code below in scrape.py

from selectorlib import Extractor
import requests 
from time import sleep
import csv

# Create an Extractor by reading from the YAML file
e = Extractor.from_yaml_file('booking.yml')

def scrape(url):    
    headers = {
        'Connection': 'keep-alive',
        'Pragma': 'no-cache',
        'Cache-Control': 'no-cache',
        'DNT': '1',
        'Upgrade-Insecure-Requests': '1',
        # You may want to change the user agent if you get blocked
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.113 Safari/537.36',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',

        'Referer': 'https://www.booking.com/index.en-gb.html',
        'Accept-Language': 'en-GB,en-US;q=0.9,en;q=0.8',
    }

    # Download the page using requests
    print("Downloading %s"%url)
    r = requests.get(url, headers=headers)
    # Pass the HTML of the page and create 
    return e.extract(r.text,base_url=url)


with open("urls.txt",'r') as urllist, open('data.csv','w') as outfile:
    fieldnames = [
        "name",
        "location",
        "price",
        "price_for",
        "room_type",
        "beds",
        "rating",
        "rating_title",
        "number_of_ratings",
        "url"
    ]
    writer = csv.DictWriter(outfile, fieldnames=fieldnames,quoting=csv.QUOTE_ALL)
    writer.writeheader()
    for url in urllist.readlines():
        data = scrape(url) 
        if data:
            for h in data['hotels']:
                writer.writerow(h)
            # sleep(5)
    

The code above will

  1. Open a file called urls.txt and download the HTML content for each link in it
  2. Parse the HTML using the Selectorlib Template called booking.yml
  3. Save the output to a CSV file called data.csv

Let’s create the file urls.txt and paste our search result URLs into it, then lets go ahead and create our Selectorlib Template.

Create Selectorlib Template to Scrape Hotel Data from Booking.com Search Results

You will notice that in the code above that we used a file called booking.yml. This file is what makes the code in this tutorial so concise and easy. The magic behind creating this file is a Web Scraper tool called Selectorlib.

Selectorlib is a tool that makes selecting, marking up, and extracting data from web pages visual and easy. The Selectorlib Web Scraper Chrome Extension lets you mark data that you need to extract, and creates the CSS Selectors or XPaths needed to extract that data. Then previews how the data would look like. You can learn more about Selectorlib and how to use it here

If you just need the data we have shown above, you do not need to use Selectorlib. Since we have done that for you already and generated a simple “template” that you can just use. However, if you want to add a new field, you can use Selectorlib to add that field to the template.

Here is how we marked up the fields for the data we need to scrape using Selectorlib Chrome Extension.

hotel-data-scraper

Once you have created the template, click on ‘Highlight’ to highlight and preview all of your selectors. Finally, click on ‘Export’ and download the YAML file and that file is the booking.yml file.

hotel-scraper

Here is how our template – booking.yml looks like

hotels:
    css: div.sr_item
    multiple: true
    type: Text
    children:
        name:
            css: span.sr-hotel__name
            type: Text
        location:
            css: a.bui-link
            type: Text
        price:
            css: div.bui-price-display__value
            type: Text
        price_for:
            css: div.bui-price-display__label
            type: Text
        room_type:
            css: strong
            type: Text
        beds:
            css: div.c-beds-configuration
            type: Text
        rating:
            css: div.bui-review-score__badge
            type: Text
        rating_title:
            css: div.bui-review-score__title
            type: Text
        number_of_ratings:
            css: div.bui-review-score__text
            type: Text
        url:
            css: a.hotel_name_link
            type: Link

 

Running the Web Scraper

To run the scraper, from the project folder,

  1. Search in Booking.com for Hotels
  2. Copy and add the search result URLS to urls.txt
  3. Run python3 scrape.py
  4. Get data from data.csv

Here is an example data from a search results page 

You can parse the address scraped using this tutorial.

scraping-hotel

 

We can help with your data or automation needs

Turn the Internet into meaningful, structured and usable data



Please DO NOT contact us for any help with our Tutorials and Code using this form or by calling us, instead please add a comment to the bottom of the tutorial page for help

Disclaimer: Any code provided in our tutorials is for illustration and learning purposes only. We are not responsible for how it is used and assume no liability for any detrimental usage of the source code. The mere presence of this code on our site does not imply that we encourage scraping or scrape the websites referenced in the code and accompanying tutorial. The tutorials only help illustrate the technique of programming web scrapers for popular internet websites. We are not obligated to provide any support for the code, however, if you add your questions in the comments section, we may periodically address them.

Table of content

Scrape any website, any format, no sweat.

ScrapeHero is the real deal for enterprise-grade scraping.

Ready to turn the internet into meaningful and usable data?

Contact us to schedule a brief, introductory call with our experts and learn how we can assist your needs.

Continue Reading

NoSQL vs. SQL databases

Stuck Choosing a Database? Explore NoSQL vs. SQL Databases in Detail

Find out which SQL and NoSQL databases are best suited to store your scraped data.
Scrape JavaScript-Rich Websites

Upgrade Your Web Scraping Skills: Scrape JavaScript-Rich Websites

Learn all about scraping JavaScript-rich websites.
Web scraping with mechanicalsoup

Ditch Multiple Libraries by Web Scraping with MechanicalSoup

Learn how you can replace Python requests and BeautifulSoup with MechanicalSoup.
ScrapeHero Logo

Can we help you get some data?