Tutorial: Web Scraping Hotel Prices using Selenium and Python

Everyone would like to pay the least amount of money for the best hotel room – simple isn’t it?

In this tutorial we will show you how to make your own little tracking web scraper for scraping Hotels.com so that you can snag the room you want at the lowest rate. All you need to do is change the City, the Check In and Check Out date and run it on a schedule.

The idea and need being simple, let’s jump straight to the code.

Feel free to copy and modify it to your needs – that is the best way to learn ! You can download the code directly from here .

Pre-Requisites

Below are the frameworks used:

  1. Selenium Web Driver – a framework that is widely using for automating routines in Web Browsers for scraping and testing purposes. Selenium receives commands such as – load a page, click a location or button etc from the scraper. We can also read what is being rendered in the browser.
    Learn how to install Selenium here –  http://www.seleniumhq.org/download and install the Python Bindings for Selenium here – http://selenium-python.readthedocs.io/installation.html
  2. LXML for extracting data from the page source HTML. LXML lets you parse HTML / XML tree structure using Xpaths. Read more on XPaths here XPaths and their relevance in Web ScrapingLearn how to install that here – http://lxml.de/installation.html 
  3. Python 2.7 available here ( https://www.python.org/downloads/ )

The Code

Open your favorite text editor and modify the line below with – City Name, Check In Date, Check Out Date and you’ll get the top 5 cheapest hotels to stay.

def parse(url):
    searchKey = "Las Vegas" # Change this to your city
    checkInDate = '27/08/2016' #Format %d/%m/%Y - Replace date here
    checkOutDate = '29/08/2016' #Format %d/%m/%Y - replace date here
    response = webdriver.Firefox()
  

And run this from the command prompt like this ( if you name the file hotels_scraper.py )

python hotels_scraper.py

This should print the results in the command prompt as a python dictionary.

For Las Vegas the output looks like this

[
  {
    "countryName": "United States", 
    "rating": 2.0, 
    "hotelName": "Antonio Hotel", 
    "address": "229 N Soto Street", 
    "postalCode": "90033", 
    "price": "$241", 
    "locality": "Los Angeles", 
    "region": "CA"
  }, 
  {
    "countryName": "United States", 
    "rating": 2.0, 
    "hotelName": "Comet Motel", 
    "address": "10808 Avalon Boulevard", 
    "postalCode": "90061", 
    "price": "$124", 
    "locality": "Los Angeles", 
    "region": "CA"
  }, 
  {
    "countryName": "United States", 
    "rating": 2.5, 
    "hotelName": "Arthur Emery", 
    "address": "907 W 17th Street", 
    "postalCode": "90015", 
    "price": "$298", 
    "locality": "Los Angeles", 
    "region": "CA"
  }, 
  {
    "countryName": "United States", 
    "rating": 2.5, 
    "hotelName": "LA Ramona Motel", 
    "address": "3211 W. Jefferson Blvd", 
    "postalCode": "90018", 
    "price": "$125", 
    "locality": "Los Angeles", 
    "region": "CA"
  }, 
  {
    "countryName": "United States", 
    "rating": 2.0, 
    "hotelName": "Central Inn Motel", 
    "address": "954 E 88th St", 
    "postalCode": "90002", 
    "price": "$185", 
    "locality": "Los Angeles", 
    "region": "CA"
  }
]

You can modify this code a bit and connect it to chatbots in Slack, Facebook or email etc to find the cheapest room rates.

The code above is good for small-scale scraping for fun. If you want to scrape some hotel pricing details from thousands of pages you should read Scalable do-it-yourself scraping – How to build and run scrapers on a large scale

If you need a faster option you can use Puppeteer, a Node.js library that controls headless Chrome or Chromium.

Web Scraping Tutorial using a Headless Browser: How to Build a Web Scraper using Puppeteer and Node.js

Need some help with scraping data?

Turn the Internet into meaningful, structured and usable data



Disclaimer: Any code provided in our tutorials is for illustration and learning purposes only. We are not responsible for how it is used and assume no liability for any detrimental usage of the source code. The mere presence of this code on our site does not imply that we encourage scraping or scrape the websites referenced in the code and accompanying tutorial. The tutorials only help illustrate the technique of programming web scrapers for popular internet websites. We are not obligated to provide any support for the code, however, if you add your questions in the comments section, we may periodically address them.

Responses

Stan L August 16, 2018

Hi SHero, thank you for the tutorial blog! very clear and concise instructions for scraping data across various types of websites. One challenge I am facing is scraping data from a website such as Forbes. ” https://www.forbes.com/top-wealth-managers ” It looks like some scripts get actioned upon the first attempt to the website and pops-up a Forbe’s Quote Window. I am very curious to know how we can bypass this window without using Selenium to action the “Continue to Site Button”. Second Question; is there an alternative method to Selenium for clicking buttons in the page or send words in a text box on the webpage, while staying with the convenience method of lxml (that is you don’t need a browser or headless browser to run the scraping script to do the task mentioned above). Thanks!

Reply

    ScrapeHero August 16, 2018

    Thanks for the comment.
    You really have to poke through the whole request and responses and cookies as you navigate the site.
    Something in that exchange signals the site to show or not show the page.
    Set that value most likely in the cookie and that might help.

    Reply

Comments or Questions?

Turn the Internet into meaningful, structured and usable data