This tutorial is a follow-up of How to scrape TripAdvisor.com for Hotels in a City using Python. In this tutorial, we’ll look at scraping hotel details from a Hotel URL. Let’s build a simple python script to download a hotel detail page from Tripadvisor.com, and extract details from it.
We’ll extract these data from a hotel’s page:
- Rating Summary
- Total Number of Reviews
- Additional Info
There is a lot more information that would be interesting to extract from the hotel detail page of TripAdvisor. For the sake of simplicity, we’ll just stick to the ones above.
- Download HTML of the hotel detail page using Python Requests – Quite easy, once you have the URL. We use python requests to download the entire HTML of this page.
- Parse the page using LXML – LXML lets you navigate the HTML Tree Structure using Xpaths. We have predefined the XPaths for the details we need in the code.
- Save the data as JSON to a file.
You could connect this scraper to the previous scraper built on How to scrape TripAdvisor.com for Hotels in a City using Python. We’ll leave that to you to figure out.
For this web scraping tutorial using Python 3, we will need some packages for downloading and parsing the HTML. Below are the package requirements.
Install Python 3 and Pip
Here is a guide to install Python 3 in Linux – http://docs.python-guide.org/en/latest/starting/install3/linux/
Mac Users can follow this guide – http://docs.python-guide.org/en/latest/starting/install3/osx/
Windows Users go here – https://www.scrapehero.com/how-to-install-python3-in-windows-10/
- PIP to install the following packages in Python (https://pip.pypa.io/en/stable/installing/)
- Python Requests, to make requests and download the HTML content of the pages ( http://docs.python-requests.org/en/master/user/install/).
- Python LXML, for parsing the HTML Tree Structure using Xpaths (Learn how to install that here – http://lxml.de/installation.html)
The code is self-explanatory as before.
If you can’t see the embed above Or if you want to download the code, here is the link to it – https://gist.github.com/scrapehero/1b26ad7fd8db1023defa1f4afd49bdbb
If you would like the code in Python 2.7, check this link https://gist.github.com/scrapehero/f8f7241e32ac0f21f97db3c2f8ecf576
Running the Scraper
Assuming you named your scraper tripadvisor_scraper_hotel.py. If you type in
python <space> scriptname in a command prompt or terminal with an -h.
python tripadvisor_scraper_hotel.py -h
usage: tripadvisor_scraper_hotel.py [-h] url
url Tripadvisor hotel url
-h, --help show this help message and exit
For example – Hotel – Langham Place, New York, Fifth Avenue whose URL is https://www.tripadvisor.com/Hotel_Review-g60763-d1776857-Reviews-Langham_Place_New_York_Fifth_Avenue-New_York_City_New_York.html
python tripadvisor_scraper_hotel.py https://www.tripadvisor.com/Hotel_Review-g60763-d1776857-Reviews-Langham_Place_New_York_Fifth_Avenue-New_York_City_New_York.html
The script will automatically create a file called tripadvisor_hotel_scraped_data.json with the scraped data from TripAdvisor.
Which would look similar to
"Very good": "191",
"highlights": "Free High Speed Internet ( WiFi ),Restaurant,Fitness Center with Gym / Workout Room,Room Service,Spa",
"About the property": "Wheelchair access ,Non-Smoking Hotel ,Pets Allowed ( Dog / Pet Friendly ) ",
"Things to do": "Restaurant ,Fitness Center with Gym / Workout Room ,Spa ",
"Room types": "Suites ,Non-Smoking Rooms ,Family Rooms ",
"In your room": "Air Conditioning ,Minibar ",
"Internet": "Free Internet ,Free High Speed Internet ( WiFi ) ,Paid Wifi ",
"Services": "Room Service ,Children Activities (Kid / Family Friendly) ,Airport Transportation ,Dry Cleaning ,Meeting Rooms ,Business Center with Internet Access ,Laundry Service ,Concierge ,Multilingual Staff ,Conference Facilities ,Breakfast Available ,Banquet Room ,Babysitting "
"name": "Langham Place, New York, Fifth Avenue",
"country": "United States",
"street_address": "400 Fifth Avenue",
"locality": "New York City"
"official_description": "Gracefully complementing Manhattan's luxurious Fifth Avenue and boldly accenting New York City's famous skyline, Langham Place, New York, Fifth Avenue is a renewed celebration of sophistication and style. Soaring more than sixty stories above the city, Langham Place, Fifth Avenue represents the new heart of the world's most inspired metropolis. Accommodations exude the very essence of luxury and elegance. Duxiana beds, Pratesi linens, and floor-to-ceiling rain showers leave guests breathing easy, if not breathless.",
"additional_info": "Address: 400 Fifth Avenue, New York City, NY 10018-2753 (Formerly The Setai Fifth Avenue) Location: United States > New York > New York City > Midtown , Garment District , Manhattan Price Range: $485 - $1,212 (Based on Average Rates for a Standard Room) Hotel Class:5 star \u2014 Langham Place, New York, Fifth Avenue 5* Number of rooms: 214 Reservation Options: TripAdvisor is proud to partner with Hotels.com, Booking.com, DerbySoft Ltd Shanghai HQ Supplier Direct, Prestigia, Expedia and Agoda so you can book your Langham Place, New York, Fifth Avenue reservations with confidence. We help millions of travelers each month to find the perfect hotel for both vacation and business trips, always with the best discounts and special offers. Hotel Style: #3 Spa Hotel in New York City #8 Family Hotel in New York City #9 Business Hotel in New York City #14 Romantic Hotel in New York City #16 Luxury Hotel in New York City",
"rank": "#19 of 464 Hotels in New York City"
You can extend this further by saving it to a database like MongoDB or MySQL ( might need some flattening of the JSON)
This code should work for grabbing basic details from most hotel URLs. However, if you want to scrape for thousands of pages there are some important things you should be aware of, and you can read about them at Scalable do-it-yourself scraping – How to build and run scrapers on a large scale and How to prevent getting blacklisted while scraping.
If you need some professional help with scraping complex websites like Tripadvisor let us know by filling up the form below.
Disclaimer: Any code provided in our tutorials is for illustration and learning purposes only. We are not responsible for how it is used and assume no liability for any detrimental usage of the source code. The mere presence of this code on our site does not imply that we encourage scraping or scrape the websites referenced in the code and accompanying tutorial. The tutorials only help illustrate the technique of programming web scrapers for popular internet websites. We are not obligated to provide any support for the code, however, if you add your questions in the comments section, we may periodically address them. Turn websites into meaningful and structured data through our web data extraction service
Tell us about your complex web scraping projects
Turn websites into meaningful and structured data through our web data extraction service