How to scrape Business Details using Python and LXML

This tutorial is a follow-up of How to scrape for Business Listings using Python. In this tutorial, we will show you how to extract data from the detail page of a business in You can use URLs of businesses you are interested in OR the ones you got from part one of this tutorial. Lets create a python script and download a restaurant page from, and extract details from it.

Here is the data that we are going to extract from the restaurant page:

  1. Website URL
  2. Ranking
  3. Working hours
  4. Category
  5. Phone Number
  6. Address
  7. Price Range
  8. Health Rating
  9. Claimed Status
  10. Ratings
  11. Additional Info

Below is a screenshot of the data that we will be extracting

You can scrape a lot more information from the business detail URL as you wish, but we’ll stick to these for now.

Scraping Logic

  1. Download HTML of the hotel detail page using Python Requests – Quite easy, once you have the URL. We use python requests to download the entire HTML of this page.
  2. Parse the page using LXML – LXML lets you navigate the HTML Tree Structure using Xpaths. We have predefined the XPaths for the details we need in the code.
  3. Save the data as JSON to a file. You might wonder why we’re using JSON here when we used CSV in the previous post. The data we scraped in part one of this tutorial has only rows and columns and fits well in a CSV format. This one has many more details and is quite hard to fit into a CSV(unless you want to look at a CSV which has more than 20 rows). You can read more about choosing a data format for your project, if you are new to this.

You could connect this scraper to the previous scraper built on How to scrape business listings from using Python, and have an automated workflow that sends you emails or writes data to a database instead of a JSON file. We are not going to do that here, as its beyond the scope of this simple tutorial.


The requirements are pretty much the same as before, as we won’t be using any other complex tools here.

  • Python 2.7 ( )
  • PIP to install the  following packages in Python (
  • Python Requests, to make requests and download the HTML content of the pages (
  • Python LXML, for parsing the HTML Tree Structure using Xpaths ( Learn how to install that here – )

The Code


If you can’t see the embed above or if you would like to download the code, here is the link to it on GIST-

Running the Scraper

Assuming you named your scraper, if you type in python <space> scriptname in a command prompt or terminal with an -h.

For example the Restaurant- The Bird, Washington DC whose URL is

The script will automatically create a file called scraped_data-the-bird-washington?osq=Restaurants.json with the scraped data from

The output file would look similar to this

You can extend this further to a database like MongoDB or MySQL.

Known Limitations

This code should work for grabbing basic details from most business URLs. However, if you want to scrape for thousands of pages there are some important things you should be aware of, and you can read about them at Scalable do-it-yourself scraping – How to build and run scrapers on a large scale and How to prevent getting blacklisted while scraping.

If you are looking for some professional help with scraping complex websites, let us know by filling up the form below.

Tell us about your complex web scraping projects

Turn websites into meaningful and structured data through our web data extraction service


Disclaimer: Any code provided in our tutorials is for illustration and learning purposes only. We are not responsible for how it is used and assume no liability for any detrimental usage of the source code. The mere presence of this code on our site does not imply that we encourage scraping or scrape the websites referenced in the code and accompanying tutorial. The tutorials only help illustrate the technique of programming web scrapers for popular internet websites. We are not obligated to provide any support for the code, however, if you add your questions in the comments section, we may periodically address them.

Join the conversation

Turn websites into meaningful and structured data through our web data extraction service