How to scrape Yelp.com for Business Listings

Yelp.com is a reliable source for extracting information regarding local businesses such as Restaurants, Shops, Home Services, Automotive Services, etc. You can use web scraping to extract details like phone numbers, reviews, address, etc.

In this tutorial, we’ll search Yelp.com for restaurants in a City and extract the following data from the first page of results.

  1. Business Name
  2. Search Rank
  3. Number of Reviews
  4. Category
  5. Rating
  6. Address
  7. Price Range
  8. Business Detail Page URL

Below is a screenshot of some of the data we will be extracting from Yelp.com as part of this tutorial.

yelp-search-webscrape-data

Scraping Logic

  1. Construct the URL of the search results page from Yelp. For example, here is the one for Washington- https://www.yelp.com/search?find_desc=Restaurants&find_loc=Washington%2C+DC&ns=1. We’ll have to create this URL manually to scrape results from that page.
  2. Download HTML of the search result page using Python Requests – Quite easy, once you have the URL. We use python requests to download the entire HTML of this page.
  3. Parse the page using LXML – LXML lets you navigate the HTML Tree Structure using Xpaths. We have predefined the XPaths for the details we need in the code.
  4. Save the data to a CSV file.

Requirements

Install Python 3 and Pip

Here is a guide to install Python 3 in Linux – http://docs.python-guide.org/en/latest/starting/install3/linux/

Mac Users can follow this guide http://docs.python-guide.org/en/latest/starting/install3/osx/

Windows Users go here – https://www.scrapehero.com/how-to-install-python3-in-windows-10/

Packages

For this web scraping tutorial using Python 3, we will need some packages for downloading and parsing the HTML. Below are the package requirements:

The Code

You can download the code from the link https://gist.github.com/scrapehero/8c61789f3f0c9d1dbc6859b635de2e4f if the embed above does not work.

If you would like the code in Python 2.7 check out the link at https://gist.github.com/scrapehero/bde7d6ec5f1cb62b8482f2b2b4ca1a94.

Running the Scraper

Assuming the script is named yelp_search.py. If you type in the script name in command prompt or terminal with a -h

A keyword is any type business. You can use any business type Yelp.com has for example – Restaurants, Health, Home Services, Hotels, Education, etc.

Run the script using python with arguments for place and keyword. The argument for place can be provided as a location, address or zip code.

As an example, to find the top 10 restaurants in Washington D.C., we would put the arguments as 20001 for place and Restaurants for keyword:

This should create a CSV file called scraped_yelp_results_for_20001.csv that will be in the same folder as the script.

Here is some sample data extracted from Yelp.com for the command above.

You can download the code at https://gist.github.com/scrapehero/8c61789f3f0c9d1dbc6859b635de2e4f

Let us know in the comments how this scraper worked for you.

Known Limitations

This code should be capable of scraping the details of most cities. If you want to scrape the details of thousands of pages you should read  Scalable do-it-yourself scraping – How to build and run scrapers on a large scale and How to prevent getting blacklisted while scraping .

If you need some professional help with scraping websites contact us by filling up the form below.

Tell us about your complex web scraping projects

Turn websites into meaningful and structured data through our web data extraction service

Disclaimer: Any code provided in our tutorials is for illustration and learning purposes only. We are not responsible for how it is used and assume no liability for any detrimental usage of the source code. The mere presence of this code on our site does not imply that we encourage scraping or scrape the websites referenced in the code and accompanying tutorial. The tutorials only help illustrate the technique of programming web scrapers for popular internet websites. We are not obligated to provide any support for the code, however, if you add your questions in the comments section, we may periodically address them.

Join the conversation


Turn websites into meaningful and structured data through our web data extraction service