How to scrape Yelp.com for Business Listings

Yelp.com is a reliable source for extracting information regarding local businesses such as Restaurants, Shops, Home Services, Automotive Services, etc. You can use web scraping to extract yelp data like phone numbers, reviews, address, etc. In this tutorial, we will build a scraper to scrape Yelp data for any keyword and location.

Here are the steps to scrape Yelp data

  1. Construct the URL of the search results page from Yelp. Example- https://www.yelp.com/search?find_desc=Restaurants&find_loc=Washington%2C+DC&ns=1.
  2. Download HTML of the search result page using Python Requests.
  3. Parse the page using LXML – LXML lets you navigate the HTML Tree Structure using Xpaths.
  4. Save the data to a CSV file.

If you don't like or want to code, ScrapeHero Cloud is just right for you!

Skip the hassle of installing software, programming and maintaining the code. Download this data using ScrapeHero cloud within seconds.

Get Started for Free
Deploy to ScrapeHero Cloud

We will be extracting the following data from the yelp search results page.

  1. Business Name
  2. Search Rank
  3. Number of Reviews
  4. Category
  5. Rating
  6. Address
  7. Price Range
  8. Business Detail Page URL

Below is a screenshot of some of the data we will be extracting from Yelp.com as part of this tutorial.

yelp-search-webscrape-data

Requirements

Install Python 3 and Pip

Here is a guide to install Python 3 in Linux – http://docs.python-guide.org/en/latest/starting/install3/linux/

Mac Users can follow this guide – http://docs.python-guide.org/en/latest/starting/install3/osx/

Windows Users go here – https://www.scrapehero.com/how-to-install-python3-in-windows-10/

Packages

For this web scraping tutorial using Python 3, we will need some packages for downloading and parsing the HTML. Below are the package requirements:

Constructing Input URL

We will need to input a search result URL to the scraper. For example, here is the one for Washington- https://www.yelp.com/search?find_desc=Restaurants&find_loc=Washington%2C+DC&ns=1. We’ll have to create this URL manually to scrape results from that page.

The Code

https://gist.github.com/scrapehero/8c61789f3f0c9d1dbc6859b635de2e4f

You can download the code from the link https://gist.github.com/scrapehero/8c61789f3f0c9d1dbc6859b635de2e4f if the embed above does not work.

If you would like the code in Python 2.7 check out the link at https://gist.github.com/scrapehero/bde7d6ec5f1cb62b8482f2b2b4ca1a94.

If you don't like or want to code, ScrapeHero Cloud is just right for you!

Skip the hassle of installing software, programming and maintaining the code. Download this data using ScrapeHero cloud within seconds.

Get Started for Free
Deploy to ScrapeHero Cloud

Running the Scraper

Assuming the script is named yelp_search.py. If you type in the script name in command prompt or terminal with a -h

usage: yelp_search.py [-h] place keyword

positional arguments:
 place    Location/ Address/ zip code
 keyword  Any keyword

optional arguments:
 -h, --help show this help message and exit

A keyword is any type business. You can use any business type Yelp.com has for example – Restaurants, Health, Home Services, Hotels, Education, etc.

Run the script using python with arguments for place and keyword. The argument for place can be provided as a location, address or zip code.

As an example, to find the top 10 restaurants in Washington D.C., we would put the arguments as 20001 for place and Restaurants for keyword:

 python3 yelp_search.py 20001 Restaurants

This should create a CSV file called scraped_yelp_results_for_20001.csv that will be in the same folder as the script.

Here is some sample data extracted from Yelp.com for the command above.

You can download the code at https://gist.github.com/scrapehero/8c61789f3f0c9d1dbc6859b635de2e4f

Let us know in the comments how this scraper worked for you.

Known Limitations

This code should be capable of scraping the details of most cities. If you want to scrape the details of thousands of pages you should read  Scalable do-it-yourself scraping – How to build and run scrapers on a large scale and How to prevent getting blacklisted while scraping .

If you need some professional help with scraping websites contact us by filling up the form below.

Tell us about your complex web scraping projects

Turn the Internet into meaningful, structured and usable data



Please DO NOT contact us for any help with our Tutorials and Code using this form or by calling us, instead please add a comment to the bottom of the tutorial page for help

Disclaimer: Any code provided in our tutorials is for illustration and learning purposes only. We are not responsible for how it is used and assume no liability for any detrimental usage of the source code. The mere presence of this code on our site does not imply that we encourage scraping or scrape the websites referenced in the code and accompanying tutorial. The tutorials only help illustrate the technique of programming web scrapers for popular internet websites. We are not obligated to provide any support for the code, however, if you add your questions in the comments section, we may periodically address them.

Responses

Geena November 25, 2018

Do you have a more step by step guide? Where do you import the code?

Reply

Sparsh Garg November 30, 2018

This gives an empty file in the output

Reply

    rijesh December 4, 2018

    It seems yelp is A/B testing its UI. We’ve updated our code to handle both the cases

    Reply

      Christina February 13, 2019

      I still get an empty csv. Help.

      Reply

        rijesh February 15, 2019

        There was an issue with the parser failing due to the ads present in the listing page. We’ve handled this case and updated the code. It should work fine now, please try.

        Reply

Ven June 17, 2019

How can I add Longitude and Latitude columns to the code

Reply

Comments or Questions?

Turn the Internet into meaningful, structured and usable data   

ScrapeHero Logo

Can we help you get some data?