How to scrape Yelp for Business Listings

Yelp.com is a reliable source for extracting information regarding local businesses such as Restaurants, Shops, Home Services, Automotive Services, etc. You can use web scraping to scrape yelp data like phone numbers, reviews, address, etc. The scraper we build in this tutorial will scrape Yelp data for any keyword and location.

First, we will create a python scraper to extract Yelp business listings from the Yelp search result page for a particular keyword and location (zip code, state, city). This scraper will extract details such as business name, rank, rating, review count, and business URL.

Then we will create a Yelp business details scraper that will extract the data from a Yelp business URL based on the URLs extracted from the first scraper. This will scrape Yelp business pages for details such as business name, contact information, working hours, and amenities.

How to Build a Scraper to Scrape Yelp Data

  1. Construct the URL for the search results page from Yelp to extract the business listings data. (example – https://www.yelp.com/search?find_desc=Restaurants&find_loc=Washington%2C+DC&ns=1.)
  2. Download the HTML of the search result page using Python Requests and parse the page using LXML.
  3. Save the business listings data to a CSV file and select a business URL from the scraped data.
  4. Download the HTML of the selected business URL using Requests and parse it using LXML.
  5. Scrape Yelp data and save it as a JSON file.

Scrape Yelp for Free

Learn to scrape Yelp on a large scale using the Yelp Business Listing Crawler available on ScrapeHero Cloud.

No coding required – All you have to do is input the list of business or search URLs you want to scrape with selected filters. You will get yelp data in minutes.


Read More – Analysis of family owned restaurants in US

Yelp Business Listings Scraper

We will be extracting the following details from a business listing page from Yelp:

  1. Business Name
  2. Rank
  3. Number of Reviews
  4. Category
  5. Rating
  6. Address
  7. Price Range
  8. Business URL

Below is a screenshot of the data we will be extracting from Yelp.

web-scraping-details-from-yelp

Requirements

Install Python 3 and Pip:

Here are guides to installing Python in Linux, Mac, and Windows systems.

Linux – http://docs.python-guide.org/en/latest/starting/install3/linux/
Mac – http://docs.python-guide.org/en/latest/starting/install3/osx/
Windows – https://www.scrapehero.com/how-to-install-python3-in-windows-10/

Packages

For this web scraping tutorial using Python 3, we will need some packages for downloading and parsing the HTML. Below are the package requirements:

Constructing Input URL

We will need to input a search result URL to the scraper. For example, here is the one for Washington- https://www.yelp.com/search?find_desc=Restaurants&find_loc=Washington%2C+DC&ns=1

We’ll have to create this URL manually to scrape the business listings from that page.

The Code to Scrape Yelp Data

You can download the code from the GitHub link https://gist.github.com/scrapehero/8c61789f3f0c9d1dbc6859b635de2e4f 

If you would like the code in Python 2.7 check out the link at https://gist.github.com/scrapehero/bde7d6ec5f1cb62b8482f2b2b4ca1a94.

Running the Scraper

Save the script with any name, we saved this as yelp_search.py. If you type in the script name in command prompt or terminal along with a -h:

usage: yelp_search.py [-h] place keyword
positional arguments:
 place    Location/ Address/ zip code
 keyword  Any keyword
optional arguments:
 -h, --help show this help message and exit

A keyword is any type of business. You can use any business type available in Yelp.com such as – Restaurants, Health, Home Services, Hotels, Education, etc.

Run the script using python with arguments for place and keyword. The argument for place can be provided as a location, address, or zip code.

Here is how to run the command to find top 10 restaurants in Washington D.C. Put the arguments as 20001 for place and Restaurants for keyword:

python3 yelp_search.py 20001 Restaurants

This should create a CSV file called scraped_yelp_results_for_20001.csv that will be in the same folder as the script.

Here is some sample data extracted from Yelp.com for the command above.

You can download the code at https://gist.github.com/scrapehero/8c61789f3f0c9d1dbc6859b635de2e4f

Let us know in the comments how this scraper worked for you.

Yelp Business Details Scraper

For the Yelp business details scraper, we will create a python script to download a business page from Yelp, and extract details from it. You can use URLs of businesses you are interested in OR the ones you got from the data scraped with the previous scraper.

Below is a screenshot of the data fields will be extracting from the business page:


The requirements are the same as the previous Yelp scraper. We won’t be using any extra packages or other complex tools here.

Download the HTML and Parse the Data

Download the HTML of the business detail page using Python Requests. We will use python requests to download the entire HTML of this page. Then parse the page using LXML – LXML lets you navigate the HTML Tree Structure using Xpaths. We have predefined the XPaths for the details we need in the code.

The Code to Scrape Yelp Data

Here is the link to the code on GIST- Yelp Business Details Code

Running the Scraper

Save the script with any name, we named this scraper yelp_business_details.py. Then type in python <space> scriptname in command prompt or terminal with an -h.

usage: yelp_business_details.py [-h] url
positional arguments:
url         yelp_business_details.py url
optional arguments:
-h, --help show this help message and exit

Here is the command to scrape the details for the restaurant ‘The Bird’ in Washington D.C:

python yelp_business_details.py https://www.yelp.com/biz/the-bird-washington?osq=Restaurants

The script will automatically create a file called scraped_data-the-bird-washington?osq=Restaurants.json with the scraped data from Yelp.com.

The output file would look similar to this

{
"info": [
{
"Takes Reservations": "Yes"
}, 
{
"Delivery": "No"
}, 
{
"Take-out": "Yes"
}, 
{
"Accepts Credit Cards": "Yes"
}, 
{
"Accepts Android Pay": "No"
}, 
{
"Good For": "Dinner"
}, 
{
"Parking": "Street"
}, 
{
"Bike Parking": "Yes"
}, 
{
"Wheelchair Accessible": "Yes"
}, 
{
"Good for Kids": "No"
}, 
{
"Good for Groups": "Yes"
}, 
{
"Attire": "Casual"
}, 
{
"Ambience": "Trendy"
}, 
{
"Noise Level": "Average"
}, 
{
"Alcohol": "Full Bar"
}, 
{
"Outdoor Seating": "Yes"
}, 
{
"Wi-Fi": "Free"
}, 
{
"Has TV": "Yes"
}, 
{
"Waiter Service": "Yes"
}, 
{
"Caters": "Yes"
}
], 
"ratings": "4.5", 
"website": "http://www.thebirddc.com", 
"working_hours": [
{
"Mon": "4:00 pm - 10:30 pmn        n                Closed now"
}, 
{
"Tue": "4:00 pm - 10:30 pm"
}, 
{
"Wed": "4:00 pm - 10:30 pm"
}, 
{
"Thu": "4:00 pm - 10:30 pm"
}, 
{
"Fri": "4:00 pm - 11:30 pm"
}, 
{
"Sat": "10:00 am - 11:30 pm"
}, 
{
"Sun": "10:00 am - 10:30 pm"
}
], 
"name": "The Bird", 
"claimed_status": "Claimed", 
"url": "https://www.yelp.com/biz/the-bird-washington?osq=Restaurants", 
"longitude": "-77.026685", 
"reviews": "84 reviews", 
"phone": "(202) 518-3609", 
"address": "1337 11th St NW Washington, DC 20001 b/t N O St & N N St Shaw", 
"latitude": "38.908420", 
"ratings_histogram": [
{
"5 stars": "54"
}, 
{
"4 stars": "22"
}, 
{
"3 stars": "3"
}, 
{
"2 stars": "3"
}, 
{
"1 star": "2"
}
], 
"price_range": "$11-30", 
"health_rating": "", 
"category": "American (New),Breakfast & Brunch"
}

You can extend this further to a database like MongoDB or MySQL.

You might wonder why we’re using JSON here when we used CSV in the previous scraper. The data we scraped in the first Yelp scraper has only rows and columns and fits well in a CSV format. This one has many more details and is quite hard to fit into a CSV (unless you want to look at a CSV which has more than 20 rows). You can read more about choosing a data format for your project, if you are new to this.

This scraper can only scrape Yelp data of one business URL at a time, if you want to scrape hundreds of business URLs at once you use the Yelp Business Listing Crawler available on ScrapeHero Cloud. 

If you don't like or want to code, ScrapeHero Cloud is just right for you!

Skip the hassle of installing software, programming and maintaining the code. Download this data using ScrapeHero cloud within seconds.

Get Started for Free
Deploy to ScrapeHero Cloud

Scrape Yelp using ScrapeHero Cloud

You can copy the business URLs scraped from the initial scraper or search URLs, and paste it into the Yelp crawler along with selected filters. You will be able to extract details such as business name, ranking, working hours, amenities, contact information, rating history, and more. Using this crawler you will get updated data within minutes. You can also use the contact information to generate leads using the Contact Details Scraper.

Scrape Yelp without coding

ScrapeHero Cloud has pre-built scrapers that help businesses to easily gather data from websites such as Yelp, Amazon, and Walmart. These scrapers are easy to use and cloud-based, you need not worry about selecting the fields to be scraped nor download any software. The scraper and the data can be accessed from any browser at any time and can deliver the data directly to Dropbox.

Known Limitations

This code should be capable of scraping the details of most cities. If you want to scrape the details of thousands of pages you should read  Scalable do-it-yourself scraping – How to build and run scrapers on a large scale and How to prevent getting blacklisted while scraping.

If you need some professional help with scraping websites contact us by filling up the form below.

Tell us about your complex web scraping projects

Turn the Internet into meaningful, structured and usable data



Please DO NOT contact us for any help with our Tutorials and Code using this form or by calling us, instead please add a comment to the bottom of the tutorial page for help

Disclaimer: Any code provided in our tutorials is for illustration and learning purposes only. We are not responsible for how it is used and assume no liability for any detrimental usage of the source code. The mere presence of this code on our site does not imply that we encourage scraping or scrape the websites referenced in the code and accompanying tutorial. The tutorials only help illustrate the technique of programming web scrapers for popular internet websites. We are not obligated to provide any support for the code, however, if you add your questions in the comments section, we may periodically address them.

Posted in:   Business Listings, Web Scraping Tutorials

Responses

Geena November 25, 2018

Do you have a more step by step guide? Where do you import the code?

Reply

Sparsh Garg November 30, 2018

This gives an empty file in the output

Reply

    rijesh December 4, 2018

    It seems yelp is A/B testing its UI. We’ve updated our code to handle both the cases

    Reply

      Christina February 13, 2019

      I still get an empty csv. Help.

      Reply

        rijesh February 15, 2019

        There was an issue with the parser failing due to the ads present in the listing page. We’ve handled this case and updated the code. It should work fine now, please try.

        Reply

          luis perez September 30, 2021

          this still giving me an empty csv file


Ven June 17, 2019

How can I add Longitude and Latitude columns to the code

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Turn the Internet into meaningful, structured and usable data