How to Scrape Real Estate Listings from Zillow.com using Python and LXML

Web Scraping is a viable option to keep track of real estate listings available for sellers and agents. Being in possession of extracted information from real estate sites such as Zillow.com can help adjust prices of listings on your site or help you create a database for your business.

In this tutorial, we will scrape Zillow.com, an online real estate database to extract real estate listings available. This real estate scraper will extract details of property listings based on zip code.

Here are the following details we will be extracting:

  1. Title
  2. Street Name
  3. City
  4. State
  5. Zip Code
  6. Price
  7. Facts and Features
  8. Real Estate Provider
  9. URL

Below is a screenshot of some of the data fields we will be extracting

details-features-to-scrape-on-zillow

Scraping Logic

  1. Construct the URL of the search results page from Zillow. For example, here is the one for Boston- https://www.zillow.com/homes/02126_rb/. We’ll have to create this URL manually to scrape results from that page.
  2. Download HTML of the search result page using Python Requests – Quite easy, once you have the URL. We use python requests to download the entire HTML of this page.
  3. Parse the page using LXML – LXML lets you navigate the HTML Tree Structure using Xpaths. We have predefined the XPaths for the details we need in the code.
  4. Save the data to a CSV file.

Required Tools

Install Python 3 and Pip

Here is a guide to install Python 3 in Linux – http://docs.python-guide.org/en/latest/starting/install3/linux/

Mac Users can follow this guide – http://docs.python-guide.org/en/latest/starting/install3/osx/

Windows Users go here – https://www.scrapehero.com/how-to-install-python3-in-windows-10/

Packages

For this web scraping tutorial using Python 3, we will need some packages for downloading and parsing the HTML. Below are the package requirements:

The Code

You can download the code from the link here https://gist.github.com/scrapehero/5f51f344d68cf2c022eb2d23a2f1cf95  if the embed does not work.

If you would like the code in Python 2.7, you can check out the link at https://gist.github.com/scrapehero/2dd61d0f1bd5222a4c9ae76465990cbd

Running the Scraper

Assume the script is named, zillow.py. When you type in the script name in a command prompt or terminal with a -h

You must run the script using python with arguments for zip code and sort. The sort argument has the options ‘newest’ and ‘cheapest’ listings available. As an example, to find the listings of the newest properties up for sale in Boston, Massachusetts we would run the script as:
This will create a CSV file called properties-02126.csv that will be in the same folder as the script. Here is some sample data extracted from Zillow.com for the command above.

extracted-results-from-web-scraping

You can download the code at https://gist.github.com/scrapehero/5f51f344d68cf2c022eb2d23a2f1cf95

Known Limitations

This script should be able to scrape real estate listings of most zipcodes provided. If you would like to scrape the details of thousands of pages, you should read Scalable do-it-yourself scraping – How to build and run scrapers on a large scale and How to prevent getting blacklisted while scraping.

If you need some professional help with scraping complex websites, you can fill up the form below.

You can also get data delivered to you, as a Service from us. Interested?

Turn websites into meaningful and structured data through our web data extraction service

Disclaimer: Any code provided in our tutorials is for illustration and learning purposes only. We are not responsible for how it is used and assume no liability for any detrimental usage of the source code. The mere presence of this code on our site does not imply that we encourage scraping or scrape the websites referenced in the code and accompanying tutorial. The tutorials only help illustrate the technique of programming web scrapers for popular internet websites. We are not obligated to provide any support for the code, however, if you add your questions in the comments section, we may periodically address them.

15 comments on “How to Scrape Real Estate Listings from Zillow.com using Python and LXML

Alex Clark

Excellent tutorial! Is it possible to have the code search individual properties? For example if I had a list of 600 addresses in excel, could we perform the scrape in this article on each of these properties? Can it be modified to use the specific address as a dynamic user input into the website and then automatically return each property in order?

Thank you in advance for your time.

    ScrapeHero

    Hi Alex,
    Thanks for your feedback.
    The tutorials provide a starting point for your own specific use cases.
    Yes you should be able to do what you need by using excel macros and building a a scraping api endpoint.
    Thanks

Howard

Great script! And very easy to use. However, it only reaches the first page of the Zillow results — in other words, 25 houses. Is there a way to get it to pull all houses available in the zipcode searched?

    ScrapeHero

    Hi Howard – that is an exercise for the reader – a key point of the tutorial. Have find coding !

zj

Nice article, it seems zillow will only show 500 (25/page * 20 pages) data, in order to fetch all datas in some super hot zipcodes. We might need to add filters to the search which I think price range (0-10k, 10k-20k) might be a useful attribute to narrow down the data size. The problem with price range is that it hugely depend on the area, by which I mean if some area only have houses price from 3m – 4m we will go through a bunch of unnecessary requests and this will increase the probability to get blocked. What do you think is the best attribute to limit results within 500?

Dev0

Ran the script, but it gets detected and Zillow is passing back a captcha now. “Please verify you’re a human to continue”

Frank Garcia

Awesome article. When attempting to run it on the agents page it returns empty string using multiple classes. In example this page: https://www.zillow.com/malibu-ca/real-estate-agent-reviews/?sortBy=None&page=3&showAdvancedItems=False&regionID=12520&locationText=Malibu%20CA

I see there is another head and body loading at the end of the page is this a trick to prevent it from being scraped? Do you have any work around to this?

Rob – Land Buyers

Nice scraping logic. A few tweaks and you can use the code on other real estate sites. Thanks!

Douglas Hitchcock

If scraping Zillow page, does output CSV capture the property feature codes (eg a RESO compliant set of listing features and description)

pilistanpilistan

just dont get it. why my csv was empty but fieldnames.

Cheng Peng

I’m also getting empty csv

kurt hansen

hi, same here, can’t get any csv output beyond the defined headers!

what to do?

thanks,
kurt

Join the conversation


Turn websites into meaningful and structured data through our web data extraction service