How to Scrape Real Estate Listings from Zillow.com using Python and LXML

Web Scraping is a viable option to keep track of real estate listings available for sellers and agents. Being in possession of extracted information from real estate sites such as Zillow.com can help adjust prices of listings on your site or help you create a database for your business.

In this tutorial, we will scrape Zillow.com, an online real estate database to extract real estate listings available. This real estate scraper will extract details of property listings based on zip code.

Here are the following details we will be extracting:

  1. Title
  2. Street Name
  3. City
  4. State
  5. Zip Code
  6. Price
  7. Facts and Features
  8. Real Estate Provider
  9. URL

Below is a screenshot of some of the data fields we will be extracting

details-features-to-scrape-on-zillow

Scraping Logic

  1. Construct the URL of the search results page from Zillow. For example, here is the one for Boston- https://www.zillow.com/homes/02126_rb/. We’ll have to create this URL manually to scrape results from that page.
  2. Download HTML of the search result page using Python Requests – Quite easy, once you have the URL. We use python requests to download the entire HTML of this page.
  3. Parse the page using LXML – LXML lets you navigate the HTML Tree Structure using Xpaths. We have predefined the XPaths for the details we need in the code.
  4. Save the data to a CSV file.

Required Tools

Install Python 3 and Pip

Here is a guide to install Python 3 in Linux – http://docs.python-guide.org/en/latest/starting/install3/linux/

Mac Users can follow this guide – http://docs.python-guide.org/en/latest/starting/install3/osx/

Windows Users go here – https://www.scrapehero.com/how-to-install-python3-in-windows-10/

Packages

For this web scraping tutorial using Python 3, we will need some packages for downloading and parsing the HTML. Below are the package requirements:

The Code

You can download the code from the link here https://gist.github.com/scrapehero/5f51f344d68cf2c022eb2d23a2f1cf95  if the embed does not work.

If you would like the code in Python 2.7, you can check out the link at https://gist.github.com/scrapehero/2dd61d0f1bd5222a4c9ae76465990cbd

Running the Scraper

Assume the script is named, zillow.py. When you type in the script name in a command prompt or terminal with a -h

usage: zillow.py [-h] zipcode sort

positional arguments:

  zipcode

  sort      
                available sort orders are :

                newest : Latest property details

                cheapest : Properties with cheapest price

optional arguments:

  -h, --help  show this help message and exit

You must run the script using python with arguments for zip code and sort. The sort argument has the options ‘newest’ and ‘cheapest’ listings available. As an example, to find the listings of the newest properties up for sale in Boston, Massachusetts we would run the script as:

python3 zillow.py 02126 newest

This will create a CSV file called properties-02126.csv that will be in the same folder as the script. Here is some sample data extracted from Zillow.com for the command above.

extracted-results-from-web-scraping

You can download the code at https://gist.github.com/scrapehero/5f51f344d68cf2c022eb2d23a2f1cf95

Known Limitations

This script should be able to scrape real estate listings of most zipcodes provided. If you would like to scrape the details of thousands of pages, you should read Scalable do-it-yourself scraping – How to build and run scrapers on a large scale and How to prevent getting blacklisted while scraping.

If you need some professional help with scraping complex websites, you can fill up the form below.

We can help with your data or automation needs

Turn the Internet into meaningful, structured and usable data


Please DO NOT contact us for any help with our Tutorials and Code using this form or by calling us, instead please add a comment to the bottom of the tutorial page for help

Disclaimer: Any code provided in our tutorials is for illustration and learning purposes only. We are not responsible for how it is used and assume no liability for any detrimental usage of the source code. The mere presence of this code on our site does not imply that we encourage scraping or scrape the websites referenced in the code and accompanying tutorial. The tutorials only help illustrate the technique of programming web scrapers for popular internet websites. We are not obligated to provide any support for the code, however, if you add your questions in the comments section, we may periodically address them.

Responses

thefieldservicesofamerica October 7, 2018

Would anyone be able to assist me? I am experiencing a ‘invalid syntax’ error. I have the script saved in the script folder in Python. Using Windows 10, Python 3.7.0

Reply

    ScrapeHero October 12, 2018

    Unfortunately, if you get such basic errors you will need to read up on python programming to be able to use this code.

    Reply

caepcomm October 17, 2018

You can just use Zillow APIs – they are free

Reply

Abhi reddy January 21, 2019

i tried this code … am not getting any error message but output is not get…
when am running in cmd Its showing only path..

C:\Users\Dell\Desktop\Python1>zillow.py 20005 newest

C:\Users\Dell\Desktop\Python1>

again am running:::

C:\Users\Dell\Desktop\Python1>zillow.py 20005 newest

C:\Users\Dell\Desktop\Python1>

Reply

    rijesh January 21, 2019

    Can you please run the script using the following command: python zillow.py 20005 newest . It works for me.

    Reply

Anand Mahajan February 11, 2019

And how legal is it to scrape from Zillow that sources data from other MLSs? Also to what scale can one do this? Nationwide?

Reply

David Corrales April 10, 2019

Will i be able to use this code to gather First and Last name as well as phone numbers and Emails?

Reply

    ScrapeHero April 11, 2019

    Sure David, you can do anything with code. It is provided for free so that you can modify it to achieve your objective.

    Reply

Dustin April 19, 2019

I am receiving this error:

File “C:\Users\smith\Zillow.py”, line 2, in
from lxml import html
File “C:\ProgramData\Anaconda3\lib\site-packages\lxml\html__init__.py”, line 54, in
from .. import etree
ImportError: DLL load failed: The specified module could not be found.

I installed the lxml. Any ideas on why this is happening?

Thanks!

Reply

    ScrapeHero April 19, 2019

    Sounds like an installation error – sorry we cant help with that.

    Reply

Brett Doyle April 21, 2019

Nice job putting this together.

I was able to modify your code to do a few more cool things. I used regex to extract the zillow property ID’s from the URL. Then I registered for a ZWSID Zillow API Key and used that in conjunction with the property ID to run API queries against properties. I am able to get things like the Zestimate and rent estimate.

Reply

    ScrapeHero April 21, 2019

    Brett,
    Glad to see you expanding and combining scraping and the API to create a hybrid solution.
    It would be great if you would publish the code (without you API key please) so the community can benefit.
    We will gladly link to it.

    Reply

Comments or Questions?

Turn the Internet into meaningful, structured and usable data   

Enjoying our Tutorials?

Subscribe to our weekly updates on the latest tutorials in Web Scraping and Data Extraction

ScrapeHero Logo

Can we help you get some data?