How to Scrape Zillow Real Estate Listings using Python and LXML

Web Scraping real estate data is a viable option to keep track of real estate listings available for sellers and agents. Being in possession of extracted real estate information from real estate sites such as Zillow.com can help adjust prices of listings on your site or help you create a database for your business. In this tutorial, we will scrape Zillow data using python, and show you how to extract real estate data. In particular, we will show you how to scrape real estate listings based on zip code.

Here are the steps to scrape Zillow

  1. Construct the URL of the search results page from Zillow. Example – https://www.zillow.com/homes/02126_rb/
  2. Download HTML of the search result page using Python Requests.
  3. Parse the page using LXML – LXML lets you navigate the HTML Tree Structure using Xpaths.
  4. Save the data to a CSV file.
Scrape Zillow using ScrapeHero Cloud at just $5!

No coding required and No setup required – Just provide URLs to start scraping!

Scrape Real Estate Listings in Zillow from ANY browser

Get started with scraping Zillow for the lowest price

Learn how to scrape real estate data using ScrapeHero Cloud

We will be extracting the following data from Zillow:

  1. Title
  2. Street Name
  3. City
  4. State
  5. Zip Code
  6. Price
  7. Facts and Features
  8. Real Estate Provider
  9. URL

Below is a screenshot of some of the data fields we will be extracting from Zillow

how-to-scrape-real-estate-data

Required Tools

Install Python 3 and Pip

Here is a guide to install Python 3 in Linux – http://docs.python-guide.org/en/latest/starting/install3/linux/

Mac Users can follow this guide – http://docs.python-guide.org/en/latest/starting/install3/osx/

Windows Users go here – https://www.scrapehero.com/how-to-install-python3-in-windows-10/

Packages

For this web scraping tutorial using Python 3, we will need some packages for downloading and parsing the HTML. Below are the package requirements:

The Code

We have to first construct the search result page URL. We’ll have to create this URL manually to scrape results from that page. For example, here is the one for Boston- https://www.zillow.com/homes/02126_rb/.

https://gist.github.com/scrapehero/5f51f344d68cf2c022eb2d23a2f1cf95

You can download the code from the link here https://gist.github.com/scrapehero/5f51f344d68cf2c022eb2d23a2f1cf95  if the embed does not work.

If you would like the code in Python 2.7 to scrape zillow listings, you can check out the link at https://gist.github.com/scrapehero/2dd61d0f1bd5222a4c9ae76465990cbd

Running the Zillow Scraper

Assume the script is named, zillow.py. When you type in the script name in a command prompt or terminal with a -h

usage: zillow.py [-h] zipcode sort

positional arguments:

  zipcode

  sort      
                available sort orders are :

                newest : Latest property details

                cheapest : Properties with cheapest price

optional arguments:

  -h, --help  show this help message and exit

You must run the zillow scraper using python with arguments for zip code and sort. The sort argument has the options ‘newest’ and ‘cheapest’ listings available. As an example, to find the listings of the newest properties up for sale in Boston, Massachusetts we would run the script as:

python3 zillow.py 02126 newest

This will create a CSV file called properties-02126.csv that will be in the same folder as the script. Here is some sample data extracted from Zillow.com for the command above.

scrape-real-estate-listings

You can download the code at https://gist.github.com/scrapehero/5f51f344d68cf2c022eb2d23a2f1cf95

Known Limitations

This Zillow scraper should be able to scrape real estate listings of most zip codes provided. To learn more on real estate data management you can go through this post – Real Estate and Quality Challenges

If you would like to scrape Zillow listings details of thousands of pages, you should read Scalable do-it-yourself scraping – How to build and run scrapers on a large scale and How to prevent getting blacklisted while scraping.


If you need some professional help with web scraping real estate data, you can fill-up the form below.

We can help with your data or automation needs

Turn the Internet into meaningful, structured and usable data



Please DO NOT contact us for any help with our Tutorials and Code using this form or by calling us, instead please add a comment to the bottom of the tutorial page for help

Disclaimer: Any code provided in our tutorials is for illustration and learning purposes only. We are not responsible for how it is used and assume no liability for any detrimental usage of the source code. The mere presence of this code on our site does not imply that we encourage scraping or scrape the websites referenced in the code and accompanying tutorial. The tutorials only help illustrate the technique of programming web scrapers for popular internet websites. We are not obligated to provide any support for the code, however, if you add your questions in the comments section, we may periodically address them.

Responses

Alex Clark October 6, 2017

Excellent tutorial! Is it possible to have the code search individual properties? For example if I had a list of 600 addresses in excel, could we perform the scrape in this article on each of these properties? Can it be modified to use the specific address as a dynamic user input into the website and then automatically return each property in order?

Thank you in advance for your time.


    ScrapeHero October 6, 2017

    Hi Alex,
    Thanks for your feedback.
    The tutorials provide a starting point for your own specific use cases.
    Yes you should be able to do what you need by using excel macros and building a a scraping api endpoint.
    Thanks


Howard October 21, 2017

Great script! And very easy to use. However, it only reaches the first page of the Zillow results — in other words, 25 houses. Is there a way to get it to pull all houses available in the zipcode searched?


    ScrapeHero October 21, 2017

    Hi Howard – that is an exercise for the reader – a key point of the tutorial. Have find coding !


      David July 20, 2018

      Hey Howard, you should look into selenium. That will enable you to create a script that will allow you to press the button that will go to the next page and allow you to scrape more data.


zj November 2, 2017

Nice article, it seems zillow will only show 500 (25/page * 20 pages) data, in order to fetch all datas in some super hot zipcodes. We might need to add filters to the search which I think price range (0-10k, 10k-20k) might be a useful attribute to narrow down the data size. The problem with price range is that it hugely depend on the area, by which I mean if some area only have houses price from 3m – 4m we will go through a bunch of unnecessary requests and this will increase the probability to get blocked. What do you think is the best attribute to limit results within 500?


Dev0 January 3, 2018

Ran the script, but it gets detected and Zillow is passing back a captcha now. “Please verify you’re a human to continue”


Rob – Land Buyers January 27, 2018

Nice scraping logic. A few tweaks and you can use the code on other real estate sites. Thanks!


Douglas Hitchcock March 13, 2018

If scraping Zillow page, does output CSV capture the property feature codes (eg a RESO compliant set of listing features and description)


pilistanpilistan April 17, 2018

just dont get it. why my csv was empty but fieldnames.


Cheng Peng April 25, 2018

I’m also getting empty csv


kurt hansen May 9, 2018

hi, same here, can’t get any csv output beyond the defined headers!

what to do?

thanks,
kurt


    kurt hansen May 9, 2018

    also, running on win 7, python 3.6.4


      scrapehero August 26, 2018

      You might most probably be getting blocked by Zillow. Can you please check if ZipCode you searched for has results in Zillow ?
      We just tested the script again and it worked for 20005.

      python3 zillow.py 20005 newest


thefieldservicesofamerica October 7, 2018

Would anyone be able to assist me? I am experiencing a ‘invalid syntax’ error. I have the script saved in the script folder in Python. Using Windows 10, Python 3.7.0


    ScrapeHero October 12, 2018

    Unfortunately, if you get such basic errors you will need to read up on python programming to be able to use this code.


caepcomm October 17, 2018

You can just use Zillow APIs – they are free


Abhi reddy January 21, 2019

i tried this code … am not getting any error message but output is not get…
when am running in cmd Its showing only path..

C:\Users\Dell\Desktop\Python1>zillow.py 20005 newest

C:\Users\Dell\Desktop\Python1>

again am running:::

C:\Users\Dell\Desktop\Python1>zillow.py 20005 newest

C:\Users\Dell\Desktop\Python1>


    rijesh January 21, 2019

    Can you please run the script using the following command: python zillow.py 20005 newest . It works for me.


Anand Mahajan February 11, 2019

And how legal is it to scrape from Zillow that sources data from other MLSs? Also to what scale can one do this? Nationwide?


David Corrales April 10, 2019

Will i be able to use this code to gather First and Last name as well as phone numbers and Emails?


    ScrapeHero April 11, 2019

    Sure David, you can do anything with code. It is provided for free so that you can modify it to achieve your objective.


Dustin April 19, 2019

I am receiving this error:

File “C:\Users\smith\Zillow.py”, line 2, in
from lxml import html
File “C:\ProgramData\Anaconda3\lib\site-packages\lxml\html__init__.py”, line 54, in
from .. import etree
ImportError: DLL load failed: The specified module could not be found.

I installed the lxml. Any ideas on why this is happening?

Thanks!


    ScrapeHero April 19, 2019

    Sounds like an installation error – sorry we cant help with that.


Brett Doyle April 21, 2019

Nice job putting this together.

I was able to modify your code to do a few more cool things. I used regex to extract the zillow property ID’s from the URL. Then I registered for a ZWSID Zillow API Key and used that in conjunction with the property ID to run API queries against properties. I am able to get things like the Zestimate and rent estimate.


    ScrapeHero April 21, 2019

    Brett,
    Glad to see you expanding and combining scraping and the API to create a hybrid solution.
    It would be great if you would publish the code (without you API key please) so the community can benefit.
    We will gladly link to it.


    Joshua Lutkemuller February 29, 2020

    Could you link the code


Viswesh May 12, 2019

I’m running the program, but the csv file returns blank and the command line indicates that only 200 requests were made. Do you know of a solution to this issue?

Thanks,
Vish


    Jay Lopez May 14, 2019

    I’m getting the exact same


      rijesh May 15, 2019

      We have tried few zipcodes and it works fine. Could Please share the zipcode you have tried?. We will look into this issue.


        rijesh May 16, 2019

        Looks like zillow.com is performing A/B tests on their site. We’ve updated the code accordingly.


Benyamin ma June 7, 2019

excellent code …

i got some issue
AFTER RUNNING MULTIPLE TIMES “python zillow.py 20850 newest”

first try :

python zillow.py 20850 newest
Fetching data for 20850
https://www.zillow.com/homes/for_sale/20850/0_singlestory/days_sort
status code received: 200
Traceback (most recent call last):
File “zillow.py”, line 187, in
scraped_data = parse(zipcode, sort)
File “zillow.py”, line 118, in parse
response = get_response(url)
File “zillow.py”, line 69, in get_response
save_to_file(response)
File “zillow.py”, line 44, in save_to_file
fp.write(response.text)
File “C:\Program Files\Python37\lib\encodings\cp1252.py”, line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: ‘charmap’ codec can’t encode character ‘\u0100’ in position 7647: character maps to

second try :

Fetching data for 20850
https://www.zillow.com/homes/for_sale/20850/0_singlestory/days_sort
status code received: 200
parsing from html page
Writing data to output file

and it happens randomly !!!

i know its because of coding and encoding but i dono how to solve it…
and where is the problem…


John Dolsen June 24, 2019

Is it normal to get sent to a Captcha when running this code?


    ScrapeHero July 19, 2019

    Yes John – that is likely


Matthew Hom July 20, 2019

Anyone else having this issue when running the code?

status code received: 200

Traceback (most recent call last):
File “C:/Users/matto/PycharmProjects/Real_Estate_Scraping/zillow.py”, line 185, in
scraped_data = parse(zipcode, sort)
File “C:/Users/matto/PycharmProjects/Real_Estate_Scraping/zillow.py”, line 129, in parse
return get_data_from_json(raw_json_data)
File “C:/Users/matto/PycharmProjects/Real_Estate_Scraping/zillow.py”, line 74, in get_data_from_json
cleaned_data = clean(raw_json_data).replace(‘“, “”)
AttributeError: ‘NoneType’ object has no attribute ‘replace’


    davidmakovoz July 30, 2019

    I just tried this script. It looks like zillow implemented a Captcha to prevent automated harvesting of their data. Here is a snippet from the response I got:

    response = get_response(url)

    ….function handleCaptcha(response)….


    Chris July 31, 2019

    Yes, I am receiving the same error message. It appears to stem from the variable “raw_json_data” being empty. Maybe a problem with the parser.xpath() call?


Erin October 2, 2019

I ended up installing tesseract to handle Captcha’s and reran zillow.py. Still no luck


John March 26, 2020

Did anyone figure out how to do this?


diytechy May 9, 2020

Follow the advice of “Xiyu-1 commented on Mar 7” from the git site “https://gist.github.com/scrapehero/5f51f344d68cf2c022eb2d23a2f1cf95”

Here Xiyu describes how the script needs to be modified to return the results and complete creation of the csv file.

Cheers,

-diytechy


Comments are closed.

Turn the Internet into meaningful, structured and usable data   

ScrapeHero Logo

Can we help you get some data?