Web Scraping is a viable option to keep track of real estate listings available for sellers and agents. Being in possession of extracted information from real estate sites such as Zillow.com can help adjust prices of listings on your site or help you create a database for your business.
In this tutorial, we will scrape Zillow.com, an online real estate database to extract real estate listings available. This real estate scraper will extract details of property listings based on zip code.
Here are the following details we will be extracting:
- Street Name
- Zip Code
- Facts and Features
- Real Estate Provider
Below is a screenshot of some of the data fields we will be extracting
- Construct the URL of the search results page from Zillow. For example, here is the one for Boston- https://www.zillow.com/homes/02126_rb/. We’ll have to create this URL manually to scrape results from that page.
- Download HTML of the search result page using Python Requests – Quite easy, once you have the URL. We use python requests to download the entire HTML of this page.
- Parse the page using LXML – LXML lets you navigate the HTML Tree Structure using Xpaths. We have predefined the XPaths for the details we need in the code.
- Save the data to a CSV file.
Install Python 3 and Pip
Here is a guide to install Python 3 in Linux – http://docs.python-guide.org/en/latest/starting/install3/linux/
Mac Users can follow this guide – http://docs.python-guide.org/en/latest/starting/install3/osx/
Windows Users go here – https://www.scrapehero.com/how-to-install-python3-in-windows-10/
For this web scraping tutorial using Python 3, we will need some packages for downloading and parsing the HTML. Below are the package requirements:
- PIP to install the following packages in Python (https://pip.pypa.io/en/stable/installing/ )
- Python Requests, to make requests and download the HTML content of the pages ( http://docs.python-requests.org/en/master/user/install/).
- Python LXML, for parsing the HTML Tree Structure using Xpaths ( Learn how to install that here – http://lxml.de/installation.html )
You can download the code from the link here https://gist.github.com/scrapehero/5f51f344d68cf2c022eb2d23a2f1cf95 if the embed does not work.
If you would like the code in Python 2.7, you can check out the link at https://gist.github.com/scrapehero/2dd61d0f1bd5222a4c9ae76465990cbd
Running the Scraper
Assume the script is named, zillow.py. When you type in the script name in a command prompt or terminal with a -h
usage: zillow.py [-h] zipcode sort
available sort orders are :
newest : Latest property details
cheapest : Properties with cheapest price
-h, --help show this help message and exit
python3 zillow.py 02126 newest
You can download the code at https://gist.github.com/scrapehero/5f51f344d68cf2c022eb2d23a2f1cf95
This script should be able to scrape real estate listings of most zipcodes provided. If you would like to scrape the details of thousands of pages, you should read Scalable do-it-yourself scraping – How to build and run scrapers on a large scale and How to prevent getting blacklisted while scraping.
If you need some professional help with scraping complex websites, you can fill up the form below.
You can also get data delivered to you, as a Service from us. Interested?
Turn websites into meaningful and structured data through our web data extraction service