How to Scrape Zillow: Code and No-Code Methods

Share:

Scrape Zillow

Scraping Zillow is challenging due to aggressive anti-bot measures like IP bans, CAPTCHAs, and frequent layout updates. To scrape Zillow successfully, you must mimic human behavior and avoid detection. Rather than manual fixes, a robust scraper is essential for reliable data collection.

This guide provides two methods for Zillow property data scraping:

  1. Using Python and Playwright
  2. Use ScrapeHero Cloud’s Zillow Scraper, a no-code solution

Scraping Zillow Using Python and Playwright

To scrape Zillow data using Python and Playwright, you need:

  1. Decide on the data you need to scrape
  2. Set up the environment
  3. Write the code

Data scraped from Zillow

The Zillow scraper collects six key data points from each Zillow property card: 

  • Price
  • Beds
  • Bath
  • Sqft
  • Address  
  • Property URL  

The code locates these fields using Zillow’s internal data-test attributes, visible text labels (e.g., ‘bd’, ‘ba’, ‘sqft’), and HTML tags like <address>

Setting Up The Environment to Scrape Zillow

Before running the script, ensure you have the necessary libraries installed. 

This Zillow data scraping project requires the Playwright package. Install it using the following command:

# install the playwright library
pip install playwright
# install the playwright browser
playwright install

Writing the Code to Scrape Zillow

lowchart showing the steps the Zillow scraper takes

Start with the imports.

import asyncio
import json
import logging
from pathlib import Path
from typing import Any
from playwright.async_api import Playwright, TimeoutError as PlaywrightTimeoutError
from playwright.async_api import async_playwright

These imports provide the async event loop, JSON writing, log messages, file path handling, and Playwright’s async browser API. PlaywrightTimeoutError is used later when a pop-up or listing card does not appear in time.

Next, set the main configuration values.

ZIPCODE = "90006"
OUTPUT_FILE = Path(__file__).with_name("zillow_data_1.json")
USER_DATA_DIR = Path(__file__).with_name("user_data")
USER_AGENT = (
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
    "AppleWebKit/537.36 (KHTML, like Gecko) "
    "Chrome/120.0.0.0 Safari/537.36"
)

ZIPCODE controls the Zillow search. OUTPUT_FILE saves the JSON beside the Python file, and USER_DATA_DIR stores the persistent browser profile. The user agent makes the browser identify itself as a regular Chrome browser on Windows.

With the configuration set, the next step is to create a utility for saving the scraped data to a JSON file.

def save_as_json(data: list[dict[str, Any]]) -&gt; None:
    """Save scraped listing data to a JSON file."""
    with OUTPUT_FILE.open("w", encoding="utf-8") as file:
        json.dump(data, file, indent=4, ensure_ascii=False)

This function receives the scraped listings and writes them to zillow_data_1.json. indent=4 makes the output readable, and ensure_ascii=False keeps normal Unicode text instead of escaping it.

Use a helper for optional text fields.

async def text_or_none(element, selector: str) -&gt; str | None:
    locator = element.locator(selector).first
    if await locator.count():
        text = await locator.inner_text()
        return text.strip() or None
    return None

This prevents the scraper from crashing when a listing is missing a field. It checks whether the selector exists, reads the first match, strips extra whitespace, and returns None when nothing is found.

Next, use another helper function for optional attributes.

async def attr_or_none(element, selector: str, attribute: str) -&gt; str | None:
    locator = element.locator(selector).first
    if await locator.count():
        value = await locator.get_attribute(attribute)
        return value.strip() if value else None
    return None

This is used for values stored inside HTML attributes instead of visible text. In this script, it reads the href attribute from a listing link.

With helpers ready, extract the visible listing details.

async def extract_data(listing) -&gt; dict[str, str | None]:
    """Extract data from one Zillow listing card."""
    price = await text_or_none(listing, "[data-test='property-card-price']")
    beds = await text_or_none(listing, "ul li:has-text('bd'), ul li:has-text('bds')")
    bath = await text_or_none(listing, "ul li:has-text('ba')")
    sqft = await text_or_none(listing, "ul li:has-text('sqft')")
    addr = await text_or_none(listing, "address")
    link = await attr_or_none(listing, "a[href*='/homedetails/']", "href")

Each line targets one part of a Zillow property card. Price uses Zillow’s data-test attribute, beds/baths/square feet are found by their visible labels, the address comes from the address tag, and the listing link comes from an anchor pointing to /homedetails/.

Then, normalize the listing URL and return the final record.

  if link and link.startswith("/"):
        link = f"https://www.zillow.com{link}"
    return {
        "price": price,
        "beds": beds,
        "bath": bath,
        "sqft": sqft,
        "addr": addr,
        "url": link,
    }

If Zillow returns a relative URL, such as /homedetails/…, this logic converts it to an absolute URL. 

The function then returns a dictionary containing the property’s scraped data.

Next, handle the optional pop-up separately.

async def close_optional_popup(page) -&gt; None:
    popup = page.get_by_role("button", name="Skip this question")
    try:
        await popup.click(timeout=5_000)
    except PlaywrightTimeoutError:
        return

If Zillow shows the “Skip this question” button, the scraper clicks it. If the pop-up never appears within five seconds, the timeout is ignored, and scraping continues.

With the utility functions established, the next step is to define the run() function, which orchestrates the scraping workflow. 

Inside the function, initialize a persistent user data directory.

USER_DATA_DIR.mkdir(exist_ok=True)

This creates the user_data folder if it does not already exist. Because the browser is launched with a persistent context, cookies and session data can be reused across runs.

With the directory initialized, launch Edge through Playwright with a persistent Chromium context.

context = await playwright.chromium.launch_persistent_context(

    user_data_dir=str(USER_DATA_DIR),
    headless=False,
    user_agent=USER_AGENT,
    viewport={"width": 1920, "height": 1080},
    java_script_enabled=True,
    locale="en-US",
    timezone_id="America/New_York",
    permissions=["geolocation"],
    bypass_csp=True,
    ignore_https_errors=True,
    channel="msedge",
    args=[
        "--disable-blink-features=AutomationControlled",
        "--disable-automation",
        "--disable-infobars",
        "--disable-dev-shm-usage",
        "--no-sandbox",
        "--disable-gpu",
        "--disable-setuid-sandbox",
    ],
)

This opens a visible Edge browser using the user_data profile. The viewport, locale, timezone, and launch arguments make the session closer to a normal desktop browsing session.

Next, open a new tab and set request headers.

page = await context.new_page()

await page.set_extra_http_headers(
    {
        "Accept-Language": "en-US,en;q=0.9",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
        "Accept-Encoding": "gzip, deflate, br",
        "Connection": "keep-alive",
    }
)

The new page is where Zillow is loaded. The headers tell the site that the browser prefers US English and accepts normal HTML/image responses.

Now you can navigate to Zillow using the goto() method.

await page.goto("https://www.zillow.com/",wait_until="load",timeout=60_000)

This opens Zillow and waits for the full page load event. The timeout allows up to 60 seconds before Playwright raises an error.

Next, search the ZIP code.

search_box = page.get_by_placeholder("Enter an address, neighborhood, city, or ZIP code")
await search_box.fill(ZIPCODE)
await search_box.press("Enter")

The search input is located by its placeholder text, filled with 90006, and submitted with the Enter key.

Clear the pop-up only when it appears.

await close_optional_popup(page)

This code calls the pop-up helper from earlier. It keeps the main scraping flow clean and avoids stopping the script when Zillow does not show a pop-up.

Wait for property cards, with a no-results fallback.

no_results = page.get_by_text("No matching results")
try:
    await page.wait_for_selector("[data-test='property-card']", timeout=60_000)
except PlaywrightTimeoutError:
    if await no_results.count():
        logging.warning("No results for zipcode: %s", ZIPCODE)
        return
    raise

The scraper waits until at least one property card appears. If that does not happen, it checks whether Zillow displayed “No matching results”; if not, the original timeout error is raised.

Then, log the total result count.

total_results = page.locator(".result-count").first
if await total_results.count():
    logging.warning(
        "Total results found - %s for zipcode - %s",
        (await total_results.inner_text()).strip(),
        ZIPCODE,
    )

The above code reads Zillow’s result count label if the page includes it. The log message is useful for confirming that the ZIP search landed on the expected results page.

You can now collect listing cards from the current page.

await page.wait_for_load_state("domcontentloaded")
listings = page.locator("[data-test='property-card']")
listing_count = await listings.count()

The script waits for the DOM to be ready, selects all visible property cards, and counts them. There is no pagination here, so only the current page is scraped.

Loop through the extracted cards and extract each record.

for index in range(listing_count):
    listing = listings.nth(index)
    try:
        await listing.scroll_into_view_if_needed(timeout=10_000)
        data.append(await extract_data(listing))
    except PlaywrightTimeoutError:
        logging.warning("Timed out extracting listing %s", index)

Each card is scrolled into view before extraction, which helps with lazy-loaded content. If one listing times out, the script logs that index and continues with the remaining cards.

Finally, save the data list.

save_as_json(data)
logging.warning("Saved %s listings to %s", len(data), OUTPUT_FILE)

The collected dictionaries are written to the JSON file, and the log prints how many listings were saved.

Always close the browser context.

finally:
    await context.close()

This cleanup runs even if navigation or extraction fails. Closing the context shuts down the browser session cleanly.

The next step is to define the script’s main asynchronous entry point.

async def main() -&gt; None:
    logging.basicConfig(level=logging.WARNING, format="%(levelname)s: %(message)s")
    async with async_playwright() as playwright:
        await run(playwright)

This configures logging, starts Playwright, and passes the Playwright object into run().

Finally, run the script directly using asyncio.run().

if __name__ == "__main__":
    asyncio.run(main())

This code starts the async program only when zillow_scraper.py is executed as a script. It will not run automatically if the file is imported somewhere else.

Here’s the full code:

import asyncio
import json
import logging
from pathlib import Path
from typing import Any

from playwright.async_api import Playwright, TimeoutError as PlaywrightTimeoutError
from playwright.async_api import async_playwright


ZIPCODE = "90006"
OUTPUT_FILE = Path(__file__).with_name("zillow_data_1.json")
USER_DATA_DIR = Path(__file__).with_name("user_data")
USER_AGENT = (
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
    "AppleWebKit/537.36 (KHTML, like Gecko) "
    "Chrome/120.0.0.0 Safari/537.36"
)

def save_as_json(data: list[dict[str, Any]]) -&gt; None:
    """Save scraped listing data to a JSON file."""
    with OUTPUT_FILE.open("w", encoding="utf-8") as file:
        json.dump(data, file, indent=4, ensure_ascii=False)

async def text_or_none(element, selector: str) -&gt; str | None:
    locator = element.locator(selector).first
    if await locator.count():
        text = await locator.inner_text()
        return text.strip() or None
    return None

async def attr_or_none(element, selector: str, attribute: str) -&gt; str | None:
    locator = element.locator(selector).first
    if await locator.count():
        value = await locator.get_attribute(attribute)
        return value.strip() if value else None
    return None

async def extract_data(listing) -&gt; dict[str, str | None]:
    """Extract data from one Zillow listing card."""
    price = await text_or_none(listing, "[data-test='property-card-price']")
    beds = await text_or_none(listing, "ul li:has-text('bd'), ul li:has-text('bds')")
    bath = await text_or_none(listing, "ul li:has-text('ba')")
    sqft = await text_or_none(listing, "ul li:has-text('sqft')")
    addr = await text_or_none(listing, "address")
    link = await attr_or_none(listing, "a[href*='/homedetails/']", "href")

    if link and link.startswith("/"):
        link = f"https://www.zillow.com{link}"

    return {
        "price": price,
        "beds": beds,
        "bath": bath,
        "sqft": sqft,
        "addr": addr,
        "url": link,
    }

async def close_optional_popup(page) -&gt; None:
    popup = page.get_by_role("button", name="Skip this question")
    try:
        await popup.click(timeout=5_000)
    except PlaywrightTimeoutError:
        return

async def run(playwright: Playwright) -&gt; None:
    """Open Zillow, search by ZIP code, and save listing data."""
    USER_DATA_DIR.mkdir(exist_ok=True)

    context = await playwright.chromium.launch_persistent_context(
        user_data_dir=str(USER_DATA_DIR),
        headless=False,
        user_agent=USER_AGENT,
        viewport={"width": 1920, "height": 1080},
        java_script_enabled=True,
        locale="en-US",
        timezone_id="America/New_York",
        permissions=["geolocation"],
        bypass_csp=True,
        ignore_https_errors=True,
        channel="msedge",
        args=[
            "--disable-blink-features=AutomationControlled",
            "--disable-automation",
            "--disable-infobars",
            "--disable-dev-shm-usage",
            "--no-sandbox",
            "--disable-gpu",
            "--disable-setuid-sandbox",
        ],
    )

    page = await context.new_page()
    await page.set_extra_http_headers(
        {
            "Accept-Language": "en-US,en;q=0.9",
            "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
            "Accept-Encoding": "gzip, deflate, br",
            "Connection": "keep-alive",
        }
    )

    data: list[dict[str, str | None]] = []
    try:
        await page.goto("https://www.zillow.com/",wait_until="load", timeout=60_000)
        search_box = page.get_by_placeholder("Enter an address, neighborhood, city, or ZIP code")
        await search_box.fill(ZIPCODE)
        await search_box.press("Enter")
        await close_optional_popup(page)

        no_results = page.get_by_text("No matching results")
        try:
            await page.wait_for_selector("[data-test='property-card']", timeout=60_000)
        except PlaywrightTimeoutError:
            if await no_results.count():
                logging.warning("No results for zipcode: %s", ZIPCODE)
                return
            raise

        total_results = page.locator(".result-count").first
        if await total_results.count():
            logging.warning(
                "Total results found - %s for zipcode - %s",
                (await total_results.inner_text()).strip(),
                ZIPCODE,
            )

        await page.wait_for_load_state("domcontentloaded")
        listings = page.locator("[data-test='property-card']")
        listing_count = await listings.count()

        for index in range(listing_count):
            listing = listings.nth(index)
            try:
                await listing.scroll_into_view_if_needed(timeout=10_000)
                data.append(await extract_data(listing))
            except PlaywrightTimeoutError:
                logging.warning("Timed out extracting listing %s", index)

        save_as_json(data)
        logging.warning("Saved %s listings to %s", len(data), OUTPUT_FILE)
    finally:
        await context.close()

async def main() -&gt; None:
    logging.basicConfig(level=logging.WARNING, format="%(levelname)s: %(message)s")
    async with async_playwright() as playwright:
        await run(playwright)

if __name__ == "__main__":
    asyncio.run(main())

Code Limitations

Keep in mind a few limitations with this property listing scraper. 

  • Right now, the script can only grab the listings visible on the initial results page, so it doesn’t handle pagination or infinite scrolling. 
  • Also, since it relies on very specific Zillow selectors, like those ‘data-test’ attributes, it could easily break if Zillow changes how its site is structured.
  • It’s also worth noting that the script currently runs with a single user context, so if that context gets flagged or blocked, the whole scrape fails. 
  • The script doesn’t implement any advanced anti-scraping measures—things like CAPTCHA solving, residential proxy rotation, or request throttling.

Scraping Zillow Using the No-Code Scraper by ScrapeHero Cloud

ScrapeHero Cloud’s Zillow data scraping tool allows users to pull data quickly without writing any code. It provides an easy, no-code method for scraping data, making it accessible for individuals with limited technical skills.

This section will guide you through the steps to set up and use the Zillow scraper.

1. Sign up or log in to your ScrapeHero Cloud account.

2. Go to the Zillow Scraper by ScrapeHero Cloud.

Selecting the Zillow scraper on ScrapeHero Cloud 3. Click the Create New Project button.

Creating a New Project4. To scrape the details, you need to provide the Zillow search results URL for a specific search query.

Searching a place on Zillow.com a. You can get the URL from the Zillow search results page.

Copying the URL of the search results page of Zillow listings5. In the field provided, enter a project name, Zillow URL, and the maximum number of records you want to gather. Then, click the Gather Data button to start the scraper.

Adding project details and gathering data 6. The scraper will start fetching data for your queries, and you can track its progress under the Projects tab.

ScrapeHero Cloud page showing scraper progress 7. Once it is finished, you can view the data by clicking on the project name. A new page will appear, and under the Overview tab, you can see and download the data.

Overview page of a project on ScrapeHero Cloud 8. You can also pull Zillow data into a spreadsheet from here. Just click on Download Data, select Excel, and open the downloaded file using Microsoft Excel.

Downloading the scraped data from the overview page of ScrapeHero Cloud

Wrapping Up: Why You Need a Web Scraping Service

While this DIY script to scrape Zillow provides a functional foundation for small-scale data extraction, its reliance on volatile CSS selectors and lack of advanced features like pagination and automatic proxy rotation make it difficult to maintain at scale. 

As Zillow continues to update its site architecture and anti-scraping measures, manual scripts often require constant troubleshooting and updates. 

For real estate teams that need Zillow data at scale and can’t afford downtime, a managed web scraping service like ScrapeHero is the faster, lower-risk path to production-ready data. 

Table of contents

Scrape any website, any format, no sweat.

ScrapeHero is the real deal for enterprise-grade scraping.

Clients love ScrapeHero on G2

Ready to turn the internet into meaningful and usable data?

Contact us to schedule a brief, introductory call with our experts and learn how we can assist your needs.

Continue Reading

Detect stockouts on competitor listings

Gain the Competitive Edge: Detect Stock Outs on Competitor Listings

Use Python web scraping tools to detect real-time stockouts on competitor sites.
Scraping vs native APIs

Best for Pricing Intelligence: Scraping vs. Native APIs

Compare native APIs and web scraping for 2026 pricing intelligence strategies.
Amazon Buy Box monitoring

Amazon Buy Box Monitoring: How to Stop Sales Drops

Learn to build a Python scraper for real-time Amazon Buy Box monitoring today.
ScrapeHero Logo

Can we help you get some data?