Scraping Zillow is challenging due to aggressive anti-bot measures like IP bans, CAPTCHAs, and frequent layout updates. To scrape Zillow successfully, you must mimic human behavior and avoid detection. Rather than manual fixes, a robust scraper is essential for reliable data collection.
This guide provides two methods for Zillow property data scraping:
Scraping Zillow Using Python and Playwright
To scrape Zillow data using Python and Playwright, you need:
- Decide on the data you need to scrape
- Set up the environment
- Write the code
Data scraped from Zillow
The Zillow scraper collects six key data points from each Zillow property card:
- Price
- Beds
- Bath
- Sqft
- Address
- Property URL
The code locates these fields using Zillow’s internal data-test attributes, visible text labels (e.g., ‘bd’, ‘ba’, ‘sqft’), and HTML tags like <address>
Setting Up The Environment to Scrape Zillow
Before running the script, ensure you have the necessary libraries installed.
This Zillow data scraping project requires the Playwright package. Install it using the following command:
# install the playwright library
pip install playwright
# install the playwright browser
playwright install
Writing the Code to Scrape Zillow

Start with the imports.
import asyncio
import json
import logging
from pathlib import Path
from typing import Any
from playwright.async_api import Playwright, TimeoutError as PlaywrightTimeoutError
from playwright.async_api import async_playwright
These imports provide the async event loop, JSON writing, log messages, file path handling, and Playwright’s async browser API. PlaywrightTimeoutError is used later when a pop-up or listing card does not appear in time.
Next, set the main configuration values.
ZIPCODE = "90006"
OUTPUT_FILE = Path(__file__).with_name("zillow_data_1.json")
USER_DATA_DIR = Path(__file__).with_name("user_data")
USER_AGENT = (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/120.0.0.0 Safari/537.36"
)
ZIPCODE controls the Zillow search. OUTPUT_FILE saves the JSON beside the Python file, and USER_DATA_DIR stores the persistent browser profile. The user agent makes the browser identify itself as a regular Chrome browser on Windows.
With the configuration set, the next step is to create a utility for saving the scraped data to a JSON file.
def save_as_json(data: list[dict[str, Any]]) -> None:
"""Save scraped listing data to a JSON file."""
with OUTPUT_FILE.open("w", encoding="utf-8") as file:
json.dump(data, file, indent=4, ensure_ascii=False)
This function receives the scraped listings and writes them to zillow_data_1.json. indent=4 makes the output readable, and ensure_ascii=False keeps normal Unicode text instead of escaping it.
Use a helper for optional text fields.
async def text_or_none(element, selector: str) -> str | None:
locator = element.locator(selector).first
if await locator.count():
text = await locator.inner_text()
return text.strip() or None
return None
This prevents the scraper from crashing when a listing is missing a field. It checks whether the selector exists, reads the first match, strips extra whitespace, and returns None when nothing is found.
Next, use another helper function for optional attributes.
async def attr_or_none(element, selector: str, attribute: str) -> str | None:
locator = element.locator(selector).first
if await locator.count():
value = await locator.get_attribute(attribute)
return value.strip() if value else None
return None
This is used for values stored inside HTML attributes instead of visible text. In this script, it reads the href attribute from a listing link.
With helpers ready, extract the visible listing details.
async def extract_data(listing) -> dict[str, str | None]:
"""Extract data from one Zillow listing card."""
price = await text_or_none(listing, "[data-test='property-card-price']")
beds = await text_or_none(listing, "ul li:has-text('bd'), ul li:has-text('bds')")
bath = await text_or_none(listing, "ul li:has-text('ba')")
sqft = await text_or_none(listing, "ul li:has-text('sqft')")
addr = await text_or_none(listing, "address")
link = await attr_or_none(listing, "a[href*='/homedetails/']", "href")
Each line targets one part of a Zillow property card. Price uses Zillow’s data-test attribute, beds/baths/square feet are found by their visible labels, the address comes from the address tag, and the listing link comes from an anchor pointing to /homedetails/.
Then, normalize the listing URL and return the final record.
if link and link.startswith("/"):
link = f"https://www.zillow.com{link}"
return {
"price": price,
"beds": beds,
"bath": bath,
"sqft": sqft,
"addr": addr,
"url": link,
}
If Zillow returns a relative URL, such as /homedetails/…, this logic converts it to an absolute URL.
The function then returns a dictionary containing the property’s scraped data.
Next, handle the optional pop-up separately.
async def close_optional_popup(page) -> None:
popup = page.get_by_role("button", name="Skip this question")
try:
await popup.click(timeout=5_000)
except PlaywrightTimeoutError:
return
If Zillow shows the “Skip this question” button, the scraper clicks it. If the pop-up never appears within five seconds, the timeout is ignored, and scraping continues.
With the utility functions established, the next step is to define the run() function, which orchestrates the scraping workflow.
Inside the function, initialize a persistent user data directory.
USER_DATA_DIR.mkdir(exist_ok=True)
This creates the user_data folder if it does not already exist. Because the browser is launched with a persistent context, cookies and session data can be reused across runs.
With the directory initialized, launch Edge through Playwright with a persistent Chromium context.
context = await playwright.chromium.launch_persistent_context(
user_data_dir=str(USER_DATA_DIR),
headless=False,
user_agent=USER_AGENT,
viewport={"width": 1920, "height": 1080},
java_script_enabled=True,
locale="en-US",
timezone_id="America/New_York",
permissions=["geolocation"],
bypass_csp=True,
ignore_https_errors=True,
channel="msedge",
args=[
"--disable-blink-features=AutomationControlled",
"--disable-automation",
"--disable-infobars",
"--disable-dev-shm-usage",
"--no-sandbox",
"--disable-gpu",
"--disable-setuid-sandbox",
],
)
This opens a visible Edge browser using the user_data profile. The viewport, locale, timezone, and launch arguments make the session closer to a normal desktop browsing session.
Next, open a new tab and set request headers.
page = await context.new_page()
await page.set_extra_http_headers(
{
"Accept-Language": "en-US,en;q=0.9",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"Accept-Encoding": "gzip, deflate, br",
"Connection": "keep-alive",
}
)
The new page is where Zillow is loaded. The headers tell the site that the browser prefers US English and accepts normal HTML/image responses.
Now you can navigate to Zillow using the goto() method.
await page.goto("https://www.zillow.com/",wait_until="load",timeout=60_000)
This opens Zillow and waits for the full page load event. The timeout allows up to 60 seconds before Playwright raises an error.
Next, search the ZIP code.
search_box = page.get_by_placeholder("Enter an address, neighborhood, city, or ZIP code")
await search_box.fill(ZIPCODE)
await search_box.press("Enter")
The search input is located by its placeholder text, filled with 90006, and submitted with the Enter key.
Clear the pop-up only when it appears.
await close_optional_popup(page)
This code calls the pop-up helper from earlier. It keeps the main scraping flow clean and avoids stopping the script when Zillow does not show a pop-up.
Wait for property cards, with a no-results fallback.
no_results = page.get_by_text("No matching results")
try:
await page.wait_for_selector("[data-test='property-card']", timeout=60_000)
except PlaywrightTimeoutError:
if await no_results.count():
logging.warning("No results for zipcode: %s", ZIPCODE)
return
raise
The scraper waits until at least one property card appears. If that does not happen, it checks whether Zillow displayed “No matching results”; if not, the original timeout error is raised.
Then, log the total result count.
total_results = page.locator(".result-count").first
if await total_results.count():
logging.warning(
"Total results found - %s for zipcode - %s",
(await total_results.inner_text()).strip(),
ZIPCODE,
)
The above code reads Zillow’s result count label if the page includes it. The log message is useful for confirming that the ZIP search landed on the expected results page.
You can now collect listing cards from the current page.
await page.wait_for_load_state("domcontentloaded")
listings = page.locator("[data-test='property-card']")
listing_count = await listings.count()
The script waits for the DOM to be ready, selects all visible property cards, and counts them. There is no pagination here, so only the current page is scraped.
Loop through the extracted cards and extract each record.
for index in range(listing_count):
listing = listings.nth(index)
try:
await listing.scroll_into_view_if_needed(timeout=10_000)
data.append(await extract_data(listing))
except PlaywrightTimeoutError:
logging.warning("Timed out extracting listing %s", index)
Each card is scrolled into view before extraction, which helps with lazy-loaded content. If one listing times out, the script logs that index and continues with the remaining cards.
Finally, save the data list.
save_as_json(data)
logging.warning("Saved %s listings to %s", len(data), OUTPUT_FILE)
The collected dictionaries are written to the JSON file, and the log prints how many listings were saved.
Always close the browser context.
finally:
await context.close()
This cleanup runs even if navigation or extraction fails. Closing the context shuts down the browser session cleanly.
The next step is to define the script’s main asynchronous entry point.
async def main() -> None:
logging.basicConfig(level=logging.WARNING, format="%(levelname)s: %(message)s")
async with async_playwright() as playwright:
await run(playwright)
This configures logging, starts Playwright, and passes the Playwright object into run().
Finally, run the script directly using asyncio.run().
if __name__ == "__main__":
asyncio.run(main())
This code starts the async program only when zillow_scraper.py is executed as a script. It will not run automatically if the file is imported somewhere else.
Here’s the full code:
import asyncio
import json
import logging
from pathlib import Path
from typing import Any
from playwright.async_api import Playwright, TimeoutError as PlaywrightTimeoutError
from playwright.async_api import async_playwright
ZIPCODE = "90006"
OUTPUT_FILE = Path(__file__).with_name("zillow_data_1.json")
USER_DATA_DIR = Path(__file__).with_name("user_data")
USER_AGENT = (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/120.0.0.0 Safari/537.36"
)
def save_as_json(data: list[dict[str, Any]]) -> None:
"""Save scraped listing data to a JSON file."""
with OUTPUT_FILE.open("w", encoding="utf-8") as file:
json.dump(data, file, indent=4, ensure_ascii=False)
async def text_or_none(element, selector: str) -> str | None:
locator = element.locator(selector).first
if await locator.count():
text = await locator.inner_text()
return text.strip() or None
return None
async def attr_or_none(element, selector: str, attribute: str) -> str | None:
locator = element.locator(selector).first
if await locator.count():
value = await locator.get_attribute(attribute)
return value.strip() if value else None
return None
async def extract_data(listing) -> dict[str, str | None]:
"""Extract data from one Zillow listing card."""
price = await text_or_none(listing, "[data-test='property-card-price']")
beds = await text_or_none(listing, "ul li:has-text('bd'), ul li:has-text('bds')")
bath = await text_or_none(listing, "ul li:has-text('ba')")
sqft = await text_or_none(listing, "ul li:has-text('sqft')")
addr = await text_or_none(listing, "address")
link = await attr_or_none(listing, "a[href*='/homedetails/']", "href")
if link and link.startswith("/"):
link = f"https://www.zillow.com{link}"
return {
"price": price,
"beds": beds,
"bath": bath,
"sqft": sqft,
"addr": addr,
"url": link,
}
async def close_optional_popup(page) -> None:
popup = page.get_by_role("button", name="Skip this question")
try:
await popup.click(timeout=5_000)
except PlaywrightTimeoutError:
return
async def run(playwright: Playwright) -> None:
"""Open Zillow, search by ZIP code, and save listing data."""
USER_DATA_DIR.mkdir(exist_ok=True)
context = await playwright.chromium.launch_persistent_context(
user_data_dir=str(USER_DATA_DIR),
headless=False,
user_agent=USER_AGENT,
viewport={"width": 1920, "height": 1080},
java_script_enabled=True,
locale="en-US",
timezone_id="America/New_York",
permissions=["geolocation"],
bypass_csp=True,
ignore_https_errors=True,
channel="msedge",
args=[
"--disable-blink-features=AutomationControlled",
"--disable-automation",
"--disable-infobars",
"--disable-dev-shm-usage",
"--no-sandbox",
"--disable-gpu",
"--disable-setuid-sandbox",
],
)
page = await context.new_page()
await page.set_extra_http_headers(
{
"Accept-Language": "en-US,en;q=0.9",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"Accept-Encoding": "gzip, deflate, br",
"Connection": "keep-alive",
}
)
data: list[dict[str, str | None]] = []
try:
await page.goto("https://www.zillow.com/",wait_until="load", timeout=60_000)
search_box = page.get_by_placeholder("Enter an address, neighborhood, city, or ZIP code")
await search_box.fill(ZIPCODE)
await search_box.press("Enter")
await close_optional_popup(page)
no_results = page.get_by_text("No matching results")
try:
await page.wait_for_selector("[data-test='property-card']", timeout=60_000)
except PlaywrightTimeoutError:
if await no_results.count():
logging.warning("No results for zipcode: %s", ZIPCODE)
return
raise
total_results = page.locator(".result-count").first
if await total_results.count():
logging.warning(
"Total results found - %s for zipcode - %s",
(await total_results.inner_text()).strip(),
ZIPCODE,
)
await page.wait_for_load_state("domcontentloaded")
listings = page.locator("[data-test='property-card']")
listing_count = await listings.count()
for index in range(listing_count):
listing = listings.nth(index)
try:
await listing.scroll_into_view_if_needed(timeout=10_000)
data.append(await extract_data(listing))
except PlaywrightTimeoutError:
logging.warning("Timed out extracting listing %s", index)
save_as_json(data)
logging.warning("Saved %s listings to %s", len(data), OUTPUT_FILE)
finally:
await context.close()
async def main() -> None:
logging.basicConfig(level=logging.WARNING, format="%(levelname)s: %(message)s")
async with async_playwright() as playwright:
await run(playwright)
if __name__ == "__main__":
asyncio.run(main())
Code Limitations
Keep in mind a few limitations with this property listing scraper.
- Right now, the script can only grab the listings visible on the initial results page, so it doesn’t handle pagination or infinite scrolling.
- Also, since it relies on very specific Zillow selectors, like those ‘data-test’ attributes, it could easily break if Zillow changes how its site is structured.
- It’s also worth noting that the script currently runs with a single user context, so if that context gets flagged or blocked, the whole scrape fails.
- The script doesn’t implement any advanced anti-scraping measures—things like CAPTCHA solving, residential proxy rotation, or request throttling.
Scraping Zillow Using the No-Code Scraper by ScrapeHero Cloud
ScrapeHero Cloud’s Zillow data scraping tool allows users to pull data quickly without writing any code. It provides an easy, no-code method for scraping data, making it accessible for individuals with limited technical skills.
This section will guide you through the steps to set up and use the Zillow scraper.
1. Sign up or log in to your ScrapeHero Cloud account.
2. Go to the Zillow Scraper by ScrapeHero Cloud.
3. Click the Create New Project button.
4. To scrape the details, you need to provide the Zillow search results URL for a specific search query.
a. You can get the URL from the Zillow search results page.
5. In the field provided, enter a project name, Zillow URL, and the maximum number of records you want to gather. Then, click the Gather Data button to start the scraper.
6. The scraper will start fetching data for your queries, and you can track its progress under the Projects tab.
7. Once it is finished, you can view the data by clicking on the project name. A new page will appear, and under the Overview tab, you can see and download the data.
8. You can also pull Zillow data into a spreadsheet from here. Just click on Download Data, select Excel, and open the downloaded file using Microsoft Excel.

Wrapping Up: Why You Need a Web Scraping Service
While this DIY script to scrape Zillow provides a functional foundation for small-scale data extraction, its reliance on volatile CSS selectors and lack of advanced features like pagination and automatic proxy rotation make it difficult to maintain at scale.
As Zillow continues to update its site architecture and anti-scraping measures, manual scripts often require constant troubleshooting and updates.
For real estate teams that need Zillow data at scale and can’t afford downtime, a managed web scraping service like ScrapeHero is the faster, lower-risk path to production-ready data.