Google Search Scraping: Code and No-Code Step-by-Step Guide

Google Search Scraping: The No-Code Method
Google Search Scraping: The Code-Based Method
Wrapping Up: Why Use a Web Scraping Service
FAQs

Google no longer allows you to access their search results without enabling JavaScript. That means you can not simply use HTTP requests based methods for Google search scraping.

However, there are a couple of solutions:

Read-made scrapers such as the Google Search Results scraper on ScrapeHero Cloud
Headless browsers such as Playwright

This tutorial discusses both of these methods.

Google Search Scraping: The No-Code Method

For a hassle-free method for Google Search data extraction, a no-code solution is often the best choice. You can use ScrapeHero Cloud’s Google Search Results Scraper. It eliminates the need for maintaining code, bypassing anti-bot measures, and handling infrastructure.

Steps:

1. Log in to your ScrapeHero Cloud account

2. Navigate to the Google Search Results Scraper in the Scrapehero App store

Finding the Google Search Results scraper in the ScrapeHero App store

3. Click on “Create New Project”

Creating a new Google search results scraping project

4. Enter project name and search queries

5. Click “Gather Data” to begin the data extraction process

Setting the Google Search Results scraper

Once complete, download your data in your preferred format (CSV, JSON, or Excel).

With ScrapeHero paid plans, you can also enjoy these features:

Cloud storage integration: Automatically send your scraped data to cloud services like Google Drive, S3, or Azure.
Scheduling: Run your scrapes daily, weekly, or monthly to monitor trends over time.
API integration: Access your data programmatically via an API for use in your own applications and dashboards.

Google Search Scraping: The Code-Based Method

If you want more flexibility and don’t mind handling the coding and maintenance of a scraper, this detailed tutorial covers scraping Google search results with Python.

Setting Up the Environment

Before starting to write the code, you need to install the necessary Python library. The code uses Playwright, a modern browser automation library that can handle dynamic, JavaScript-heavy websites.

Install the Playwright library using PIP.

pip install playwright

Also install the Playwright browser.

playwright install

Data Scraped from Google Search Results

The provided code extracts specific data points from each result on the Google results page. Here’s what it collects and how:

Title: The blue, clickable headline of the search result—extracted from the h3 element with the class LC20lb.
URL: The full hyperlink address—retrieved from the href attribute of the a tag with the class zReHs.
Domain: The root website address (e.g., example.com)—parsed from the full URL.
Description: The snippet of text that summarizes the page content—taken from the div element with the class VwiC3b.

Writing The Code

If you want to get to work right away, here’s the complete code to scrape Google results page.

from playwright.sync_api import sync_playwright

import os, json

user_data_dir = os.path.join(os.getcwd(), "user_data")
if not os.path.exists(user_data_dir):
    os.makedirs(user_data_dir)

with sync_playwright() as p:
    user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"

    context = p.chromium.launch_persistent_context(
        headless=False,
        user_agent=user_agent,
        user_data_dir=user_data_dir,
        viewport={'width': 1920, 'height': 1080},
        java_script_enabled=True,
        locale='en-US',
        timezone_id='America/New_York',
        permissions=['geolocation'],
        # Mask automation
        bypass_csp=True,
        ignore_https_errors=True,
        channel="msedge",
        args=[
            '--disable-blink-features=AutomationControlled',
            '--disable-automation',
            '--disable-infobars',
            '--disable-dev-shm-usage',
            '--no-sandbox',
            '--disable-gpu',
            '--disable-setuid-sandbox'
        ]

    )
    
    page = context.new_page()

    page.set_extra_http_headers({
        'Accept-Language': 'en-US,en;q=0.9',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
        'Accept-Encoding': 'gzip, deflate, br',
        'Connection': 'keep-alive'
    })

    search_term = 'how to teach my cat to use laptop'
    search_url = 'https://google.com'
    page.goto(search_url)
    page.wait_for_timeout(5000)

    search_box = page.get_by_role('combobox',name='Search')
    search_box.fill(f'{search_term}')

    page.wait_for_timeout(1000)

    search_box.press('Enter')
    page.wait_for_selector('div.N54PNb')

    for _ in range(5):
        page.mouse.wheel(0, 1000)
        page.wait_for_timeout(1000)

    results = page.locator('div.N54PNb').all()
    print("results fetched")

    details = []

    for result in results:

        url = result.locator('a.zReHs').get_attribute('href')
        domain = url.split('/')[2] if url else None
        title = result.locator('h3.LC20lb').inner_text()
        description = result.locator('div.VwiC3b').inner_text()

        details.append(
            {
                'url':url,
                'domain':domain,
                'title':title,
                'description':description
            }
        )

with open('search_results.json','w',encoding='utf') as f:
    json.dump(details,f,ensure_ascii=False,indent=4)

Let’s break down this code step-by-step to understand how Google search scraping is implemented.

First, the script imports the necessary libraries:

playwright is for browser automation
os for handling file paths, and json for saving the data.

from playwright.sync_api import sync_playwright
import os, json

To make the browser instance appear more like a real user and potentially avoid blocks, the script creates a persistent user data directory. This allows the browser to save cookies and cache, creating a consistent session across runs.

user_data_dir = os.path.join(os.getcwd(), "user_data")
if not os.path.exists(user_data_dir):
    os.makedirs(user_data_dir)

The sync_playwright context manager launches a Chromium browser with the configuration that helps you evade detection.

For instance, it runs in headed mode (headless=False), sets a realistic user agent, and uses the persistent context. The arguments (like –disable-blink-features=AutomationControlled) help mask automation indicators.

with sync_playwright() as p:
    user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
    context = p.chromium.launch_persistent_context(
        headless=False,
        user_agent=user_agent,
        user_data_dir=user_data_dir,
        viewport={'width': 1920, 'height': 1080},
        java_script_enabled=True,
        locale='en-US',
        timezone_id='America/New_York',
        permissions=['geolocation'],
        # Mask automation
        bypass_csp=True,
        ignore_https_errors=True,
        channel="msedge",
        args=[
            '--disable-blink-features=AutomationControlled',
            '--disable-dev-shm-usage',
            '--no-sandbox',
            '--disable-gpu',
            '--disable-setuid-sandbox'
        ]
    )

Next, the code creates a new page (tab) within the browser context and sets extra HTTP headers to further mimic a browser request from an English-speaking user.

 page = context.new_page()
    page.set_extra_http_headers({
        'Accept-Language': 'en-US,en;q=0.9',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
        'Accept-Encoding': 'gzip, deflate, br',
        'Connection': 'keep-alive'
    })

The script now defines the search query and navigates to Google’s homepage. It uses Playwright’s get_by_role() to find the search box by its ARIA role and name, fill() to enter the search term, and press() to press the Enter key.

 search_term = 'how to teach my cat to use laptop'
    search_url = 'https://google.com'
    page.goto(search_url)
    page.wait_for_timeout(5000)
    search_box = page.get_by_role('combobox',name='Search')
    search_box.fill(f'{search_term}')
    page.wait_for_timeout(1000)
    search_box.press('Enter')

After initiating the search, the code waits for the results container (div.N54PNb) to appear. Google uses lazy-loading, so the code simulates scrolling using the mouse wheel five times and loads more results.

  page.wait_for_selector('div.N54PNb')
    for _ in range(5):
        page.mouse.wheel(0, 10000)
        page.wait_for_timeout(1000)

With the page fully scrolled, the script now locates all individual search result elements. It then loops through each result extracts these data points using the CSS selectors discussed earlier:

Title
URL
Domain
Description

And then it appends this data to a list.

  results = page.locator('div.N54PNb').all()
    print("results fetched")
    details = []
    for result in results:
        url = result.locator('a.zReHs').get_attribute('href')
        domain = url.split('/')[2] if url else None
        title = result.locator('h3.LC20lb').inner_text()
        description = result.locator('div.VwiC3b').inner_text()
        details.append(
            {
                'url':url,
                'domain':domain,
                'title':title,
                'description':description
            }
        )

Finally, the script writes the scraped data, stored in the details list, to a JSON file named search_results.json.

with open('search_results.json','w',encoding='utf') as f:
    json.dump(details,f,ensure_ascii=False,indent=4)

Code Limitations

While this code is a functional example, consider these limitations for production use:

Google frequently changes the CSS classes of its HTML elements (like N54PNb, LC20lb). A small change by Google will break the scraper.
Despite the stealth measures, Google’s advanced anti-bot systems may still detect and block the automated browser, leading to CAPTCHAs or IP bans as the script doesn’t rotate IP addresses.

Wrapping Up: Why Use a Web Scraping Service

Building a custom scraper with Python and Playwright gives you a great learning experience and fine-grained control.

However, if your business relies on accurate, consistent, and large-scale Google Search data extraction, the maintenance overhead and risk of blocks are significant.

A professional web scraping service like ScrapeHero solves these problems by providing enterprise-grade data. We can handle proxy rotation, CAPTCHAs, and changes in Google’s layout, allowing you to focus on analyzing the data, not collecting it.

Connect with ScrapeHero to make data collection hassle free.

FAQs

Is it legal to scrape Google results?

This is a complex area and not legal advice. Generally, scraping publicly accessible data for fair use (e.g., analysis, research) may be permissible. However, always consult with a legal professional.

Why is my script not finding any results (empty list)?

The most common reason is that Google has updated its HTML, and the CSS selectors (like div.N54PNb or h3.LC20lb) in the code doesn’t target the correct element. You now need to manually inspect the new Google Search Results Page and update the selectors accordingly.

How can I make this scraper more robust and avoid being blocked?

To improve robustness, you should:
1. Implement random delays between actions
2. Rotate user agents and use a pool of residential proxies
3. Regularly monitor and update the CSS selectors.

Published on: October 24, 2025

Services

Google Search Scraping Step-By-Step: A Code and No-Code Guide

Table of contents

Don’t want to code? ScrapeHero Cloud is exactly what you need.

Google Search Scraping: The No-Code Method

Google Search Scraping: The Code-Based Method

Setting Up the Environment

Data Scraped from Google Search Results

Writing The Code

Code Limitations

Wrapping Up: Why Use a Web Scraping Service

FAQs

Table of contents

Scrape any website, any format, no sweat.

Ready to turn the internet into meaningful and usable data?

Continue Reading

Programmatic Content Archiving: An Essential Guide

Monitor Instagram Reels on SERP via Python Scraping: A Technical Overview

How to Scrape AI Overviews for Multiple Queries: A Technical Guide