Google Search Scraping Step-By-Step: A Code and No-Code Guide

Share:

Google search scraping

Google no longer allows you to access their search results without enabling JavaScript. That means you can not simply use HTTP requests based methods for Google search scraping.

However, there are a couple of solutions:

  • Read-made scrapers such as the Google Search Results scraper on ScrapeHero Cloud 
  • Headless browsers such as Playwright

This tutorial discusses both of these methods. 

Don’t want to code? ScrapeHero Cloud is exactly what you need.

With ScrapeHero Cloud, you can download data in just two clicks!

Google Search Scraping: The No-Code Method

For a hassle-free method for Google Search data extraction, a no-code solution is often the best choice. You can use ScrapeHero Cloud’s Google Search Results Scraper. It eliminates the need for maintaining code, bypassing anti-bot measures, and handling infrastructure.

Steps:

1. Log in to your ScrapeHero Cloud account

2. Navigate to the Google Search Results Scraper in the Scrapehero App store

Finding the Google Search Results scraper in the ScrapeHero App store

 

3. Click on “Create New Project”

Creating a new Google search results scraping project

 

4. Enter project name and search queries

5. Click “Gather Data” to begin the data extraction process

Setting the Google Search Results scraper

 

Once complete, download your data in your preferred format (CSV, JSON, or Excel).

With ScrapeHero paid plans, you can also enjoy these features:

  • Cloud storage integration: Automatically send your scraped data to cloud services like Google Drive, S3, or Azure.
  • Scheduling: Run your scrapes daily, weekly, or monthly to monitor trends over time.
  • API integration: Access your data programmatically via an API for use in your own applications and dashboards.

Google Search Scraping: The Code-Based Method

If you want more flexibility and don’t mind handling the coding and maintenance of a scraper, this detailed tutorial covers scraping Google search results with Python.

Setting Up the Environment

Before starting to write the code, you need to install the necessary Python library. The code uses Playwright, a modern browser automation library that can handle dynamic, JavaScript-heavy websites.

Install the Playwright library using PIP.

pip install playwright

Also install the Playwright browser.

playwright install

Data Scraped from Google Search Results

The provided code extracts specific data points from each result on the Google results page. Here’s what it collects and how:

  • Title: The blue, clickable headline of the search result—extracted from the h3 element with the class LC20lb.
  •  URL: The full hyperlink address—retrieved from the href attribute of the a tag with the class zReHs.
  •  Domain: The root website address (e.g., example.com)—parsed from the full URL.
  • Description: The snippet of text that summarizes the page content—taken from the div element with the class VwiC3b.

Writing The Code

If you want to get to work right away, here’s the complete code to scrape Google results page.

from playwright.sync_api import sync_playwright

import os, json

user_data_dir = os.path.join(os.getcwd(), "user_data")
if not os.path.exists(user_data_dir):
    os.makedirs(user_data_dir)

with sync_playwright() as p:
    user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"

    context = p.chromium.launch_persistent_context(
        headless=False,
        user_agent=user_agent,
        user_data_dir=user_data_dir,
        viewport={'width': 1920, 'height': 1080},
        java_script_enabled=True,
        locale='en-US',
        timezone_id='America/New_York',
        permissions=['geolocation'],
        # Mask automation
        bypass_csp=True,
        ignore_https_errors=True,
        channel="msedge",
        args=[
            '--disable-blink-features=AutomationControlled',
            '--disable-automation',
            '--disable-infobars',
            '--disable-dev-shm-usage',
            '--no-sandbox',
            '--disable-gpu',
            '--disable-setuid-sandbox'
        ]

    )
    
    page = context.new_page()

    page.set_extra_http_headers({
        'Accept-Language': 'en-US,en;q=0.9',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
        'Accept-Encoding': 'gzip, deflate, br',
        'Connection': 'keep-alive'
    })

    search_term = 'how to teach my cat to use laptop'
    search_url = 'https://google.com'
    page.goto(search_url)
    page.wait_for_timeout(5000)

    search_box = page.get_by_role('combobox',name='Search')
    search_box.fill(f'{search_term}')

    page.wait_for_timeout(1000)

    search_box.press('Enter')
    page.wait_for_selector('div.N54PNb')

    for _ in range(5):
        page.mouse.wheel(0, 1000)
        page.wait_for_timeout(1000)

    results = page.locator('div.N54PNb').all()
    print("results fetched")

    details = []

    for result in results:

        url = result.locator('a.zReHs').get_attribute('href')
        domain = url.split('/')[2] if url else None
        title = result.locator('h3.LC20lb').inner_text()
        description = result.locator('div.VwiC3b').inner_text()

        details.append(
            {
                'url':url,
                'domain':domain,
                'title':title,
                'description':description
            }
        )

with open('search_results.json','w',encoding='utf') as f:
    json.dump(details,f,ensure_ascii=False,indent=4)

Let’s break down this code step-by-step to understand how Google search scraping is implemented.

First, the script imports the necessary libraries:

  • playwright is for browser automation
  • os for handling file paths, and json for saving the data.
from playwright.sync_api import sync_playwright
import os, json

To make the browser instance appear more like a real user and potentially avoid blocks, the script creates a persistent user data directory. This allows the browser to save cookies and cache, creating a consistent session across runs.

user_data_dir = os.path.join(os.getcwd(), "user_data")
if not os.path.exists(user_data_dir):
    os.makedirs(user_data_dir)

The sync_playwright context manager launches a Chromium browser with the configuration that helps you evade detection.

For instance, it runs in headed mode (headless=False), sets a realistic user agent, and uses the persistent context. The arguments (like –disable-blink-features=AutomationControlled) help mask automation indicators.

with sync_playwright() as p:
    user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
    context = p.chromium.launch_persistent_context(
        headless=False,
        user_agent=user_agent,
        user_data_dir=user_data_dir,
        viewport={'width': 1920, 'height': 1080},
        java_script_enabled=True,
        locale='en-US',
        timezone_id='America/New_York',
        permissions=['geolocation'],
        # Mask automation
        bypass_csp=True,
        ignore_https_errors=True,
        channel="msedge",
        args=[
            '--disable-blink-features=AutomationControlled',
            '--disable-dev-shm-usage',
            '--no-sandbox',
            '--disable-gpu',
            '--disable-setuid-sandbox'
        ]
    )

Next, the code creates a new page (tab) within the browser context and sets extra HTTP headers to further mimic a browser request from an English-speaking user.

 page = context.new_page()
    page.set_extra_http_headers({
        'Accept-Language': 'en-US,en;q=0.9',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
        'Accept-Encoding': 'gzip, deflate, br',
        'Connection': 'keep-alive'
    })

The script now defines the search query and navigates to Google’s homepage. It uses Playwright’s get_by_role() to find the search box by its ARIA role and name, fill() to enter the search term, and press() to press the Enter key.

 search_term = 'how to teach my cat to use laptop'
    search_url = 'https://google.com'
    page.goto(search_url)
    page.wait_for_timeout(5000)
    search_box = page.get_by_role('combobox',name='Search')
    search_box.fill(f'{search_term}')
    page.wait_for_timeout(1000)
    search_box.press('Enter')

After initiating the search, the code waits for the results container (div.N54PNb) to appear. Google uses lazy-loading, so the code simulates scrolling using the mouse wheel five times and loads more results.

  page.wait_for_selector('div.N54PNb')
    for _ in range(5):
        page.mouse.wheel(0, 10000)
        page.wait_for_timeout(1000)

With the page fully scrolled, the script now locates all individual search result elements. It then loops through each result extracts these data points using the CSS selectors discussed earlier:

  • Title
  • URL
  • Domain
  • Description 

And then it appends this data to a list.

  results = page.locator('div.N54PNb').all()
    print("results fetched")
    details = []
    for result in results:
        url = result.locator('a.zReHs').get_attribute('href')
        domain = url.split('/')[2] if url else None
        title = result.locator('h3.LC20lb').inner_text()
        description = result.locator('div.VwiC3b').inner_text()
        details.append(
            {
                'url':url,
                'domain':domain,
                'title':title,
                'description':description
            }
        )

Finally, the script writes the scraped data, stored in the details list, to a JSON file named search_results.json.

with open('search_results.json','w',encoding='utf') as f:
    json.dump(details,f,ensure_ascii=False,indent=4)

Code Limitations

While this code is a functional example, consider these limitations for production use:

  1. Google frequently changes the CSS classes of its HTML elements (like N54PNb, LC20lb). A small change by Google will break the scraper.
  2. Despite the stealth measures, Google’s advanced anti-bot systems may still detect and block the automated browser, leading to CAPTCHAs or IP bans as the script doesn’t rotate IP addresses.

Wrapping Up: Why Use a Web Scraping Service

Building a custom scraper with Python and Playwright gives you a great learning experience and fine-grained control. 

However, if your business relies on accurate, consistent, and large-scale Google Search data extraction, the maintenance overhead and risk of blocks are significant. 

A professional web scraping service like ScrapeHero solves these problems by providing enterprise-grade data. We can handle proxy rotation, CAPTCHAs, and changes in Google’s layout, allowing you to focus on analyzing the data, not collecting it.

Connect with ScrapeHero to make data collection hassle free.

FAQs

Is it legal to scrape Google results?

This is a complex area and not legal advice. Generally, scraping publicly accessible data for fair use (e.g., analysis, research) may be permissible. However, always consult with a legal professional.

Why is my script not finding any results (empty list)?

The most common reason is that Google has updated its HTML, and the CSS selectors (like div.N54PNb or h3.LC20lb) in the code doesn’t target the correct element. You now need to manually inspect the new Google Search Results Page and update the selectors accordingly. 

How can I make this scraper more robust and avoid being blocked?

To improve robustness, you should: 
1. Implement random delays between actions
2. Rotate user agents and use a pool of residential proxies
3. Regularly monitor and update the CSS selectors.

Table of contents

Scrape any website, any format, no sweat.

ScrapeHero is the real deal for enterprise-grade scraping.

Clients love ScrapeHero on G2

Ready to turn the internet into meaningful and usable data?

Contact us to schedule a brief, introductory call with our experts and learn how we can assist your needs.

Continue Reading

Questions on Web Scraping

Choosing a Web Scraping Company? Here are 10 Questions on Web Scraping You Must Ask

Questions on Web Scraping: What to Ask Before You Sign a Contract.
Scrape Google Play Store

A Guide on How to Scrape Google Play Store: Code and No-Code Approaches

Learn how to extract data from the Google Play Store.
Risks of Outsourcing Web Scraping

12 Warning Signs and Risks of Outsourcing Web Scraping

Top Risks of Outsourcing Web Scraping for Enterprises.
ScrapeHero Logo

Can we help you get some data?