How to Efficiently Perform Adaptive Web Scraping

Adaptive Web Scraping Using Flexible Selectors
Adaptive Web Scraping Using Fallback Selectors
Adaptive Web Scraping Using Large Language Models (LLMs)
Limitations
Why Use a Web Scraping Service

Websites frequently change their HTML structure, and some structures are dynamically generated with different attributes each time—meaning you need to constantly update your selectors. Adaptive web scraping aims to reduce the frequency with which you have to do that.

For instance, an e-commerce site might change a product card’s class from “product-item” to “item-card” after a JavaScript re-render. Traditional scrapers, which rely on fixed selectors (e.g., CSS or XPath), fail when these changes occur, resulting in “No Element Found” errors. However, using adaptive web scraping techniques, you can build a scraper that accounts for these changes.

This article discusses three ways you can implement adaptive web scraping:

Using Flexible Selectors
Using Fallback Selectors
Using LLMs

Adaptive Web Scraping Using Flexible Selectors

Flexible selectors don’t target the exact attributes but rather a part of them. For instance, an element holding a price may have a class like “current-price,” “buy-price,” “price-100,” etc. All of them contain the string ‘price,’ so a flexible selector can target that.

How you use a flexible selector depends on your parser.

In lxml, when you use XPaths, you can use the contains keyword like this:

parser = html.fromstring(‘html_string’)
parser.xpath(“//div[contains(@class,’price’)

However, you’ll need to use RegEx if you are using Python requests and BeautifulSoup.

pattern = re.compile(‘price’)
soup.find('div',{'class':pattern})

Here’s a sample script for adaptive scraping using RegEx:

import requests

from bs4 import BeautifulSoup

import re

def scrape_adaptive(url):

    response = requests.get(url)

    soup = BeautifulSoup(response.text, 'html.parser')

    class_pattern = re.compile(r'product')

    content = None

    element = soup.find('div',{'class':class_pattern})

    if element:

        content = element.get_text()

    print(content or 'No element found')

Explanation:

HTTP Request: requests.get(url) fetches the page’s HTML.
BeautifulSoup Parsing: BeautifulSoup(response.text, ‘html.parser’) creates a parser object.
Class Pattern: The RegEx pattern targets any string with the word ‘product’ in it.
Element Selection: Uses the find() method of BeautifulSoup to find the required element.
Output: The extracted text or a failure message is printed.

Need to know more about web scraping using the requests library? Read this ultimate guide to scrape websites with Python requests.

Adaptive Web Scraping Using Fallback Selectors

Sometimes, attribute changes are so different that the flexible selector may fail or grab an element you don’t need. Such situations call for fallback selectors.

Consider this example: An element holding a writer’s name can have an ID like “author,” “byline,” “writer,” etc. You can’t use a single flexible selector to target all these IDs effectively. You can, however, test them one by one, which is what this method does.

The method to implement fallback selectors is the same for any parser. You iterate through a list of selectors and in each iteration try to extract the data points using the current selector.

Here’s how to implement fallback selectors using Playwright, which is typically used for scraping dynamic websites:

from playwright.sync_api import sync_playwright

def scrape_adaptive(url):

    with sync_playwright() as p:

        browser = p.chromium.launch()

        page = browser.new_page()

        page.goto(url)

        selectors = {

            'primary': '.product-item',

            'fallback1': '.item-card',

            'fallback2': '[data-test="product"]'

        }

        content = None

        for key, selector in selectors.items():

            try:

                content = page.locator(selector).text_content()

                print(f'Used {key} selector')

                break

            except Exception as e: # Catching a general exception is usually better

                print(f'{key} selector failed: {e}') # Print the error for debugging

        print(content or 'No element found')

        browser.close()

Explanation:

Browser Setup: sync_playwright() initializes Playwright, p.chromium.launch() starts Chrome, new_page() creates a tab, and page.goto(url) loads the page.
Selector Dictionary: The selectors dictionary maps keys to CSS selectors for clarity.
Fallback Loop: The for loop iterates through selectors.items(), using page.locator(selector).text_content() to extract text. If a selector fails, the exception is caught and the loop moves to the next selector.
Output: The extracted text or a failure message is printed.
Cleanup: browser.close() is called within the with block to ensure proper resource cleanup.

Read this article to learn more about web scraping with Playwright.

Adaptive Web Scraping Using Large Language Models (LLMs)

You can use LLMs to analyze HTML code and extract required details without using any selectors. Either install an open-source LLM like Meta’s Llama locally and run it using powerful graphics cards, or use proprietary LLMs like Gemini or OpenAI.

Here’s how you might implement the Gemini API to extract product details.

import google.generativeai as genai

import os

import json

def extract_data(html):

    genai.configure(api_key=os.environ["GEMINI_API_KEY"].strip())

    # Create the model

    generation_config = {

        "temperature": 0,

        "max_output_tokens": 65536,

        "response_mime_type": "application/json",

    }

    model = genai.GenerativeModel(

        model_name="gemini-2.5-pro-preview",

        generation_config=generation_config,

    )

    chat_session = model.start_chat(

        history=[]
    )

    prompt = f"extract details of all the products from the following HTML: {html}"

    response = chat_session.send_message(prompt)

    print(json.loads(response.text))

Explanation:

API Setup: genai.configure() adds your API key for authentication with the model.
Configuration Dictionary: Creates a dictionary that specifies the model’s behavior.
Model Instance: genai.GenerativeModel() starts an instance of the model using the specified model name and the configuration dictionary.
Chat Session: model.start_chat() initializes a chat session with Gemini.
Prompt Creation: Sets a prompt that instructs the model to extract product details from the HTML code accepted by the function.
Response Parsing: json.loads() parses the response, which the function then returns.

Limitations

Anti-Scraping Measures: CAPTCHAs, IP bans, or bot detection (e.g., Cloudflare) can block scrapers. Bypassing these requires proxies, headless browser tweaks, or specialized services, which are beyond this article’s scope.
Unaccounted Changes: Unexpected HTML changes (e.g., site redesigns or new frameworks) can still break adaptive scrapers, requiring continuous monitoring.
Unreliability of LLMs: LLMs may fail to extract data because of unusual formatting, or they may generate incorrect data that doesn’t exist on the website (hallucination).

Worried about anti-scraping measures? Read this article on how to ethically avoid anti-scraping measures.

Why Use a Web Scraping Service

Adaptive web scraping aims to avoid issues due to HTML structure changes without manually changing the code. It usually involves using flexible and fallback selectors or LLMs to extract data. However, these methods have limitations.

The webpage may generate tags and attributes unaccounted for by the flexible or fallback selectors. Or, LLMs may hallucinate and generate incorrect data.

You can either deal with these limitations yourself or use a web scraping service.

A web scraping service like ScrapeHero can handle all these problems. We can take care of all the technicalities, including changing HTML structures and anti-scraping measures.

ScrapeHero is an enterprise-grade web scraping service provider. Contact ScrapeHero! Get high-quality data for analysis without worrying about extraction.

Published on: April 30, 2025

Services

Tired of Broken Scrapers? Discover Adaptive Web Scraping

Table of contents

Adaptive Web Scraping Using Flexible Selectors

Go the hassle-free route with ScrapeHero

Adaptive Web Scraping Using Fallback Selectors

Adaptive Web Scraping Using Large Language Models (LLMs)

Limitations

Why Use a Web Scraping Service

Table of contents

Scrape any website, any format, no sweat.

Ready to turn the internet into meaningful and usable data?

Continue Reading

Web Scraping Grab.com: A Technical Guide on Extracting Grab’s Food Data

Scrape Zomato Data: Extracting Restaurant and Dish Details

Scrape Swiggy Data: Extracting Details of Dishes from Swiggy