Shein Data Scraping Guide [Selenium + BeautifulSoup]

Data Scraped from Shein
Shein Data Scraping: Code Limitations
Why Not Use a Web Scraping Service?
FAQ

Shein data scraping can be challenging due to its dynamic content and anti-scraping measures. However, a browser automation library such as Selenium can help you interact with dynamic webpages and scrape data from Shein.

Here’s a step-by-step guide on how to scrape Shein data using Selenium.

Data Scraped from Shein

This tutorial for Shein web scraping extracts three data points from its super-deals page.

Product name
Product Price
Product URL

You can use the browser’s inspect panel to determine which HTML elements on Shein’s home page contain the details:

Right-click on a product’s data, like price
Click ‘Inspect’

Shein’s search results page with the inspect panel on the right

Shein Data Scraping: The Environment

The code uses Selenium Python to fetch the HTML source code of Shein’s super-deals page. Selenium’s ability to interact with browsers makes it excellent for scraping e-commerce websites like Shein.

For parsing the source code, BeautifulSoup is used.

Both Selenium and BeautifulSoup are external libraries, so you need to install them using Python pip.

pip install bs4 selenium

Besides the external libraries, the code also uses three packages from the Python standard library:

json to save the extracted data to a JSON file
urllib.parse to make relative links absolute
time to delay the script execution

Shein Data Scraping: The Code

Here’s the complete code if you want to get started with Shein data scraping right away.

import json
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from urllib.parse import urljoin
from time import sleep

source = "https://us.shein.com/super-deals"

options = webdriver.ChromeOptions()
options.add_argument("--headless=new")

with webdriver.Chrome(options) as browser:

    browser.get(source)

    products = []


    sleep(5)

    response = browser.page_source
    soup = BeautifulSoup(response,'lxml')
    product_list = soup.find('div',{'class':'thrifty-find-products'}).find_all('div',{'class':'product-card'})

    print(len(product_list))

    for div in product_list:
        
        #extract details
        try:
            name = div['aria-label']
            product_id = div['data-expose-id'].split('-')[1]
            slug = name.replace(' ','-').lower()
            url = f"https://us.shein.com/{slug}-p-{product_id}.html"
            price = div.find('div', {'class': 'final-price'}).text.strip()
        except:
            continue

        #append the details to the array

        products.append(
            {
                "Name":name,
                "Price":price,
                "URL": urljoin("https://shein.com",url)
            }

        )

with open("shein.json",'w') as f:
    json.dump(products,f,indent=4,ensure_ascii=False)

Begin your code to scrape Shein’s website by importing the packages mentioned above.

import json
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from urllib.parse import urljoin
from time import sleep

This code only imports two Selenium modules: webdriver and By:

The webdriver module interacts with the browser (navigating to a URL, setting browser options, etc.)
The By module lets you specify how to locate an HTML element (by XPath, class name, etc.).

The scraper extracts the product details from Shein’s super deals page.

source = "https://us.shein.com/super-deals"

Selenium browser is faster in headless mode. You can start the browser in headless mode by adding the argument “–headless=new” to the browser’s options and launching the browser with it.

options = webdriver.ChromeOptions()
options.add_argument("--headless=new")

The code uses a context manager to handle launching and closing the browser.

with webdriver.Chrome(options) as browser:

Note: ChromeOptions() is used when you are using the Chrome browser while running Selenium. You need to use the appropriate methods for the browser you want to use.

Within the context, use the get() method with the *source *variable as the argument to visit the super-deals page.

browser.get(source)

You can now extract the product details, but first, declare an empty array to store the details.

products = []

Next, pause the script execution for 5 seconds to allow all the products on the page to load.

sleep(5)

You can now get the HTML source code using Selenium’s page_source attribute.

response = browser.page_source

The next step is to parse the fetched HTML source code. Pass the source code to BeautifulSoup with lxml as the parser.

soup = BeautifulSoup(response,'lxml')

Parsing creates a BeautifulSoup object that provides methods for extracting elements from the source code.

Use the .find() method to select the div element containing product listings and the .find_all() method to find each *div *element holding the product details.

product_list = soup.find('div',{'class':'thrifty-find-products'}).find_all('div',{'class':'product-card'})

This gives you a list of div elements, each containing the details of a product. Iterate through the div elements, and in each iteration:

Extract name, URL, and price
Append the details to the array defined before the loop

for div in product_list:

        #extract details

        try:

            name = div['aria-label']

            product_id = div['data-expose-id'].split('-')[1]

            slug = name.replace(' ','-').lower()

            url = f"https://us.shein.com/{slug}-p-{product_id}.html"

            price = div.find('div', {'class': 'final-price'}).text.strip()

        except:

            continue

        #append the details to the array

        products.append(

            {

                "Name":name,

                "Price":price,

                "URL": urljoin("https://shein.com",url)

            }

        )

Finally, save the extracted Shein product data to a JSON file.

with open("shein.json",'w') as f:

    json.dump(products,f,indent=4,ensure_ascii=False)

Here is a flowchart showing the entire process.

Flowchart showing four steps for scraping Shein data: sending a request, parsing the response, extracting data, and saving the data

Shein Data Scraping: Code Limitations

Shein has strong anti-scraping mechanisms, such as CAPTCHA challenges and IP rate limiting. To overcome this, you may need to rotate proxies and use CAPTCHA solvers. This code doesn’t do that, which also means it’s unsuitable for large-scale web scraping.

Moreover, you must keep watching Shein’s website for any changes in its HTML structure because this code relies on it to extract the product details.

Why Not Use a Web Scraping Service?

Selenium WebDriver and BeautifulSoup are excellent for Shein data scraping. This tutorial showed how to scrape Shein’s super-deals page; similarly, you can scrape their other pages by modifying the code.

You also need to change the code for large-scale web scraping or whenever Shein changes its HTML structure. However, you can avoid all that by choosing ScrapeHero’s Web Scraping Service.

ScrapeHero is a fully-managed web scraping service provider capable of building large-scale web scraping and crawling.

FAQ

Is scraping Shein legal?

Although it’s legal to scrape a public website, scraping Shein or any website without permission may violate its terms of service. It’s essential to consult a legal expert to ensure compliance. Check out this page on the legality of web scraping to learn more.

Why do I need proxies to scrape Shein?

Shein has rate limits and IP bans that prevent scraping. Rotating proxies allow you to send requests from different IPs, reducing the risk of being blocked.

Can I scrape Shein without using Selenium?

Scraping Shein without browser automation is difficult because the site uses JavaScript to load content dynamically. However, if you don’t want to use Selenium, you can use other browser automation libraries like Playwright or Puppeteer.

How do I avoid getting blocked while scraping Shein?

Here are the steps for web scraping without getting blocked by Shein.
1. Use rotating proxies
2. Mimic human behavior (like random delays between requests)
3. Respect the site’s robots.txt file
4. Avoid too many requests

Published on: October 11, 2024

Services

Shein Data Scraping Guide: Using Selenium for Dynamic Pages

Table of contents

Data Scraped from Shein

Shein Data Scraping: The Environment

Shein Data Scraping: The Code

Shein Data Scraping: Code Limitations

Why Not Use a Web Scraping Service?

FAQ

Table of contents

Scrape any website, any format, no sweat.

Ready to turn the internet into meaningful and usable data?

Continue Reading

Amazon Buy Box Monitoring: How to Stop Sales Drops

Beyond Tracking: How to Set Up Early Warning Alerts for Pricing Changes in E-Commerce

TLS Fingerprint Bypass Techniques: Detection, Evasion, and What Actually Works in 2026