How to Scrape Store Locations from Walmart.com using Python 3

Share:

Scrape Walmart Store Locations

Walmart is a multinational chain of Hypermarkets, discount stores, and grocery stores. Knowing the Walmart store locations will help you understand market saturation, which will help you place your stores strategically. However, physically getting the details of Walmart stores is impractical; web scraping Walmart store locations is the practical alternative.

In this tutorial, you will learn how web scraping Walmart using Python works.

Set Up The Environment for Web Scraping Walmart Store Locations

This tutorial uses two external Python libraries: selenium and pandas. You can use pip, Python’s package manager, to install these packages.

Use this code to install Selenium and Pandas.

pip install selenium pandas

Selenium can automate the browser to visit Walmart’s page and get the location details, and Pandas will enable you to save the extracted data to a CSV file.

Data Scraped from Walmart

This tutorial for web scraping Walmart store locations will extract four data points:

  • Store name
  • ID
  • Address
  • Distance from the zip code.

Data extracted while web scraping scraping Walmart store location

The code will get the zipcode from the user via command line arguments, visit the Walmart website, extract data, clean it, and save it to a CSV file.

The Code for Web Scraping Walmart Store Locations

You will first import the necessary packages for Scraping Walmart:

  1. Argparse, for enabling the script to accept arguments from the command line
  2. Selenium, for visiting the Walmart store location page and extracting the data
  3. Pandas, for saving the extracted data into a CSV file
import argparse
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
import pandas

You must have noticed that you don’t directly import selenium. Instead, you only import the various modules from the Selenium library, which is more convenient.

Creating a separate function for visiting the link and extracting the data will be cleaner; therefore, you will make a function locate_stores() that will accept the zip code as an argument and return the scraped data as an object.

def locate_stores(zip_code):

Walmart doesn’t like bots scraping their data, so you will likely encounter their anti-scraping measures. They may detect that you are using an automated browser to visit the website and block you, making it necessary to take steps to bypass their detection.

This code uses one such method to scrape without getting blocked; it deactivates the flags telling websites that the browser is automation-controlled. You can turn this off by adding the appropriate argument in the options() method.

options = Options()
options.add_argument("start-maximized")
options.add_argument('--disable-blink-features=AutomationControlled')

After that, you can start the browser instance with options as an argument and visit the website.

driver = webdriver.Chrome(options=options)
driver.get(url)

Examining the HTML code will show you that the data points are inside a div element with the attribute aria-label=’results-list.’ You can locate this div element using its XPath.

results = driver.find_element(By.XPATH,"//div[@aria-label='results-list']")

There are two ways for web scraping Walmart store locations: You can get all the text from the section and clean it for required data or find the XPaths of each data point. This code uses the first method to scrape Walmart.

You will extract all the text from the section, then use the split() method to separate the store details. split() will separate the values in a string and store them in an array. It takes a string as an argument, which it uses as the separator.

For example, consider the string “grapes, oranges, apples, strawberries.” You will get [grapes, oranges, apples, strawberries] using the split() method with a comma as the argument.

The screenshot above shows you a “make this my store” text at the end of each store section. You will use this text as the argument for the split() method.

stores = results.text.split("\nMake this my store")

Now, you have store details as separate items in an array.

However, the details of each store are still in a string with newline characters separating them. Therefore, you must use the newline character as the split() argument to separate each detail, giving you an object.

store_details = []
    for store in stores:
        store_details.append(store.split('\n'))

The next step is to clean the object. You can do that with Pandas:

  • Convert the object into a Pandas DataFrame
    df = pandas.DataFrame(store_details)
    
  • Drop unwanted columns.
    df2 = df.drop(df.columns[0],axis=1)
    df3 = df2.drop(df.columns[1],axis=1)
  • Drop rows with null values.
    df4 = df3.dropna()

The function also transforms a column. You use regular expressions to extract the distance from the string.

df4[5] = df4[5].str.extract(r'\.(\d+)',expand=False)

Finally, it returns the extracted and cleaned data.

Essentially, the code performs three tasks:

  • Use the argparse module to enable the script to access arguments from the command line
    argparser = argparse.ArgumentParser()
    argparser.add_argument('zip_code',help = 'zip code to search')
    args = argparser.parse_args()
    zip_code = args.zip_code
  • Call the locate_stores() function
    scraped_data = locate_stores(zip_code)
  • Save the extracted data as a CSV file using Pandas
    scraped_data.to_csv("walmart.csv")

Here is the complete code for web scraping Walmart locations.

import argparse
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
import pandas


def locate_stores(zip_code):
    options = Options()
    options.add_argument("start-maximized")
    options.add_argument('--disable-blink-features=AutomationControlled')


    url = "https://www.walmart.com/store-finder?location=%s&distance=500"%(zip_code)


    driver = webdriver.Chrome(options=options)
    driver.get(url)


    results = driver.find_element(By.XPATH,"//div[@aria-label='results-list']")
   
    stores = results.text.split("\nMake this my store")


    store_details = []
    for store in stores:
        store_details.append(store.split('\n'))


    df = pandas.DataFrame(store_details)
    df2 = df.drop(df.columns[0],axis=1)
    df3 = df2.drop(df.columns[1],axis=1)
    df4 = df3.dropna()
    df4[5] = df4[5].str.extract(r'\.(\d+)',expand=False)
    new_names = ['Name','ID','Address','Distance']
    df4 = df4.set_axis(new_names, axis=1)
   
    return df4  


if __name__=="__main__":
   
    argparser = argparse.ArgumentParser()
    argparser.add_argument('zip_code',help = 'zip code to search')
    args = argparser.parse_args()
    zip_code = args.zip_code
    scraped_data = locate_stores(zip_code)    
    scraped_data.to_csv("walmart.csv")

And here are the Walmart store locations.

Results of web scraping Walmart store locations using Python

Code Limitations

This code can go online and scrape Walmart store locations data for a specific zip code, but it has some limitations.

Walmart uses strict anti-scraping measures, and this code does not use advanced techniques like proxy rotation to get around them.

Therefore, the code needs tweaking for large-scale data extraction from Walmart.

Another issue is that XPaths depend on the site structure, so you must determine XPaths each time Walmart changes its structure.

Wrapping Up

You can scrape Walmart store data using Python Selenium, but you must watch Walmart.com for any changes in its HTML code. And whenever that happens, you must analyze their webpage again and figure out the new XPaths.

If you want to avoid zip coding yourself, try ScrapeHero. ScrapeHero can help you in two ways.

You can use our no-code ScrapeHero Walmart Scraper from ScrapeHero Cloud. It is a prebuilt, affordable web scraper you can try for free. With just a few clicks, you can get the Walmart location data as JSON, or CSV.

You can also use ScrapeHero services if you want large-scale data extraction. We are an enterprise-grade web scraping service provider capable of building custom web scrapers. You only have to mention the data specifications; we will cover everything else.

We can help with your data or automation needs

Turn the Internet into meaningful, structured and usable data



Please DO NOT contact us for any help with our Tutorials and Code using this form or by calling us, instead please add a comment to the bottom of the tutorial page for help

Table of content

Scrape any website, any format, no sweat.

ScrapeHero is the real deal for enterprise-grade scraping.

Ready to turn the internet into meaningful and usable data?

Contact us to schedule a brief, introductory call with our experts and learn how we can assist your needs.

Continue Reading

NoSQL vs. SQL databases

Stuck Choosing a Database? Explore NoSQL vs. SQL Databases in Detail

Find out which SQL and NoSQL databases are best suited to store your scraped data.
Scrape JavaScript-Rich Websites

Upgrade Your Web Scraping Skills: Scrape JavaScript-Rich Websites

Learn all about scraping JavaScript-rich websites.
Web scraping with mechanicalsoup

Ditch Multiple Libraries by Web Scraping with MechanicalSoup

Learn how you can replace Python requests and BeautifulSoup with MechanicalSoup.
ScrapeHero Logo

Can we help you get some data?