When you search for a product on Target.com, it executes JavScript and fetches the search results from an external URL.
Therefore, you have two choices for web scraping Target.com:
- Use automated browsers like Selenium Python to execute JavaScript and extract details from the HTML elements
- Find out the external URL for web scraping Target product data
The second method, using the external URL, would be faster because executing JavaScript takes longer. Moreover, the URL delivers data in a JSON format, from which you can extract data more efficiently.
This article shows how to scrape Target.com and get product data using Python.
Data Scraped from Target
The code for web scraping Target.com extracts data in two parts.
In the first part, the code makes an HTTP request to the external URL to get the details of the products on the search results. You can get this external URL using the browser’s developer tools.
- Search a product on target.com
- Open the network tab of developer tools
- Ensure Fetch/XHR is present
- Get the link from the item named product_summary_with_fullfilment
The product details obtained using the above method only contain the title and the product’s URL.
To get other product details, you must make an HTTP request to each product’s URL. This request will fetch the HTML source code of each product’s page; you can then extract the product details from this source code.
The code extracts 5 data points:
- Description
- Features
- Name
- Price
- URL
- Ratings
The details will be inside a script tag. You can analyze the page’s source code and find which script tag contains the data.
- Right-click on the page and select view-source. A new tab will open with the HTML source code.
- Search for a specific value, such as a product price; the script tag containing this price value will have the product details.
Note: The script tag has no ID, so you must determine the unique element inside it.
Web Scraping Target with Python: Environment
The code needs four packages to scrape Target.com:
- Python requests
- BeautifulSoup
- The json module
- The re module
Python requests handles HTTP requests; the code uses it to fetch data from the URL, and BeautifulSoup handles HTML parsing.
The json module enables you to extract data from a JSON string and write the extracted data to a JSON file.
The re module helps you find a string with a specific pattern.
BeautifulSoup and Python requests are external Python libraries; therefore, install them using pip.
pip install bs4 requests
Web Scraping Target.com: The Code
Import all the required packages.
import requests
from bs4 import BeautifulSoup
import json
import re
Headers allow you to pose as a legitimate request from a user in front of the server. Otherwise, Target.com will know that the request is from Python and block you.
Here are the headers used for web scraping Target.com.
headers = {
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,"
"*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
"accept-language": "en-GB;q=0.9,en-US;q=0.8,en;q=0.7",
"dpr": "1",
"sec-fetch-dest": "document",
"sec-fetch-mode": "navigate",
"sec-fetch-site": "none",
"sec-fetch-user": "?1",
"upgrade-insecure-requests": "1",
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36",
}
- The accept headers define what formats your response will contain.
- Similarly, the accept-language header specifies what languages the response can contain.
- The DPR headers will specify the density pixel ratio.
- The sec-fetch headers increase security. It gives the server the context of the request.
- “sec-fetch-dest: document” tells the server that the request intends to get a document.
- “sec-fetch-mode: navigate” tells the server that requests originated in the browser’s mainframe.
- “sec-fetch-site: none” tells the server there is no relationship between the server and the request.
- “sec-fetch-site: ?1” tells the server that the request is user-generated.
- The user-agent headers fake the details of the device and the software from which the request originated.
Get the product URLs
Use the headers defined above to send requests to the URL from which target.com fetches the product data.
product_summary = requests.get(
"https://redsky.target.com/redsky_aggregations/v1/web/product_summary_with_fulfillment_v1?key=9f36aeafbe60771e321a7cc95a78140772ab3e96&tcins=87450164%2C89981814%2C76633629%2C76564372%2C75557222%2C83935767%2C90007101%2C87450172%2C84694230%2C83935763%2C75663217%2C51944685%2C81088223%2C81888650%2C86912997%2C91337758%2C91337835%2C91337712%2C91337754%2C91337804%2C91337741%2C91337794%2C91337728%2C91337843%2C91337807%2C91337698%2C91337703%2C91337759&zip=64160&state=TN&latitude=11.090&longitude=77.350&has_required_store_id=false&skip_price_promo=true&visitor_id=018FDD4CCE720201AA9FB7EAFCFB3EF9&channel=WEB&page=%2Fs%2Fpaper+towels",
headers=headers,
).text
Next, use the json.loads() to parse the request text, which will be in JSON format.
product_summary = json.loads(product_summary)
Extract the value containing the details of all the products; they will be inside the key “product-summaries.”
products = product_summary["data"]["product_summaries"]
Define an array to store the URLs, iterate through all products, extract the product URLs, and store them in the array.
urls = []
for product in products:
urls.append(product["item"]["enrichment"]["buy_url"])
Define another empty array to store the extracted product details.
targetProducts = []
Extract the Product Details
Iterate through the array containing product URLs. Each loop will
- Make HTTP requests to the URL
- Extract details
- Store them in a dict
- Append the dict
Making HTTP requests
While making the HTTP requests, use the get method of Python requests with the headers defined earlier.
pResponse = requests.get(url, headers=headers)
Extracting the details
Begin by parsing the contents using BeautifulSoup. Then, you can locate and extract the required details, which will be inside a script element.
soup = BeautifulSoup(pResponse.text)
Analyzing the HTML source code reveals that the product details are inside a script element with a variable __TGT_DATA__,
Therefore, to find the script element containing the product details,
- Find all the script elements.
scripts = soup.find_all("script")
- Iterate through all the scripts, and if any script contains __TGT_DATA__, pass it into the reqScript variable.
for script in scripts: if "__TGT_DATA__" in script.text: reqScript = script
Still inside the loop,
1. Extract the JSON string.
jsonProduct = re.search(r'parse\("({ .+)"\)\)', reqScript.text).group(1)
2. Clean the string
The nature of the JSON string determines how you clean it. In this case, it involves
- Putting the true and false values in double quotes
jsonProduct = jsonProduct.replace("false", '\\"false\\"') jsonProduct = jsonProduct.replace("true", '\\"true\\"')
- Removing backslashes before the double quotes so they remain unescaped.
jsonProduct = jsonProduct.replace('\\"', '"') jsonProduct = jsonProduct.replace('\\\\"', '\\"')
Now, you have a valid JSON string that you can load using the json module, giving you a JSON object.
onProduct = json.loads(jsonProduct)
You can extract the product details from this JSON object. Ensure you extract the required details in a try-except block, as there may be null values, leading to errors.
- Get the dict containing product descriptions
rawDescription = jsonProduct["__PRELOADED_QUERIES__"]["queries"][2][1]["data"]["product"]["children"][0]["item"]["product_description"]
- Get the product description and features from this dict
description = rawDescription["downstream_description"] features = rawDescription["soft_bullets"]["bullets"]
- Get the title
name = rawDescription["title"]
- Extract the dict containing various types of product prices
rawPrice = jsonProduct["__PRELOADED_QUERIES__"]["queries"][2][1]["data"]["product"]["children"][0]["price"]
- Get the current price from this dict
price = rawPrice["current_retail"]
- Extract the ratings
ratings = jsonProduct["__PRELOADED_QUERIES__"]["queries"][2][1]["data"]["product"]["ratings_and_reviews"]["statistics"
Storing the values to a dict and appending it to the array
extractedProductDetails = {
"description": description,
"features": features,
"name": name,
"price": price,
"url": url,
targetProducts.append(extractedProductDetails)
Finally, you can save the extracted details to a JSON file.
with open("target.json", "w", encoding="UTF-8") as targetFile:
json.dump(targetProducts, targetFile, indent=4, ensure_ascii=False)
Code Limitations
The code relies on the structure of the JSON data and the URL to scrape Target product results; if Target.com changes them, you need to alter the code.
Moreover, the code is inefficient for large-scale web scraping, which requires a huge HTTP request volume, making your scraper more susceptible to Target.com’s anti-scraping measures.
This code does not have any techniques for bypassing anti-scraping measures.
ScrapeHero Target Scraper
If you want to scrape Target products without coding, try the ScrapeHero Target Scraper for free from ScrapeHero Cloud. It is a no-code web scraper that can gather product details from Target.
Its benefits include:
- Scheduled executions
- Automated delivery to your cloud storage
- Data available in multiple formats
Here’s how you use the scraper:
- Go to ScrapeHero Cloud and create an account.
- Search for ScrapeHero Target Scraper and add it to your crawler list
- Add the search URLs or keywords to the input
- Click gather data
Read how to scrape Target.com without coding to learn the process in detail.
Wrapping Up
Python is excellent for web scraping Target.com and storing the results in JSON format. Use Python requests to get the HTML code and the packages BeautifulSoup, json, and re to extract the content.
The code in this tutorial will work until Target.com changes the JSON structure or URLs. If that happens, you need to update the code. You also need to add more code for large-scale web scraping to bypass anti-scraping measures.
To avoid coding yourself, use the ScrapeHero Target Scraper mentioned above; this will be excellent for small-scale projects.
However, if you wish to scrape on a larger scale and have custom data requirements, try ScrapeHero services. We will take care of everything, including handling anti-scraping measures.
ScrapeHero is a full-service web scraping service provider. Provide us with your web-scraping requirements, and we will build an enterprise-grade web scraper for you. Our services range from large-scale web scraping and crawling to monitoring and custom RPA.
We can help with your data or automation needs
Turn the Internet into meaningful, structured and usable data