Online data is becoming increasingly localized
The days of having the same data or information shown by a website to all users is fast ending. Websites are dynamically showing different content and data based on the location of the user. In this article we will highlight the impact of this trend on web scraping and how web based data gathering needs to adapt to this shift.
The data that usually changes based on location is the Price or Availability.
What one person may see as the price or whether a product is available for sale or shipping or pickup will vary based on the location of that person.
eCommerce Data
Covid-19 has accelerated the move to online and traditional retailers and manufacturers increasingly moved to online shopping. But the delivery of physical goods has always been local even though the stores may not be local anymore.
Let us look at an example of location specific options on Target.com
When you first go to Target.com, the websites tries to simplify the user experience and “guess” your location.
This means you have less clicks to make in selecting your store or location etc.
If this “guessing” of the location is incorrect or if you are not shopping at the location guessed by Target.com, you can expand the panel and pick a few options based on various criteria.
The goal of the websites is to be as accurate as possible in guessing the location without impairing the user experience i.e. not forcing them to select a location before accessing the website or making it harder for user to browse or shop on the website.
How do websites know my location?
Websites have various methods from unintrusive to very intrusive in identifying your location. Here are a few current methods
IP Address
Every device that accesses the Internet is assigned a number called the Internet Protocol (IP) address. It is an number of the format 123.456.789.123 that is allocated to your Internet Service Provider (ISP) and then assigned to your connection or device temporarily or permanently.
When you visit a website, this IP address is sent to the website and they can use certain databases called the GeoIP databases. These databases allow websites to map the IP address to a physical location such as a city, town, state, country etc.
You cannot control whether you send the IP address to a website, but you can definitely try to hide it using services such as VPNs.
Here is an example of entering an IP address that belongs to Amazon AWS on the GeoIP lookup page
You can see the city and zip code of this IP address and this is usually the method that websites use initially to identify your location.
GPS Location
This method uses Global Positioning System (GPS) data from hardware chips in your device. This is a satellite based tracking system used in most phones and some computers that have this hardware.
The GPS location can be directly used by apps or be provided to the web browser that is used to access the website. The website gets the exact Latitude and Longitude coordinates of your location anywhere on the globe.
The access to this information can be controlled by the user at the device level and per app or website level (in most cases).
Methods to get location based data from a website
The methods used to get location specific data depends on each website, but generally the methods that are used are
- Use your IP address mapped location
- Allow the browser (and hence the website) to use your GPS location
- Set a specific zip code or store location explicitly
Impact of location specific data on web scraping
As location specific data gets more prevalent, the impact on using web scraping for data gathering is significant.
On the positive front, the cost savings are significant because brands do not have to send “mystery shoppers” to physical stores to check product availability or pricing. This data can now be gathered with increasing accuracy online.
The data can now cover the whole country or all stores of a brand and all brands. The savings versus the traditional approaches are huge and the data coverage is much wider and frequent.
On the negative front, the cost of gathering this data increases because of the need to get the data from multiple locations as opposed to a single location.
The cost of getting IPs that are specific to a city or location are also higher due to the availability of local IPs in every location.
The number of pages that need to be scraped also increases because the location has to be set for each product and that sometimes takes 2 to 3 times the number of pages compared to a location agnostic data gathering project.
Overall, the costs are much lower in comparison to any physical store level checks and the data granularity and accuracy is much higher
Visual examples of location based access on some popular websites
How to set your location on Amazon.com
How to set your location on Walmart.com
How does a website ask for my GPS location?
Here is an example of Walgreens.com asking for permission to use the computer’s GPS location on the Edge browser. Other browsers show similar prompts when a websites askes for this data.