Web Scraping Using GPT-4 With Vision (GPT-4V)

Share:

Web Scraping Using GPT-4 With Vision

Are you aware that ChatGPT now responds to image prompts? A recently released multimodal version of ChatGPT, GPT-4 with Vision (GPT-4V) or GPT-4V(ision), comes with image and voice capabilities.

GPT-4 with Vision can extract text from images, which is useful for scraping data from websites that display information in images instead of HTML. Even though this cannot be fully interpreted as traditional web scraping, it still plays a role in data extraction.

If you want to know more about web scraping using ChatGPT, you can refer to our articles. Now let’s explore GPT-4V’s image web scraping capabilities through this article, taking some examples. Let’s begin.

Note: ChatGPT with Vision is available only to Plus subscribers.

Example 1: Web Scraping Amazon Product Listings

First, let’s see how web scraping using GPT-4V works. For that, let’s use an e-commerce site, say Amazon, to extract all the product details.

  1. Take a screenshot of the Amazon product page.
    Screenshot of Amazon product page for ChatGPT web scraping.
  2. Upload the screenshot in the chat box. Give a prompt to collect all the product data and store it in a table.
    Using GPT-4V for web scraping product data giving proper prompts.
  3. Using GPT-4 with Vision for web scraping produces the result in a tabular format as per the prompt.
    The final results given by ChatGPT after web scraping.

Amazon Product Details and Pricing Scraper: An Alternative Solution

Using ScrapeHero Cloud can be a better way of web scraping. Here, you don’t have to spend your time taking multiple screenshots and uploading them to get the data you require. Instead, just provide web page URLs and gather data. That’s it.

Our prebuilt scrapers allow you to configure your needs and fetch the data without any dedicated teams or hosting infrastructure. All you need to do is create an account where you get 25 credits for free upon signing up, unlike GPT-4V, which is paid. ScrapeHero Cloud makes web scraping hassle-free without any coding involved on your part.

You can try ScrapeHero Amazon product and pricing scraper, which can replace GPT-4 with Vision for web scraping Amazon product details. It can do the same task of fetching all the product information, pricing, FBA, and best seller rank from Amazon within seconds.

Example 2: Web Scraping Google Reviews

Next, let’s scrape Google Reviews. Let’s just display the reviews in the chat box without storing them in any files.

  1. Open any Google Review page of your choice, and then take a screenshot.
    Screenshot of Google reviews page for ChatGPT web scraping.
  2. Give a proper prompt to extract the data after uploading the screenshot.
    Using GPT-4V for web scraping Google reviews giving proper prompts.
  3. You can see that the reviews are extracted from the screenshot.
    The final results given by ChatGPT after web scraping.

Google Reviews Scraper: An Alternative Solution

We have an alternative to GPT-4 with Vision for web scraping that can save you time and effort. Taking a number of screenshots every time you need to collect details is not a possible solution. This is where the ScrapeHero Cloud comes in handy. With the URLs provided, you can scrape all the information you need from the web pages.

You don’t require specialized teams or hosting infrastructure to configure our prebuilt scrapers to meet your demands and retrieve the data. Unlike GPT-4V, which requires payment, all you have to do is register an account to receive 25 credits at the time of signup. Without requiring you to know any code, ScrapeHero Cloud makes web scraping simple.

Try ScrapeHero Google Reviews Scraper, which can do a better job of providing you with Google reviews for places and businesses, gathering information such as business name, address, reviews, ratings, and images.

Example 3: Web Scraping Glassdoor Listings

The next example that you can try out using GPT-4V for web scraping is Glassdoor listings. This time, let’s extract the details into an Excel sheet.

  1. First, take a screenshot of the Glassdoor listing page that you want to scrape.
    Screenshot of Glassdoor listings page for ChatGPT web scraping.
  2. Give a prompt to extract job details and save them into an Excel sheet.
    Using GPT-4V for web scraping Glassdoor listings giving proper prompts.
  3. Download the Excel file where the results are stored.
    The final results given by ChatGPT after web scraping Glassdoor

Glassdoor Job Listings Scraper: An Alternative Solution

A lot of human effort and time are needed to take screenshots of web pages and then upload them to GPT-4V. Also, scraping data from images is not always a practical way to acquire the required data, especially for massive-scale scraping. So as an alternative to GPT-4V, for web scraping, you can use ScrapeHero Cloud.

Instead of having to pay for GPT-4V, you can sign up for our pre-built scrapers for web scraping. With 25 credits for free on signing up, you can customize your needs and retrieve the data you need without any specialized teams or hosting infrastructure. An additional benefit is that you don’t even need to know coding to use ScrapeHero Cloud.

In order to scrape job listings from Glassdoor, you can make use of  ScrapeHero Glassdoor Job Listings Scraper, which can provide you with much information, such as job title, salary, job description, location, company name, number of reviews, and ratings from multiple Glassdoor domains.

Limitations of Using GPT-4V for Web Scraping

Using GPT-4V for web scraping has its own limitations. The main challenges are that it can be operated only in a limited context, and that too in a limited scalability.

1. Limited Context

GPT-4V can only scrape data that is visible in the image, unlike traditional web scraping, where data is extracted from HTML. It is unable to read any missing text, mathematical symbols, or characters.

It cannot recognize spatial locations as well as colors. To process a full web page, it is required to split into multiple smaller images, which is not a possible solution.

2. Limited Scalability

GPT-4V is a good option for beginners or non-coders doing small-scale web scraping. But for massive-scale or enterprise-grade web scraping services, which involves millions of web pages, it’s definitely not a perfect solution.

You can also check out the ScrapeHero data extraction services guide and checklist if you need to learn more about our data extraction services apart from our web scraping services.

Wrapping Up

There are other methods of web scraping using ChatGPT, web scraping with ChatGPT scraper plugin and code interpreter. But here we have utilized GPT-4 with Vision for web scraping since it comes with multimodality. It needs images for web scraping. But for bigger tasks, it is impossible to take huge amounts of screenshots and then scrape the data.

ScrapeHero Cloud can be an affordable and better alternative to GPT-4V for your web scraping needs, as mentioned earlier. We are also a fully managed enterprise-grade web scraping service provider specializing in custom solutions.

We can help with your data or automation needs

Turn the Internet into meaningful, structured and usable data



Please DO NOT contact us for any help with our Tutorials and Code using this form or by calling us, instead please add a comment to the bottom of the tutorial page for help

Table of content

Scrape any website, any format, no sweat.

ScrapeHero is the real deal for enterprise-grade scraping.

Ready to turn the internet into meaningful and usable data?

Contact us to schedule a brief, introductory call with our experts and learn how we can assist your needs.

Continue Reading

NoSQL vs. SQL databases

Stuck Choosing a Database? Explore NoSQL vs. SQL Databases in Detail

Find out which SQL and NoSQL databases are best suited to store your scraped data.
Scrape JavaScript-Rich Websites

Upgrade Your Web Scraping Skills: Scrape JavaScript-Rich Websites

Learn all about scraping JavaScript-rich websites.
Web scraping with mechanicalsoup

Ditch Multiple Libraries by Web Scraping with MechanicalSoup

Learn how you can replace Python requests and BeautifulSoup with MechanicalSoup.
ScrapeHero Logo

Can we help you get some data?