Excel is mainly used to log data or perform calculations. But is web scraping with Excel a possibility? Yes. Many users are unaware of its web scraping capabilities, which allow them to extract data from websites to Excel.
So how can we efficiently scrape data from websites to Excel? Read this article discussing the web scraping feature of Excel and the workings of Web Queries.
Initial Requirements for Web Scraping in Excel
Before beginning, make sure that you meet the following requirements for smooth web scraping with Excel.
- Download and install Microsoft Office to use Microsoft Excel.
- An active internet connection.
- A website URL where there’s a static data table on the page.
Step-by-Step Process of Creating an Excel Web Query
You might be wondering how to extract data from websites to Excel automatically. This can be done by using the Web Query feature in Excel.
The Microsoft Excel Web Query feature allows fetching data from a web page’s table directly into an Excel worksheet with no manual copy-pasting. Web Query supports formats such as HTML and XML, depending on the Excel version used. It can even refresh the Excel reports automatically from the original source database when there is an update in the database.
Microsoft Web Query makes use of the web browser of your operating system for loading the website. In older Windows, the browser will be Internet Explorer, and in newer ones, it will be Edge. This way, Web Query can render JavaScript-heavy websites with ease.
When the loading is complete, the Web Query automatically parses the whole page, finding and highlighting all the relevant static HTML tables. The user can choose the desired table, after which the Web Query starts extracting the table data.
To automatically pull data from the website to Excel:
- Select the Data menu on your Microsoft Excel sheet to begin web scraping with Excel.
- Select Get Data→ From Other Sources→ From Web
- The Web Query window appears. You can now paste the website URL from which data should be extracted. Select OK.
- Excel will pull the web content into the sheet. You can select a table that is displayed on the left side of the Navigator window for a preview of the table in the right side pane.
- If the table preview displays the data you need to scrape, then click on the Load button which is visible at the bottom of the window.
- You can now see the extracted data appearing in the cells of the Excel spreadsheet.
This is how you successfully scrape data from websites to Excel using the Excel Web Query.
Now, let’s discuss some advanced Web Query options that will help you with web scraping in Excel.
Advanced Excel Web Query Options
Web scraping with Excel using Web Queries is an easy method that helps you to pull data from websites to Excel.
Apart from the method listed above, you also get additional options for doing web scraping in Excel. For example, if you need your web data to be imported to another location in the Excel spreadsheet, you can choose the Load To option from the drop-down arrow next to the Load button.
An Import Data window will open now, and you can have options here for formatting and placing the data where you like on the sheet. Some options available are Standard Table, PivotTable Report, PivotChart, etc.
If you want to place the data in a different cell other than the default A1 cell, then you have another option.
Selecting the Advanced option will provide you with more choices than the Basic option on the URL page, as you will be able to provide advanced URL information.
The Advanced option is useful specifically when the web page from which the data is loaded has parameters (usually following the “?” character) or particular HTTP header requests.
With the advanced query features in Excel, scraping data from web pages that do not have a static URL is possible.
Customizing the Excel Web Query
An Excel Web Query can be customized by accessing the Web Query Editor. Right-click any result cell, select Table, and then further select Edit Query. The Web Query Editor window opens.
Alternatively, you can also select a result cell and choose the small sheet icon, which is visible to the right of the table in the Queries & Connections box. When a preview of the table data is open, select the Edit button.
You are able to deeply customize how the Web Query can pull data from websites to Excel using the Web Query Editor window.
The Web Query Editor window provides many other options for doing web scraping with Excel. This includes source URL modification, modifying Excel’s data transformations before placing them into the sheet, choosing the source table from which to extract the data, etc.
Refreshing Tables Manually and Automatically
While web scraping in Excel, you might have seen some options that can help in extracting the required data into the Excel sheets from remote web page tables.
You will be able to manually refresh by choosing the sheet with the table and then clicking the Data menu. Later, select the Refresh All option, which refreshes all the tables in the sheet that retrieve data from an external source.
You can also schedule the refreshes that are automatically done. For this, select the drop-down arrow of the Refresh All button and select Connection Properties.
Now the Query Properties window will be open. Select the Usage tab, and then enable the refresh time that you need.
Note: For web scraping in Excel, the automatic refresh works only when the Excel file is opened.
Drawbacks of Using Excel Web Query for Web Scraping
Web Query, a built-in feature of Microsoft Excel, is a great option to scrape data from websites to Excel. But this method has its own drawbacks when extracting web page data from Excel and these include:
- Scraping data from dynamic web pages or webpages with complex HTML structures is not possible with Web Queries.
- Since Web Queries rely on the web page HTML structure, if there are any changes, it may fail to extract the correct data.
- Extracting large quantities of data is not possible Using Excel Web Queries.
- Features such as Pagination, Infinite Scrolling, Deduplication, etc. cannot be used when web scraping in Excel is carried out.
Wrapping Up
Excel allows data extraction for websites with tables with its powerful Web Query feature. Simple data extraction tasks can be done with minimal interaction. Using Web Queries, you can also scrape data from websites with JavaScript.
Web scraping in Excel using Web Queries is not an apt method for enterprises that need to do web scraping on a large scale. Also, custom, sophisticated web scrapers cannot be developed with Web Query. In such cases, web scraping with Python or JavaScript will be an option.
ScrapeHero can assist you in overcoming the drawbacks of web scraping in Excel and automating your web scraping needs with the required scrapers. Try our premium web scraping tools, Homes.com Scraper, Realtor Scraper, Redfin Scraper, Trulia Scraper, and Zillow Scraper, to scrape data in a few clicks. Let’s connect if what you need is an enterprise-grade, large-scale scraping service.
We can help with your data or automation needs
Turn the Internet into meaningful, structured and usable data