How to scrape Historical Data from Twitter

Accessing historical data from social media feeds can be quite useful in conducting sentiments analysis and understanding user behavior towards a particular event, product or statement. With the right infrastructure, advanced search result data can be scraped based on keywords or time frame, which makes this feature immensely useful for market research. This tutorial shows you how to use WebScraper.io Chrome Extension to scrape historical data from Twitter advanced search.

Install the Web Scraper Chrome Extension

We will use Web Scraper Extension to create and run the scraper. It is a great web scraping tool for extracting data from dynamic web pages. Using the extension, you can create a sitemap that shows you how the website should be traversed and what data should be extracted. With the sitemaps, you can easily navigate the site any way you want and the data can be later exported as a CSV.

Please install the Web Scraper extension from the Chrome Web Store to get started.

Import Twitter Advanced Search Scraper

Right-click anywhere on a page, go to ‘inspect’ and the developer tools console will pop up. Click on the tab Web Scraper and go on to the ‘Create new sitemap’ button and click on the ‘Import sitemap’ option. Now paste the JSON (given in the gist link below) in the Sitemap JSON box. 

https://gist.github.com/scrapehero/d0305d8d15b0e447dcefdf548a9846e9           

opening-web-scraper-io-importing-sitemap

 

Get the advanced search result URL from Twitter

Twitter Advanced Search lets you find historical tweets that you can filter based on parameters like Words, People and Dates.

In order to get the historical tweet data, use the advanced search in Twitter by going to this URL – https://twitter.com/search-advanced?lang=en and filter the data based on your needs. For now we will do a search for all tweets which has the text “tesla” and was made between October 1 to October 5, 2018.

Copy the search result URL. Our link looks like this  https://twitter.com/search?l=&q=tesla%20since%3A2018-10-01%20until%3A2018-10-05&src=typd&lang=en

In the Web Scraper toolbar, click on the Sitemap button (which would have changed to sitemap ‘your sitemap name’ now) and select the “Edit metadata’ option and paste the URL of the twitter advanced search page.

getting-advanced-search-url

Run the scraper

running-scraper

To start scraping, go to the Sitemap and click Scrape from the drop down. A new instance of chrome will launch, enabling the browser to scroll and automatically grab data. Once the scrape is complete, the browser would close by itself and send a notification when the scraping is completed.

Download the data

In order to download the scraped data, go to the Sitemap drop down > ‘Export as CSV’ > “Download Now” . A CSV file would soon be downloaded with all the scraped data.

export-scraper-data

We can help with your data or automation needs

Turn the Internet into meaningful, structured and usable data

Disclaimer: Any code provided in our tutorials is for illustration and learning purposes only. We are not responsible for how it is used and assume no liability for any detrimental usage of the source code. The mere presence of this code on our site does not imply that we encourage scraping or scrape the websites referenced in the code and accompanying tutorial. The tutorials only help illustrate the technique of programming web scrapers for popular internet websites. We are not obligated to provide any support for the code, however, if you add your questions in the comments section, we may periodically address them.

Comments or Questions?

Turn the Internet into meaningful, structured and usable data