How to scrape Twitter for Historical Tweet Data

Accessing historical data from social media feeds can be quite useful in conducting sentiments analysis and understanding user behavior towards a particular event, product or statement. With the right infrastructure, advanced search result data can be scraped based on keywords or time frame, which makes this feature immensely useful for market research. This tutorial shows you how to use WebScraper.io Chrome Extension to scrape historical data from Twitter advanced search.

How to Scrape Historical Data from Twitter

  1. Install the Web Scraper Chrome Extension
  2. Import the Twitter advanced search scraper
  3. Get the advanced search URL from Twitter
  4. Run the scraper
  5. Download data

If you don't like or want to code, ScrapeHero Cloud is just right for you!

Skip the hassle of installing software, programming and maintaining the code. Download this data using ScrapeHero cloud within seconds.

Get Started for Free
Deploy to ScrapeHero Cloud

The crawler scrapes the data without logging in, so the actual number of pages crawled might differ in ScrapeHero Cloud.

Install the Web Scraper Chrome Extension

We will use Web Scraper Extension to create and run the scraper. It is a great web scraping tool for extracting data from dynamic web pages. Using the extension, you can create a sitemap that shows you how the website should be traversed and what data should be extracted. With the sitemaps, you can easily navigate the site any way you want and the data can be later exported as a CSV.

Please install the Web Scraper extension from the Chrome Web Store to get started.

Here is a step by step video to scrape historical tweets

Import Twitter Advanced Search Scraper

Right-click anywhere on a page, go to ‘inspect’ and the developer tools console will pop up. Click on the tab Web Scraper and go on to the ‘Create new sitemap’ button and click on the ‘Import sitemap’ option. Now paste the JSON (given in the gist link below) in the Sitemap JSON box. 

https://gist.github.com/scrapehero/d0305d8d15b0e447dcefdf548a9846e9 

If you don't like or want to code, ScrapeHero Cloud is just right for you!

Skip the hassle of installing software, programming and maintaining the code. Download this data using ScrapeHero cloud within seconds.

Get Started for Free
Deploy to ScrapeHero Cloud

The crawler scrapes the data without logging in, so the actual number of pages crawled might differ in ScrapeHero Cloud.

Get the advanced search result URL from Twitter

Twitter Advanced Search lets you find historical tweets that you can filter based on parameters like Words, People and Dates.

In order to get the historical tweet data, use the advanced search in Twitter by going to this URL – https://twitter.com/search-advanced?lang=en and filter the data based on your needs. For now we will do a search for all tweets which has the text “tesla” and was made between October 1 to October 5, 2018.

Copy the search result URL. Our link looks like this  https://twitter.com/search?l=&q=tesla%20since%3A2018-10-01%20until%3A2018-10-05&src=typd&lang=en

In the Web Scraper toolbar, click on the Sitemap button (which would have changed to sitemap ‘your sitemap name’ now) and select the “Edit metadata’ option and paste the URL of the twitter advanced search page.

Run the scraper

To start scraping, go to the Sitemap and click Scrape from the drop down. A new instance of chrome will launch, enabling the browser to scroll and automatically grab data. Once the scrape is complete, the browser would close by itself and send a notification when the scraping is completed.

Download the data

In order to download the scraped data, go to the Sitemap drop down > ‘Export as CSV’ > “Download Now”. A CSV file would soon be downloaded with all the scraped data.

We can help with your data or automation needs

Turn the Internet into meaningful, structured and usable data



Please DO NOT contact us for any help with our Tutorials and Code using this form or by calling us, instead please add a comment to the bottom of the tutorial page for help


Disclaimer: Any code provided in our tutorials is for illustration and learning purposes only. We are not responsible for how it is used and assume no liability for any detrimental usage of the source code. The mere presence of this code on our site does not imply that we encourage scraping or scrape the websites referenced in the code and accompanying tutorial. The tutorials only help illustrate the technique of programming web scrapers for popular internet websites. We are not obligated to provide any support for the code, however, if you add your questions in the comments section, we may periodically address them.

Posted in:   Social Media Data Gathering, Web Scraping Tutorials

Responses

Lauren H November 20, 2019

Hello, i have attempted to scrape twitter data over the period of 9 months but only ended up extracting 100 tweets from one day, why might this be? thanks

Reply

Comments or Questions?

Turn the Internet into meaningful, structured and usable data