How to scrape Twitter for Historical Tweet Data

Accessing historical data from social media feeds can be quite useful in conducting sentiments analysis and understanding user behavior towards a particular event, product or statement. With the right infrastructure, advanced search result data can be scraped based on keywords or time frame, which makes this feature immensely useful for market research. This tutorial shows you how to use WebScraper.io Chrome Extension to scrape historical data from Twitter advanced search.

Install the Web Scraper Chrome Extension

We will use Web Scraper Extension to create and run the scraper. It is a great web scraping tool for extracting data from dynamic web pages. Using the extension, you can create a sitemap that shows you how the website should be traversed and what data should be extracted. With the sitemaps, you can easily navigate the site any way you want and the data can be later exported as a CSV.

Please install the Web Scraper extension from the Chrome Web Store to get started.

Import Twitter Advanced Search Scraper

Right-click anywhere on a page, go to ‘inspect’ and the developer tools console will pop up. Click on the tab Web Scraper and go on to the ‘Create new sitemap’ button and click on the ‘Import sitemap’ option. Now paste the JSON (given in the gist link below) in the Sitemap JSON box. 

https://gist.github.com/scrapehero/d0305d8d15b0e447dcefdf548a9846e9           

opening-web-scraper-io-importing-sitemap

 

Get the advanced search result URL from Twitter

Twitter Advanced Search lets you find historical tweets that you can filter based on parameters like Words, People and Dates.

In order to get the historical tweet data, use the advanced search in Twitter by going to this URL – https://twitter.com/search-advanced?lang=en and filter the data based on your needs. For now we will do a search for all tweets which has the text “tesla” and was made between October 1 to October 5, 2018.

Copy the search result URL. Our link looks like this  https://twitter.com/search?l=&q=tesla%20since%3A2018-10-01%20until%3A2018-10-05&src=typd&lang=en

In the Web Scraper toolbar, click on the Sitemap button (which would have changed to sitemap ‘your sitemap name’ now) and select the “Edit metadata’ option and paste the URL of the twitter advanced search page.

getting-advanced-search-url

Run the scraper

running-scraper

To start scraping, go to the Sitemap and click Scrape from the drop down. A new instance of chrome will launch, enabling the browser to scroll and automatically grab data. Once the scrape is complete, the browser would close by itself and send a notification when the scraping is completed.

ScrapeHero Cloud

Try ScrapeHero Coud to get tweets from Twitter – pass the search URLs from the Advanced Twitter Search and download the data in JSON, CSV and XML formats in just a few clicks.

Check out our free trial which has a 100 page credit limit. Click here for crawler documentaion – https://cloud.scrapehero.com/marketplace/twitter-advanced-search/

Download the data

In order to download the scraped data, go to the Sitemap drop down > ‘Export as CSV’ > “Download Now”. A CSV file would soon be downloaded with all the scraped data.

We can help with your data or automation needs

Turn the Internet into meaningful, structured and usable data


Please DO NOT contact us for any help with our Tutorials and Code using this form or by calling us, instead please add a comment to the bottom of the tutorial page for help

Disclaimer: Any code provided in our tutorials is for illustration and learning purposes only. We are not responsible for how it is used and assume no liability for any detrimental usage of the source code. The mere presence of this code on our site does not imply that we encourage scraping or scrape the websites referenced in the code and accompanying tutorial. The tutorials only help illustrate the technique of programming web scrapers for popular internet websites. We are not obligated to provide any support for the code, however, if you add your questions in the comments section, we may periodically address them.

Posted in:   Social Media Data Gathering, Web Scraping Tutorials

Responses

Meaghan April 3, 2019

What is the significance of the value used in the request interval field?

Reply

k T April 18, 2019

this use to work but no longer returns any results when the process has completed. Would you be able to assist?

Reply

    ScrapeHero April 19, 2019

    You must be getting blocked by Twitter. We just tested this again, and seems to be working fine. Would you mind sharing the link to the adavanced search results you used.

    Reply

      Kristian Pedersen April 23, 2019

      We are experiencing the same problem. Once the scraping tool have completed there is no data available and we cant generate a csv file. We are able to extract “top posts” for a given day, but only get around 90 observations.

      We are using this url “https://twitter.com/search?f=tweets&vertical=news&q=brexit%20since%3A2016-06-15%20until%3A2016-06-20&l=en&src=typd”

      Do you have any idea of why this may be?

      Reply

Kristian Pedersen April 23, 2019

We are experiencing the same issue. Once the scraping tool have completed, we don’t get any data. Have this become a common problem?

Reply

    ScrapeHero April 23, 2019

    Hey Kristian, This might be a problem with the latest version of the webscraper extension. We haven’t been able to reproduce the problem yet. Would you mind sharing the link to the advanced search results you used.

    Reply

Comments or Questions?

Turn the Internet into meaningful, structured and usable data   

Enjoying our Tutorials?

Subscribe to our weekly updates on the latest tutorials in Web Scraping and Data Extraction

ScrapeHero Logo

Can we help you get some data?