How to scrape Twitter for Historical Tweet Data

Accessing historical data from social media feeds can be quite useful in conducting sentiments analysis and understanding user behavior towards a particular event, product or statement. With the right infrastructure, advanced search result data can be scraped based on keywords or time frame, which makes this feature immensely useful for market research. This tutorial shows you how to use WebScraper.io Chrome Extension to scrape historical data from Twitter advanced search.

Install the Web Scraper Chrome Extension

We will use Web Scraper Extension to create and run the scraper. It is a great web scraping tool for extracting data from dynamic web pages. Using the extension, you can create a sitemap that shows you how the website should be traversed and what data should be extracted. With the sitemaps, you can easily navigate the site any way you want and the data can be later exported as a CSV.

Please install the Web Scraper extension from the Chrome Web Store to get started.

Here is a step by step video to scrape historical tweets

Import Twitter Advanced Search Scraper

Right-click anywhere on a page, go to ‘inspect’ and the developer tools console will pop up. Click on the tab Web Scraper and go on to the ‘Create new sitemap’ button and click on the ‘Import sitemap’ option. Now paste the JSON (given in the gist link below) in the Sitemap JSON box. 

https://gist.github.com/scrapehero/d0305d8d15b0e447dcefdf548a9846e9           

Get the advanced search result URL from Twitter

Twitter Advanced Search lets you find historical tweets that you can filter based on parameters like Words, People and Dates.

In order to get the historical tweet data, use the advanced search in Twitter by going to this URL – https://twitter.com/search-advanced?lang=en and filter the data based on your needs. For now we will do a search for all tweets which has the text “tesla” and was made between October 1 to October 5, 2018.

Copy the search result URL. Our link looks like this  https://twitter.com/search?l=&q=tesla%20since%3A2018-10-01%20until%3A2018-10-05&src=typd&lang=en

In the Web Scraper toolbar, click on the Sitemap button (which would have changed to sitemap ‘your sitemap name’ now) and select the “Edit metadata’ option and paste the URL of the twitter advanced search page.

If you don't like or want to code, the ScrapeHero Cloud is just right for you!

Skip the hassle of installing software, programming and maintaining the code. Run this scraper in the ScrapeHero Cloud within seconds

Run this in the Cloud for FREE
Deploy to ScrapeHero Cloud

Run the scraper

To start scraping, go to the Sitemap and click Scrape from the drop down. A new instance of chrome will launch, enabling the browser to scroll and automatically grab data. Once the scrape is complete, the browser would close by itself and send a notification when the scraping is completed.

Download the data

In order to download the scraped data, go to the Sitemap drop down > ‘Export as CSV’ > “Download Now”. A CSV file would soon be downloaded with all the scraped data.

We can help with your data or automation needs

Turn the Internet into meaningful, structured and usable data


Please DO NOT contact us for any help with our Tutorials and Code using this form or by calling us, instead please add a comment to the bottom of the tutorial page for help

Disclaimer: Any code provided in our tutorials is for illustration and learning purposes only. We are not responsible for how it is used and assume no liability for any detrimental usage of the source code. The mere presence of this code on our site does not imply that we encourage scraping or scrape the websites referenced in the code and accompanying tutorial. The tutorials only help illustrate the technique of programming web scrapers for popular internet websites. We are not obligated to provide any support for the code, however, if you add your questions in the comments section, we may periodically address them.

Posted in:   Social Media Data Gathering, Web Scraping Tutorials

Responses

Meaghan April 3, 2019

What is the significance of the value used in the request interval field?

Reply

k T April 18, 2019

this use to work but no longer returns any results when the process has completed. Would you be able to assist?

Reply

    ScrapeHero April 19, 2019

    You must be getting blocked by Twitter. We just tested this again, and seems to be working fine. Would you mind sharing the link to the adavanced search results you used.

    Reply

      Kristian Pedersen April 23, 2019

      We are experiencing the same problem. Once the scraping tool have completed there is no data available and we cant generate a csv file. We are able to extract “top posts” for a given day, but only get around 90 observations.

      We are using this url “https://twitter.com/search?f=tweets&vertical=news&q=brexit%20since%3A2016-06-15%20until%3A2016-06-20&l=en&src=typd”

      Do you have any idea of why this may be?

      Reply

        ScrapeHero April 29, 2019

        Could you also try using the advanced search for the same keyword IFB and limiting it between a shorter date range.

        Reply

          K T April 29, 2019

          Older version worked with the URL previously provided for searches with the latest results.

          I tried the advanced search with the latest version of the extension, it worked. Seems that it came down to a user error at the end of the day.

          Thanks, ScrapeHero !! You guys rock


          ScrapeHero April 30, 2019

          Glad to be of help 😉


Kristian Pedersen April 23, 2019

We are experiencing the same issue. Once the scraping tool have completed, we don’t get any data. Have this become a common problem?

Reply

    ScrapeHero April 23, 2019

    Hey Kristian, This might be a problem with the latest version of the webscraper extension. We haven’t been able to reproduce the problem yet. Would you mind sharing the link to the advanced search results you used.

    Reply

    Marc Dunnink May 21, 2019

    Any ideas?

    Reply

      ScrapeHero May 22, 2019

      Please refer to the comment by K T on April 29, 2019 above. This seems to be a problem with the latest version of web scraper extension.

      Reply

        Marc Dunnink May 22, 2019

        Hi ScrapeHero

        Thanks for getting back to me.
        I had used the latest version of Web Scrapper a few months ago using the advanced twitter search, and worked really well. It seems twitter has changed its interface a bit since then though, the actual search platform is different and the search results differ somewhat. I’m wondering if this perhaps has something to do with it and maybe the JSON from github needs to me updated accordingly.
        In any case, thanks for the assistance, ill also try figure out how to get an older version of the scrapper and see if that works, still not sure how KT got the new version to work with the advanced search.

        cheers,
        M

        Reply

JR May 26, 2019

same here. could it be the amount of data? when I scrape for shorter timeframes and feeds with less content, I get the data, but if longer/more intense feeds, doesn’t work.

Reply

Jagos Radovic May 26, 2019

same problem

Reply

lauren ansell July 16, 2019

Once again having a problem retrieving the data after a scrap. Your example is performing the scrape (I can see the page automatically scrolling) but when I change to scrape the tweets I need, the new page is not even scrolling. But neither are collecting any data and the csv file is empty. Twitter have just launched an update, could this be the issue?

Reply

    ScrapeHero July 19, 2019

    Lauren,
    Please look at the error messages to see what’s going on.
    When we can, we will check the site and see if there is some change.

    Reply

Comments or Questions?

Turn the Internet into meaningful, structured and usable data   

Enjoying our Tutorials?

Subscribe to our weekly updates on the latest tutorials in Web Scraping and Data Extraction