Before you get all kinds of ideas about what the topic of this article means - please look at the context - We are talking about Web Scraping here ! This post will talk about reason why not to do…
Accessing historical data from social media feeds can be quite useful in conducting sentiments analysis and understanding user behavior towards a particular event, product or statement. With the right infrastructure, advanced search result data can be scraped based on keywords or time frame, which makes this feature immensely useful for market research. This tutorial shows you how to use WebScraper.io Chrome Extension to scrape historical data from Twitter advanced search.
Install the Web Scraper Chrome Extension
We will use Web Scraper Extension to create and run the scraper. It is a great web scraping tool for extracting data from dynamic web pages. Using the extension, you can create a sitemap that shows you how the website should be traversed and what data should be extracted. With the sitemaps, you can easily navigate the site any way you want and the data can be later exported as a CSV.
Import Twitter Advanced Search Scraper
Right-click anywhere on a page, go to ‘inspect’ and the developer tools console will pop up. Click on the tab Web Scraper and go on to the ‘Create new sitemap’ button and click on the ‘Import sitemap’ option. Now paste the JSON (given in the gist link below) in the Sitemap JSON box.
Get the advanced search result URL from Twitter
Twitter Advanced Search lets you find historical tweets that you can filter based on parameters like Words, People and Dates.
In order to get the historical tweet data, use the advanced search in Twitter by going to this URL – https://twitter.com/search-advanced?lang=en and filter the data based on your needs. For now we will do a search for all tweets which has the text “tesla” and was made between October 1 to October 5, 2018.
Copy the search result URL. Our link looks like this https://twitter.com/search?l=&q=tesla%20since%3A2018-10-01%20until%3A2018-10-05&src=typd&lang=en
In the Web Scraper toolbar, click on the Sitemap button (which would have changed to sitemap ‘your sitemap name’ now) and select the “Edit metadata’ option and paste the URL of the twitter advanced search page.
Run the scraper
To start scraping, go to the Sitemap and click Scrape from the drop down. A new instance of chrome will launch, enabling the browser to scroll and automatically grab data. Once the scrape is complete, the browser would close by itself and send a notification when the scraping is completed.
Download the data
In order to download the scraped data, go to the Sitemap drop down > ‘Export as CSV’ > “Download Now” . A CSV file would soon be downloaded with all the scraped data.
We can help with your data or automation needs
Turn the Internet into meaningful, structured and usable data
Disclaimer: Any code provided in our tutorials is for illustration and learning purposes only. We are not responsible for how it is used and assume no liability for any detrimental usage of the source code. The mere presence of this code on our site does not imply that we encourage scraping or scrape the websites referenced in the code and accompanying tutorial. The tutorials only help illustrate the technique of programming web scrapers for popular internet websites. We are not obligated to provide any support for the code, however, if you add your questions in the comments section, we may periodically address them.