Twitter scraper 2024

+ Sentiment analysis

View the Code

Installation Guide

Get Full Twitter Dataset

Sentiment Analysis on Topics Scraped from Twitter

Sentiment analysis plotted over time stratified by queries.

+20,000 unique twitter posts were scraped with pre-selected search queries: "artificialintelligence", "politics", and "celebrities". Data is continuously being updated, and the graph will continue changing over time with additional queries being added.

Sentiment analysis was conducted using Natural Language Processing (NLP) Python library Textblob, with a moderate prediction accuracy at around 45%. Twitter post text was extracted, and fed into the NLP, which generated post-specific sentiment values ranging from -1 (most negative) to +1 (most positive). This line graph plots the time series of mean artificial intelligence sentiment, along with other query sentiments, with shaded areas representing standard deviation.

All code to scrape twitter and create time-series sentiment plots is open-source and available for use via GitHub (see earlier).

Sample data scraped using our bot

Installation and Usage Guide

Installation + Configuration

Downloading the scripts
  • Access the GitHub link at the top of the page (View the code)
  • Download the folder from GitHub manually or by using $ git pull https://github.com/MegaBytten/Twitter-scraper-python.git
  • macosinit.sh
  • This file quickly helps you set up the scripts, prompting you for configuration settings and twitter bot
  • All details you enter are stored in the local config.cfg file
  • To execute the macosinit.sh, run the command $ chmod +x macosinit to make it executable, followed by the command $ ./macosinit.sh
  • the script also downloads all required python dependencies, such as pandas, playwright, and NLP-libraries
  • Non-MacOS platforms
  • Currently, there is no bash script to automate set up for windows/linux.
  • the twitter scrape python scripts require a config.cfg file, compatible with the configparser library's syntax, containing username and password details of your bot
  • What if I want to run the script with no Twitter accoutns?
  • Unfortunately, X.com (Twitter) has hidden their content behind log in portals
  • This means twitter accounts are required to scrape data.
  • What if I don't want to enter my bot username/password?
  • The macosinit.sh bash script takes your username/password and writes it directly to the config.cfg
  • This means if you want to ensure no details are being shared, you canL
    • Create your own config.cfg and enter the username/password directly, following the configparser library sytnax
    • Manually enter the details into the python script, and prevent the script from searching for the config.cfg

Running the Scripts

twitterscraper.py
  • This is the first of two scraping python scripts
  • This script iterates through a given list of queries sequentially
  • Firstly, it is more likely to be detected as a bot, and therefore be throttled by twitter for reduced content
  • Secondly, as it runs queries in sequence and cannot be distributed, it runs signficantly slower.
distributedtwitterscraper.py
  • This is the latest, faster version of twitterscraper.py
  • It has been configured to be run from multiple terminals sequentially, focusing on one query at a time.
  • It is compatible with multiple different twitter bots, for reduced bot-footprint, and decreased throttling
  • Therefore, it is recommended to run this version on multiple terminal instances, with separate bot accounts for fastest and optimal scraping.