Twitter scraper 2024

+ Sentiment analysis

View the Code

Installation Guide

Get Full Twitter Dataset

Sentiment Analysis on Topics Scraped from Twitter

Sentiment analysis plotted over time stratified by queries.

+20,000 unique twitter posts were scraped with pre-selected search queries: "artificialintelligence", "politics", and "celebrities". Data is continuously being updated, and the graph will continue changing over time with additional queries being added.

Sentiment analysis was conducted using Natural Language Processing (NLP) Python library Textblob, with a moderate prediction accuracy at around 45%. Twitter post text was extracted, and fed into the NLP, which generated post-specific sentiment values ranging from -1 (most negative) to +1 (most positive). This line graph plots the time series of mean artificial intelligence sentiment, along with other query sentiments, with shaded areas representing standard deviation.

All code to scrape twitter and create time-series sentiment plots is open-source and available for use via GitHub (see earlier).

Sample data scraped using our bot

Installation and Usage Guide

Installation + Configuration

Downloading the scripts

Access the GitHub link at the top of the page (View the code)
Download the folder from GitHub manually or by using $ git pull https://github.com/MegaBytten/Twitter-scraper-python.git

macosinit.sh

This file quickly helps you set up the scripts, prompting you for configuration settings and twitter bot
All details you enter are stored in the local config.cfg file
To execute the macosinit.sh, run the command $ chmod +x macosinit to make it executable, followed by the command $ ./macosinit.sh
the script also downloads all required python dependencies, such as pandas, playwright, and NLP-libraries

Non-MacOS platforms

Currently, there is no bash script to automate set up for windows/linux.
the twitter scrape python scripts require a config.cfg file, compatible with the configparser library's syntax, containing username and password details of your bot

What if I want to run the script with no Twitter accoutns?

Unfortunately, X.com (Twitter) has hidden their content behind log in portals
This means twitter accounts are required to scrape data.

What if I don't want to enter my bot username/password?

The macosinit.sh bash script takes your username/password and writes it directly to the config.cfg
This means if you want to ensure no details are being shared, you canL

Create your own config.cfg and enter the username/password directly, following the configparser library sytnax
Manually enter the details into the python script, and prevent the script from searching for the config.cfg

Running the Scripts

twitterscraper.py

This is the first of two scraping python scripts
This script iterates through a given list of queries sequentially
Firstly, it is more likely to be detected as a bot, and therefore be throttled by twitter for reduced content
Secondly, as it runs queries in sequence and cannot be distributed, it runs signficantly slower.

distributedtwitterscraper.py

This is the latest, faster version of twitterscraper.py

It has been configured to be run from multiple terminals sequentially, focusing on one query at a time.

It is compatible with multiple different twitter bots, for reduced bot-footprint, and decreased throttling

Therefore, it is recommended to run this version on multiple terminal instances, with separate bot accounts for fastest and optimal scraping.