Data and code accompanying the paper Politics and Virality in the Time of Twitter

Last update: Jul 02, 2022

Overview

Politics and Virality in the Time of Twitter

Data and code accompanying the paper Politics and Virality in the Time of Twitter.

In specific:

the code used for the training of our models (./code/finetune_models.py and ./code/finetune_multi_cv.py)
a Jupyter Notebook containing the major parts of our analysis (./code/analysis.ipynb)
the model that was selected and used for the sentiment analysis.
the manually annotated data used for training are shared (./data/annotation/).
the ids of tweets that were used in our analyis and control experiments (./data/main/ & ./data/control)
names, parties and handles of the MPs that were tracked (./data/mps_list.csv).

Annotated Data (./data/annotation/)

One folder for each language (English, Spanish, Greek).
In each directory there are three files:
1. *_900.csv contains the 900 tweets that annotators labelled individually (300 tweets each annotator).
2. *_tiebreak_100.csv contains the initial 100 tweets all annotators labelled. 'annotator_3' indicates the annotator that was used as a tiebreaker.
3. *_combined.csv contains all tweets labelled for the language.

Model

While we plan to upload all the models trained for our experiments to huggingface.co, currently only the main model used in our analysis can be currently be find at: https://drive.google.com/file/d/1_Ngmh-uHGWEbKHFpKmQ1DhVf6LtDTglx/view?usp=sharing

The model, 'xlm-roberta-sentiment-multilingual', is based on the implementation of 'cardiffnlp/twitter-xlm-roberta-base-sentiment' while being further finetuned on the annotated dataset.

Example usage

from transformers import AutoModelForSequenceClassification, pipeline
model = AutoModelForSequenceClassification.from_pretrained('./xlm-roberta-sentiment-multilingual/')
sentiment_analysis_task = pipeline("sentiment-analysis", model=model, tokenizer="cardiffnlp/twitter-xlm-roberta-base-sentiment")

sentiment_analysis_task('Today is a good day')
Out: [{'label': 'Positive', 'score': 0.978614866733551}]

Reference paper

For more details, please check the reference paper. If you use the data contained in this repository for your research, please cite the paper using the following bib entry:

@inproceedings{antypas2022politics,
  title={{Politics and Virality in the Time of Twitter: A Large-Scale Cross-Party Sentiment Analysis in Greece, Spain and United Kingdom}},
  author={Antypas, Dimosthenis and Preece, Alun and Camacho-Collados, Jose},
  booktitle={arXiv preprint arXiv:2202.00396},
  year={2022}
}

Data and code accompanying the paper Politics and Virality in the Time of Twitter

Related tags

Overview

Politics and Virality in the Time of Twitter

Annotated Data (./data/annotation/)

Model

Example usage

Reference paper

Owner

Cardiff NLP

Fit models to your data in Python with Sherpa.

💬 Python scripts to parse Messenger, Hangouts, WhatsApp and Telegram chat logs into DataFrames.

This module is used to create Convolutional AutoEncoders for Variational Data Assimilation

The repo for mlbtradetrees.com. Analyze any trade in baseball history!

Statistical package in Python based on Pandas

peptides.py is a pure-Python package to compute common descriptors for protein sequences

Python scripts aim to use a Random Forest machine learning algorithm to predict the water affinity of Metal-Organic Frameworks

MidTerm Project for the Data Analysis FT Bootcamp, Adam Tycner and Florent ZAHOUI

Python implementation of Principal Component Analysis

Reading streams of Twitter data, save them to Kafka, then process with Kafka Stream API and Spark Streaming

Recommendations from Cramer: On the show Mad-Money (CNBC) Jim Cramer picks stocks which he recommends to buy. We will use this data to build a portfolio

Orchest is a browser based IDE for Data Science.

Find exposed data in Azure with this public blob scanner

High Dimensional Portfolio Selection with Cardinality Constraints

Exploratory Data Analysis for Employee Retention Dataset

Exploring the Top ML and DL GitHub Repositories

PyPDC is a Python package for calculating asymptotic Partial Directed Coherence estimations for brain connectivity analysis.

Open source platform for Data Science Management automation

Pip install minimal-pandas-api-for-polars

Driver Analysis with Factors and Forests: An Automated Data Science Tool using Python