Model that predicts the probability of a Twitter user being anti-vaccination.

Overview
<style>body {text-align: justify}</style>

AVAXTAR: Anti-VAXx Tweet AnalyzeR

AVAXTAR is a python package to identify anti-vaccine users on twitter. The model outputs complimentary probabilities for [not anti-vaccine, anti-vaccine]. AVAXTAR was trained on 100GB of autolabeled twitter data.

The model supports both Twitter API v1 and v2. To predict with v1, the user needs its consumer key, consumer secret, access token and access secret. The v2 requires only a bearer token, but it can only predict based on user id, not on screen name. Predicting from the v2 api using screen name is only possible if v1 keys are passed to the model.

The methodology behind the package is described in full at {placeholder}

Citation

To cite this paper, please use: {placeholder}

Installation

Attention: this package relies on a pre-trained embedding model from sent2vec, with a size of 5 GB. The model will be automatically downloaded when the package is first instanced on a python script, and will then be saved on the package directory for future usage.

  1. Clone this repo:
git clone https://github.com/Matheus-Schmitz/avaxtar.git
  1. Go to the repo's root:
cd avaxtar/
  1. Install with pip:
pip install .

Usage Example

For prediction, use:

model.predict_from_userid_api_v1(userid)

and:

model.predict_from_userid_api_v2(userid)

For example:

from avaxtar import Avaxtar

consumer_key = ''
consumer_secret = ''
access_token = ''
access_secret = ''
bearer_token = ''


if __name__ == "__main__":

	# Get the userid
	userid = ''

	# Predict
	model = Avaxtar.AvaxModel(consumer_key, consumer_secret, access_token, access_secret, bearer_token)
	pred_proba = model.predict_from_userid_api_v1(userid)

	# Results
	print(f'User: {userid}')
	print(f'Class Probabilities: {pred_proba}')

Package Details

The AVAXTAR classifier is trained on a comprehensive labeled dataset that contains historical tweets of approximately 130K Twitter accounts. Each account from the dataset was assigned one out of two labels: positive for the accounts that actively spread anti-vaccination narrative \~70K and negative for the accounts that do not spread anti vaccination narrative \~60K.

Collecting positive samples: Positive samples are gathered through a snowball method to identify a set of hashtags and keywords associated with the anti-vaccination movement, and then queried the Twitter API and collected the historical tweets of accounts that used any of the identified keywords.

Collecting negative samples: To collect the negative samples, we first performed a mirror approach the positive samples and queried the Twitter API to get historical tweets of accounts that do not use any of the predefined keywords and hashtags. We then enlarge the number of negative samples, by gathering the tweets from accounts that are likely proponents of the vaccination. We identify the pro-ponents of the vaccines in the following way: First, we identify the set of twenty most prominent doctors and health experts active on Twitter. Then collected the covid-related Lists those health experts made on Twitter. From those lists, we collected approximately one thousand Twitter handles of prominent experts and doctors who tweet about the coronavirus and the pandemic. In the next step, we go through their latest 200 tweets and collected the Twitter handles of users who retweeted their tweets. That became our pool of pro-vaccine users. Finally, we collected the historical tweets of users from the pro-vaccine pool.

After model training, we identify the optimal classification threshold to be used, based on maximizing F1 score on the validation set. We find that a threshold of 0.5938 results in the best F1 Score, and thus recommend the usage of that threshold instead of the default threshold of 0.5. Using the optimized threshold, the resulting modelwas then evaluated on a test set of users, achieving the reasonable scores, as shown in the table below.

Metric Negative Class Positive Class
Accuracy 0.8680 0.8680
ROC-AUC 0.9270 0.9270
PRC-AUC 0.8427 0.9677
Precision 0.8675 0.8675
Recall 0.8680 0.8680
F1 0.8677 0.8678
👐OpenHands : Making Sign Language Recognition Accessible (WiP 🚧👷‍♂️🏗)

👐 OpenHands: Sign Language Recognition Library Making Sign Language Recognition Accessible Check the documentation on how to use the library: ReadThe

AI4Bhārat 69 Dec 12, 2022
A powerful framework for decentralized federated learning with user-defined communication topology

Scatterbrained Decentralized Federated Learning Scatterbrained makes it easy to build federated learning systems. In addition to traditional federated

Johns Hopkins Applied Physics Laboratory 7 Sep 26, 2022
Detecting Blurred Ground-based Sky/Cloud Images

Detecting Blurred Ground-based Sky/Cloud Images With the spirit of reproducible research, this repository contains all the codes required to produce t

1 Oct 20, 2021
Repository for training material for the 2022 SDSC HPC/CI User Training Course

hpc-training-2022 Repository for training material for the 2022 SDSC HPC/CI Training Series HPC/CI Training Series home https://www.sdsc.edu/event_ite

sdsc-hpc-training-org 21 Jul 27, 2022
The official implementation of paper Siamese Transformer Pyramid Networks for Real-Time UAV Tracking, accepted by WACV22

SiamTPN Introduction This is the official implementation of the SiamTPN (WACV2022). The tracker intergrates pyramid feature network and transformer in

Robotics and Intelligent Systems Control @ NYUAD 29 Jan 08, 2023
MERLOT: Multimodal Neural Script Knowledge Models

merlot MERLOT: Multimodal Neural Script Knowledge Models MERLOT is a model for learning what we are calling "neural script knowledge" -- representatio

Rowan Zellers 190 Dec 22, 2022
Breaking Shortcut: Exploring Fully Convolutional Cycle-Consistency for Video Correspondence Learning

Breaking Shortcut: Exploring Fully Convolutional Cycle-Consistency for Video Correspondence Learning Yansong Tang *, Zhenyu Jiang *, Zhenda Xie *, Yue

Zhenyu Jiang 12 Nov 16, 2022
"NAS-Bench-301 and the Case for Surrogate Benchmarks for Neural Architecture Search".

NAS-Bench-301 This repository containts code for the paper: "NAS-Bench-301 and the Case for Surrogate Benchmarks for Neural Architecture Search". The

AutoML-Freiburg-Hannover 57 Nov 30, 2022
A rule-based log analyzer & filter

Flog 一个根据规则集来处理文本日志的工具。 前言 在日常开发过程中,由于缺乏必要的日志规范,导致很多人乱打一通,一个日志文件夹解压缩后往往有几十万行。 日志泛滥会导致信息密度骤减,给排查问题带来了不小的麻烦。 以前都是用grep之类的工具先挑选出有用的,再逐条进行排查,费时费力。在忍无可忍之后决

上山打老虎 9 Jun 23, 2022
PyBullet CartPole and Quadrotor environments—with CasADi symbolic a priori dynamics—for learning-based control and reinforcement learning

safe-control-gym Physics-based CartPole and Quadrotor Gym environments (using PyBullet) with symbolic a priori dynamics (using CasADi) for learning-ba

Dynamic Systems Lab 300 Dec 28, 2022
Tensorflow implementation of Semi-supervised Sequence Learning (https://arxiv.org/abs/1511.01432)

Transfer Learning for Text Classification with Tensorflow Tensorflow implementation of Semi-supervised Sequence Learning(https://arxiv.org/abs/1511.01

DONGJUN LEE 82 Oct 22, 2022
A Tensorflow implementation of the Text Conditioned Auxiliary Classifier Generative Adversarial Network for Generating Images from text descriptions

A Tensorflow implementation of the Text Conditioned Auxiliary Classifier Generative Adversarial Network for Generating Images from text descriptions

Ayushman Dash 93 Aug 04, 2022
Özlem Taşkın 0 Feb 23, 2022
The official implementation for ACL 2021 "Challenges in Information Seeking QA: Unanswerable Questions and Paragraph Retrieval".

Code for "Challenges in Information Seeking QA: Unanswerable Questions and Paragraph Retrieval" (ACL 2021, Long) This is the repository for baseline m

Akari Asai 25 Oct 30, 2022
This repository for project that can Automate Number Plate Recognition (ANPR) in Morocco Licensed Vehicles. 💻 + 🚙 + 🇲🇦 = 🤖 🕵🏻‍♂️

MoroccoAI Data Challenge (Edition #001) This Reposotory is result of our work in the comepetiton organized by MoroccoAI in the context of the first Mo

SAFOINE EL KHABICH 14 Oct 31, 2022
Explaining Hyperparameter Optimization via PDPs

Explaining Hyperparameter Optimization via PDPs This repository gives access to an implementation of the methods presented in the paper submission “Ex

2 Nov 16, 2022
Code for the IJCAI 2021 paper "Structure Guided Lane Detection"

SGNet Project for the IJCAI 2021 paper "Structure Guided Lane Detection" Abstract Recently, lane detection has made great progress with the rapid deve

Jinming Su 27 Dec 08, 2022
Rank1 Conversation Emotion Detection Task

Rank1-Conversation_Emotion_Detection_Task accuracy macro-f1 recall 0.826 0.7544 0.719 基于预训练模型和时序预测模型的对话情感探测任务 1 摘要 针对对话情感探测任务,本文将其分为文本分类和时间序列预测两个子任务,分

Yuchen Han 2 Nov 28, 2021
Pytorch code for semantic segmentation using ERFNet

ERFNet (PyTorch version) This code is a toolbox that uses PyTorch for training and evaluating the ERFNet architecture for semantic segmentation. For t

Edu 394 Jan 01, 2023
GAN example for Keras. Cuz MNIST is too small and there should be something more realistic.

Keras-GAN-Animeface-Character GAN example for Keras. Cuz MNIST is too small and there should an example on something more realistic. Some results Trai

160 Sep 20, 2022