Package for extracting emotions from social media text. Tailored for financial data.

Overview

EmTract: Extracting Emotions from Social Media Text Tailored for Financial Contexts

EmTract is a tool that extracts emotions from social media text. It incorporates key aspects of social media data (e.g., non-standard phrases, emojis and emoticons), and uses cutting edge natural language processing (NLP) techniques to learn latent representations, such as word order, word usage, and local context, to predict the emotions.

Details on the model and text processing are in the appendix of EmTract: Investor Emotions and Market Behavior.

User Guide

Installation

Before being able to use the package python3 must be installed. We also recommend using a virtual environment so that the tool runs with the same dependencies with which it was developed. Instruction on how to set up a virtual environment can be found here.

Once basic requirements are setup, follow these instructions:

  1. Clone the repository: git clone https://github.com/dvamossy/EmTract.git
  2. Navigate into repository: cd EmTract
  3. (Optional) Create and activate virtual environment:
    python3 -m venv venv
    source venv/bin/activate
    
  4. Run ./install.sh. This will install python requirements and also download our model files

Usage

Our package should be run with the following command:

python3 -m emtract.inference [args]

Where args are the following:

  • --model_type: can be twitter or stocktwits. Default is stocktwits
  • --interactive: Run in interactive mode
  • --input_file/-i: input to use for predictions (only for non interactive mode)
  • --output_file/-o: output location for predictions(only for non interactive mode)

Output

For each input (i.e., text), EmTract outputs probabilities (they sum to 1!) corresponding to seven emotional states: neutral, happy, sad, anger, disgust, surprise, fear. It also labels the text by computing the argmax of the probabilities.

Modes

Our tool can be run in 2 execution modes.

Interactive mode allows the user to input a tweet and evaluate it in real time. This is great for exploratory analysis.

python3 -m emtract.inference --interactive

The other mode is intended for automating predictions. Here an input file must be specified that will be used as the prediction input. This file must be a csv or text file with 1 column. This column should have the messages/text to predict with.

python3 -m emtract.inference -i tweets_example.csv -o predictions.csv

Model Types

Our models leverage GloVe Embeddings with Bidirectional GRU architecture.

We trained our emotion models with 2 different data sources. One from Twitter, and another from StockTwits. The Twitter training data comes from here; it is available at data/twitter_emotion.csv. The StockTwits training data is explained in the paper.

One of the key concerns using emotion packages is that it is unknown how well they transfer to financial text data. We alleviate this concern by hand-tagging 10,000 StockTwits messages. These are available at data/hand_tagged_sample.parquet.snappy; they were not included during training any of our models. We use this for testing model performance, and alternative emotion packages (notebooks/Alternative Packages.ipynb).

We found our StockTwits model to perform best on the hand-tagged sample, and therefore it is used as the default for predictions.

Alternative Models

We also have an implementation of DistilBERT in notebooks/Alternative Models.ipynb on the Twitter data; which can be easily extended to any other state-of-the-art models. We find marginal performance gains on the hand-tagged sample, which comes at the cost of far slower inference.

Citation

If you use EmTract in your research, please cite us as follows:

Domonkos Vamossy and Rolf Skog. EmTract: Investor Emotions and Market Behavior https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3975884, 2021.

Contributing and Feedback

This project welcomes contributions and suggestions.

Our goal is to provide a unified framework for extracting emotions from financial social media text. Particularly useful for research on emotions in financial contexts would be labeling financial social media text. We plan to upload sample text upon request.

The implementation of the lifelong infinite mixture model

Lifelong infinite mixture model 📋 This is the implementation of the Lifelong infinite mixture model 📋 Accepted by ICCV 2021 Title : Lifelong Infinit

Fei Ye 5 Oct 20, 2022
🕺Full body detection and tracking

Pose-Detection 🤔 Overview Human pose estimation from video plays a critical role in various applications such as quantifying physical exercises, sign

Abbas Ataei 20 Nov 21, 2022
PyTorch code for the paper "Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval".

Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval (M2HSE) PyTorch code fo

Xinlei-Pei 6 Dec 23, 2022
Image-to-image regression with uncertainty quantification in PyTorch

Image-to-image regression with uncertainty quantification in PyTorch. Take any dataset and train a model to regress images to images with rigorous, distribution-free uncertainty quantification.

Anastasios Angelopoulos 25 Dec 26, 2022
git《Investigating Loss Functions for Extreme Super-Resolution》(CVPR 2020) GitHub:

Investigating Loss Functions for Extreme Super-Resolution NTIRE 2020 Perceptual Extreme Super-Resolution Submission. Our method ranked first and secon

Sejong Yang 0 Oct 17, 2022
Scalable Attentive Sentence-Pair Modeling via Distilled Sentence Embedding (AAAI 2020) - PyTorch Implementation

Scalable Attentive Sentence-Pair Modeling via Distilled Sentence Embedding PyTorch implementation for the Scalable Attentive Sentence-Pair Modeling vi

Microsoft 25 Dec 02, 2022
A high-level Python library for Quantum Natural Language Processing

lambeq About lambeq is a toolkit for quantum natural language processing (QNLP). Documentation: https://cqcl.github.io/lambeq/ Getting started Prerequ

Cambridge Quantum 315 Jan 01, 2023
Data and extra materials for the food safety publications classifier

Data and extra materials for the food safety publications classifier The subdirectories contain detailed descriptions of their contents in the README.

1 Jan 20, 2022
True Few-Shot Learning with Language Models

This codebase supports using language models (LMs) for true few-shot learning: learning to perform a task using a limited number of examples from a single task distribution.

Ethan Perez 124 Jan 04, 2023
A face dataset generator with out-of-focus blur detection and dynamic interval adjustment.

A face dataset generator with out-of-focus blur detection and dynamic interval adjustment.

Yutian Liu 2 Jan 29, 2022
Official code for our ICCV paper: "From Continuity to Editability: Inverting GANs with Consecutive Images"

GANInversion_with_ConsecutiveImgs Official code for our ICCV paper: "From Continuity to Editability: Inverting GANs with Consecutive Images" https://a

QingyangXu 38 Dec 07, 2022
MPViT:Multi-Path Vision Transformer for Dense Prediction

MPViT : Multi-Path Vision Transformer for Dense Prediction This repository inlcu

Youngwan Lee 272 Dec 20, 2022
Automatically replace ONNX's RandomNormal node with Constant node.

onnx-remove-random-normal This is a script to replace RandomNormal node with Constant node. Example Imagine that we have something ONNX model like the

Masashi Shibata 1 Dec 11, 2021
Audio Domain Adaptation for Acoustic Scene Classification using Disentanglement Learning

Audio Domain Adaptation for Acoustic Scene Classification using Disentanglement Learning Reference Abeßer, J. & Müller, M. Towards Audio Domain Adapt

Jakob Abeßer 2 Jul 06, 2022
Implementation for our ICCV 2021 paper: Dual-Camera Super-Resolution with Aligned Attention Modules

DCSR: Dual Camera Super-Resolution Implementation for our ICCV 2021 oral paper: Dual-Camera Super-Resolution with Aligned Attention Modules paper | pr

Tengfei Wang 110 Dec 20, 2022
Masked regression code - Masked Regression

Masked Regression MR - Python Implementation This repositery provides a python implementation of MR (Masked Regression). MR can efficiently synthesize

Arbish Akram 1 Dec 23, 2021
Taking A Closer Look at Domain Shift: Category-level Adversaries for Semantics Consistent Domain Adaptation

Taking A Closer Look at Domain Shift: Category-level Adversaries for Semantics Consistent Domain Adaptation (CVPR2019) This is a pytorch implementatio

Yawei Luo 280 Jan 01, 2023
Internship Assessment Task for BaggageAI.

BaggageAI Internship Task Problem Statement: You are given two sets of images:- background and threat objects. Background images are the background x-

Arya Shah 10 Nov 14, 2022
PyTorch experiments with the Zalando fashion-mnist dataset

zalando-pytorch PyTorch experiments with the Zalando fashion-mnist dataset Project Organization ├── LICENSE ├── Makefile - Makefile with co

Federico Baldassarre 31 Sep 25, 2021
Image-Adaptive YOLO for Object Detection in Adverse Weather Conditions

Image-Adaptive YOLO for Object Detection in Adverse Weather Conditions Accepted by AAAI 2022 [arxiv] Wenyu Liu, Gaofeng Ren, Runsheng Yu, Shi Guo, Jia

liuwenyu 245 Dec 16, 2022