TCube generates rich and fluent narratives that describes the characteristics, trends, and anomalies of any time-series data (domain-agnostic) using the transfer learning capabilities of PLMs.

Last update: Oct 31, 2021

Overview

TCube: Domain-Agnostic Neural Time series Narration

This repository contains the code for the paper: "TCube: Domain-Agnostic Neural Time series Narration" (to appear in IEEE ICDM 2021).

The PLMs used in this effort (T5, BART, and GPT-2) are implemented using the HuggingFace library (https://huggingface.co/) and finetuned to the WebNLG v3 (https://gitlab.com/shimorina/webnlg-dataset/-/tree/master/release_v3.0) and DART (https://arxiv.org/abs/2007.02871) datasets.

Clones of both datasets are available under /Finetune PLMs/Datasets in this repository.

The PLMs fine-tuned to WebNLG/DART could not be uploaded due to the 1GB limitations of GitLFS. However, pre-made scripts in this repository (detailed below) are present for convientiently fine-tuning these models.

The entire repository is based on Python 3.6 and the results are visaulized through the iPython Notebooks.

Dependencies

Interactive Environments

notebook
ipywidgets==7.5.1

Deep Learning Frameworks

torch 1.7.1 (suited to your CUDA version)
pytorch-lightning 0.9.0
transformers==3.1.0

NLP Toolkits

sentencepiece==0.1.91
nltk

Scientific Computing, Data Manipulation, and Visualizations

numpy
scipy
sklearn
matplotib
pandas
pwlf

Evaluation

rouge-score
textstat
lexical_diversity
language-tool-python

Misc

xlrd
tqdm
cython

Please make sure that the aforementioned Python packages with their specified versions are installed in your system in a separate virtual environment.

Data-Preprocessing Scripts

Under /Finetune PLMs in this repository there are two scripts for pre-processing the WebNLG and DART datasets:

preprocess_webnlg.py
preprocess_dart.py

These scripts draw from the original datasets in /Finetune PLMs/Datasets/WebNLGv3 and /Finetune PLMs/Datasets/DART and prepare CSV files in /Finetune PLMs/Datasets breaking the original datasets into train, dev, and test sets in the format required by our PLMs.

Fine-tuning Scripts

Under /Finetune PLMs in this repository there are three scripts for fine-tuning T5, BART, and GPT-2:

finetuneT5.py
finetuneBART.py
finetuneGPT2.py

Visualization and Evaluation Notebooks

In the root directory are 10 notebooks. For the descriptions of the time-series datasets used:

Datatsets.ipynb

For comparisons of segmentation and regime-change detection algorithms:

Error Determination.ipynb
Regime Detection.ipynb
Segmentation.ipynb
Trend Detection Plot.ipynb

For the evaluation of the TCube framework on respective time-series datasets:

T3-COVID.ipnyb
T3-DOTS.ipnyb
T3-Pollution.ipnyb
T3-Population.ipnyb
T3-Temperature.ipnyb

Citation and Contact

If any part of this code repository or the TCube framework is used in your work, please cite our paper. Thanks!

Contact: Mandar Sharma ([email protected]), First Author.

TCube generates rich and fluent narratives that describes the characteristics, trends, and anomalies of any time-series data (domain-agnostic) using the transfer learning capabilities of PLMs.

Related tags

Overview

TCube: Domain-Agnostic Neural Time series Narration

Dependencies

Interactive Environments

Deep Learning Frameworks

NLP Toolkits

Scientific Computing, Data Manipulation, and Visualizations

Evaluation

Misc

Data-Preprocessing Scripts

Fine-tuning Scripts

Visualization and Evaluation Notebooks

Citation and Contact

Owner

Mandar Sharma

GPU implementation of $k$-Nearest Neighbors and Shared-Nearest Neighbors

The aim of the game, as in the original one, is to find a specific image from a group of different images of a person's face

Combinatorially Hard Games where the levels are procedurally generated

Remote sensing change detection using PaddlePaddle

Speech Recognition using DeepSpeech2.

Official implementation of deep-multi-trajectory-based single object tracking (IEEE T-CSVT 2021).

A fast poisson image editing implementation that can utilize multi-core CPU or GPU to handle a high-resolution image input.

An atmospheric growth and evolution model based on the EVo degassing model and FastChem 2.0

Calibrate your listeners! Robust communication-based training for pragmatic speakers. Findings of EMNLP 2021.

HybridNets: End-to-End Perception Network

Point Cloud Denoising input segmentation output raw point-cloud valid/clear fog rain de-noised Abstract Lidar sensors are frequently used in environme

Unofficial TensorFlow implementation of the Keyword Spotting Transformer model

Photographic Image Synthesis with Cascaded Refinement Networks - Pytorch Implementation

Tensorflow2 Keras-based Semantic Segmentation Models Implementation

Official Implementation (PyTorch) of "Point Cloud Augmentation with Weighted Local Transformations", ICCV 2021

GUPNet - Geometry Uncertainty Projection Network for Monocular 3D Object Detection

Distributed DataLoader For Pytorch Based On Ray

Compact Bidirectional Transformer for Image Captioning

Supporting code for the paper "Dangers of Bayesian Model Averaging under Covariate Shift"

A collection of Google research projects related to Federated Learning and Federated Analytics.