A framework for using LSTMs to detect anomalies in multivariate time series data. Includes spacecraft anomaly data and experiments from the Mars Science Laboratory and SMAP missions.

Overview

Telemanom (v2.0)

v2.0 updates:

  • Vectorized operations via numpy
  • Object-oriented restructure, improved organization
  • Merge branches into single branch for both processing modes (with/without labels)
  • Update requirements.txt and Dockerfile
  • Updated result output for both modes
  • PEP8 cleanup

Anomaly Detection in Time Series Data Using LSTMs and Automatic Thresholding

License

Telemanom employs vanilla LSTMs using Keras/Tensorflow to identify anomalies in multivariate sensor data. LSTMs are trained to learn normal system behaviors using encoded command information and prior telemetry values. Predictions are generated at each time step and the errors in predictions represent deviations from expected behavior. Telemanom then uses a novel nonparametric, unsupervised approach for thresholding these errors and identifying anomalous sequences of errors.

This repo along with the linked data can be used to re-create the experiments in our 2018 KDD paper, "Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding", which describes the background, methodologies, and experiments in more detail. While the system was originally deployed to monitor spacecraft telemetry, it can be easily adapted to similar problems.

Getting Started

Clone the repo (only available from source currently):

git clone https://github.com/khundman/telemanom.git && cd telemanom

Configure system/modeling parameters in config.yaml file (to recreate experiment from paper, leave as is). For example:

  • train: True if True, a new model will be trained for each input stream. If False (default) existing trained model will be loaded and used to generate predictions
  • predict: True Generate new predictions using models. If False (default), use existing saved predictions in evaluation (useful for tuning error thresholding and skipping prior processing steps)
  • l_s: 250 Determines the number of previous timesteps input to the model at each timestep t (used to generate predictions)

To run via Docker:

docker build -t telemanom .

# rerun experiment detailed in paper or run with your own set of labeled anomlies in 'labeled_anomalies.csv'
docker run telemanom -l labeled_anomalies.csv

# run without labeled anomalies
docker run telemanom

To run with local or virtual environment

From root of repo, curl and unzip data:

curl -O https://s3-us-west-2.amazonaws.com/telemanom/data.zip && unzip data.zip && rm data.zip

Install dependencies using python 3.6+ (recommend using a virtualenv):

pip install -r requirements.txt

Begin processing (from root of repo):

# rerun experiment detailed in paper or run with your own set of labeled anomlies
python example.py -l labeled_anomalies.csv

# run without labeled anomalies
python example.py

A jupyter notebook for evaluating results for a run is at telemanom/result_viewer.ipynb. To launch notebook:

jupyter notebook telemanom/result-viewer.ipynb

Plotly is used to generate interactive inline plots, e.g.:

drawing2

Data

Using your own data

Pre-split training and test sets must be placed in directories named data/train/ and data/test. One .npy file should be generated for each channel or stream (for both train and test) with shape (n_timesteps, n_inputs). The filename should be a unique channel name or ID. The telemetry values being predicted in the test data must be the first feature in the input.

For example, a channel T-1 should have train/test sets named T-1.npy with shapes akin to (4900,61) and (3925, 61), where the number of input dimensions are matching (61). The actual telemetry values should be along the first dimension (4900,1) and (3925,1).

Raw experiment data

The raw data available for download represents real spacecraft telemetry data and anomalies from the Soil Moisture Active Passive satellite (SMAP) and the Curiosity Rover on Mars (MSL). All data has been anonymized with regard to time and all telemetry values are pre-scaled between (-1,1) according to the min/max in the test set. Channel IDs are also anonymized, but the first letter gives indicates the type of channel (P = power, R = radiation, etc.). Model input data also includes one-hot encoded information about commands that were sent or received by specific spacecraft modules in a given time window. No identifying information related to the timing or nature of commands is included in the data. For example:

drawing

This data also includes pre-split test and training data, pre-trained models, predictions, and smoothed errors generated using the default settings in config.yaml. When getting familiar with the repo, running the result-viewer.ipynb notebook to visualize results is useful for developing intuition. The included data also is useful for isolating portions of the system. For example, if you wish to see the effects of changes to the thresholding parameters without having to train new models, you can set Train and Predict to False in config.yaml to use previously generated predictions from prior models.

Anomaly labels and metadata

The anomaly labels and metadata are available in labeled_anomalies.csv, which includes:

  • channel id: anonymized channel id - first letter represents nature of channel (P = power, R = radiation, etc.)
  • spacecraft: spacecraft that generated telemetry stream
  • anomaly_sequences: start and end indices of true anomalies in stream
  • class: the class of anomaly (see paper for discussion)
  • num values: number of telemetry values in each stream

To provide your own labels, use the labeled_anomalies.csv file as a template. The only required fields/columns are channel_id and anomaly_sequences. anomaly_sequences is a list of lists that contain start and end indices of anomalous regions in the test dataset for a channel.

Dataset and performance statistics:

Data

SMAP MSL Total
Total anomaly sequences 69 36 105
Point anomalies (% tot.) 43 (62%) 19 (53%) 62 (59%)
Contextual anomalies (% tot.) 26 (38%) 17 (47%) 43 (41%)
Unique telemetry channels 55 27 82
Unique ISAs 28 19 47
Telemetry values evaluated 429,735 66,709 496,444

Performance (with default params specified in paper)

Spacecraft Precision Recall F_0.5 Score
SMAP 85.5% 85.5% 0.71
Curiosity (MSL) 92.6% 69.4% 0.69
Total 87.5% 80.0% 0.71

Processing

Each time the system is started a unique datetime ID (ex. 2018-05-17_16.28.00) will be used to create the following

  • a results file (in results/) that extends labeled_anomalies.csv to include identified anomalous sequences and related info
  • a data subdirectory containing data files for created models, predictions, and smoothed errors for each channel. A file called params.log is also created that contains parameter settings and logging output during processing.

As mentioned, the jupyter notebook telemanom/result-viewer.ipynb can be used to visualize results for each stream.

Citation

If you use this work, please cite:

  title={Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding},
  author={Hundman, Kyle and Constantinou, Valentino and Laporte, Christopher and Colwell, Ian and Soderstrom, Tom},
  journal={arXiv preprint arXiv:1802.04431},
  year={2018}
}

License

Telemanom is distributed under Apache 2.0 license.

Contact: Kyle Hundman ([email protected])

Contributors

Code and models used in "MUSS Multilingual Unsupervised Sentence Simplification by Mining Paraphrases".

Multilingual Unsupervised Sentence Simplification Code and pretrained models to reproduce experiments in "MUSS: Multilingual Unsupervised Sentence Sim

Facebook Research 81 Dec 29, 2022
Keras implementation of Real-Time Semantic Segmentation on High-Resolution Images

Keras-ICNet [paper] Keras implementation of Real-Time Semantic Segmentation on High-Resolution Images. Training in progress! Requisites Python 3.6.3 K

Aitor Ruano 87 Dec 16, 2022
PIXIE: Collaborative Regression of Expressive Bodies

PIXIE: Collaborative Regression of Expressive Bodies [Project Page] This is the official Pytorch implementation of PIXIE. PIXIE reconstructs an expres

Yao Feng 331 Jan 04, 2023
Weakly Supervised Text-to-SQL Parsing through Question Decomposition

Weakly Supervised Text-to-SQL Parsing through Question Decomposition The official repository for the paper "Weakly Supervised Text-to-SQL Parsing thro

14 Dec 19, 2022
This tool uses Deep Learning to help you draw and write with your hand and webcam.

This tool uses Deep Learning to help you draw and write with your hand and webcam. A Deep Learning model is used to try to predict whether you want to have 'pencil up' or 'pencil down'.

lmagne 169 Dec 10, 2022
The code from the paper Character Transformations for Non-Autoregressive GEC Tagging

Character Transformations for Non-Autoregressive GEC Tagging Milan Straka, Jakub Náplava, Jana Straková Charles University Faculty of Mathematics and

ÚFAL 5 Dec 10, 2022
CFC-Net: A Critical Feature Capturing Network for Arbitrary-Oriented Object Detection in Remote Sensing Images

CFC-Net This project hosts the official implementation for the paper: CFC-Net: A Critical Feature Capturing Network for Arbitrary-Oriented Object Dete

ming71 55 Dec 12, 2022
Combinatorial model of ligand-receptor binding

Combinatorial model of ligand-receptor binding The binding of ligands to receptors is the starting point for many import signal pathways within a cell

Mobolaji Williams 0 Jan 09, 2022
ICCV2021 Papers with Code

ICCV2021 Papers with Code

Amusi 1.4k Jan 02, 2023
The Official Repository for "Generalized OOD Detection: A Survey"

Generalized Out-of-Distribution Detection: A Survey 1. Overview This repository is with our survey paper: Title: Generalized Out-of-Distribution Detec

Jingkang Yang 338 Jan 03, 2023
TensorFlow Tutorials with YouTube Videos

TensorFlow Tutorials Original repository on GitHub Original author is Magnus Erik Hvass Pedersen Introduction These tutorials are intended for beginne

9.1k Jan 02, 2023
SparseML is a libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models

SparseML is a toolkit that includes APIs, CLIs, scripts and libraries that apply state-of-the-art sparsification algorithms such as pruning and quantization to any neural network. General, recipe-dri

Neural Magic 1.5k Dec 30, 2022
A treasure chest for visual recognition powered by PaddlePaddle

简体中文 | English PaddleClas 简介 飞桨图像识别套件PaddleClas是飞桨为工业界和学术界所准备的一个图像识别任务的工具集,助力使用者训练出更好的视觉模型和应用落地。 近期更新 2021.11.1 发布PP-ShiTu技术报告,新增饮料识别demo 2021.10.23 发

4.6k Dec 31, 2022
Localized representation learning from Vision and Text (LoVT)

Localized Vision-Text Pre-Training Contrastive learning has proven effective for pre- training image models on unlabeled data and achieved great resul

Philip Müller 10 Dec 07, 2022
Machine learning Bot detection technique, based on United States election dataset

Machine learning Bot detection technique, based on United States election dataset (2020). Current github repo provides implementation described in pap

Alexander Shevtsov 4 Nov 20, 2022
Depth image based mouse cursor visual haptic

Depth image based mouse cursor visual haptic How to run it. Install pyqt5. Install python modules pip install Pillow pip install numpy For illustrati

Xiong Jie 17 Dec 20, 2022
💡 Type hints for Numpy

Type hints with dynamic checks for Numpy! (❒) Installation pip install nptyping (❒) Usage (❒) NDArray nptyping.NDArray lets you define the shape and

Ramon Hagenaars 377 Dec 28, 2022
A Python Package For System Identification Using NARMAX Models

SysIdentPy is a Python module for System Identification using NARMAX models built on top of numpy and is distributed under the 3-Clause BSD license. N

Wilson Rocha 175 Dec 25, 2022
An Object Oriented Programming (OOP) interface for Ontology Web language (OWL) ontologies.

Enabling a developer to use Ontology Web Language (OWL) along with its reasoning capabilities in an Object Oriented Programming (OOP) paradigm, by pro

TheEngineRoom-UniGe 7 Sep 23, 2022
Implemented fully documented Particle Swarm Optimization algorithm (basic model with few advanced features) using Python programming language

Implemented fully documented Particle Swarm Optimization (PSO) algorithm in Python which includes a basic model along with few advanced features such as updating inertia weight, cognitive, social lea

9 Nov 29, 2022