(Py)TOD: Tensor-based Outlier Detection, A General GPU-Accelerated Framework

Last update: Jan 05, 2023

Overview

(Py)TOD: Tensor-based Outlier Detection, A General GPU-Accelerated Framework

Background: Outlier detection (OD) is a key data mining task for identifying abnormal objects from general samples with numerous high-stake applications including fraud detection and intrusion detection.

To scale outlier detection (OD) to large-scale, high-dimensional datasets, we propose TOD, a novel system that abstracts OD algorithms into basic tensor operations for efficient GPU acceleration.

The corresponding paper. The code is being cleaned up and released. Please watch and star!

One reason to use it:

On average, TOD is 11 times faster than PyOD!

If you need another reason: it can handle much larger datasets:more than a million sample OD within an hour!

TOD is featured for:

Unified APIs, detailed documentation, and examples for the easy use (under construction)
Supports more than 10 different OD algorithms and more are being added
TOD supports multi-GPU acceleration
Advanced techniques like provable quantization

Programming Model Interface

Complex OD algorithms can be abstracted into common tensor operators.

https://raw.githubusercontent.com/yzhao062/pytod/master/figs/abstraction.png

For instance, ABOD and COPOD can be assembled by the basic tensor operators.

https://raw.githubusercontent.com/yzhao062/pytod/master/figs/abstraction_example.png

End-to-end Performance Comparison with PyOD

Overall, it is much (on avg. 11 times) faster than PyOD takes way less run time.

https://raw.githubusercontent.com/yzhao062/pytod/master/figs/run_time.png

Code is being released. Watch and star for the latest news!

Comments

Error while installing package
I installed Pytorch 1.10 from their site. It seen in virtual environment. I try pip install pytod but when searching for pytorch, it cannot find it because it searches with the "pytorch" package, not the "torch" package.

ERROR: Could not find a version that satisfies the requirement pytorch>=1.7 (from pytod) (from versions: 0.1.2, 1.0.2) ERROR: No matching distribution found for pytorch>=1.7
opened by nuriakiin 1
decision_function() returns None

Thanks for the package. When I try to implement LOF (or KNN) decision_function() on test data returns empty object. Is there a fix to this? Following is the code that replicates the issue (on GPU):

from pytod.models.lof import LOF import torch import numpy as np

x = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [75,80]], dtype=np.float32) x = torch.from_numpy(x)

y = np.array([[6, 5], [1, 2], [3, 4], [5, 1], [11,12]], dtype=np.float32) y = torch.from_numpy(y)

lof = LOF(n_neighbors=2, device = 'cuda:0')

lof.fit(x)

print(lof.decision_function(y))

opened by sugatc 0
Support for novelty detection and changing distance metric with local outlier factor

The current implementation of LOF doesn't allow changing the distance metric to 'cosine', for example or setting novelty = True which prevents it from being used for novelty detection task. It will be great if support can be added for these.

opened by sugatc 2
can't fit model in colab

when i try fit on any model in colab gpu instance i get the following error. my dataset has 2 columns and 1 million rows:

AttributeError Traceback (most recent call last) in () 4 clf_name = 'KNN' 5 clf = LOF() ----> 6 clf.fit(X)

3 frames /usr/local/lib/python3.7/dist-packages/pandas/core/generic.py in getattr(self, name) 5485 ): 5486 return self[name] -> 5487 return object.getattribute(self, name) 5488 5489 def setattr(self, name: str, value) -> None:

AttributeError: 'DataFrame' object has no attribute 'to'

opened by yairVanti 0
clean up reproducibility scripts

We are cleaning up these scripts for an easy run, while the primary results are reproducible with the compare_real_data.py (https://github.com/yzhao062/pytod/tree/main/reproducibility)
enhancement

opened by yzhao062 0

Releases(v0.0.2)

v0.0.2(Jun 19, 2022)

v<0.0.1>, <04/12/2021> -- Add LOF. v<0.0.1>, <04/23/2021> -- Add ABOD. v<0.0.2>, <06/19/2021> -- Add PCA and HBOS. v<0.0.2>, <06/19/2021> -- Turn on test suites.

Now we have updated both the paper the repo to cover more algorithms.
Source code(tar.gz)
Source code(zip)

Owner

Yue Zhao

Ph.D. Student @ CMU. Outlier Detection Systems | ML Systems (MLSys) | Anomaly/Outlier Detection | AutoML. Twitter@ yzhao062

GitHub Repository https://www.andrew.cmu.edu/user/yuezhao2/papers/21-preprint-tod.pdf

Referring Video Object Segmentation

Awesome-Referring-Video-Object-Segmentation Welcome to starts ⭐ & comments 💹 & sharing 😀 !! - 2021.12.12: Recent papers (from 2021) - welcome to ad

57 Dec 11, 2022

The repository forked from NVlabs uses our data. (Differentiable rasterization applied to 3D model simplification tasks)

nvdiffmodeling [origin_code] Differentiable rasterization applied to 3D model simplification tasks, as described in the paper: Appearance-Driven Autom

2 Oct 31, 2022

Demonstration of the Model Training as a CI/CD System in Vertex AI

Model Training as a CI/CD System This project demonstrates the machine model training as a CI/CD system in GCP platform. You will see more detailed wo

19 Dec 28, 2022

Skipgram Negative Sampling in PyTorch

PyTorch SGNS Word2Vec's SkipGramNegativeSampling in Python. Yet another but quite general negative sampling loss implemented in PyTorch. It can be use

287 Dec 14, 2022

Object Database for Super Mario Galaxy 1/2.

Super Mario Galaxy Object Database Welcome to the public object database for Super Mario Galaxy and Super Mario Galaxy 2. Here, we document all object

9 Dec 04, 2022

Users can free try their models on SIDD dataset based on this code

SIDD benchmark 1 Train python train.py If you want to train your network, just modify the yaml in the options folder. 2 Validation python validation.p

2 May 20, 2022

ML From Scratch

ML from Scratch MACHINE LEARNING TOPICS COVERED - FROM SCRATCH Linear Regression Logistic Regression K Means Clustering K Nearest Neighbours Decision

66 Nov 02, 2022

Transformers are Graph Neural Networks!

🚀 Gated Graph Transformers Gated Graph Transformers for graph-level property prediction, i.e. graph classification and regression. Associated article

46 Jun 30, 2022

Official implementation of "Robust channel-wise illumination estimation"

This repository provides the official implementation of "Robust channel-wise illumination estimation." accepted in BMVC (2021).

4 Nov 08, 2022

PyTorch reimplementation of the paper Involution: Inverting the Inherence of Convolution for Visual Recognition [CVPR 2021].

Involution: Inverting the Inherence of Convolution for Visual Recognition Unofficial PyTorch reimplementation of the paper Involution: Inverting the I

100 Dec 01, 2022

Official implementation of VQ-Diffusion

Vector Quantized Diffusion Model for Text-to-Image Synthesis Overview This is the official repo for the paper: [Vector Quantized Diffusion Model for T

592 Jan 03, 2023

Pytorch implementation of Cut-Thumbnail in the paper Cut-Thumbnail:A Novel Data Augmentation for Convolutional Neural Network.

Cut-Thumbnail (Accepted at ACM MULTIMEDIA 2021) Tianshu Xie, Xuan Cheng, Xiaomin Wang, Minghui Liu, Jiali Deng, Tao Zhou, Ming Liu This is the officia

3 Apr 12, 2022

(Py)TOD: Tensor-based Outlier Detection, A General GPU-Accelerated Framework

Related tags

Overview

(Py)TOD: Tensor-based Outlier Detection, A General GPU-Accelerated Framework

One reason to use it:

Programming Model Interface

End-to-end Performance Comparison with PyOD

Comments

Error while installing package

decision_function() returns None

Support for novelty detection and changing distance metric with local outlier factor

can't fit model in colab

clean up reproducibility scripts

Releases(v0.0.2)

v0.0.2(Jun 19, 2022)

Owner

Yue Zhao

Referring Video Object Segmentation

The repository forked from NVlabs uses our data. (Differentiable rasterization applied to 3D model simplification tasks)

Demonstration of the Model Training as a CI/CD System in Vertex AI

Skipgram Negative Sampling in PyTorch

Object Database for Super Mario Galaxy 1/2.

Users can free try their models on SIDD dataset based on this code

ML From Scratch

Transformers are Graph Neural Networks!

Official implementation of "Robust channel-wise illumination estimation"

PyTorch reimplementation of the paper Involution: Inverting the Inherence of Convolution for Visual Recognition [CVPR 2021].

Official implementation of VQ-Diffusion

Pytorch implementation of Cut-Thumbnail in the paper Cut-Thumbnail:A Novel Data Augmentation for Convolutional Neural Network.

GPU implementation of $k$-Nearest Neighbors and Shared-Nearest Neighbors

Easy-to-use micro-wrappers for Gym and PettingZoo based RL Environments

One-line your code easily but still with the fun of doing so!

MediaPipe is a an open-source framework from Google for building multimodal

A minimal implementation of Gaussian process regression in PyTorch

This repository provides the code for MedViLL(Medical Vision Language Learner).

Code for Talk-to-Edit (ICCV2021). Paper: Talk-to-Edit: Fine-Grained Facial Editing via Dialog.

Image Deblurring using Generative Adversarial Networks