TorchPQ is a python library for Approximate Nearest Neighbor Search (ANNS) and Maximum Inner Product Search (MIPS) on GPU using Product Quantization (PQ) algorithm.

Last update: Dec 28, 2022

Related tags

Overview

TorchPQ

TorchPQ is a python library for Approximate Nearest Neighbor Search (ANNS) and Maximum Inner Product Search (MIPS) on GPU using Product Quantization (PQ) algorithm. TorchPQ is implemented mainly with PyTorch, with some extra CUDA kernels to accelerate clustering, indexing and searching.

Install

First install a version of CuPy library that matches your CUDA version

pip install cupy-cuda90
pip install cupy-cuda100
pip install cupy-cuda101
...

Then install TorchPQ

pip install torchpq

for a full list of cupy-cuda versions, please go to Installation Guide

Quick Start

IVFPQ

InVerted File Product Quantization (IVFPQ) is a type of ANN search algorithm that is designed to do fast and efficient vector search in million, or even billion scale vector sets. check the original paper for more details.

Training

from torchpq import IVFPQ

n_data = 1000000 # number of data points
d_vector = 128 # dimentionality / number of features

index = IVFPQ(
  d_vector=d_vector,
  n_subvectors=64,
  n_cq_clusters=1024,
  n_pq_clusters=256,
  blocksize=128,
  distance="euclidean",
)

x = torch.randn(d_vector, n_data, device="cuda:0")
index.train(x)

There are some important parameters that need to be explained:

d_vector: dimentionality of input vectors. there are 2 constraints on d_vector: (1) it needs to be divisible by n_subvectors; (2) it needs to be a multiple of 4.*
n_subvectors: number of subquantizers, essentially this is the byte size of each quantized vector, 64 byte per vector in the above example.**
n_cq_clusters: number of coarse quantizer clusters
n_pq_clusters: number of product quantizer clusters, this is assumed to be 256 throughout the entire project, and should NOT be changed.
blocksize: initial capacity assigned to each voronoi cell of coarse quantizer. n_cq_clusters * blocksize is the number of vectors that can be stored initially. if any cell has reached its capacity, that cell will be automatically expanded. If you need to add vectors frequently, a larger value for blocksize is recommended.

Remember that the shape of any tensor that contains data points has to be [d_vector, n_data].

* the second constraint could be removed in the future
** actual byte size would be (n_subvectors+9) bytes, 8 bytes for ID and 1 byte for is_empty

Adding new vectors

ids = torch.arange(n_data, device="cuda")
index.add(x, input_ids=ids)

Each ID in ids needs to be a unique int64 (torch.long) value that identifies a vector in x. if input_ids is not provided, it will be set to torch.arange(n_data, device="cuda") + previous_max_id

Removing vectors

index.remove(ids)

index.remove(ids) will virtually remove vectors with specified ids from storage. It ignores ids that doesn't exist.

Topk search

index.n_probe = 32
n_query = 10000
query = torch.randn(d_vector, n_query, device="cuda:0")
topk_values, topk_ids = index.topk(query, k=100)

when distance="inner", topk_values are inner product of queries and topk closest data points.
when distance="euclidean", topk_values are negative squared L2 distance between queries and topk closest data points.
when distance="manhattan", topk_values are negative L1 distance between queries and topk closest data points.
when distance="cosine", topk_values are cosine similarity between queries and topk closest data points.

Encode and Decode

you can use IVFPQ as a vector codec for lossy compression of vectors

code = index.encode(query)   # compression
reconstruction = index.decode(code) # reconstruction

Save and Load

Most of the TorchPQ modules are inherited from torch.nn.Module, this means you can save and load them just like a regular pytorch model.

# Save to PATH
torch.save(index.state_dict(), PATH)
# Load from PATH
index.load_state_dict(torch.load(PATH))

Clustering

K-means

from torchpq.kmeans import KMeans
import torch

n_data = 1000000 # number of data points
d_vector = 128 # dimentionality / number of features
x = torch.randn(d_vector, n_data, device="cuda")

kmeans = KMeans(n_clusters=4096, distance="euclidean")
labels = kmeans.fit(x)

Notice that the shape of the tensor that contains data points has to be [d_vector, n_data], this is consistant in TorchPQ.

Multiple concurrent K-means

Sometimes, we have multiple independent datasets that need to be clustered, instead of running multiple KMeans sequentianlly, we can perform multiple kmeans concurrently with MultiKMeans

from torchpq.kmeans import MultiKMeans
import torch

n_data = 1000000
n_kmeans = 16
d_vector = 64
x = torch.randn(n_kmeans, d_vector, n_data, device="cuda")
kmeans = MultiKMeans(n_clusters=256, distance="euclidean")
labels = kmeans.fit(x)

Prediction with K-means

labels = kmeans.predict(x)

Benchmark

All experiments were performed with a Tesla T4 GPU.

SIFT1M

IVFPQ

Faiss is one of the most well known ANN search libraries, and it also has a GPU implementation of IVFPQ, so we did some comparison experiments with faiss.

How to read the plot:

the plot format follows the style of ann-benchmarks
X axis is [email protected], Y axis is queries/second
the closer to the top right corner the better
indexes with same parameters from different libraries have similar colors.
different libraries have different line styles (TorchPQ is solid line with circle marker, faiss is dashed line with triangle marker)
each node on the line represents a different n_probe, starting from 1 at the left most node, and multiplied by 2 at the next node. (n_probe = 1,2,4,8,16,...)

Summary:

for all the IVF16384 variants, torchpq outperforms faiss when n_probe > 16.
for IVF4096, torchpq has lower [email protected] compared to faiss, this could be caused by not encoding residuals. An option to encode residuals will be added soon.

IVFPQ+R

GIST1M

coming soon...

Comments

torchPQ in a deep nets

Hello,

Thank you very much for sharing the project. I am interested in using torchPQ inside a deep nets (implemented in pytorch) where in each forward pass, I will call torchPQ. I was wondering is this possible?

Also, I saw https://ai.googleblog.com/2020/07/announcing-scann-efficient-vector.html, have you tried some comparison with other methods?

Thank you!

opened by Chen-Cai-OSU 9
About SM Size

Hi, thanks very much for sharing this project. I have been looking for a package supporting batch kmeans for a very long period. Very glad to find that TorchPQ supports that (MultiKMeans). Many thanks again.

But I have a question regarding the argument sm_size of initializing MultiKMeans. I know it is Shared Memory Size of CUDA. I am not familiar with CUDA programming and cannot figure out what the default value 48 * 256 * 4 means (the comment in the code does not mention this argument), even after I search on the internet. Could you briefly explain this here? Also, I guess increasing this value can speed up the computation? Am I right? Thanks for your time.

opened by SEC4SR 6
Question about importing MultiKMeans

Thanks for the nice work! But when I tried to import MultiKMeans using the command shown in README.md: from torchpq.kmeans import MultiKMeans it goes wrong and said: ModuleNotFoundError: No module named 'torchpq.kmeans' And when I try to use: from torchpq.clustering import MultiKMeans to import, and it goes right. I wonder if it is correct since it is different from what README.md says.

opened by nmynol 6
CUDA error distributed training

Hi,

TorchPQ runs well on a single gpu, but it fails when I switch to multi-gpus. The error occurs in the synchronize step. Do you have any suggestions for multi-gpu usage?

Thanks!

opened by Songweiping 4
Inquiry about the centroids of the K-means method

Hi, firstly thanks for your wonderful work.

I want to get the centroids of the clusters and visualize them. However, from your introduction, it seems I can only get the labels of all samples. Do you have any suggestions that I can get the results?

Thanks again for helping me out.

opened by hellodrx 3

Import Error in Minibatch K means

just tried this today

Traceback (most recent call last):
  File "/datadrive/phd-projects/PiCIE/eval_minimal.py", line 18, in <module>
    from torchpq.clustering import MinibatchKMeans
  File "/anaconda/envs/py38_pytorch/lib/python3.8/site-packages/torchpq/__init__.py", line 11, in <module>
    from . import experimental
ImportError: cannot import name 'experimental' from partially initialized module 'torchpq' (most likely due to a circular import) (/anaconda/envs/py38_pytorch/lib/python3.8/site-packages/torchpq/__init__.py)

opened by mhamilton723 3

Imports on CPU-only machine fail
Hello,

I am trying to run your awesome CUDA-powered k-means. For testing purposes, I would like to make it runnable also on CPU, but I am getting errors during importing because of this: https://github.com/DeMoriarty/TorchPQ/blob/b8bbadf7915b8fead9a1b0f2dafa964b4058f26d/torchpq/kernels/default_device.py#L3

which results in:

CUDARuntimeError: cudaErrorNoDevice: no CUDA-capable device is detected

Would you mind changing it to something like:

if torch.cuda.is_available(): __device = cp.cuda.Device().id else: __device = None

or hiding the imports of get_default_device and set_default_device (they seem to be imported after checking torch.cuda.is_available() anyway, so it should be possible)?

And also getting rid / hiding this: https://github.com/DeMoriarty/TorchPQ/blob/b8bbadf7915b8fead9a1b0f2dafa964b4058f26d/torchpq/init.py#L22
opened by Tomiinek 2

KMeans and MultiKMeans: CUDA_ERROR_INVALID_VALUE: invalid argument

This issue seems to come up when the tensor length (n_data) is greater than 8388480.

n_data = 8388481 # Works when n_data = 8388480
n_kmeans = 5
d_vector = 3
A = torch.randn(n_kmeans, d_vector, n_data, device="cuda")
kmeans = MultiKMeans(n_clusters=10, distance="euclidean")
labels = kmeans.fit(x)

Error message:

---------------------------------------------------------------------------
CUDADriverError                           Traceback (most recent call last)
<ipython-input-27-75b27aaadf4d> in <module>
      6 #x = x.float()
      7 kmeans = MultiKMeans(n_clusters=10, distance="euclidean")
----> 8 labels = kmeans3fit(x)

~/.local/lib/python3.8/site-packages/torchpq/clustering/MultiKMeans.py in fit(self, data, centroids)
    432       for j in range(self.max_iter):
    433         # 1 iteration of clustering
--> 434         maxsims, labels = self.get_labels(data, centroids) #top1 search
    435         new_centroids = self.compute_centroids(data, labels)
    436         error = self.calculate_error(centroids, new_centroids)

~/.local/lib/python3.8/site-packages/torchpq/clustering/MultiKMeans.py in get_labels(self, data, centroids)
    323         #   dim=2
    324         # )
--> 325         maxsims, labels = self.max_sim_cuda(
    326           data,
    327           centroids,

~/.local/lib/python3.8/site-packages/torchpq/kernels/MaxSimCuda.py in __call__(self, A, B, dim, mode)
    317       vals, inds = self._call_tt(A2, B2, dim)
    318     elif mode == "tn":
--> 319       vals, inds = self._call_tn(A2, B2, dim)
    320     elif mode == "nt":
    321       vals, inds = self._call_nt(A2, B2, dim)

~/.local/lib/python3.8/site-packages/torchpq/kernels/MaxSimCuda.py in _call_tn(self, A, B, dim)
    213     blocks_per_grid = (l, math.ceil(n/128), math.ceil(m/128))
    214 
--> 215     self._fn_tn(
    216       grid=blocks_per_grid,
    217       block=threads_per_block,

cupy/_core/raw.pyx in cupy._core.raw.RawKernel.__call__()

cupy/cuda/function.pyx in cupy.cuda.function.Function.__call__()

cupy/cuda/function.pyx in cupy.cuda.function._launch()

cupy_backends/cuda/api/driver.pyx in cupy_backends.cuda.api.driver.launchKernel()

cupy_backends/cuda/api/driver.pyx in cupy_backends.cuda.api.driver.check_status()

CUDADriverError: CUDA_ERROR_INVALID_VALUE: invalid argument

opened by mhudecheck 2

How to use MinibatchKMeans on multi GPUs machine?

I'm a beginner, please how can I use multiple GPUs in MinibatchKMeans?

from torchpq.clustering import MinibatchKMeans
import torch

n_data = 10000 # number of data points
d_vector = 128 # dimentionality / number of features
x = torch.randn(d_vector, n_data, device="cuda")

minibatch_kmeans = MinibatchKMeans(n_clusters = 128)
minibatch_kmeans = torch.nn.DataParallel(minibatch_kmeans, device_ids=[0,1,2])
n_iter = 10
tol = 0.001
for i in range(n_iter):
    x = torch.randn(d_vector, n_data, device="cuda")
    minibatch_kmeans.fit_minibatch(x)
    if minibatch_kmeans.error < tol:
        break

And I get the below output

Traceback (most recent call last):
  File "kmean_torch.py", line 14, in <module>
    minibatch_kmeans.fit_minibatch(x)
  File "/data/home/dl/anaconda3/envs/clip/lib/python3.7/site-packages/torch/nn/modules/module.py", line 779, in __getattr__
    type(self).__name__, name))
torch.nn.modules.module.ModuleAttributeError: 'DataParallel' object has no attribute 'fit_minibatch'

opened by ZhangIceNight 1

readme does not run

Hello, I'm trying to run your Readme example and I get __init__() got an unexpected keyword argument 'blocksize' on removing blocksize, then i see __init__() got an unexpected keyword argument 'init_size'

opened by lucidrains 1

Error while importing torchpq.clustering

I see the following error when I try to import torchpq.clustering.

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
/tmp/ipykernel_7302/3376715144.py in <module>
----> 1 from torchpq import clustering

~/.local/lib/python3.8/site-packages/torchpq/__init__.py in <module>
     18 from .CustomModule import CustomModule
     19 
---> 20 topk = fn.Topk()

~/.local/lib/python3.8/site-packages/torchpq/fn/Topk.py in __init__(self)
      4 class Topk:
      5   def __init__(self):
----> 6     self._top32_cuda = TopkSelectCuda(
      7       tpb = 32,
      8       queue_capacity = 4,

~/.local/lib/python3.8/site-packages/torchpq/kernels/TopkSelectCuda.py in __init__(self, tpb, queue_capacity, buffer_size)
     23     self.buffer_size = buffer_size
     24 
---> 25     with open(get_absolute_path("kernels", "cuda", "topk_select.cu"),'r') as f: ###
     26       self.kernel = f.read()
     27 

FileNotFoundError: [Errno 2] No such file or directory: '/home/XXXX/.local/lib/python3.8/site-packages/torchpq/kernels/cuda/topk_select.cu'

Installation details:

Used pip to install cupy-cuda110,
pytorch version: 1.7.1
Cuda: 11.0

However, I am able to run from torchpq.index import IVFPQIndex without any issue. Can you please help me fix this?

opened by abhinavvs 1

Releases(v0.3.0.1)

v0.3.0.1(Sep 27, 2022)

Source code(tar.gz)
Source code(zip)
v0.3.0(Oct 22, 2021)

Improvements on IVFPQ added DistributedCellContainer
Source code(tar.gz)
Source code(zip)
v0.2.0.2(Oct 22, 2021)

Small improvements
Source code(tar.gz)
Source code(zip)
v0.2.0(Jul 11, 2021)
What's new in v0.2.0?

IVFPQIndex search speed is greatly improved, now at least 2 times faster than before

IVFPQIndex now supports encoding residuals, pass pq_use_residual=True to initializer method in order to toggle residual encoding on. this improves recall rate, espcially for low code sizes. You can see the difference between residual encoding on and off from benchmark results

Added new submodules, such as index, clustering, codec, transform and more

old implementations from v0.1 are moved to torchpq/legacy/

more thorough benchmark results on sift1m dataset

Source code(tar.gz)
Source code(zip)
v0.1.4(Feb 9, 2021)

Source code(tar.gz)
Source code(zip)

Owner

GitHub Repository

A modular, research-friendly framework for high-performance and inference of sequence models at many scales

T5X T5X is a modular, composable, research-friendly framework for high-performance, configurable, self-service training, evaluation, and inference of

1.1k Jan 08, 2023

UIUCTF 2021 Public Challenge Repository

UIUCTF-2021-Public UIUCTF 2021 Public Challenge Repository Notes: every challenge folder contains a challenge.yml file in the format for ctfcli, CTFd'

15 Nov 03, 2022

Spatial Intention Maps for Multi-Agent Mobile Manipulation (ICRA 2021)

spatial-intention-maps This code release accompanies the following paper: Spatial Intention Maps for Multi-Agent Mobile Manipulation Jimmy Wu, Xingyua

70 Jan 02, 2023

This application explain how we can easily integrate Deepface framework with Python Django application

deepface_suite This application explain how we can easily integrate Deepface framework with Python Django application install redis cache install requ

3 Apr 18, 2022

Subgraph Based Learning of Contextual Embedding

SLiCE Self-Supervised Learning of Contextual Embeddings for Link Prediction in Heterogeneous Networks Dataset details: We use four public benchmark da

27 Dec 01, 2022

HomeAssitant custom integration for dyson

HomeAssistant Custom Integration for Dyson This custom integration is still under development. This is a HA custom integration for dyson. There are se

232 Dec 31, 2022

Self-Supervised Image Denoising via Iterative Data Refinement

Self-Supervised Image Denoising via Iterative Data Refinement Yi Zhang1, Dasong Li1, Ka Lung Law2, Xiaogang Wang1, Hongwei Qin2, Hongsheng Li1 1CUHK-S

72 Jan 01, 2023

A High-Quality Real Time Upscaler for Anime Video

Anime4K Anime4K is a set of open-source, high-quality real-time anime upscaling/denoising algorithms that can be implemented in any programming langua

15.7k Jan 06, 2023

Dist2Dec: A Simplicial Neural Network for Homology Localization

6 Jun 12, 2022

Light-Head R-CNN

Light-head R-CNN Introduction We release code for Light-Head R-CNN. This is my best practice for my research. This repo is organized as follows: light

835 Dec 06, 2022

Code for "The Box Size Confidence Bias Harms Your Object Detector"

The Box Size Confidence Bias Harms Your Object Detector - Code Disclaimer: This repository is for research purposes only. It is designed to maintain r

24 Dec 07, 2022

A list of awesome PyTorch scholarship articles, guides, blogs, courses and other resources.

Awesome PyTorch Scholarship Resources A collection of awesome PyTorch and Python learning resources. Contributions are always welcome! Course Informat

302 Dec 03, 2022

A Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way :chestnut:

Squirrel Core Share, load, and transform data in a collaborative, flexible, and efficient way What is Squirrel? Squirrel is a Python library that enab

249 Dec 07, 2022

Official code of "R2RNet: Low-light Image Enhancement via Real-low to Real-normal Network."

R2RNet Official code of "R2RNet: Low-light Image Enhancement via Real-low to Real-normal Network." Jiang Hai, Zhu Xuan, Ren Yang, Yutong Hao, Fengzhu

77 Dec 24, 2022

This is the codebase for Diffusion Models Beat GANS on Image Synthesis.

3k Dec 26, 2022

使用yolov5训练自己数据集(详细过程)并通过flask部署

使用yolov5训练自己的数据集（详细过程）并通过flask部署依赖库 torch torchvision numpy opencv-python lxml tqdm flask pillow tensorboard matplotlib pycocotools Windows，请使用 pycoc

19 Dec 28, 2022

Official repository of "DeepMIH: Deep Invertible Network for Multiple Image Hiding", TPAMI 2022.

DeepMIH: Deep Invertible Network for Multiple Image Hiding (TPAMI 2022) This repo is the official code for DeepMIH: Deep Invertible Network for Multip

67 Nov 22, 2022

Steer OpenAI's Jukebox with Music Taggers

TagBox Steer OpenAI's Jukebox with Music Taggers! The closest thing we have to VQGAN+CLIP for music! Unsupervised Source Separation By Steering Pretra

34 Nov 02, 2022

Official Pytorch implementation of "Beyond Static Features for Temporally Consistent 3D Human Pose and Shape from a Video", CVPR 2021

TCMR: Beyond Static Features for Temporally Consistent 3D Human Pose and Shape from a Video Qualtitative result Paper teaser video Introduction This r

215 Jan 06, 2023

The Adapter-Bot: All-In-One Controllable Conversational Model

The Adapter-Bot: All-In-One Controllable Conversational Model This is the implementation of the paper: The Adapter-Bot: All-In-One Controllable Conver

37 Nov 04, 2022

TorchPQ is a python library for Approximate Nearest Neighbor Search (ANNS) and Maximum Inner Product Search (MIPS) on GPU using Product Quantization (PQ) algorithm.

Related tags

Overview

TorchPQ

Install

Quick Start

IVFPQ

Training

Adding new vectors

Removing vectors

Topk search

Encode and Decode

Save and Load

Clustering

K-means

Multiple concurrent K-means

Prediction with K-means

Benchmark

SIFT1M

IVFPQ

IVFPQ+R

GIST1M

Comments

Releases(v0.3.0.1)

v0.3.0.1(Sep 27, 2022)

v0.3.0(Oct 22, 2021)

v0.2.0.2(Oct 22, 2021)

v0.2.0(Jul 11, 2021)

What's new in v0.2.0?

v0.1.4(Feb 9, 2021)

Owner

A modular, research-friendly framework for high-performance and inference of sequence models at many scales

UIUCTF 2021 Public Challenge Repository

Spatial Intention Maps for Multi-Agent Mobile Manipulation (ICRA 2021)

This application explain how we can easily integrate Deepface framework with Python Django application

Subgraph Based Learning of Contextual Embedding

HomeAssitant custom integration for dyson

Self-Supervised Image Denoising via Iterative Data Refinement

A High-Quality Real Time Upscaler for Anime Video

Dist2Dec: A Simplicial Neural Network for Homology Localization

Light-Head R-CNN

Code for "The Box Size Confidence Bias Harms Your Object Detector"

A list of awesome PyTorch scholarship articles, guides, blogs, courses and other resources.

A Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way :chestnut:

Official code of "R2RNet: Low-light Image Enhancement via Real-low to Real-normal Network."

This is the codebase for Diffusion Models Beat GANS on Image Synthesis.

使用yolov5训练自己数据集(详细过程)并通过flask部署

Official repository of "DeepMIH: Deep Invertible Network for Multiple Image Hiding", TPAMI 2022.

Steer OpenAI's Jukebox with Music Taggers

Official Pytorch implementation of "Beyond Static Features for Temporally Consistent 3D Human Pose and Shape from a Video", CVPR 2021

The Adapter-Bot: All-In-One Controllable Conversational Model