CLASP - Contrastive Language-Aminoacid Sequence Pretraining

Last update: Dec 29, 2022

Related tags

Deep Learning clasp

Overview

CLASP - Contrastive Language-Aminoacid Sequence Pretraining

Repository for creating models pretrained on language and aminoacid sequences similar to ConVIRT, CLIP, and ALIGN.

Work in progress - more updates soon!

Requirements

You can install the requirements with the following

$ python setup.py install --user

Then, you must install Microsoft's sparse attention CUDA kernel with the following two steps.

$ sh install_deepspeed.sh

Next, you need to pip install the package triton

$ pip install triton

If both of the above succeeded, now you can train your long biosequences with CLASP

Usage

import torch
from torch.optim import Adam

from clasp import CLASP, Transformer, tokenize

# instantiate the attention models for text and bioseq

text_enc = Transformer(
    num_tokens = 20000,
    dim = 512,
    depth = 6,
    seq_len = 1024
)

bioseq_enc = Transformer(
    num_tokens = 21,
    dim = 512,
    depth = 6,
    seq_len = 512,
    sparse_attn = True
)

# clasp (CLIP) trainer

clasp = CLASP(
    text_encoder = text_enc,
    bioseq_encoder = bioseq_enc
)

# data

text, text_mask = tokenize(['Spike protein S2: HAMAP-Rule:MF_04099'], context_length = 1024, return_mask = True)

bioseq = torch.randint(0, 21, (1, 511))         # when using sparse attention, should be 1 less than the sequence length
bioseq_mask = torch.ones_like(bioseq).bool()

# do the below with large batch sizes for many many iterations

opt = Adam(clasp.parameters(), lr = 3e-4)

loss = clasp(
    text,
    bioseq,
    text_mask = text_mask,
    bioseq_mask = bioseq_mask,
    return_loss = True               # set return loss to True
)

loss.backward()

Once trained

scores = clasp(
    texts,
    bio_seq,
    text_mask = text_mask,
    bioseq_mask = bioseq_mask
)

Resources

See interesting resources (feel free to add interesting material that could be useful).

Citations

@article{zhang2020contrastive,
  title={Contrastive learning of medical visual representations from paired images and text},
  author={Zhang, Yuhao and Jiang, Hang and Miura, Yasuhide and Manning, Christopher D and Langlotz, Curtis P},
  journal={arXiv preprint arXiv:2010.00747},
  year={2020}
}

OpenAI blog post "CLIP: Connecting Text and Images"

@article{radford2021learning,
  title={Learning transferable visual models from natural language supervision},
  author={Radford, Alec and Kim, Jong Wook and Hallacy, Chris and Ramesh, Aditya and Goh, Gabriel and Agarwal, Sandhini and Sastry, Girish and Askell, Amanda and Mishkin, Pamela and Clark, Jack and others},
  journal={arXiv preprint arXiv:2103.00020},
  year={2021}
}

@article{jia2021scaling,
  title={Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision},
  author={Jia, Chao and Yang, Yinfei and Xia, Ye and Chen, Yi-Ting and Parekh, Zarana and Pham, Hieu and Le, Quoc V and Sung, Yunhsuan and Li, Zhen and Duerig, Tom},
  journal={arXiv preprint arXiv:2102.05918},
  year={2021}
}

CLASP - Contrastive Language-Aminoacid Sequence Pretraining

Related tags

Overview

CLASP - Contrastive Language-Aminoacid Sequence Pretraining

Requirements

Usage

Resources

Citations

Owner

Michael Pieler

Parameter Efficient Deep Probabilistic Forecasting

[Open Source]. The improved version of AnimeGAN. Landscape photos/videos to anime

CVPR 2022 "Online Convolutional Re-parameterization"

BrainGNN - A deep learning model for data-driven discovery of functional connectivity

Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers

Code for "Share With Thy Neighbors: Single-View Reconstruction by Cross-Instance Consistency" paper

The Python code for the paper A Hybrid Quantum-Classical Algorithm for Robust Fitting

RATE: Overcoming Noise and Sparsity of Textual Features in Real-Time Location Estimation (CIKM'17)

An educational resource to help anyone learn deep reinforcement learning.

A Dataset for Direct Quotation Extraction and Attribution in News Articles.

Reinforcement Learning for finance

Keras + Hyperopt: A very simple wrapper for convenient hyperparameter optimization

This package implements the algorithms introduced in Smucler, Sapienza, and Rotnitzky (2020) to compute optimal adjustment sets in causal graphical models.

Using contrastive learning and OpenAI's CLIP to find good embeddings for images with lossy transformations

Multiple Object Tracking with Yolov5!

Low-code/No-code approach for deep learning inference on devices

code release for USENIX'22 paper `On the Security Risks of AutoML`

LLVM-based compiler for LightGBM gradient-boosted trees. Speeds up prediction by ≥10x.

Code release for "Detecting Twenty-thousand Classes using Image-level Supervision".

SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking