Trained T5 and T5-large model for creating keywords from text

Last update: Nov 24, 2022

Overview

text to keywords

Trained T5-base and T5-large model for creating keywords from text. Supported languages: ru

Pretraining Large version | Pretraining Base version

habr article

Usage

Example usage (the code returns a list with keywords. duplicates are possible):

pip install transformers sentencepiece

from itertools import groupby
import torch
from transformers import T5ForConditionalGeneration, T5Tokenizer
model_name = "0x7194633/keyt5-large" # or 0x7194633/keyt5-base
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)

def generate(text, **kwargs):
    inputs = tokenizer(text, return_tensors='pt')
    with torch.no_grad():
        hypotheses = model.generate(**inputs, num_beams=5, **kwargs)
    s = tokenizer.decode(hypotheses[0], skip_special_tokens=True)
    s = s.replace('; ', ';').replace(' ;', ';').lower().split(';')[:-1]
    s = [el for el, _ in groupby(s)]
    return s

article = """Reuters сообщил об отмене 3,6 тыс. авиарейсов из-за «омикрона» и погоды
Наибольшее число отмен авиарейсов 2 января пришлось на американские авиакомпании 
SkyWest и Southwest, у каждой — более 400 отмененных рейсов. При этом среди 
отмененных 2 января авиарейсов — более 2,1 тыс. рейсов в США. Также свыше 6400 
рейсов были задержаны."""

print(generate(article, top_p=1.0, max_length=64))  
# ['авиаперевозки', 'отмена авиарейсов', 'отмена рейсов', 'отмена авиарейсов', 'отмена рейсов', 'отмена авиарейсов']

Training

To teach the keyT5-base and keyT5-large models, you will need a table in csv format, like this:

KeyT5 models were trained on ~7000 compressed habr.com articles. data.csv collect.py Exclusively supports the Russian language!

X	Y
Some text that is fed to the input	The text that should come out
Some text that is fed to the input	The text that should come out

Go to the training notebook and learn more about it:

Trained T5 and T5-large model for creating keywords from text

Related tags

Overview

text to keywords

Usage

Training

Owner

Danil

OpenAI CLIP text encoders for multiple languages!

Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS)

BERTAC (BERT-style transformer-based language model with Adversarially pretrained Convolutional neural network)

A relatively simple python program to generate one of those reddit text to speech videos dominating youtube.

NLP codes implemented with Pytorch (w/o library such as huggingface)

FewCLUE: 为中文NLP定制的小样本学习测评基准

🕹 An esoteric language designed so that the program looks like the transcript of a Pokémon battle

Sample data associated with the Aurora-BP study

Задания КЕГЭ по информатике 2021 на Python

Words_And_Phrases - Just a repo for useful words and phrases that might come handy in some scenarios. Feel free to add yours

Train 🤗transformers with DeepSpeed: ZeRO-2, ZeRO-3

PyKaldi is a Python scripting layer for the Kaldi speech recognition toolkit.

Contact Extraction with Question Answering.

A retro text-to-speech bot for Discord

Non-Autoregressive Predictive Coding

Shellcode antivirus evasion framework

The official code for “DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction”, ACM MM, Oral Paper, 2021.

A framework for implementing federated learning

This repository contains the code, data, and models of the paper titled "CrossSum: Beyond English-Centric Cross-Lingual Abstractive Text Summarization for 1500+ Language Pairs".

Predict an emoji that is associated with a text