Deep learning for NLP crash course at ABBYY.

Last update: Dec 18, 2022

Overview

Deep NLP Course at ABBYY

Deep learning for NLP crash course at ABBYY.

I'm gradually updating and translating the notebooks right now. Stay in touch.

Materials

Week 1: Introduction

Sentiment analysis on the IMDB movie review dataset: a short overview of classical machine learning for NLP + indecently brief intro to keras.

Russian version:

Updated English version:

Week 2: Word Embeddings: Part 1

Meet the Word Embeddings: an unsupervised method to capture some fun relationships between words.
Phrases similarity with word embeddings model + word based machine translation without parallel data (with MUSE word embeddings).

Russian version:

Updated English version:

Week 3: Word Embeddings: Part 2

Introduction to PyTorch. Implementation of pet linear regression on pure numpy and pytorch. Implementations of CBoW, skip-gram, negative sampling and structured Word2vec models.

Russian version:

Updated English version:

Week 4: Convolutional Neural Networks

Introduction to convolutional networks. Relations between convolutions and n-grams. Simple surname detector on character-level convolutions + fun visualizations.

Russian version:

Updated English version:

Week 5: RNNs: Part 1

RNNs for text classification. Simple RNN implementation + memorization test. Surname detector in multilingual setup: character-level LSTM classifier.

Russian version:

Updated English version:

Week 6: RNNs: Part 2

RNNs for sequence labelling. Part-of-speech tagger implementations based on word embeddings and character-level word embeddings.

Russian version:

Week 7: Language Models: Part 1

Character-level language model for Russian troll tweets generation: fixed-window model via convolutions and RNN model.
Simple conditional language model: surname generation given source language.
And Toxic Comment Classification Challenge - to apply your skills to a real-world problem.

Russian version:

Week 8: Language Models: Part 2

Word-level language model for poetry generation. Pet examples of transfer learning and multi-task learning applied to language models.

Russian version:

Week 9: Seq2seq

Seq2seq for machine translation and image captioning. Byte-pair encoding, beam search and other usefull stuff for machine translation.

Russian version:

Week 10: Seq2seq with Attention

Seq2seq with attention for machine translation and image captioning.

Russian version:

Week 11: Transformers & Text Summarization

Implementation of Transformer model for text summarization. Discussion of Pointer-Generator Networks for text summarization.

Russian version:

Week 12: Dialogue Systems: Part 1

Goal-orientied dialogue systems. Implemention of the multi-task model: intent classifier and token tagger for dialogue manager.

Russian version:

Week 13: Dialogue Systems: Part 2

General conversation dialogue systems and DSSMs. Implementation of question answering model on SQuAD dataset and chit-chat model on OpenSubtitles dataset.

Russian version:

Week 14: Pretrained Models

Pretrained models for various tasks: Universal Sentence Encoder for sentence similarity, ELMo for sequence tagging (with a bit of CRF), BERT for SWAG - reasoning about possible continuation.

Russian version:

Final Presentation

NLP Summary - summary of cool stuff that appeared and didn't in the course.

Deep learning for NLP crash course at ABBYY.

Related tags

Overview

Deep NLP Course at ABBYY

Materials

Week 1: Introduction

Week 2: Word Embeddings: Part 1

Week 3: Word Embeddings: Part 2

Week 4: Convolutional Neural Networks

Week 5: RNNs: Part 1

Week 6: RNNs: Part 2

Week 7: Language Models: Part 1

Week 8: Language Models: Part 2

Week 9: Seq2seq

Week 10: Seq2seq with Attention

Week 11: Transformers & Text Summarization

Week 12: Dialogue Systems: Part 1

Week 13: Dialogue Systems: Part 2

Week 14: Pretrained Models

Final Presentation

Owner

Dan Anastasyev

A Japanese tokenizer based on recurrent neural networks

Yet Another Sequence Encoder - Encode sequences to vector of vector in python !

NLTK Source

Official source for spanish Language Models and resources made @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).

Code release for "COTR: Correspondence Transformer for Matching Across Images"

本插件是pcrjjc插件的重置版，可以独立于后端api运行

Utilizing RBERT model for KLUE Relation Extraction task

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

Experiments in converting wikidata to ftm

Twitter-Sentiment-Analysis - Analysis of twitter posts' positive and negative score.

Creating an Audiobook (mp3 file) using a Ebook (epub) using BeautifulSoup and Google Text to Speech

Continuously update some NLP practice based on different tasks.

A workshop with several modules to help learn Feast, an open-source feature store

Examples of using sparse attention, as in "Generating Long Sequences with Sparse Transformers"

Task-based datasets, preprocessing, and evaluation for sequence models.

In this project, we compared Spanish BERT and Multilingual BERT in the Sentiment Analysis task.

easySpeech is an open-source Python wrapper for google speech to text API that doesn't require PyAudio(So you especially windows user don't have to deal with the errors while installing PyAudio) and also works with hugging face transformers

Python-zhuyin - An open source Python library that provides a unified interface for converting between Chinese pinyin and Zhuyin (bopomofo)

Natural language processing summarizer using 3 state of the art Transformer models: BERT, GPT2, and T5

TTS is a library for advanced Text-to-Speech generation.