NLP: SLU tagging

Last update: Jan 14, 2022

Related tags

Text Data & NLP slu-homework

Overview

创建环境

conda create -n slu python=3.6
source activate slu
pip install torch==1.7.1

运行

训练：在根目录下运行

python scripts/slu_baseline.py

测试：在根目录下运行（将会读取test_unlabelled.json并在data目录下生成test.json）环境与原始相同

python scripts/slu_evaluate.py

代码说明

utils/args.py:定义了所有涉及到的可选参数，如需改动某一参数可以在运行的时候将命令修改成
```
  python scripts/slu_baseline.py --
      
      

      
     
```
其中，为要修改的参数名，为修改后的值
utils/initialization.py:初始化系统设置，包括设置随机种子和显卡/CPU
utils/vocab.py:构建编码输入输出的词表
utils/word2vec.py:读取词向量
utils/example.py:读取数据
utils/batch.py:将数据以批为单位转化为输入
model/slu_baseline_tagging.py:baseline模型
scripts/slu_baseline.py:主程序脚本

有关预训练语言模型

本次代码中没有加入有关预训练语言模型的代码，如需使用预训练语言模型我们推荐使用下面几个预训练模型，若使用预训练语言模型，不要使用large级别的模型

Bert: https://huggingface.co/bert-base-chinese
Bert-WWM: https://huggingface.co/hfl/chinese-bert-wwm-ext
Roberta-WWM: https://huggingface.co/hfl/chinese-roberta-wwm-ext
MacBert: https://huggingface.co/hfl/chinese-macbert-base

推荐使用的工具库

transformers
- 使用预训练语言模型的工具库: https://huggingface.co/
nltk
- 强力的NLP工具库: https://www.nltk.org/
stanza
- 强力的NLP工具库: https://stanfordnlp.github.io/stanza/
jieba
- 中文分词工具: https://github.com/fxsjy/jieba

Owner

北海若

Undergraduate, at SJTU & MSRA.

北海若

GitHub Repository

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

MMF is a modular framework for vision and language multimodal research from Facebook AI Research. MMF contains reference implementations of state-of-t

5.1k Dec 26, 2022

:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

Dedupe Python Library dedupe is a python library that uses machine learning to perform fuzzy matching, deduplication and entity resolution quickly on

3.6k Jan 02, 2023

Code for hyperboloid embeddings for knowledge graph entities

Implementation for the papers: Self-Supervised Hyperboloid Representations from Logical Queries over Knowledge Graphs, Nurendra Choudhary, Nikhil Rao,

30 Dec 10, 2022

💫 Industrial-strength Natural Language Processing (NLP) in Python

spaCy: Industrial-strength NLP spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest researc

24.9k Jan 02, 2023

BookNLP, a natural language processing pipeline for books

BookNLP BookNLP is a natural language processing pipeline that scales to books and other long documents (in English), including: Part-of-speech taggin

654 Jan 02, 2023

Implementation of some unbalanced loss like focal_loss, dice_loss, DSC Loss, GHM Loss et.al

Implementation of some unbalanced loss for NLP task like focal_loss, dice_loss, DSC Loss, GHM Loss et.al Summary Here is a loss implementation reposit

121 Jan 01, 2023

Fuzzy String Matching in Python

FuzzyWuzzy Fuzzy string matching like a boss. It uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package.

8.8k Jan 01, 2023

EdiTTS: Score-based Editing for Controllable Text-to-Speech

Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech

99 Jan 02, 2023

SimpleChinese2 集成了许多基本的中文NLP功能，使基于 Python 的中文文字处理和信息提取变得简单方便。

SimpleChinese2 SimpleChinese2 集成了许多基本的中文NLP功能，使基于 Python 的中文文字处理和信息提取变得简单方便。声明本项目是为方便个人工作所创建的，仅有部分代码原创。

30 Dec 02, 2022

Th2En & Th2Zh: The large-scale datasets for Thai text cross-lingual summarization

Th2En & Th2Zh: The large-scale datasets for Thai text cross-lingual summarization 📥 Download Datasets 📥 Download Trained Models INTRODUCTION TH2ZH (

5 Jan 03, 2022

A simple chatbot based on chatterbot that you can use for anything has basic features

Chatbotium A simple chatbot based on chatterbot that you can use for anything has basic features. I have some errors Read the paragraph below: Known b

1 Feb 16, 2022

Multilingual word vectors in 78 languages

Aligning the fastText vectors of 78 languages Facebook recently open-sourced word vectors in 89 languages. However these vectors are monolingual; mean

1.2k Dec 17, 2022

Deploying a Text Summarization NLP use case on Docker Container Utilizing Nvidia GPU

GPU Docker NLP Application Deployment Deploying a Text Summarization NLP use case on Docker Container Utilizing Nvidia GPU, to setup the enviroment on

9 Oct 14, 2022

Tools for curating biomedical training data for large-scale language modeling

Tools for curating biomedical training data for large-scale language modeling

242 Dec 25, 2022

ConferencingSpeech2022; Non-intrusive Objective Speech Quality Assessment (NISQA) Challenge

ConferencingSpeech 2022 challenge This repository contains the datasets list and scripts required for the ConferencingSpeech 2022 challenge. For more

21 Dec 02, 2022

An IVR Chatbot which can exponentially reduce the burden of companies as well as can improve the consumer/end user experience.

IVR-Chatbot Achievements 🏆 Team Uhtred won the Maverick 2.0 Bot-a-thon 2021 organized by AbInbev India. ❓ Problem Statement As we all know that, lot

9 Dec 08, 2022

Source code for AAAI20 "Generating Persona Consistent Dialogues by Exploiting Natural Language Inference".

Generating Persona Consistent Dialogues by Exploiting Natural Language Inference Source code for RCDG model in AAAI20 Generating Persona Consistent Di

16 Oct 08, 2022

Bpe algorithm can finetune tokenizer - Bpe algorithm can finetune tokenizer

"# bpe_algorithm_can_finetune_tokenizer" this is an implyment for https://github

1 Feb 02, 2022

NVDA, the free and open source Screen Reader for Microsoft Windows

NVDA NVDA (NonVisual Desktop Access) is a free, open source screen reader for Microsoft Windows. It is developed by NV Access in collaboration with a

1.6k Jan 07, 2023

A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

A Deep Learning NLP/NLU library by Intel® AI Lab Overview | Models | Installation | Examples | Documentation | Tutorials | Contributing NLP Architect

2.9k Dec 31, 2022