초성 해석기 based on ko-BART

Last update: Oct 28, 2022

Related tags

Overview

초성 해석기

개요

한국어 초성만으로 이루어진 문장을 입력하면, 완성된 문장을 예측하는 초성 해석기입니다.

초성: ㄴㄴ ㄴㄹ ㅈㅇㅎ
예측 문장: 나는 너를 좋아해

모델

모델은 SKT-AI에서 공개한 Ko-BART를 이용합니다.

데이터

문장 단위로 이루어진 아무 코퍼스나 사용가능합니다. 단, 모델의 추론 성능은 데이터의 도메인이나 데이터의 양에 크게 의존하기 때문에 원하는 모델 성능에 맞는 코퍼스를 사용해주세요. ./data 디렉토리에 더미 데이터셋을 추가해두었으니, 더미 데이터셋과 동일한 형식의 코퍼스를 준비해두시면 됩니다.

학습

python run_train.py

추론

python run_inference.py --finetuned-model-path $FINETUNED_MODEL_PATH

예시

공개된 코퍼스로 학습한 모델의 추론 결과입니다.

초성: ㅂㄱㅍㄷ 	 예측 문장: 배고픈데
초성: ㅂㄱㅍㄷ 	 예측 문장: 배고프다
초성: ㅂㄱㅍㄷ 	 예측 문장: 배고프대

초성: ㄴㅁㄴㅁ ㅅㄹㅎㅇ 	 예측 문장: 너무너무 사랑해요
초성: ㄴㅁㄴㅁ ㅅㄹㅎㅇ 	 예측 문장: 너무너무 사랑했어
초성: ㄴㅁㄴㅁ ㅅㄹㅎㅇ 	 예측 문장: 나만너무 사랑해요

초성: ㄴㄴ ㄴㄹ ㅈㅇㅎ 	 예측 문장: 나는 너를 좋아해
초성: ㄴㄴ ㄴㄹ ㅈㅇㅎ 	 예측 문장: 누나 나랑 좋아해
초성: ㄴㄴ ㄴㄹ ㅈㅇㅎ 	 예측 문장: 너는 나를 좋아해

Notes

본 레포는 별도의 학습 데이터를 포함하고 있지 않습니다.
본 레포의 라이센스는 Ko-BART의 modified-MIT 라이센스를 따릅니다.

Todo

테스트 코드 추가

초성 해석기 based on ko-BART

Related tags

Overview

초성 해석기

개요

모델

데이터

학습

추론

예시

Notes

Todo

Owner

Dawoon Jung

FB ID CLONER WUTHOT CHECKPOINT, FACEBOOK ID CLONE FROM FILE

Athena is an open-source implementation of end-to-end speech processing engine.

Python powered crossword generator with database with 20k+ polish words

UniSpeech - Large Scale Self-Supervised Learning for Speech

Built for cleaning purposes in military institutions

A notebook that shows how to import the IITB English-Hindi Parallel Corpus from the HuggingFace datasets repository

nlp基础任务

Beta Distribution Guided Aspect-aware Graph for Aspect Category Sentiment Analysis with Affective Knowledge. Proceedings of EMNLP 2021

BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model

Python generation script for BitBirds

EMNLP 2021 paper "Pre-train or Annotate? Domain Adaptation with a Constrained Budget".

NumPy String-Indexed is a NumPy extension that allows arrays to be indexed using descriptive string labels

vits chinese, tts chinese, tts mandarin

This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL 2021.

Russian GPT3 models.

A single model that parses Universal Dependencies across 75 languages.

Big Bird: Transformers for Longer Sequences

A simple Streamlit App to classify swahili news into different categories.

Awesome Treasure of Transformers Models Collection

The implementation of Parameter Differentiation based Multilingual Neural Machine Translation