The aim of this task is to predict someone's English proficiency based on a text input.

Last update: Dec 13, 2021

Overview

English_proficiency_prediction_NLP

The aim of this task is to predict someone's English proficiency based on a text input.

Using the The NICT JLE Corpus available here : https://alaginrc.nict.go.jp/nict_jle/index_E.html

The source of the corpus data is the transcripts of the audio-recorded speech samples of 1,281 participants (1.2 million words, 300 hours in total) of English oral proficiency interview test. Each participant got a SST (Standard Speaking Test) score between 1 (low proficiency) and 9 (high proficiency) based on this test.

The goal is to build a machine learning algorithm for predicting the SST score of each participant based on their transcript.

Steps:

1 - Pre-process the dataset: extract the participant transcript (all tags). Inside participant transcript, you can remove all other tags and extract only English words.

2 - Process the dataset: extract features with the Bag of Word (BoW) technique

3 - Train a classifier to predict the SST score

4 - Compute the accuracy of your system (the number of participant classified correctly) and plot the confusion matrix.

5 - Try to improve your system (for example you can try to use GloVe instead of BoW).

The aim of this task is to predict someone's English proficiency based on a text input.

Related tags

Overview

English_proficiency_prediction_NLP

Owner

The first online catalogue for Arabic NLP datasets.

Dust model dichotomous performance analysis

Creating a chess engine using GPT-3

Enterprise Scale NLP with Hugging Face & SageMaker Workshop series

A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR models

Official implementation of Meta-StyleSpeech and StyleSpeech

Unofficial PyTorch implementation of Google AI's VoiceFilter system

LOT: A Benchmark for Evaluating Chinese Long Text Understanding and Generation

Simple program that translates the name of files into English

PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer

Code associated with the Don't Stop Pretraining ACL 2020 paper

Blender addon - Scrub timeline from viewport with a shortcut

Opal-lang - A WIP programming language based on Python

Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code.

Code release for "COTR: Correspondence Transformer for Matching Across Images"

Python SDK for working with Voicegain Speech-to-Text

Deep learning for NLP crash course at ABBYY.

A Word Level Transformer layer based on PyTorch and 🤗 Transformers.

Text Classification Using LSTM

PortaSpeech - PyTorch Implementation