Text Classification Using LSTM

Overview

Text-Classification-Using-LSTM

Ontology Classification-Using-LSTM

Introduction

Text classification is the task of assigning a set of predefined categories to free text. Text classifiers can be used to organize, structure, and categorize pretty much anything. For example, new articles can be organized by topics, support tickets can be organized by urgency, chat conversations can be organized by language, brand mentions can be organized by sentiment, and so on.

Technologies Used

1. IDE - Pycharm
2. LSTM - As a classification Deep learning Model
3. GPU - P-4000
4. Google Colab - Text Analysis
5. Flas- Fast API
6. Postman - API Tester
7. Gensim - Word2Vec embeddings

๐Ÿ”‘ Prerequisites All the dependencies and required libraries are included in the file requirements.txt

  Python 3.6

Dataset

The DBpedia ontology classification dataset is constructed by picking 14 non-overlapping classes from DBpedia 2014. They are listed in classes.txt. From each of thse 14 ontology classes, we randomly choose 40,000 training samples and 5,000 testing samples. Therefore, the total size of the training dataset is 560,000 and testing dataset 70,000. The files train.csv and test.csv contain all the training samples as comma-sparated values. There are 3 columns in them, corresponding to class index (1 to 14), title and content. The title and content are escaped using double quotes ("), and any internal double quote is escaped by 2 double quotes (""). There are no new lines in title or content.

For Dataset Please click here

Process - Flow of This project

๐Ÿš€ Installation of Text-Classification-Using-LSTM

  1. Clone the repo
git clone https://github.com/KrishArul26/Text-Classification-DBpedia-ontology-classes-Using-LSTM.git
  1. Change your directory to the cloned repo
cd Text-Classification-DBpedia-ontology-classes-Using-LSTM

  1. Create a Python 3.6 version of virtual environment name 'lstm' and activate it
pip install virtualenv

virtualenv bert

lstm\Scripts\activate

  1. Now, run the following command in your Terminal/Command Prompt to install the libraries required!!!
pip install -r requirements.txt

๐Ÿ’ก Working

Type the following command:

python app.py

After that You will see the running IP adress just copy and paste into you browser and import or upload your speech then closk the predict button.

Implementations

In this section, contains the project directory, explanation of each python file presents in the directory.

1. Project Directory

Below picture illustrate the complete folder structure of this project.

2. preprocess.py

Below picture illustrate the preprocess.py file, It does the necessary text cleaning process such as removing punctuation, numbers, lemmatization. And it will create train_preprocessed, validation_preprocessed and test_preprocessed pickle files for the further analysis.

3. word_embedder_gensim.py

Below picture illustrate the word_embedder_gensim.py, After done with text pre-processing, this file will take those cleaned text as input and will be creating the Word2vec embedding for each word.

4. rnn_w2v.py

Below picture illustrate the rnn_w2v.py, After done with creating Word2vec for each word then those vectors will use as input for creating the LSTM model and Train the LSTM (RNN) model with body and Classes.

5. index.htmml

Below picture illustrate the index.html file, these files use to create the web frame for us.

6. main.py

Below picture illustrate the main.py, After evaluating the LSTM model, This files will create the Rest -API, To that It will use FLASK frameworks and get the request from the customer or client then It will Post into the prediction files and Answer will be deliver over the web browser.

7. Testing Rest-API

Owner
KrishArul26
Google Certified - TensorFlow Developer | Google Cloud Associated Engineer | Enthusiastic in Machine Learning | Deep Learning | Object Detection | AI
KrishArul26
๐Ÿงช Cutting-edge experimental spaCy components and features

spacy-experimental: Cutting-edge experimental spaCy components and features This package includes experimental components and features for spaCy v3.x,

Explosion 65 Dec 30, 2022
Open solution to the Toxic Comment Classification Challenge

Starter code: Kaggle Toxic Comment Classification Challenge More competitions ๐ŸŽ‡ Check collection of public projects ๐ŸŽ , where you can find multiple

minerva.ml 153 Jun 22, 2022
Chinese real time voice cloning (VC) and Chinese text to speech (TTS).

Chinese real time voice cloning (VC) and Chinese text to speech (TTS). ๅฅฝ็”จ็š„ไธญๆ–‡่ฏญ้Ÿณๅ…‹้š†ๅ…ผไธญๆ–‡่ฏญ้Ÿณๅˆๆˆ็ณป็ปŸ๏ผŒๅŒ…ๅซ่ฏญ้Ÿณ็ผ–็ ๅ™จใ€่ฏญ้Ÿณๅˆๆˆๅ™จใ€ๅฃฐ็ ๅ™จๅ’Œๅฏ่ง†ๅŒ–ๆจกๅ—ใ€‚

Kuang Dada 6 Nov 08, 2022
CMeEE ๆ•ฐๆฎ้›†ๅŒปๅญฆๅฎžไฝ“ๆŠฝๅ–

ๅŒปๅญฆๅฎžไฝ“ๆŠฝๅ–_GlobalPointer_torch ไป‹็ป ๆ€ๆƒณๆฅ่‡ชไบŽ่‹็ฅž GlobalPointer๏ผŒๅŽŸๅง‹็‰ˆๆœฌๆ˜ฏๅŸบไบŽkerasๅฎž็Žฐ็š„๏ผŒๆจกๅž‹็ป“ๆž„ๅฎž็Žฐๅ‚่€ƒ็Žฐๆœ‰ pytorch ๅค็Žฐไปฃ็ ใ€ๆ„Ÿ่ฐข!ใ€‘๏ผŒๅŸบไบŽtorch็™พๅˆ†็™พๅค็Žฐ่‹็ฅžๅŽŸๅง‹ๆ•ˆๆžœใ€‚ ๆ•ฐๆฎ้›† ไธญๆ–‡ๅŒปๅญฆๅ‘ฝๅๅฎžไฝ“ๆ•ฐๆฎ้›† ็‚น่ฟ™้‡Œ็”ณ่ฏท๏ผŒๅพˆ็ฎ€ๅ•๏ผŒๅ…ฑๅŒ…ๅซไน็ฑปๅŒปๅญฆ

85 Dec 28, 2022
Framework for fine-tuning pretrained transformers for Named-Entity Recognition (NER) tasks

NERDA Not only is NERDA a mesmerizing muppet-like character. NERDA is also a python package, that offers a slick easy-to-use interface for fine-tuning

Ekstra Bladet 141 Dec 30, 2022
End-to-End Speech Processing Toolkit

ESPnet: end-to-end speech processing toolkit system/pytorch ver. 1.0.1 1.1.0 1.2.0 1.3.1 1.4.0 1.5.1 1.6.0 1.7.1 1.8.1 ubuntu18/python3.8/pip ubuntu18

ESPnet 5.9k Jan 03, 2023
Sentiment Classification using WSD, Maximum Entropy & Naive Bayes Classifiers

Sentiment Classification using WSD, Maximum Entropy & Naive Bayes Classifiers

Pulkit Kathuria 173 Jan 04, 2023
Google's Meena transformer chatbot implementation

Here's my attempt at recreating Meena, a state of the art chatbot developed by Google Research and described in the paper Towards a Human-like Open-Domain Chatbot.

Francesco Pham 94 Dec 25, 2022
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

ALBERT ***************New March 28, 2020 *************** Add a colab tutorial to run fine-tuning for GLUE datasets. ***************New January 7, 2020

Google Research 3k Dec 26, 2022
This is a project built for FALLABOUT2021 event under SRMMIC, This project deals with NLP poetry generation.

FALLABOUT-SRMMIC 21 POETRY-GENERATION HINGLISH DESCRIPTION We have developed a NLP(natural language processing) model which automatically generates a

7 Sep 28, 2021
A versatile token stream for handwritten parsers.

Writing recursive-descent parsers by hand can be quite elegant but it's often a bit more verbose than expected, especially when it comes to handling indentation and reporting proper syntax errors. Th

Valentin Berlier 8 Nov 30, 2022
SciBERT is a BERT model trained on scientific text.

SciBERT is a BERT model trained on scientific text.

AI2 1.2k Dec 24, 2022
์ˆญ์‹ค๋Œ€ํ•™๊ต ์ปดํ“จํ„ฐํ•™๋ถ€ ์ „๊ณต์ข…ํ•ฉ์„ค๊ณ„ํ”„๋กœ์ ํŠธ

โœจ ์‹œ๊ฐ์žฅ์• ์ธ์„ ์œ„ํ•œ ๋ฒ„์Šค๋„์ฐฉ ์•Œ๋ฆผ ์žฅ์น˜ โœจ ๐Ÿ‘€ ๊ฐœ์š” ํ˜„๋Œ€ ์‚ฌํšŒ์—์„œ ๋Œ€์ค‘๊ตํ†ต ์œ„์น˜ ์ •๋ณด๋ฅผ ์ด์šฉํ•˜์—ฌ ์‚ฌ๋žŒ๋“ค์ด ๊ฐ„๋‹จํ•˜๊ฒŒ ์ด์šฉํ•  ๋Œ€์ค‘๊ตํ†ต์˜ ์ •๋ณด๋ฅผ ์–ป๊ณ  ์‰ฝ๊ฒŒ ๋Œ€์ค‘๊ตํ†ต์„ ์ด์šฉํ•  ์ˆ˜ ์žˆ๋‹ค. ํ•ด๋‹น ์ •๋ณด๋Š” ๊ฐ์ข… ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜๊ณผ ๋Œ€์ค‘๊ตํ†ต ์ด์šฉ์‹œ์„ค์—์„œ ์œ„์น˜ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•˜๊ณ  ์žˆ์ง€๋งŒ ์‹œ๊ฐ

taegyun 3 Jan 25, 2022
A Survey of Natural Language Generation in Task-Oriented Dialogue System (TOD): Recent Advances and New Frontiers

A Survey of Natural Language Generation in Task-Oriented Dialogue System (TOD): Recent Advances and New Frontiers

Libo Qin 132 Nov 25, 2022
This is the writeup of all the challenges from Advent-of-cyber-2019 of TryHackMe

Advent-of-cyber-2019-writeup This is the writeup of all the challenges from Advent-of-cyber-2019 of TryHackMe https://tryhackme.com/shivam007/badges/c

shivam danawale 5 Jul 17, 2022
Training open neural machine translation models

Train Opus-MT models This package includes scripts for training NMT models using MarianNMT and OPUS data for OPUS-MT. More details are given in the Ma

Language Technology at the University of Helsinki 167 Jan 03, 2023
Implementation of Fast Transformer in Pytorch

Fast Transformer - Pytorch Implementation of Fast Transformer in Pytorch. This only work as an encoder. Yannic video AI Epiphany Install $ pip install

Phil Wang 167 Dec 27, 2022
NLP techniques such as named entity recognition, sentiment analysis, topic modeling, text classification with Python to predict sentiment and rating of drug from user reviews.

This file contains the following documents sumbited for Baruch CIS9665 group 9 fall 2021. 1. Dataset: drug_reviews.csv 2. python codes for text classi

Aarif Munwar Jahan 2 Jan 04, 2023
Anomaly Detection ์ด์ƒ์น˜ ํƒ์ง€ ์ „์ฒ˜๋ฆฌ ๋ชจ๋“ˆ

Anomaly Detection ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์ด์ƒ์น˜ ํƒ์ง€ 1. Kernel Density Estimation์„ ํ™œ์šฉํ•œ ์ด์ƒ์น˜ ํƒ์ง€ train_data_path์™€ test_data_path์— ์กด์žฌํ•˜๋Š” ์‹œ์  ์ •๋ณด๋ฅผ ํฌํ•จํ•˜๊ณ  ์žˆ๋Š” csv ํ˜•ํƒœ์˜ train data์™€

CLUST-consortium 43 Nov 28, 2022
Treemap visualisation of Maya scene files

Ever wondered which nodes are responsible for that 600 mb+ Maya scene file? Features Fast, resizable UI Parsing at 50 mb/sec Dependency-free, single-f

Marcus Ottosson 76 Nov 12, 2022