Code Generation using a large neural network called GPT-J

Last update: Dec 31, 2022

Overview

CodeGenX

CodeGenX is a Code Generation system powered by Artificial Intelligence! It is delivered to you in the form of a Visual Studio Code Extension and is Free and Open-source!

Installation

You can find installation instructions and additional information about CodeGenX in the documentation here.

About CodeGenX

1. Languages Supported

CodeGenX currently only supports Python. We are planning to add additional languages in future releases.

2. Modules Trained On

CodeGenX was trained on Python code which covers many of its common uses. Some libraries which CodeGenX is specifically trained on are:

Tensorflow
Pytorch
Scikit-Learn
Pandas
NumPy
OpenCV
Django
Flask
PyGame

3. How CodeGenX Works

At the core of CodeGenX lies a large neural network called GPT-J. GPT-J is a 6 billion parameter transformer model which was trained on hundreds of gigabytes of text from the internet. We fine-tuned this model on a dataset of open-source python code. This fine-tuned model can now be used to generate code when given an input with the right instructions.

Contributors ✨

This project would not have been possible without the help of these wonderful people:

_{Arya Manjaramkar}	_{Matthias Wijnsma}	_{Thomas Houtrique}	_{Dominic Rampas}	_{Bilel Medimegh}	_{Josh Hills}	_Alex
_Tiimo

Acknowledgements

Many thanks to the support of the Google TPU Research Cloud for providing the precious compute needed for this project.

Code Generation using a large neural network called GPT-J

Related tags

Overview

CodeGenX

Installation

About CodeGenX

1. Languages Supported

2. Modules Trained On

3. How CodeGenX Works

Contributors ✨

Acknowledgements

Owner

DeepGenX

⚖️ A Statutory Article Retrieval Dataset in French.

ProtFeat is protein feature extraction tool that utilizes POSSUM and iFeature.

DVC-NLP-Simple-usecase

Wikipedia-Utils: Preprocessing Wikipedia Texts for NLP

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.

Code for the paper TestRank: Bringing Order into Unlabeled Test Instances for Deep Learning Tasks

A python package for deep multilingual punctuation prediction.

An open collection of annotated voices in Japanese language

code for modular summarization work published in ACL2021 by Krishna et al

A fast, efficient universal vector embedding utility package.

The training code for the 4th place model at MDX 2021 leaderboard A.

nlpcommon is a python Open Source Toolkit for text classification.

Extracting Summary Knowledge Graphs from Long Documents

Toward Model Interpretability in Medical NLP

Continuously update some NLP practice based on different tasks.

Simple GUI where you can enter an article and get a crisp summarized version.

QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries

Code for the project carried out fulfilling the course requirements for Fall 2021 NLP at NYU

Crowd sourced training data for Rasa NLU models

Knowledge Oriented Programming Language