📝 Wrapper library for text generation / language models at char and word level with RNN in TensorFlow

Last update: May 22, 2021

Overview

tensorlm

Generate Shakespeare poems with 4 lines of code.

Installation

tensorlm is written in / for Python 3.4+ and TensorFlow 1.1+

pip3 install tensorlm

Basic Usage

Use the CharLM or WordLM class:

import tensorflow as tf
from tensorlm import CharLM
    
with tf.Session() as session:
    
    # Create a new model. You can also use WordLM
    model = CharLM(session, "datasets/sherlock/tinytrain.txt", max_vocab_size=96,
                   neurons_per_layer=100, num_layers=3, num_timesteps=15)
    
    # Train it 
    model.train(session, max_epochs=10, max_steps=500)
    
    # Let it generate a text
    generated = model.sample(session, "The ", num_steps=100)
    print("The " + generated)

This should output something like:

The  ee e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e

Command Line Usage

Train: python3 -m tensorlm.cli --train=True --level=char --train_text_path=datasets/sherlock/tinytrain.txt --max_vocab_size=96 --neurons_per_layer=100 --num_layers=2 --batch_size=10 --num_timesteps=15 --save_dir=out/model --max_epochs=300 --save_interval_hours=0.5

Sample: python3 -m tensorlm.cli --sample=True --level=char --neurons_per_layer=400 --num_layers=3 --num_timesteps=160 --save_dir=out/model

Evaluate: python3 -m tensorlm.cli --evaluate=True --level=char --evaluate_text_path=datasets/sherlock/tinyvalid.txt --neurons_per_layer=400 --num_layers=3 --batch_size=10 --num_timesteps=160 --save_dir=out/model

See python3 -m tensorlm.cli --help for all options.

Advanced Usage

Custom Input Data

The inputs and targets don't have to be text. GeneratingLSTM only expects token ids, so you can use any data type for the sequences, as long as you can encode the data to integer ids.

# We use integer ids from 0 to 19, so the vocab size is 20. The range of ids must always start
# at zero.
batch_inputs = np.array([[1, 2, 3, 4], [15, 16, 17, 18]])  # 2 batches, 4 time steps each
batch_targets = np.array([[2, 3, 4, 5], [16, 17, 18, 19]])

# Create the model in a TensorFlow graph
model = GeneratingLSTM(vocab_size=20, neurons_per_layer=10, num_layers=2, max_batch_size=2)

# Initialize all defined TF Variables
session.run(tf.global_variables_initializer())

for _ in range(5000):
    model.train_step(session, batch_inputs, batch_targets)

sampled = model.sample_ids(session, [15], num_steps=3)
print("Sampled: " + str(sampled))

This should output something like:

Sampled: [16, 18, 19]

Custom Training, Dropout etc.

Use the GeneratingLSTM class directly. This class is agnostic to the dataset type. It expects integer ids and returns integer ids.

import tensorflow as tf
from tensorlm import Vocabulary, Dataset, GeneratingLSTM

BATCH_SIZE = 20
NUM_TIMESTEPS = 15

with tf.Session() as session:
    # Generate a token -> id vocabulary based on the text
    vocab = Vocabulary.create_from_text("datasets/sherlock/tinytrain.txt", max_vocab_size=96,
                                        level="char")

    # Obtain input and target batches from the text file
    dataset = Dataset("datasets/sherlock/tinytrain.txt", vocab, BATCH_SIZE, NUM_TIMESTEPS)

    # Create the model in a TensorFlow graph
    model = GeneratingLSTM(vocab_size=vocab.get_size(), neurons_per_layer=100, num_layers=2,
                           max_batch_size=BATCH_SIZE, output_keep_prob=0.5)

    # Initialize all defined TF Variables
    session.run(tf.global_variables_initializer())

    # Do the training
    epoch = 1
    step = 1
    for epoch in range(20):
        for inputs, targets in dataset:
            loss = model.train_step(session, inputs, targets)

            if step % 100 == 0:
                # Evaluate from time to time
                dev_dataset = Dataset("datasets/sherlock/tinyvalid.txt", vocab,
                                      batch_size=BATCH_SIZE, num_timesteps=NUM_TIMESTEPS)
                dev_loss = model.evaluate(session, dev_dataset)
                print("Epoch: %d, Step: %d, Train Loss: %f, Dev Loss: %f" % (
                    epoch, step, loss, dev_loss))

                # Sample from the model from time to time
                print("Sampled: \"The " + model.sample_text(session, vocab, "The ") + "\"")

            step += 1

This should output something like:

Epoch: 3, Step: 100, Train Loss: 3.824941, Dev Loss: 3.778008
Sampled: "The                                                                                                     "
Epoch: 7, Step: 200, Train Loss: 2.832825, Dev Loss: 2.896187
Sampled: "The                                                                                                     "
Epoch: 11, Step: 300, Train Loss: 2.778579, Dev Loss: 2.830176
Sampled: "The         eee                                                                                         "
Epoch: 15, Step: 400, Train Loss: 2.655153, Dev Loss: 2.684828
Sampled: "The        ee    e  e   e  e  e  e  e  e  e   e  e  e   e  e  e   e  e  e   e  e  e   e  e  e   e  e  e "
Epoch: 19, Step: 500, Train Loss: 2.444502, Dev Loss: 2.479753
Sampled: "The    an  an  an  on  on  on  on  on  on  on  on  on  on  on  on  on  on  on  on  on  on  on  on  on  o"

RNN Predict Street Commercial Vitality

RNN-for-Predicting-Street-Vitality Code and dataset for Predicting the Vitality of Stores along the Street based on Business Type Sequence via Recurre

1 Dec 15, 2021

Emotion classification of online comments based on RNN

emotion_classification Emotion classification of online comments based on RNN, the accuracy of the model in the test set reaches 99% data: Large Movie

1 Nov 23, 2021

Pytorch implementation of the popular Improv RNN model originally proposed by the Magenta team.

Pytorch Implementation of Improv RNN Overview This code is a pytorch implementation of the popular Improv RNN model originally implemented by the Mage

3 Nov 11, 2022

Static Features Classifier - A static features classifier for Point-Could clusters using an Attention-RNN model

Static Features Classifier This is a static features classifier for Point-Could

1 Jan 25, 2022

Source code for the GPT-2 story generation models in the EMNLP 2020 paper "STORIUM: A Dataset and Evaluation Platform for Human-in-the-Loop Story Generation"

Storium GPT-2 Models This is the official repository for the GPT-2 models described in the EMNLP 2020 paper [STORIUM: A Dataset and Evaluation Platfor

27 Dec 20, 2022

Deep learning library featuring a higher-level API for TensorFlow.

TFLearn: Deep learning library featuring a higher-level API for TensorFlow. TFlearn is a modular and transparent deep learning library built on top of

9.6k Jan 2, 2023

Deep learning library featuring a higher-level API for TensorFlow.

TFLearn: Deep learning library featuring a higher-level API for TensorFlow. TFlearn is a modular and transparent deep learning library built on top of

9.5k Feb 12, 2021

Image-generation-baseline - MUGE Text To Image Generation Baseline

MUGE Text To Image Generation Baseline Requirements and Installation More detail

23 Oct 17, 2022

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Transformer in Transformer Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image c

272 Dec 23, 2022

Comments

Bump numpy from 1.13.1 to 1.21.0
Bumps numpy from 1.13.1 to 1.21.0.

Release notes

Sourced from numpy's releases.

v1.21.0

NumPy 1.21.0 Release Notes

The NumPy 1.21.0 release highlights are

continued SIMD work covering more functions and platforms,

initial work on the new dtype infrastructure and casting,

universal2 wheels for Python 3.8 and Python 3.9 on Mac,

improved documentation,

improved annotations,

new PCG64DXSM bitgenerator for random numbers.

In addition there are the usual large number of bug fixes and other improvements.

The Python versions supported for this release are 3.7-3.9. Official support for Python 3.10 will be added when it is released.

:warning: Warning: there are unresolved problems compiling NumPy 1.21.0 with gcc-11.1 .

Optimization level -O3 results in many wrong warnings when running the tests.

On some hardware NumPy will hang in an infinite loop.

New functions

Add PCG64DXSM BitGenerator

Uses of the PCG64 BitGenerator in a massively-parallel context have been shown to have statistical weaknesses that were not apparent at the first release in numpy 1.17. Most users will never observe this weakness and are safe to continue to use PCG64. We have introduced a new PCG64DXSM BitGenerator that will eventually become the new default BitGenerator implementation used by default_rng in future releases. PCG64DXSM solves the statistical weakness while preserving the performance and the features of PCG64.

See upgrading-pcg64 for more details.

(gh-18906)

Expired deprecations

The shape argument numpy.unravel_index cannot be passed as dims keyword argument anymore. (Was deprecated in NumPy 1.16.)

... (truncated)

Commits

b235f9e Merge pull request #19283 from charris/prepare-1.21.0-release

34aebc2 MAINT: Update 1.21.0-notes.rst

493b64b MAINT: Update 1.21.0-changelog.rst

07d7e72 MAINT: Remove accidentally created directory.

032fca5 Merge pull request #19280 from charris/backport-19277

7d25b81 BUG: Fix refcount leak in ResultType

fa5754e BUG: Add missing DECREF in new path

61127bb Merge pull request #19268 from charris/backport-19264

143d45f Merge pull request #19269 from charris/backport-19228

d80e473 BUG: Removed typing for == and != in dtypes

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 1
Bump numpy from 1.13.1 to 1.22.0
Bumps numpy from 1.13.1 to 1.22.0.

Release notes

Sourced from numpy's releases.

v1.22.0

NumPy 1.22.0 Release Notes

NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.

A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.

NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.

New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.

A new configurable allocator for use by downstream projects.

These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

Expired deprecations

Deprecated numeric style dtype strings have been removed

Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

(gh-19539)

Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

(gh-19615)

... (truncated)

Commits

4adc87d Merge pull request #20685 from charris/prepare-for-1.22.0-release

fd66547 REL: Prepare for the NumPy 1.22.0 release.

125304b wip

c283859 Merge pull request #20682 from charris/backport-20416

5399c03 Merge pull request #20681 from charris/backport-20954

f9c45f8 Merge pull request #20680 from charris/backport-20663

794b36f Update armccompiler.py

d93b14e Update test_public_api.py

7662c07 Update init.py

311ab52 Update armccompiler.py

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
Bump nltk from 3.2.4 to 3.4.5
Bumps nltk from 3.2.4 to 3.4.5.

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot ignore this [patch|minor|major] version will close this PR and stop Dependabot creating any more for this minor/major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0

Releases(v0.4.2)

v0.4.2(Apr 9, 2018)

Source code(tar.gz)
Source code(zip)
v0.4.1(Apr 9, 2018)

Source code(tar.gz)
Source code(zip)
v0.4(Apr 9, 2018)

Adds the temperature parameter to control the degree of randomness during sampling.
Source code(tar.gz)
Source code(zip)

Owner

Kilian Batzner

GitHub Repository

Bagua is a flexible and performant distributed training algorithm development framework.

786 Dec 17, 2022

Source code for our Paper "Learning in High-Dimensional Feature Spaces Using ANOVA-Based Matrix-Vector Multiplication"

NFFT4ANOVA Source code for our Paper "Learning in High-Dimensional Feature Spaces Using ANOVA-Based Matrix-Vector Multiplication" This package uses th

1 Aug 10, 2022

PyDEns is a framework for solving Ordinary and Partial Differential Equations (ODEs & PDEs) using neural networks

PyDEns PyDEns is a framework for solving Ordinary and Partial Differential Equations (ODEs & PDEs) using neural networks. With PyDEns one can solve PD

220 Dec 26, 2022

Official implementation of the ICCV 2021 paper: "The Power of Points for Modeling Humans in Clothing".

The Power of Points for Modeling Humans in Clothing (ICCV 2021) This repository contains the official PyTorch implementation of the ICCV 2021 paper: T

158 Nov 24, 2022

[CVPRW 21] "BNN - BN = ? Training Binary Neural Networks without Batch Normalization", Tianlong Chen, Zhenyu Zhang, Xu Ouyang, Zechun Liu, Zhiqiang Shen, Zhangyang Wang

BNN - BN = ? Training Binary Neural Networks without Batch Normalization Codes for this paper BNN - BN = ? Training Binary Neural Networks without Bat

40 Dec 30, 2022

UIUCTF 2021 Public Challenge Repository

UIUCTF-2021-Public UIUCTF 2021 Public Challenge Repository Notes: every challenge folder contains a challenge.yml file in the format for ctfcli, CTFd'

15 Nov 03, 2022

Official Keras Implementation for UNet++ in IEEE Transactions on Medical Imaging and DLMIA 2018

UNet++: A Nested U-Net Architecture for Medical Image Segmentation UNet++ is a new general purpose image segmentation architecture for more accurate i

1.8k Jan 07, 2023

Flexible Networks for Learning Physical Dynamics of Deformable Objects (2021)

Flexible Networks for Learning Physical Dynamics of Deformable Objects (2021) By Jinhyung Park, Dohae Lee, In-Kwon Lee from Yonsei University (Seoul,

0 Jan 09, 2022

Migration of Edge-based Distributed Federated Learning

FedFly: Towards Migration in Edge-based Distributed Federated Learning About the research Due to mobility, a device participating in Federated Learnin

11 Nov 13, 2022

"Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices", official implementation

Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices This repository contains the official PyTorch implemen

21 Oct 18, 2022

A PyTorch implementation for V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation

A PyTorch implementation of V-Net Vnet is a PyTorch implementation of the paper V-Net: Fully Convolutional Neural Networks for Volumetric Medical Imag

606 Dec 21, 2022

DecoupledNet is semantic segmentation system which using heterogeneous annotations

DecoupledNet: Decoupled Deep Neural Network for Semi-supervised Semantic Segmentation Created by Seunghoon Hong, Hyeonwoo Noh and Bohyung Han at POSTE

74 Sep 22, 2021

Marine debris detection with commercial satellite imagery and deep learning.

Marine debris detection with commercial satellite imagery and deep learning. Floating marine debris is a global pollution problem which threatens mari

56 Dec 16, 2022

Pytorch implementation of our paper accepted by NeurIPS 2021 -- Revisiting Discriminator in GAN Compression: A Generator-discriminator Cooperative Compression Scheme

Revisiting Discriminator in GAN Compression: A Generator-discriminator Cooperative Compression Scheme (NeurIPS2021) (Link) Overview Prerequisites Linu

34 Mar 31, 2022

📝 Wrapper library for text generation / language models at char and word level with RNN in TensorFlow

Related tags

Overview

tensorlm

Installation

Basic Usage

Command Line Usage

Advanced Usage

Custom Input Data

Custom Training, Dropout etc.

You might also like...

RNN Predict Street Commercial Vitality

Emotion classification of online comments based on RNN

Pytorch implementation of the popular Improv RNN model originally proposed by the Magenta team.

Static Features Classifier - A static features classifier for Point-Could clusters using an Attention-RNN model

Source code for the GPT-2 story generation models in the EMNLP 2020 paper "STORIUM: A Dataset and Evaluation Platform for Human-in-the-Loop Story Generation"

Deep learning library featuring a higher-level API for TensorFlow.

Deep learning library featuring a higher-level API for TensorFlow.

Image-generation-baseline - MUGE Text To Image Generation Baseline

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Comments

Bump numpy from 1.13.1 to 1.21.0

v1.21.0

NumPy 1.21.0 Release Notes

New functions

Add PCG64DXSM BitGenerator

Expired deprecations

Bump numpy from 1.13.1 to 1.22.0

v1.22.0

NumPy 1.22.0 Release Notes

Expired deprecations

Deprecated numeric style dtype strings have been removed

Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

Bump nltk from 3.2.4 to 3.4.5

Releases(v0.4.2)

v0.4.2(Apr 9, 2018)

v0.4.1(Apr 9, 2018)

v0.4(Apr 9, 2018)

Owner

Kilian Batzner

Bagua is a flexible and performant distributed training algorithm development framework.

Source code for our Paper "Learning in High-Dimensional Feature Spaces Using ANOVA-Based Matrix-Vector Multiplication"

PyDEns is a framework for solving Ordinary and Partial Differential Equations (ODEs & PDEs) using neural networks

Official implementation of the ICCV 2021 paper: "The Power of Points for Modeling Humans in Clothing".

[CVPRW 21] "BNN - BN = ? Training Binary Neural Networks without Batch Normalization", Tianlong Chen, Zhenyu Zhang, Xu Ouyang, Zechun Liu, Zhiqiang Shen, Zhangyang Wang

UIUCTF 2021 Public Challenge Repository

Official Keras Implementation for UNet++ in IEEE Transactions on Medical Imaging and DLMIA 2018

Flexible Networks for Learning Physical Dynamics of Deformable Objects (2021)

Migration of Edge-based Distributed Federated Learning

"Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices", official implementation

A PyTorch implementation for V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation

DecoupledNet is semantic segmentation system which using heterogeneous annotations

Marine debris detection with commercial satellite imagery and deep learning.

Pytorch implementation of our paper accepted by NeurIPS 2021 -- Revisiting Discriminator in GAN Compression: A Generator-discriminator Cooperative Compression Scheme

Two-Stage Peer-Regularized Feature Recombination for Arbitrary Image Style Transfer

Python script to download the celebA-HQ dataset from google drive

Drone detection using YOLOv5

A Physics-based Noise Formation Model for Extreme Low-light Raw Denoising (CVPR 2020 Oral & TPAMI 2021)

SporeAgent: Reinforced Scene-level Plausibility for Object Pose Refinement

Like Dirt-Samples, but cleaned up

Expired deprecations for `loads`, `ndfromtxt`, and `mafromtxt` in npyio