Independent and minimal implementations of some reinforcement learning algorithms using PyTorch (including PPO, A3C, A2C, ...).

Last update: Dec 31, 2022

Overview

PyTorch RL Minimal Implementations

There are implementations of some reinforcement learning algorithms, whose characteristics are as follow:

Less packages-based: Only PyTorch and Gym, for building neural networks and testing algorithms' performance respectively, are necessary to install.
Independent implementation: All RL algorithms are implemented in separate files, which facilitates to understand their processes and modify them to adapt to other tasks.
Various expansion configurations: It's convenient to configure various parameters and tools, such as reward normalization, advantage normalization, tensorboard, tqdm and so on.

RL Algorithms List

Name	Type	Estimator	Paper	File
Q-Learning	Value-based / Off policy	TD	Watkins et al. Q-Learning. Machine Learning, 1992	q_learning.py
REINFORCE	Policy-based On policy	MC	Sutton et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation. In NeurIPS, 2000.	reinforce.py
DQN	Value-based / Off policy	TD	Mnih et al. Human-level control through deep reinforcement learning. Nature, 2015.	doing
A2C	Actor-Critic / On policy	n-step TD	Mnih et al. Asynchronous Methods for Deep Reinforcement Learning. In ICML, 2016.	a2c.py
A3C	Actor-Critic / On policy	n-step TD	.Mnih et al. Asynchronous Methods for Deep Reinforcement Learning. In ICML, 2016	a3c.py
ACER	Actor-Critic / On policy	GAE	Wang et al. Sample Efficient Actor-Critic with Experience Replay. In ICLR, 2017.	doing
ACKTR	Actor-Critic / On policy	GAE	Wu et al. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation. In NeurIPS, 2017.	doing
PPO	Actor-Critic / On policy	GAE	Schulman et al. Proximal Policy Optimization Algorithms. arXiv, 2017.	ppo.py

Quick Start

Requirements

pytorch
gym

tensorboard  # for summary writer
tqdm         # for process bar

Abstract Agent

Components / Parameters

Component	Description
policy	neural network model
gamma	discount factor of cumulative reward
lr	learning rate. i.e. `lr_actor`, `lr_critic`
lr_decay	weight decay to schedule the learning rate
lr_scheduler	scheduler for the learning rate
coef_critic_loss	coefficient of critic loss
coef_entropy_loss	coefficient of entropy loss
writer	summary writer to record information
buffer	replay buffer to store historical trajectories
use_cuda	use GPU
clip_grad	gradients clipping
max_grad_norm	maximum norm of gradients clipped
norm_advantage	advantage normalization
open_tb	open summary writer
open_tqdm	open process bar

Methods

Methods	Description
preprocess_obs()	preprocess observation before input into the neural network
select_action()	use actor network to select an action based on the policy distribution.
estimate_obs()	use critic network to estimate the value of observation
update()	update the parameter by calculate losses and gradients
train()	set the neural network to train mode
eval()	set the neural network to evaluate mode
save()	save the model parameters
load()	load the model parameters

Update & To-do & Limitations

Update History

2021-12-09 ADD TRICK:norm_critic_loss in PPO
2021-12-09 ADD PARAM: coef_critic_loss, coef_entropy_loss, log_step
2021-12-07 ADD ALGO: A3C
2021-12-05 ADD ALGO: PPO
2021-11-28 ADD ALGO: A2C
2021-11-20 ADD ALGO: Q learning, Reinforce

Independent and minimal implementations of some reinforcement learning algorithms using PyTorch (including PPO, A3C, A2C, ...).

Related tags

Overview

PyTorch RL Minimal Implementations

RL Algorithms List

Quick Start

Requirements

Abstract Agent

Components / Parameters

Methods

Update & To-do & Limitations

Update History

To-do List

Current Limitations

Reference & Acknowledgements

Owner

Gemini Light

ONNX Command-Line Toolbox

Semi-supervised Transfer Learning for Image Rain Removal. In CVPR 2019.

SFD implement with pytorch

FastReID is a research platform that implements state-of-the-art re-identification algorithms.

Trajectory Variational Autoencder baseline for Multi-Agent Behavior challenge 2022

Single Red Blood Cell Hydrodynamic Traps Via the Generative Design

[CVPR'21] DeepSurfels: Learning Online Appearance Fusion

Supplementary code for the paper "Meta-Solver for Neural Ordinary Differential Equations" https://arxiv.org/abs/2103.08561

Dense Passage Retriever - is a set of tools and models for open domain Q&A task.

Classification of ecg datas for disease detection

Implementation for "Conditional entropy minimization principle for learning domain invariant representation features"

Unofficial implementation of "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" (https://arxiv.org/abs/2103.14030)

The PyTorch implementation of Directed Graph Contrastive Learning (DiGCL), NeurIPS-2021

This repository includes the code of the sequence-to-sequence model for discontinuous constituent parsing described in paper Discontinuous Grammar as a Foreign Language.

This project is based on RIFE and aims to make RIFE more practical for users by adding various features and design new models

[CVPR 2021] MetaSAug: Meta Semantic Augmentation for Long-Tailed Visual Recognition

This is the winning solution of the Endocv-2021 grand challange.

[ICML 2022] The official implementation of Graph Stochastic Attention (GSAT).

meProp: Sparsified Back Propagation for Accelerated Deep Learning (ICML 2017)