Generalized Proximal Policy Optimization with Sample Reuse (GePPO)

Related tags

Deep Learninggeppo
Overview

Generalized Proximal Policy Optimization with Sample Reuse

This repository is the official implementation of the reinforcement learning algorithm Generalized Proximal Policy Optimization with Sample Reuse (GePPO), which was introduced in the NeurIPS 2021 paper with the same name.

GePPO improves the sample efficiency of the popular on-policy algorithm PPO through principled sample reuse, while still retaining PPO's approximate policy improvement guarantees. GePPO is theoretically supported by a generalized policy improvement lower bound that can be approximated using data from all recent policies.

Requirements

The source code requires the following packages to be installed (we have included the version used to produce the results found in the paper in parentheses):

  • python (3.7.7)
  • gurobi (9.0.2)
  • gym (0.17.1)
  • matplotlib (3.1.3)
  • mujoco-py (1.50.1.68)
  • numpy (1.18.1)
  • scipy (1.4.1)
  • seaborn (0.10.1)
  • tensorflow (2.1.0)

See the file environment.yml for the conda environment used to run our experiments, which can be built with conda using the command conda env create.

The MuJoCo environments used in our experiments require the MuJoCo physics engine and a MuJoCo license. Please see the MuJoCo website for more information on downloading MuJoCo and obtaining a license.

Our implementation of GePPO uses Gurobi to determine the optimal policy weights used in the algorithm, which requires a Gurobi license. Please see the Gurobi website for more information on downloading Gurobi and obtaining a license. Alternatively, GePPO can be run without Gurobi by using uniform policy weights with the --uniform option.

Training

Simulations can be run by calling run on the command line. For example, we can run simulations on the HalfCheetah-v3 environment with PPO and GePPO as follows:

python -m geppo.run --env_name HalfCheetah-v3 --alg_name ppo
python -m geppo.run --env_name HalfCheetah-v3 --alg_name geppo

By default, all algorithm hyperparameters are set to the default values used in the paper. Hyperparameters can be changed to non-default values by using the relevant option on the command line. For more information on the inputs accepted by run, use the --help option.

The results of simulations are saved in the logs/ folder upon completion.

Evaluation

The results of simulations saved in the logs/ folder can be visualized by calling plot on the command line:

python -m geppo.plot --ppo_file <filename> --geppo_file <filename>

By default, this command saves a plot of average performance throughout training in the figs/ folder. Other metrics can be plotted using the --metric option. For more information on the inputs accepted by plot, use the --help option.

Owner
Jimmy Queeney
Jimmy Queeney
Contrastive Feature Loss for Image Prediction

Contrastive Feature Loss for Image Prediction We provide a PyTorch implementation of our contrastive feature loss presented in: Contrastive Feature Lo

Alex Andonian 44 Oct 05, 2022
This is an implementation for the CVPR2020 paper "Learning Invariant Representation for Unsupervised Image Restoration"

Learning Invariant Representation for Unsupervised Image Restoration (CVPR 2020) Introduction This is an implementation for the paper "Learning Invari

GarField 88 Nov 07, 2022
Code for ICCV 2021 paper "HuMoR: 3D Human Motion Model for Robust Pose Estimation"

Code for ICCV 2021 paper "HuMoR: 3D Human Motion Model for Robust Pose Estimation"

Davis Rempe 367 Dec 24, 2022
PyTorchMemTracer - Depict GPU memory footprint during DNN training of PyTorch

A Memory Tracer For PyTorch OOM is a nightmare for PyTorch users. However, most

Jiarui Fang 9 Nov 14, 2022
PESTO: Switching Point based Dynamic and Relative Positional Encoding for Code-Mixed Languages

PESTO: Switching Point based Dynamic and Relative Positional Encoding for Code-Mixed Languages Abstract NLP applications for code-mixed (CM) or mix-li

Mohsin Ali, Mohammed 1 Nov 12, 2021
Playable Video Generation

Playable Video Generation Playable Video Generation Willi Menapace, Stéphane Lathuilière, Sergey Tulyakov, Aliaksandr Siarohin, Elisa Ricci Paper: ArX

Willi Menapace 136 Dec 31, 2022
Code to reproduce experiments in the paper "Explainability Requires Interactivity".

Explainability Requires Interactivity This repository contains the code to train all custom models used in the paper Explainability Requires Interacti

Digital Health & Machine Learning 5 Apr 07, 2022
Vector Neurons: A General Framework for SO(3)-Equivariant Networks

Vector Neurons: A General Framework for SO(3)-Equivariant Networks Created by Congyue Deng, Or Litany, Yueqi Duan, Adrien Poulenard, Andrea Tagliasacc

Congyue Deng 332 Dec 29, 2022
Normalizing Flows with a resampled base distribution

Resampling Base Distributions of Normalizing Flows Normalizing flows are a popular class of models for approximating probability distributions. Howeve

Vincent Stimper 24 Nov 03, 2022
Episodic-memory - Ego4D Episodic Memory Benchmark

Ego4D Episodic Memory Benchmark EGO4D is the world's largest egocentric (first p

3 Feb 18, 2022
Machine learning, in numpy

numpy-ml Ever wish you had an inefficient but somewhat legible collection of machine learning algorithms implemented exclusively in NumPy? No? Install

David Bourgin 11.6k Dec 30, 2022
Stacked Recurrent Hourglass Network for Stereo Matching

SRH-Net: Stacked Recurrent Hourglass Introduction This repository is supplementary material of our RA-L submission, which helps reviewers to understan

28 Jan 03, 2023
Forecasting directional movements of stock prices for intraday trading using LSTM and random forest

Forecasting directional movements of stock-prices for intraday trading using LSTM and random-forest https://arxiv.org/abs/2004.10178 Pushpendu Ghosh,

Pushpendu Ghosh 270 Dec 24, 2022
PyTorch Implementation for AAAI'21 "Do Response Selection Models Really Know What's Next? Utterance Manipulation Strategies for Multi-turn Response Selection"

UMS for Multi-turn Response Selection Implements the model described in the following paper Do Response Selection Models Really Know What's Next? Utte

Taesun Whang 47 Nov 22, 2022
A library for using chemistry in your applications

Chemistry in python Resources Used The following items are not made by me! Click the words to go to the original source Periodic Tab Json - Used in -

Tech Penguin 28 Dec 17, 2021
Phylogeny Partners

Phylogeny-Partners Two states models Instalation You may need to install the cython, networkx, numpy, scipy package: pip install cython, networkx, num

1 Sep 19, 2022
Implementation of our paper "Video Playback Rate Perception for Self-supervised Spatio-Temporal Representation Learning".

PRP Introduction This is the implementation of our paper "Video Playback Rate Perception for Self-supervised Spatio-Temporal Representation Learning".

yuanyao366 39 Dec 29, 2022
PyTorch implementation of the paper Ultra Fast Structure-aware Deep Lane Detection

PyTorch implementation of the paper Ultra Fast Structure-aware Deep Lane Detection

1.4k Jan 06, 2023
Diagnostic tests for linguistic capacities in language models

LM diagnostics This repository contains the diagnostic datasets and experimental code for What BERT is not: Lessons from a new suite of psycholinguist

61 Jan 02, 2023
This is a collection of all challenges in HKCERT CTF 2021

香港網絡保安新生代奪旗挑戰賽 2021 (HKCERT CTF 2021) This is a collection of all challenges (and writeups) in HKCERT CTF 2021 Challenges ID Chinese name Name Score S

10 Jan 27, 2022