A clean implementation based on AlphaZero for any game in any framework + tutorial + Othello/Gobang/TicTacToe/Connect4 and more

Last update: Jan 05, 2023

Overview

Alpha Zero General (any game, any framework!)

A simplified, highly flexible, commented and (hopefully) easy to understand implementation of self-play based reinforcement learning based on the AlphaGo Zero paper (Silver et al). It is designed to be easy to adopt for any two-player turn-based adversarial game and any deep learning framework of your choice. A sample implementation has been provided for the game of Othello in PyTorch, Keras, TensorFlow and Chainer. An accompanying tutorial can be found here. We also have implementations for GoBang and TicTacToe.

To use a game of your choice, subclass the classes in Game.py and NeuralNet.py and implement their functions. Example implementations for Othello can be found in othello/OthelloGame.py and othello/{pytorch,keras,tensorflow,chainer}/NNet.py.

Coach.py contains the core training loop and MCTS.py performs the Monte Carlo Tree Search. The parameters for the self-play can be specified in main.py. Additional neural network parameters are in othello/{pytorch,keras,tensorflow,chainer}/NNet.py (cuda flag, batch size, epochs, learning rate etc.).

To start training a model for Othello:

python main.py

Choose your framework and game in main.py.

Docker Installation

For easy environment setup, we can use nvidia-docker. Once you have nvidia-docker set up, we can then simply run:

./setup_env.sh

to set up a (default: pyTorch) Jupyter docker container. We can now open a new terminal and enter:

docker exec -ti pytorch_notebook python main.py

Experiments

We trained a PyTorch model for 6x6 Othello (~80 iterations, 100 episodes per iteration and 25 MCTS simulations per turn). This took about 3 days on an NVIDIA Tesla K80. The pretrained model (PyTorch) can be found in pretrained_models/othello/pytorch/. You can play a game against it using pit.py. Below is the performance of the model against a random and a greedy baseline with the number of iterations.

A concise description of our algorithm can be found here.

Contributing

While the current code is fairly functional, we could benefit from the following contributions:

Game logic files for more games that follow the specifications in Game.py, along with their neural networks
Neural networks in other frameworks
Pre-trained models for different game configurations
An asynchronous version of the code- parallel processes for self-play, neural net training and model comparison.
Asynchronous MCTS as described in the paper

Contributors and Credits

Shantanu Thakoor and Megha Jhunjhunwala helped with core design and implementation.
Shantanu Kumar contributed TensorFlow and Keras models for Othello.
Evgeny Tyurin contributed rules and a trained model for TicTacToe.
MBoss contributed rules and a model for GoBang.
Jernej Habjan contributed RTS game.
Adam Lawson contributed rules and a trained model for 3D TicTacToe.
Carlos Aguayo contributed rules and a trained model for Dots and Boxes along with a JavaScript implementation.
Robert Ronan contributed rules for Santorini.

A clean implementation based on AlphaZero for any game in any framework + tutorial + Othello/Gobang/TicTacToe/Connect4 and more

Related tags

Overview

Alpha Zero General (any game, any framework!)

Docker Installation

Experiments

Contributing

Contributors and Credits

Owner

Surag Nair

Software that can generate photos from paintings, turn horses into zebras, perform style transfer, and more.

My implementation of transformers related papers for computer vision in pytorch

TensorFlow (Python) implementation of DeepTCN model for multivariate time series forecasting.

Equipped customers with insights about their EVs Hourly energy consumption and helped predict future charging behavior using LSTM model

BarcodeRattler - A Raspberry Pi Powered Barcode Reader to load a game on the Mister FPGA using MBC

Setup freqtrade/freqUI on Heroku

Implementation of a Transformer that Ponders, using the scheme from the PonderNet paper

PyTorch code accompanying the paper "Landmark-Guided Subgoal Generation in Hierarchical Reinforcement Learning" (NeurIPS 2021).

As-ViT: Auto-scaling Vision Transformers without Training

pyhsmm - library for approximate unsupervised inference in Bayesian Hidden Markov Models (HMMs) and explicit-duration Hidden semi-Markov Models (HSMMs), focusing on the Bayesian Nonparametric extensions, the HDP-HMM and HDP-HSMM, mostly with weak-limit approximations.

Backdoor Attack through Frequency Domain

GLIP: Grounded Language-Image Pre-training

The pytorch implementation of the paper "text-guided neural image inpainting" at MM'2020

Exact Pareto Optimal solutions for preference based Multi-Objective Optimization

Code for paper "Extract, Denoise and Enforce: Evaluating and Improving Concept Preservation for Text-to-Text Generation" EMNLP 2021

Tensorflow implementation for "Improved Transformer for High-Resolution GANs" (NeurIPS 2021).

Code for Max-Margin Contrastive Learning - AAAI 2022

Measuring Coding Challenge Competence With APPS

A Simple Long-Tailed Rocognition Baseline via Vision-Language Model

PyTorch implementation of UPFlow (unsupervised optical flow learning)