An introduction of Markov decision process (MDP) and two algorithms that solve MDPs (value iteration, policy iteration) along with their Python implementations.

Overview

Markov Decision Process

A Markov decision process (MDP), by definition, is a sequential decision problem for a fully observable, stochastic environment with a Markovian transition model and additive rewards. It consists of a set of states, a set of actions, a transition model, and a reward function. Here's an example.

This is a simple 4 x 3 environment, and each block represents a state. The agent can move left, right, up, or down from a state. The "intended" outcome occurs with probability 0.8, but with probability 0.2 the agent moves at right angles to the intended direction. A collision with the wall or boundary results in no movement. The two terminal states have reward +1 and -1, respectively, and all other states have a constant reward (e.g. of -0.01).

Also, a MDP usually has a discount factor γ , a number between 0 and 1, that describes the preference of an agent for current rewards over future rewards.

Policy

A solution to a MDP is called a policy π(s). It specifies an action for each state s. In a MDP, we aim to find the optimal policy that yields the highest expected utility. Here's an example of a policy.

Value Iteration

Value iteration is an algorithm that gives an optimal policy for a MDP. It calculates the utility of each state, which is defined as the expected sum of discounted rewards from that state onward.

This is called the Bellman equation. For example, the utility of the state (1, 1) in the MDP example shown above is:

For n states, there are n Bellman equations with n unknowns (the utilities of states). To solve this system of equations, value iteration uses an iterative approach that repeatedly updates the utility of each state (starting from zero) until an equilibrium is reached (converge). The iteration step, called a Bellman update, looks like this:

Here's the pseudocode for calculating the utilities of states.

Then, after the utilities of states are calculated, we can use them to select an optimal action for each state.

valueIteration.py contains the Python implementation of this algorithm for the MDP example shown above.

(Modified) Policy Iteration

Policy iteration is another algorithm that solves MDPs. It starts with a random policy and alternates the following two steps until the policy improvement step yields no change:

(1) Policy evaluation: given a policy, calculate the utility U(s) of each state s if the policy is executed;

(2) Policy improvement: update the policy based on U(s).

For the policy evaluation step, we use a simplified version of the Bellman equation to calculate the utility of each state.

For n states, there are n linear equations with n unknowns (the utilities of states), which can be solved in O(n^3) time. To make the algorithm more efficient, we can perform some number of simplified Bellman updates (simplified because the policy is fixed) to get an approximation of the utilities instead of calculating the exact solutions.

Here's the pseudocode for policy iteration.

policyIteration.py contains the Python implementation of this algorithm for the MDP example shown above.

Reference

Stuart Russell and Peter Norvig. Artificial Intelligence: A Modern Approach (3rd ed.).

Owner
Yu Shen
UCLA '24
Yu Shen
A JSON Web Token authentication plugin for the Django REST Framework.

Simple JWT Abstract Simple JWT is a JSON Web Token authentication plugin for the Django REST Framework. For full documentation, visit django-rest-fram

Jazzband 3.2k Dec 28, 2022
Includes Automation and Personal Projects

Python Models, and Connect Forclient & OpenCv projects Completed Automation** Alarm (S

tushar malhan 1 Jan 15, 2022
OpenStack Keystone auth plugin for HTTPie

httpie-keystone-auth OpenStack Keystone auth plugin for HTTPie. Installation $ pip install --upgrade httpie-keystone-auth You should now see keystone

Pavlo Shchelokovskyy 1 Oct 20, 2021
An extension of django rest framework, providing a configurable password reset strategy

Django Rest Password Reset This python package provides a simple password reset strategy for django rest framework, where users can request password r

Anexia 363 Dec 24, 2022
An open source Flask extension that provides JWT support (with batteries included)!

Flask-JWT-Extended Features Flask-JWT-Extended not only adds support for using JSON Web Tokens (JWT) to Flask for protecting views, but also many help

Landon Gilbert-Bland 1.4k Jan 04, 2023
Creation & manipulation of PyPI tokens

PyPIToken: Manipulate PyPI API tokens PyPIToken is an open-source Python 3.6+ library for generating and manipulating PyPI tokens. PyPI tokens are ver

Joachim Jablon 8 Nov 01, 2022
JSON Web Token Authentication support for Django REST Framework

REST framework JWT Auth Notice This project is currently unmaintained. Check #484 for more details and suggested alternatives. JSON Web Token Authenti

José Padilla 3.2k Dec 31, 2022
Some scripts to utilise device code authorization for phishing.

OAuth Device Code Authorization Phishing Some scripts to utilise device code authorization for phishing. High level overview as per the instructions a

Daniel Underhay 6 Oct 03, 2022
Authentication Module for django rest auth

django-rest-knox Authentication Module for django rest auth Knox provides easy to use authentication for Django REST Framework The aim is to allow for

James McMahon 878 Jan 04, 2023
A Python library for OAuth 1.0/a, 2.0, and Ofly.

Rauth A simple Python OAuth 1.0/a, OAuth 2.0, and Ofly consumer library built on top of Requests. Features Supports OAuth 1.0/a, 2.0 and Ofly Service

litl 1.6k Dec 08, 2022
A JSON Web Token authentication plugin for the Django REST Framework.

Simple JWT Abstract Simple JWT is a JSON Web Token authentication plugin for the Django REST Framework. For full documentation, visit django-rest-fram

Simple JWT 3.3k Jan 01, 2023
Simple Login - Login Extension for Flask - maintainer @cuducos

Login Extension for Flask The simplest way to add login to flask! Top Contributors Add yourself, send a PR! How it works First install it from PyPI. p

Flask Extensions 181 Jan 01, 2023
python implementation of JSON Web Signatures

python-jws 🚨 This is Unmaintained 🚨 This library is unmaintained and you should probably use For histo

Brian J Brennan 57 Apr 18, 2022
Provide OAuth2 access to your app

django-oml Welcome to the documentation for django-oml! OML means Object Moderation Layer, the idea is to have a mixin model that allows you to modera

Caffeinehit 334 Jul 27, 2022
Basic auth for Django.

Basic auth for Django.

bichanna 2 Mar 25, 2022
JWT Key Confusion PoC (CVE-2015-9235) Written for the Hack the Box challenge - Under Construction

JWT Key Confusion PoC (CVE-2015-9235) Written for the Hack the Box challenge - Under Construction This script performs a Java Web Token Key Confusion

Alex Fronteddu 1 Jan 13, 2022
Social auth made simple

Python Social Auth Python Social Auth is an easy-to-setup social authentication/registration mechanism with support for several frameworks and auth pr

Matías Aguirre 2.8k Dec 24, 2022
python-social-auth and oauth2 support for django-rest-framework

Django REST Framework Social OAuth2 This module provides OAuth2 social authentication support for applications in Django REST Framework. The aim of th

1k Dec 22, 2022
Basic auth for Django.

easy-basicauth WARNING! THIS LIBRARY IS IN PROGRESS! ANYTHING CAN CHANGE AT ANY MOMENT WITHOUT ANY NOTICE! Installation pip install easy-basicauth Usa

bichanna 2 Mar 25, 2022
Django x Elasticsearch Templates

Django x Elasticsearch Requirements Python 3.7 Django = 3 Elasticsearch 7.15 Setup Elasticsearch Install via brew Install brew tap elastic/tap brew

Aji Pratama 0 May 22, 2022