...
 
Commits (2)
gym @ 12e8b763
Subproject commit 12e8b763d5dcda4962cbd17887d545f0eec6808a
*.swp
*.pyc
*.py~
.DS_Store
.cache
.pytest_cache/
# Setuptools distribution and build folders.
/dist/
/build
# Virtualenv
/env
# Python egg metadata, regenerated from source files by setuptools.
/*.egg-info
*.sublime-project
*.sublime-workspace
logs/
.ipynb_checkpoints
ghostdriver.log
junk
MUJOCO_LOG.txt
rllab_mujoco
tutorial/*.html
# IDE files
.eggs
.tox
# PyCharm project files
.idea
vizdoom.ini
sudo: required
language: python
services:
- docker
env:
# - UBUNTU_VER=14.04 - problems with atari-py
- UBUNTU_VER=16.04
- UBUNTU_VER=18.04
install: "" # so travis doesn't do pip install requirements.txt
script:
- docker build -f test.dockerfile.${UBUNTU_VER} -t gym-test --build-arg MUJOCO_KEY=$MUJOCO_KEY .
- docker run -e MUJOCO_KEY=$MUJOCO_KEY gym-test tox
deploy:
provider: pypi
username: $TWINE_USERNAME
password: $TWINE_PASSWORD
on:
tags: true
condition: $UBUNTU_VER = 16.04
OpenAI Gym is dedicated to providing a harassment-free experience for
everyone, regardless of gender, gender identity and expression, sexual
orientation, disability, physical appearance, body size, age, race, or
religion. We do not tolerate harassment of participants in any form.
This code of conduct applies to all OpenAI Gym spaces (including Gist
comments) both online and off. Anyone who violates this code of
conduct may be sanctioned or expelled from these spaces at the
discretion of the OpenAI team.
We may add additional rules over time, which will be made clearly
available to participants. Participants are responsible for knowing
and abiding by these rules.
# gym
The MIT License
Copyright (c) 2016 OpenAI (https://openai.com)
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
# Mujoco models
This work is derived from [MuJuCo models](http://www.mujoco.org/forum/index.php?resources/) used under the following license:
```
This file is part of MuJoCo.
Copyright 2009-2015 Roboti LLC.
Mujoco :: Advanced physics simulation engine
Source : www.roboti.us
Version : 1.31
Released : 23Apr16
Author :: Vikash Kumar
Contacts : kumar@roboti.us
```
.PHONY: install test
install:
pip install -r requirements.txt
base:
docker pull ubuntu:14.04
docker tag ubuntu:14.04 quay.io/openai/gym:base
docker push quay.io/openai/gym:base
test:
docker build -f test.dockerfile -t quay.io/openai/gym:test .
docker push quay.io/openai/gym:test
upload:
rm -rf dist
python setup.py sdist
twine upload dist/*
docker-build:
docker build -t quay.io/openai/gym .
docker-run:
docker run -ti quay.io/openai/gym bash
**Status:** Maintenance (expect bug fixes and minor updates)
OpenAI Gym
**********
**OpenAI Gym is a toolkit for developing and comparing reinforcement learning algorithms.** This is the ``gym`` open-source library, which gives you access to a standardized set of environments.
.. image:: https://travis-ci.org/openai/gym.svg?branch=master
:target: https://travis-ci.org/openai/gym
`See What's New section below <#what-s-new>`_
``gym`` makes no assumptions about the structure of your agent, and is compatible with any numerical computation library, such as TensorFlow or Theano. You can use it from Python code, and soon from other languages.
If you're not sure where to start, we recommend beginning with the
`docs <https://gym.openai.com/docs>`_ on our site. See also the `FAQ <https://github.com/openai/gym/wiki/FAQ>`_.
A whitepaper for OpenAI Gym is available at http://arxiv.org/abs/1606.01540, and here's a BibTeX entry that you can use to cite it in a publication::
@misc{1606.01540,
Author = {Greg Brockman and Vicki Cheung and Ludwig Pettersson and Jonas Schneider and John Schulman and Jie Tang and Wojciech Zaremba},
Title = {OpenAI Gym},
Year = {2016},
Eprint = {arXiv:1606.01540},
}
.. contents:: **Contents of this document**
:depth: 2
Basics
======
There are two basic concepts in reinforcement learning: the
environment (namely, the outside world) and the agent (namely, the
algorithm you are writing). The agent sends `actions` to the
environment, and the environment replies with `observations` and
`rewards` (that is, a score).
The core `gym` interface is `Env <https://github.com/openai/gym/blob/master/gym/core.py>`_, which is
the unified environment interface. There is no interface for agents;
that part is left to you. The following are the ``Env`` methods you
should know:
- `reset(self)`: Reset the environment's state. Returns `observation`.
- `step(self, action)`: Step the environment by one timestep. Returns `observation`, `reward`, `done`, `info`.
- `render(self, mode='human', close=False)`: Render one frame of the environment. The default mode will do something human friendly, such as pop up a window. Passing the `close` flag signals the renderer to close any such windows.
Installation
============
You can perform a minimal install of ``gym`` with:
.. code:: shell
git clone https://github.com/openai/gym.git
cd gym
pip install -e .
If you prefer, you can do a minimal install of the packaged version directly from PyPI:
.. code:: shell
pip install gym
You'll be able to run a few environments right away:
- algorithmic
- toy_text
- classic_control (you'll need ``pyglet`` to render though)
We recommend playing with those environments at first, and then later
installing the dependencies for the remaining environments.
Installing everything
---------------------
To install the full set of environments, you'll need to have some system
packages installed. We'll build out the list here over time; please let us know
what you end up installing on your platform. Also, take a look at the docker files (test.dockerfile.xx.xx) to
see the composition of our CI-tested images.
On OSX:
.. code:: shell
brew install cmake boost boost-python sdl2 swig wget
On Ubuntu 14.04 (non-mujoco only):
.. code:: shell
apt-get install libjpeg-dev cmake swig python-pyglet python3-opengl libboost-all-dev \
libsdl2-2.0.0 libsdl2-dev libglu1-mesa libglu1-mesa-dev libgles2-mesa-dev \
freeglut3 xvfb libav-tools
On Ubuntu 16.04:
.. code:: shell
apt-get install -y python-pyglet python3-opengl zlib1g-dev libjpeg-dev patchelf \
cmake swig libboost-all-dev libsdl2-dev libosmesa6-dev xvfb ffmpeg
On Ubuntu 18.04:
.. code:: shell
apt install -y python3-dev zlib1g-dev libjpeg-dev cmake swig python-pyglet python3-opengl libboost-all-dev libsdl2-dev \
libosmesa6-dev patchelf ffmpeg xvfb
MuJoCo has a proprietary dependency we can't set up for you. Follow
the
`instructions <https://github.com/openai/mujoco-py#obtaining-the-binaries-and-license-key>`_
in the ``mujoco-py`` package for help.
Once you're ready to install everything, run ``pip install -e '.[all]'`` (or ``pip install 'gym[all]'``).
Supported systems
-----------------
We currently support Linux and OS X running Python 2.7 or 3.5. Some users on OSX + Python3 may need to run
.. code:: shell
brew install boost-python --with-python3
If you want to access Gym from languages other than python, we have limited support for non-python
frameworks, such as lua/Torch, using the OpenAI Gym `HTTP API <https://github.com/openai/gym-http-api>`_.
Pip version
-----------
To run ``pip install -e '.[all]'``, you'll need a semi-recent pip.
Please make sure your pip is at least at version ``1.5.0``. You can
upgrade using the following: ``pip install --ignore-installed
pip``. Alternatively, you can open `setup.py
<https://github.com/openai/gym/blob/master/setup.py>`_ and
install the dependencies by hand.
Rendering on a server
---------------------
If you're trying to render video on a server, you'll need to connect a
fake display. The easiest way to do this is by running under
``xvfb-run`` (on Ubuntu, install the ``xvfb`` package):
.. code:: shell
xvfb-run -s "-screen 0 1400x900x24" bash
Installing dependencies for specific environments
-------------------------------------------------
If you'd like to install the dependencies for only specific
environments, see `setup.py
<https://github.com/openai/gym/blob/master/setup.py>`_. We
maintain the lists of dependencies on a per-environment group basis.
Environments
============
The code for each environment group is housed in its own subdirectory
`gym/envs
<https://github.com/openai/gym/blob/master/gym/envs>`_. The
specification of each task is in `gym/envs/__init__.py
<https://github.com/openai/gym/blob/master/gym/envs/__init__.py>`_. It's
worth browsing through both.
Algorithmic
-----------
These are a variety of algorithmic tasks, such as learning to copy a
sequence.
.. code:: python
import gym
env = gym.make('Copy-v0')
env.reset()
env.render()
Atari
-----
The Atari environments are a variety of Atari video games. If you didn't do the full install, you can install dependencies via ``pip install -e '.[atari]'`` (you'll need ``cmake`` installed) and then get started as follow:
.. code:: python
import gym
env = gym.make('SpaceInvaders-v0')
env.reset()
env.render()
This will install ``atari-py``, which automatically compiles the `Arcade Learning Environment <http://www.arcadelearningenvironment.org/>`_. This can take quite a while (a few minutes on a decent laptop), so just be prepared.
Box2d
-----------
Box2d is a 2D physics engine. You can install it via ``pip install -e '.[box2d]'`` and then get started as follow:
.. code:: python
import gym
env = gym.make('LunarLander-v2')
env.reset()
env.render()
Classic control
---------------
These are a variety of classic control tasks, which would appear in a typical reinforcement learning textbook. If you didn't do the full install, you will need to run ``pip install -e '.[classic_control]'`` to enable rendering. You can get started with them via:
.. code:: python
import gym
env = gym.make('CartPole-v0')
env.reset()
env.render()
MuJoCo
------
`MuJoCo <http://www.mujoco.org/>`_ is a physics engine which can do
very detailed efficient simulations with contacts. It's not
open-source, so you'll have to follow the instructions in `mujoco-py
<https://github.com/openai/mujoco-py#obtaining-the-binaries-and-license-key>`_
to set it up. You'll have to also run ``pip install -e '.[mujoco]'`` if you didn't do the full install.
.. code:: python
import gym
env = gym.make('Humanoid-v2')
env.reset()
env.render()
Robotics
------
`MuJoCo <http://www.mujoco.org/>`_ is a physics engine which can do
very detailed efficient simulations with contacts and we use it for all robotics environments. It's not
open-source, so you'll have to follow the instructions in `mujoco-py
<https://github.com/openai/mujoco-py#obtaining-the-binaries-and-license-key>`_
to set it up. You'll have to also run ``pip install -e '.[robotics]'`` if you didn't do the full install.
.. code:: python
import gym
env = gym.make('HandManipulateBlock-v0')
env.reset()
env.render()
You can also find additional details in the accompanying `technical report <https://arxiv.org/abs/1802.09464>`_ and `blog post <https://blog.openai.com/ingredients-for-robotics-research/>`_.
If you use these environments, you can cite them as follows::
@misc{1802.09464,
Author = {Matthias Plappert and Marcin Andrychowicz and Alex Ray and Bob McGrew and Bowen Baker and Glenn Powell and Jonas Schneider and Josh Tobin and Maciek Chociej and Peter Welinder and Vikash Kumar and Wojciech Zaremba},
Title = {Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research},
Year = {2018},
Eprint = {arXiv:1802.09464},
}
Toy text
--------
Toy environments which are text-based. There's no extra dependency to install, so to get started, you can just do:
.. code:: python
import gym
env = gym.make('FrozenLake-v0')
env.reset()
env.render()
Examples
========
See the ``examples`` directory.
- Run `examples/agents/random_agent.py <https://github.com/openai/gym/blob/master/examples/agents/random_agent.py>`_ to run an simple random agent.
- Run `examples/agents/cem.py <https://github.com/openai/gym/blob/master/examples/agents/cem.py>`_ to run an actual learning agent (using the cross-entropy method).
- Run `examples/scripts/list_envs <https://github.com/openai/gym/blob/master/examples/scripts/list_envs>`_ to generate a list of all environments.
Testing
=======
We are using `pytest <http://doc.pytest.org>`_ for tests. You can run them via:
.. code:: shell
pytest
.. _See What's New section below:
What's new
==========
- 2018-02-28: Release of a set of new robotics environments.
- 2018-01-25: Made some aesthetic improvements and removed unmaintained parts of gym. This may seem like a downgrade in functionality, but it is actually a long-needed cleanup in preparation for some great new things that will be released in the next month.
+ Now your `Env` and `Wrapper` subclasses should define `step`, `reset`, `render`, `close`, `seed` rather than underscored method names.
+ Removed the `board_game`, `debugging`, `safety`, `parameter_tuning` environments since they're not being maintained by us at OpenAI. We encourage authors and users to create new repositories for these environments.
+ Changed `MultiDiscrete` action space to range from `[0, ..., n-1]` rather than `[a, ..., b-1]`.
+ No more `render(close=True)`, use env-specific methods to close the rendering.
+ Removed `scoreboard` directory, since site doesn't exist anymore.
+ Moved `gym/monitoring` to `gym/wrappers/monitoring`
+ Add `dtype` to `Space`.
+ Not using python's built-in module anymore, using `gym.logger`
- 2018-01-24: All continuous control environments now use mujoco_py >= 1.50.
Versions have been updated accordingly to -v2, e.g. HalfCheetah-v2. Performance
should be similar (see https://github.com/openai/gym/pull/834) but there are likely
some differences due to changes in MuJoCo.
- 2017-06-16: Make env.spec into a property to fix a bug that occurs
when you try to print out an unregistered Env.
- 2017-05-13: BACKWARDS INCOMPATIBILITY: The Atari environments are now at
*v4*. To keep using the old v3 environments, keep gym <= 0.8.2 and atari-py
<= 0.0.21. Note that the v4 environments will not give identical results to
existing v3 results, although differences are minor. The v4 environments
incorporate the latest Arcade Learning Environment (ALE), including several
ROM fixes, and now handle loading and saving of the emulator state. While
seeds still ensure determinism, the effect of any given seed is not preserved
across this upgrade because the random number generator in ALE has changed.
The `*NoFrameSkip-v4` environments should be considered the canonical Atari
environments from now on.
- 2017-03-05: BACKWARDS INCOMPATIBILITY: The `configure` method has been removed
from `Env`. `configure` was not used by `gym`, but was used by some dependent
libraries including `universe`. These libraries will migrate away from the
configure method by using wrappers instead. This change is on master and will be released with 0.8.0.
- 2016-12-27: BACKWARDS INCOMPATIBILITY: The gym monitor is now a
wrapper. Rather than starting monitoring as
`env.monitor.start(directory)`, envs are now wrapped as follows:
`env = wrappers.Monitor(env, directory)`. This change is on master
and will be released with 0.7.0.
- 2016-11-1: Several experimental changes to how a running monitor interacts
with environments. The monitor will now raise an error if reset() is called
when the env has not returned done=True. The monitor will only record complete
episodes where done=True. Finally, the monitor no longer calls seed() on the
underlying env, nor does it record or upload seed information.
- 2016-10-31: We're experimentally expanding the environment ID format
to include an optional username.
- 2016-09-21: Switch the Gym automated logger setup to configure the
root logger rather than just the 'gym' logger.
- 2016-08-17: Calling `close` on an env will also close the monitor
and any rendering windows.
- 2016-08-17: The monitor will no longer write manifest files in
real-time, unless `write_upon_reset=True` is passed.
- 2016-05-28: For controlled reproducibility, envs now support seeding
(cf #91 and #135). The monitor records which seeds are used. We will
soon add seed information to the display on the scoreboard.
#!/bin/bash
# This script is the entrypoint for our Docker image.
set -ex
# Set up display; otherwise rendering will fail
Xvfb -screen 0 1024x768x24 &
export DISPLAY=:0
# Wait for the file to come up
display=0
file="/tmp/.X11-unix/X$display"
for i in $(seq 1 10); do
if [ -e "$file" ]; then
break
fi
echo "Waiting for $file to be created (try $i/10)"
sleep "$i"
done
if ! [ -e "$file" ]; then
echo "Timing out: $file was not created"
exit 1
fi
exec "$@"
#!/usr/bin/env python3
import argparse
import gym
parser = argparse.ArgumentParser(description='Renders a Gym environment for quick inspection.')
parser.add_argument('env_id', type=str, help='the ID of the environment to be rendered (e.g. HalfCheetah-v1')
parser.add_argument('--step', type=int, default=1)
args = parser.parse_args()
env = gym.make(args.env_id)
env.reset()
step = 0
while True:
if args.step:
env.step(env.action_space.sample())
env.render()
if step % 10 == 0:
env.reset()
step += 1
# Agents
An "agent" describes the method of running an RL algorithm against an environment in the gym. The agent may contain the algorithm itself or simply provide an integration between an algorithm and the gym environments.
## RandomAgent
A sample agent located in this repo at `gym/examples/agents/random_agent.py`. This simple agent leverages the environments ability to produce a random valid action and does so for each step.
## cem.py
A generic Cross-Entropy agent located in this repo at `gym/examples/agents/cem.py`. This agent defaults to 10 iterations of 25 episodes considering the top 20% "elite".
## dqn
This is a very basic DQN (with experience replay) implementation, which uses OpenAI's gym environment and Keras/Theano neural networks. [/sherjilozair/dqn](https://github.com/sherjilozair/dqn)
## Simple DQN
Simple, fast and easy to extend DQN implementation using [Neon](https://github.com/NervanaSystems/neon) deep learning library. Comes with out-of-box tools to train, test and visualize models. For details see [this blog post](https://www.nervanasys.com/deep-reinforcement-learning-with-neon/) or check out the [repo](https://github.com/tambetm/simple_dqn).
## AgentNet
A library that allows you to develop custom deep/convolutional/recurrent reinforcement learning agent with full integration with Theano/Lasagne. Also contains a toolkit for various reinforcement learning algorithms, policies, memory augmentations, etc.
- The repo's here: [AgentNet](https://github.com/yandexdataschool/AgentNet)
- [A step-by-step demo for Atari SpaceInvaders ](https://github.com/yandexdataschool/AgentNet/blob/master/examples/Playing%20Atari%20with%20Deep%20Reinforcement%20Learning%20%28OpenAI%20Gym%29.ipynb)
## rllab
a framework for developing and evaluating reinforcement learning algorithms, fully compatible with OpenAI Gym. It includes a wide range of continuous control tasks plus implementations of many algorithms. [/rllab/rllab](https://github.com/rllab/rllab)
## [keras-rl](https://github.com/matthiasplappert/keras-rl)
[keras-rl](https://github.com/matthiasplappert/keras-rl) implements some state-of-the art deep reinforcement learning algorithms. It was built with OpenAI Gym in mind, and also built on top of the deep learning library [Keras](https://keras.io/) and utilises similar design patterns like callbacks and user-definable metrics.
# Environments
The gym comes prepackaged with many many environments. It's this common API around many environments that makes the gym so great. Here we will list additional environments that do not come prepacked with the gym. Submit another to this list via a pull-request.
_**NOTICE**: Its possible that in time OpenAI will develop a full fledged repository of supplemental environments. Until then this bit of markdown will suffice._
## PGE: Parallel Game Engine
PGE is a FOSS 3D engine for AI simulations, and can interoperate with the Gym. Contains environments with modern 3D graphics, and uses Bullet for physics.
Learn more here: https://github.com/222464/PGE
## gym-inventory: Inventory Control Environments
gym-inventory is a single agent domain featuring discrete state and action spaces that an AI agent might encounter in inventory control problems.
Learn more here: https://github.com/paulhendricks/gym-inventory
## gym-gazebo: training Robots in Gazebo
gym-gazebo presents an extension of the initial OpenAI gym for robotics using ROS and Gazebo, an advanced 3D modeling and
rendering tool.
Learn more here: https://github.com/erlerobot/gym-gazebo/
## gym-maze: 2D maze environment
A simple 2D maze environment where an agent finds its way from the start position to the goal.
Learn more here: https://github.com/tuzzer/gym-maze/
## gym-minigrid: Minimalistic Gridworld Environment
A minimalistic gridworld environment. Seeks to minimize software dependencies, be easy to extend and deliver good performance for faster training.
Learn more here: https://github.com/maximecb/gym-minigrid
## gym-sokoban: 2D Transportation Puzzles
The environment consists of transportation puzzles in which the player's goal is to push all boxes on the warehouse's storage locations.
The advantage of the environment is that it generates a new random level every time it is initialized or reset, which prevents over fitting to predefined levels.
Learn more here: https://github.com/mpSchrader/gym-sokoban
## gym-duckietown: Lane-Following Simulator for Duckietown
A lane-following simulator built for the [Duckietown](http://duckietown.org/) project (small-scale self-driving car course).
Learn more here: https://github.com/duckietown/gym-duckietown
# Miscellaneous
Here we have a bunch of tools, libs, apis, tutorials, resources, etc. provided by the community to add value to the gym ecosystem.
## OpenAIGym.jl
Convenience wrapper of the OpenAI Gym for the Julia language [/tbreloff/OpenAIGym.jl](https://github.com/tbreloff/OpenAIGym.jl)
\ No newline at end of file
# Table of Contents
- [Agents](agents.md) contains a listing of agents compatible with gym environments. Agents facilitate the running of an algorithm against an environment.
- [Environments](environments.md) lists more environments to run your algorithms against. These do not come prepackaged with the gym.
- [Miscellaneous](misc.md) is a collection of other value-add tools and utilities. These could be anything from a small convenience lib to a collection of video tutorials or a new language binding.
# Support code for cem.py
class BinaryActionLinearPolicy(object):
def __init__(self, theta):
self.w = theta[:-1]
self.b = theta[-1]
def act(self, ob):
y = ob.dot(self.w) + self.b
a = int(y < 0)
return a
class ContinuousActionLinearPolicy(object):
def __init__(self, theta, n_in, n_out):
assert len(theta) == (n_in + 1) * n_out
self.W = theta[0 : n_in * n_out].reshape(n_in, n_out)
self.b = theta[n_in * n_out : None].reshape(1, n_out)
def act(self, ob):
a = ob.dot(self.W) + self.b
return a
from __future__ import print_function
import gym
from gym import wrappers, logger
import numpy as np
from six.moves import cPickle as pickle
import json, sys, os
from os import path
from _policies import BinaryActionLinearPolicy # Different file so it can be unpickled
import argparse
def cem(f, th_mean, batch_size, n_iter, elite_frac, initial_std=1.0):
"""
Generic implementation of the cross-entropy method for maximizing a black-box function
f: a function mapping from vector -> scalar
th_mean: initial mean over input distribution
batch_size: number of samples of theta to evaluate per batch
n_iter: number of batches
elite_frac: each batch, select this fraction of the top-performing samples
initial_std: initial standard deviation over parameter vectors
"""
n_elite = int(np.round(batch_size*elite_frac))
th_std = np.ones_like(th_mean) * initial_std
for _ in range(n_iter):
ths = np.array([th_mean + dth for dth in th_std[None,:]*np.random.randn(batch_size, th_mean.size)])
ys = np.array([f(th) for th in ths])
elite_inds = ys.argsort()[::-1][:n_elite]
elite_ths = ths[elite_inds]
th_mean = elite_ths.mean(axis=0)
th_std = elite_ths.std(axis=0)
yield {'ys' : ys, 'theta_mean' : th_mean, 'y_mean' : ys.mean()}
def do_rollout(agent, env, num_steps, render=False):
total_rew = 0
ob = env.reset()
for t in range(num_steps):
a = agent.act(ob)
(ob, reward, done, _info) = env.step(a)
total_rew += reward
if render and t%3==0: env.render()
if done: break
return total_rew, t+1
if __name__ == '__main__':
logger.set_level(logger.INFO)
parser = argparse.ArgumentParser()
parser.add_argument('--display', action='store_true')
parser.add_argument('target', nargs="?", default="CartPole-v0")
args = parser.parse_args()
env = gym.make(args.target)
env.seed(0)
np.random.seed(0)
params = dict(n_iter=10, batch_size=25, elite_frac = 0.2)
num_steps = 200
# You provide the directory to write to (can be an existing
# directory, but can't contain previous monitor results. You can
# also dump to a tempdir if you'd like: tempfile.mkdtemp().
outdir = '/tmp/cem-agent-results'
env = wrappers.Monitor(env, outdir, force=True)
# Prepare snapshotting
# ----------------------------------------
def writefile(fname, s):
with open(path.join(outdir, fname), 'w') as fh: fh.write(s)
info = {}
info['params'] = params
info['argv'] = sys.argv
info['env_id'] = env.spec.id
# ------------------------------------------
def noisy_evaluation(theta):
agent = BinaryActionLinearPolicy(theta)
rew, T = do_rollout(agent, env, num_steps)
return rew
# Train the agent, and snapshot each stage
for (i, iterdata) in enumerate(
cem(noisy_evaluation, np.zeros(env.observation_space.shape[0]+1), **params)):
print('Iteration %2i. Episode mean reward: %7.3f'%(i, iterdata['y_mean']))
agent = BinaryActionLinearPolicy(iterdata['theta_mean'])
if args.display: do_rollout(agent, env, 200, render=True)
writefile('agent-%.4i.pkl'%i, str(pickle.dumps(agent, -1)))
# Write out the env at the end so we store the parameters of this
# environment.
writefile('info.json', json.dumps(info))
env.close()
#!/usr/bin/env python
from __future__ import print_function
import sys, gym, time
#
# Test yourself as a learning agent! Pass environment name as a command-line argument, for example:
#
# python keyboard_agent.py SpaceInvadersNoFrameskip-v4
#
env = gym.make('LunarLander-v2' if len(sys.argv)<2 else sys.argv[1])
if not hasattr(env.action_space, 'n'):
raise Exception('Keyboard agent only supports discrete action spaces')
ACTIONS = env.action_space.n
SKIP_CONTROL = 0 # Use previous control decision SKIP_CONTROL times, that's how you
# can test what skip is still usable.
human_agent_action = 0
human_wants_restart = False
human_sets_pause = False
def key_press(key, mod):
global human_agent_action, human_wants_restart, human_sets_pause
if key==0xff0d: human_wants_restart = True
if key==32: human_sets_pause = not human_sets_pause
a = int( key - ord('0') )
if a <= 0 or a >= ACTIONS: return
human_agent_action = a
def key_release(key, mod):
global human_agent_action
a = int( key - ord('0') )
if a <= 0 or a >= ACTIONS: return
if human_agent_action == a:
human_agent_action = 0
env.render()
env.unwrapped.viewer.window.on_key_press = key_press
env.unwrapped.viewer.window.on_key_release = key_release
def rollout(env):
global human_agent_action, human_wants_restart, human_sets_pause
human_wants_restart = False
obser = env.reset()
skip = 0
total_reward = 0
total_timesteps = 0
while 1:
if not skip:
#print("taking action {}".format(human_agent_action))
a = human_agent_action
total_timesteps += 1
skip = SKIP_CONTROL
else:
skip -= 1
obser, r, done, info = env.step(a)
if r != 0:
print("reward %0.3f" % r)
total_reward += r
window_still_open = env.render()
if window_still_open==False: return False
if done: break
if human_wants_restart: break
while human_sets_pause:
env.render()
time.sleep(0.1)
time.sleep(0.1)
print("timesteps %i reward %0.2f" % (total_timesteps, total_reward))
print("ACTIONS={}".format(ACTIONS))
print("Press keys 1 2 3 ... to take actions 1 2 3 ...")
print("No keys pressed is taking action 0")
while 1:
window_still_open = rollout(env)
if window_still_open==False: break
import argparse
import sys
import gym
from gym import wrappers, logger
class RandomAgent(object):
"""The world's simplest agent!"""
def __init__(self, action_space):
self.action_space = action_space
def act(self, observation, reward, done):
return self.action_space.sample()
if __name__ == '__main__':
parser = argparse.ArgumentParser(description=None)
parser.add_argument('env_id', nargs='?', default='CartPole-v0', help='Select the environment to run')
args = parser.parse_args()
# You can set the level to logger.DEBUG or logger.WARN if you
# want to change the amount of output.
logger.set_level(logger.INFO)
env = gym.make(args.env_id)
# You provide the directory to write to (can be an existing
# directory, including one with existing data -- all monitor files
# will be namespaced). You can also dump to a tempdir if you'd
# like: tempfile.mkdtemp().
outdir = '/tmp/random-agent-results'
env = wrappers.Monitor(env, directory=outdir, force=True)
env.seed(0)
agent = RandomAgent(env.action_space)
episode_count = 100
reward = 0
done = False
for i in range(episode_count):
ob = env.reset()
while True:
action = agent.act(ob, reward, done)
ob, reward, done, _ = env.step(action)
if done:
break
# Note there's no env.render() here. But the environment still can open window and
# render if asked by env.monitor: it calls env.render('rgb_array') to record video.
# Video is not recorded every episode, see capped_cubic_video_schedule for details.
# Close the env and write monitor result info to disk
env.close()
#!/usr/bin/env python
#
# Run all the tasks on a benchmark using a random agent.
#
# This script assumes you have set an OPENAI_GYM_API_KEY environment
# variable. You can find your API key in the web interface:
# https://gym.openai.com/settings/profile.
#
import argparse
import logging
import os
import sys
import gym
# In modules, use `logger = logging.getLogger(__name__)`
from gym import wrappers
from gym.scoreboard.scoring import benchmark_score_from_local
import openai_benchmark
logger = logging.getLogger()
def main():
parser = argparse.ArgumentParser(description=None)
parser.add_argument('-b', '--benchmark-id', help='id of benchmark to run e.g. Atari7Ram-v0')
parser.add_argument('-v', '--verbose', action='count', dest='verbosity', default=0, help='Set verbosity.')
parser.add_argument('-f', '--force', action='store_true', dest='force', default=False)
parser.add_argument('-t', '--training-dir', default="/tmp/gym-results", help='What directory to upload.')
args = parser.parse_args()
if args.verbosity == 0:
logger.setLevel(logging.INFO)
elif args.verbosity >= 1:
logger.setLevel(logging.DEBUG)
benchmark_id = args.benchmark_id
if benchmark_id is None:
logger.info("Must supply a valid benchmark")
return 1
try:
benchmark = gym.benchmark_spec(benchmark_id)
except Exception:
logger.info("Invalid benchmark")
return 1
# run benchmark tasks
for task in benchmark.tasks:
logger.info("Running on env: {}".format(task.env_id))
for trial in range(task.trials):
env = gym.make(task.env_id)
training_dir_name = "{}/{}-{}".format(args.training_dir, task.env_id, trial)
env = wrappers.Monitor(env, training_dir_name, video_callable=False, force=args.force)
env.reset()
for _ in range(task.max_timesteps):
o, r, done, _ = env.step(env.action_space.sample())
if done:
env.reset()
env.close()
logger.info("""Computing statistics for this benchmark run...
{{
score: {score},
num_envs_solved: {num_envs_solved},
summed_training_seconds: {summed_training_seconds},
start_to_finish_seconds: {start_to_finish_seconds},
}}
""".rstrip().format(**benchmark_score_from_local(benchmark_id, args.training_dir)))
logger.info("""Done running, upload results using the following command:
python -c "import gym; gym.upload('{}', benchmark_id='{}', algorithm_id='(unknown)')"
""".rstrip().format(args.training_dir, benchmark_id))
return 0
if __name__ == '__main__':
sys.exit(main())
#!/usr/bin/env python
from gym import envs
envids = [spec.id for spec in envs.registry.all()]
for envid in sorted(envids):
print(envid)
#!/usr/bin/env python
import gym
from gym import spaces, envs
import argparse
import numpy as np
import itertools
import time
parser = argparse.ArgumentParser()
parser.add_argument("env")
parser.add_argument("--mode", choices=["noop", "random", "static", "human"],
default="random")
parser.add_argument("--max_steps", type=int, default=0)
parser.add_argument("--fps",type=float)
parser.add_argument("--once", action="store_true")
parser.add_argument("--ignore_done", action="store_true")
args = parser.parse_args()
env = envs.make(args.env)
ac_space = env.action_space
fps = args.fps or env.metadata.get('video.frames_per_second') or 100
if args.max_steps == 0: args.max_steps = env.spec.tags['wrapper_config.TimeLimit.max_episode_steps']
while True:
env.reset()
env.render(mode='human')
print("Starting a new trajectory")
for t in range(args.max_steps) if args.max_steps else itertools.count():
done = False
if args.mode == "noop":
if isinstance(ac_space, spaces.Box):
a = np.zeros(ac_space.shape)
elif isinstance(ac_space, spaces.Discrete):
a = 0
else:
raise NotImplementedError("noop not implemented for class {}".format(type(ac_space)))
_, _, done, _ = env.step(a)
time.sleep(1.0/fps)
elif args.mode == "random":
a = ac_space.sample()
_, _, done, _ = env.step(a)
time.sleep(1.0/fps)
elif args.mode == "static":
time.sleep(1.0/fps)
elif args.mode == "human":
a = raw_input("type action from {0,...,%i} and press enter: "%(ac_space.n-1))
try:
a = int(a)
except ValueError:
print("WARNING: ignoring illegal action '{}'.".format(a))
a = 0
if a >= ac_space.n:
print("WARNING: ignoring illegal action {}.".format(a))
a = 0
_, _, done, _ = env.step(a)
env.render()
if done and not args.ignore_done: break
print("Done after {} steps".format(t+1))
if args.once:
break
else:
raw_input("Press enter to continue")
import distutils.version
import os
import sys
import warnings
from gym import error
from gym.utils import reraise
from gym.version import VERSION as __version__
from gym.core import Env, GoalEnv, Space, Wrapper, ObservationWrapper, ActionWrapper, RewardWrapper
from gym.envs import make, spec
from gym import logger
__all__ = ["Env", "Space", "Wrapper", "make", "spec"]
from gym import logger
import gym
from gym import error
from gym.utils import closer
env_closer = closer.Closer()
# Env-related abstractions
class Env(object):
"""The main OpenAI Gym class. It encapsulates an environment with
arbitrary behind-the-scenes dynamics. An environment can be
partially or fully observed.
The main API methods that users of this class need to know are:
step
reset
render
close
seed
And set the following attributes:
action_space: The Space object corresponding to valid actions
observation_space: The Space object corresponding to valid observations
reward_range: A tuple corresponding to the min and max possible rewards
Note: a default reward range set to [-inf,+inf] already exists. Set it if you want a narrower range.
The methods are accessed publicly as "step", "reset", etc.. The
non-underscored versions are wrapper methods to which we may add
functionality over time.
"""
# Set this in SOME subclasses
metadata = {'render.modes': []}
reward_range = (-float('inf'), float('inf'))
spec = None
# Set these in ALL subclasses
action_space = None
observation_space = None
def step(self, action):
"""Run one timestep of the environment's dynamics. When end of
episode is reached, you are responsible for calling `reset()`
to reset this environment's state.
Accepts an action and returns a tuple (observation, reward, done, info).
Args:
action (object): an action provided by the environment
Returns:
observation (object): agent's observation of the current environment
reward (float) : amount of reward returned after previous action
done (boolean): whether the episode has ended, in which case further step() calls will return undefined results
info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
"""
raise NotImplementedError
def reset(self):
"""Resets the state of the environment and returns an initial observation.
Returns: observation (object): the initial observation of the
space.
"""
raise NotImplementedError
def render(self, mode='human'):
"""Renders the environment.
The set of supported modes varies per environment. (And some
environments do not support rendering at all.) By convention,
if mode is:
- human: render to the current display or terminal and
return nothing. Usually for human consumption.
- rgb_array: Return an numpy.ndarray with shape (x, y, 3),
representing RGB values for an x-by-y pixel image, suitable
for turning into a video.
- ansi: Return a string (str) or StringIO.StringIO containing a
terminal-style text representation. The text can include newlines
and ANSI escape sequences (e.g. for colors).
Note:
Make sure that your class's metadata 'render.modes' key includes
the list of supported modes. It's recommended to call super()
in implementations to use the functionality of this method.
Args:
mode (str): the mode to render with
close (bool): close all open renderings
Example:
class MyEnv(Env):
metadata = {'render.modes': ['human', 'rgb_array']}
def render(self, mode='human'):
if mode == 'rgb_array':
return np.array(...) # return RGB frame suitable for video
elif mode is 'human':
... # pop up a window and render
else:
super(MyEnv, self).render(mode=mode) # just raise an exception
"""
raise NotImplementedError
def close(self):
"""Override _close in your subclass to perform any necessary cleanup.
Environments will automatically close() themselves when
garbage collected or when the program exits.
"""
return
def seed(self, seed=None):
"""Sets the seed for this env's random number generator(s).
Note:
Some environments use multiple pseudorandom number generators.
We want to capture all such seeds used in order to ensure that
there aren't accidental correlations between multiple generators.
Returns:
list<bigint>: Returns the list of seeds used in this env's random
number generators. The first value in the list should be the
"main" seed, or the value which a reproducer should pass to
'seed'. Often, the main seed equals the provided 'seed', but
this won't be true if seed=None, for example.
"""
logger.warn("Could not seed environment %s", self)
return
@property
def unwrapped(self):
"""Completely unwrap this env.
Returns:
gym.Env: The base non-wrapped gym.Env instance
"""
return self
def __str__(self):
if self.spec is None:
return '<{} instance>'.format(type(self).__name__)
else:
return '<{}<{}>>'.format(type(self).__name__, self.spec.id)
class GoalEnv(Env):
"""A goal-based environment. It functions just as any regular OpenAI Gym environment but it
imposes a required structure on the observation_space. More concretely, the observation
space is required to contain at least three elements, namely `observation`, `desired_goal`, and
`achieved_goal`. Here, `desired_goal` specifies the goal that the agent should attempt to achieve.
`achieved_goal` is the goal that it currently achieved instead. `observation` contains the
actual observations of the environment as per usual.
"""
def reset(self):
# Enforce that each GoalEnv uses a Goal-compatible observation space.
if not isinstance(self.observation_space, gym.spaces.Dict):
raise error.Error('GoalEnv requires an observation space of type gym.spaces.Dict')
result = super(GoalEnv, self).reset()
for key in ['observation', 'achieved_goal', 'desired_goal']:
if key not in result:
raise error.Error('GoalEnv requires the "{}" key to be part of the observation dictionary.'.format(key))
return result
def compute_reward(self, achieved_goal, desired_goal, info):
"""Compute the step reward. This externalizes the reward function and makes
it dependent on an a desired goal and the one that was achieved. If you wish to include
additional rewards that are independent of the goal, you can include the necessary values
to derive it in info and compute it accordingly.
Args:
achieved_goal (object): the goal that was achieved during execution
desired_goal (object): the desired goal that we asked the agent to attempt to achieve
info (dict): an info dictionary with additional information
Returns:
float: The reward that corresponds to the provided achieved goal w.r.t. to the desired
goal. Note that the following should always hold true:
ob, reward, done, info = env.step()
assert reward == env.compute_reward(ob['achieved_goal'], ob['goal'], info)
"""
raise NotImplementedError()
# Space-related abstractions
class Space(object):
"""Defines the observation and action spaces, so you can write generic
code that applies to any Env. For example, you can choose a random
action.
"""
def __init__(self, shape=None, dtype=None):
import numpy as np # takes about 300-400ms to import, so we load lazily
self.shape = None if shape is None else tuple(shape)
self.dtype = None if dtype is None else np.dtype(dtype)
def sample(self):
"""
Uniformly randomly sample a random element of this space
"""
raise NotImplementedError
def contains(self, x):
"""
Return boolean specifying if x is a valid
member of this space
"""
raise NotImplementedError
__contains__ = contains
def to_jsonable(self, sample_n):
"""Convert a batch of samples from this space to a JSONable data type."""
# By default, assume identity is JSONable
return sample_n
def from_jsonable(self, sample_n):
"""Convert a JSONable data type to a batch of samples from this space."""
# By default, assume identity is JSONable
return sample_n
warn_once = True
def deprecated_warn_once(text):
global warn_once
if not warn_once: return
warn_once = False
logger.warn(text)
class Wrapper(Env):
env = None
def __init__(self, env):
self.env = env
self.action_space = self.env.action_space
self.observation_space = self.env.observation_space
self.reward_range = self.env.reward_range
self.metadata = self.env.metadata
@classmethod
def class_name(cls):
return cls.__name__
def step(self, action):
if hasattr(self, "_step"):
deprecated_warn_once("%s doesn't implement 'step' method, but it implements deprecated '_step' method." % type(self))
self.step = self._step
return self.step(action)
else:
deprecated_warn_once("%s doesn't implement 'step' method, " % type(self) +
"which is required for wrappers derived directly from Wrapper. Deprecated default implementation is used.")
return self.env.step(action)
def reset(self, **kwargs):
if hasattr(self, "_reset"):
deprecated_warn_once("%s doesn't implement 'reset' method, but it implements deprecated '_reset' method." % type(self))
self.reset = self._reset
return self._reset(**kwargs)
else:
deprecated_warn_once("%s doesn't implement 'reset' method, " % type(self) +
"which is required for wrappers derived directly from Wrapper. Deprecated default implementation is used.")
return self.env.reset(**kwargs)
def render(self, mode='human', **kwargs):
return self.env.render(mode, **kwargs)
def close(self):
if self.env:
return self.env.close()
def seed(self, seed=None):
return self.env.seed(seed)
def compute_reward(self, achieved_goal, desired_goal, info):
return self.env.compute_reward(achieved_goal, desired_goal, info)
def __str__(self):
return '<{}{}>'.format(type(self).__name__, self.env)
def __repr__(self):
return str(self)
@property
def unwrapped(self):
return self.env.unwrapped
@property
def spec(self):
return self.env.spec
class ObservationWrapper(Wrapper):
def step(self, action):
observation, reward, done, info = self.env.step(action)
return self.observation(observation), reward, done, info
def reset(self, **kwargs):
observation = self.env.reset(**kwargs)
return self.observation(observation)
def observation(self, observation):
deprecated_warn_once("%s doesn't implement 'observation' method. Maybe it implements deprecated '_observation' method." % type(self))
return self._observation(observation)
class RewardWrapper(Wrapper):
def reset(self):
return self.env.reset()
def step(self, action):
observation, reward, done, info = self.env.step(action)
return observation, self.reward(reward), done, info
def reward(self, reward):
deprecated_warn_once("%s doesn't implement 'reward' method. Maybe it implements deprecated '_reward' method." % type(self))
return self._reward(reward)
class ActionWrapper(Wrapper):
def step(self, action):
action = self.action(action)
return self.env.step(action)
def reset(self):
return self.env.reset()
def action(self, action):
deprecated_warn_once("%s doesn't implement 'action' method. Maybe it implements deprecated '_action' method." % type(self))
return self._action(action)
def reverse_action(self, action):
deprecated_warn_once("%s doesn't implement 'reverse_action' method. Maybe it implements deprecated '_reverse_action' method." % type(self))
return self._reverse_action(action)
# Envs
These are the core integrated environments. Note that we may later
restructure any of the files, but will keep the environments available
at the relevant package's top-level. So for example, you should access
`AntEnv` as follows:
```
# Will be supported in future releases
from gym.envs import mujoco
mujoco.AntEnv
```
Rather than:
```
# May break in future releases
from gym.envs.mujoco import ant
ant.AntEnv
```
## How to create new environments for Gym
* Create a new repo called gym-foo, which should also be a PIP package.
* A good example is https://github.com/openai/gym-soccer.
* It should have at least the following files:
<