This repository update so quickly, please make sure that your fork is up to date.
This repository will implement the classic and state-of-the-art deep reinforcement learning algorithms. The aim of this repository is to provide clear pytorch code for people to learn the deep reinforcement learning algorithm.
In the future, more state-of-the-art algorithms will be added and the existing codes will also be maintained.
If you need me help you implement RL, you can send a email to me.
My email addres is johnyhe1997 at gmail dot com
- python <=3.6
- gym >= 0.10
- pytorch >= 0.4
Note that tensorflow does not support python3.7
pip install -r requirements.txt
If you fail:
- Install gym
pip install gym
- Install the pytorch
please go to official webisite to install it: https://pytorch.org/ Recommend use Anaconda Virtual Environment to manage your packages
- Install tensorboardX
pip install tensorboardX pip install tensorflow==1.12
cd Char10\ TD3/ python TD3_BipedalWalker-v2.py --mode test
You could see a bipedalwalker if you install successfully.
- install openai-baselines (Optional)
# clone the openai baselines git clone https://github.com/openai/baselines.git cd baselines pip install -e .
Here I uploaded two DQN models which is trianing CartPole-v0 and MountainCar-v0.
Tips for MountainCar-v0
This is a sparse binary reward task. Only when car reach the top of the mountain there is a none-zero reward. In genearal it may take 1e5 steps in stochastic policy. You can add a reward term, for example, to change to the current position of the Car is positively related. Of course, there is a more advanced approach that is inverse reinforcement learning.
This is value loss for DQN, We can see that the loss increaded to 1e13, however, the network work well. Because the target_net and act_net are very different with the training process going on. The calculated loss cumulate large. The previous loss was small because the reward was very sparse, resulting in a small update of the two networks.
Papers Related to the DQN
- Playing Atari with Deep Reinforcement Learning [arxiv] [code]
- Deep Reinforcement Learning with Double Q-learning [arxiv] [code]
- Dueling Network Architectures for Deep Reinforcement Learning [arxiv] [code]
- Prioritized Experience Replay [arxiv] [code]
- Noisy Networks for Exploration [arxiv] [code]
- A Distributional Perspective on Reinforcement Learning [arxiv] [code]
- Rainbow: Combining Improvements in Deep Reinforcement Learning [arxiv] [code]
- Distributional Reinforcement Learning with Quantile Regression [arxiv] [code]
- Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation [arxiv] [code]
- Neural Episodic Control [arxiv] [code]
Use the following command to run a saved model
Use the following command to train model
This is a model that I have trained.
This is an algorithmic framework, and the classic REINFORCE method is stored under Actor-Critic.
Episode reward in Pendulum-v0:
- Original paper: https://arxiv.org/abs/1707.06347
- Openai Baselines blog post: https://blog.openai.com/openai-baselines-ppo/
Advantage Policy Gradient, an paper in 2017 pointed out that the difference in performance betw