SLM Lab是PyTorch中的模块化深度强化学习框架
SLM Lab是PyTorch中的模块化深度强化学习框架
v4.1.0
kengz released this
This marks a stable release of SLM Lab with full benchmark results
RAdam+Lookahead optimizer
- Lookahead + RAdam optimizer significantly improves the performance of some RL algorithms (A2C (n-step), PPO) on continuous domain problems, but does not improve (A2C (GAE), SAC). #416
TensorBoard
- Add TensorBoard in body to auto-log summary variables, graph, network parameter histograms, action histogram. To launch TensorBoard, run
tensorboard --logdir=data
after a session/trial is completed. Example screenshot:
Full Benchmark Upload
Plot Legend
Discrete Benchmark
Env. \ Alg. | DQN | DDQN+PER | A2C (GAE) | A2C (n-step) | PPO | SAC |
Breakout | 80.88 | 182 | 377 | 398 | 443 | - |
Pong | 18.48 | 20.5 | 19.31 | 19.56 | 20.58 | 19.87* |
Seaquest | 1185 | 4405 | 1070 | 1684 | 1715 | - |
Qbert | 5494 | 11426 | 12405 | 13590 | 13460 | 214* |
LunarLander | 192 | 233 | 25.21 | 68.23 | 214 | 276 |
UnityHallway | -0.32 | 0.27 | 0.08 | -0.96 | 0.73 | - |
UnityPushBlock | 4.88 | 4.93 | 4.68 | 4.93 | 4.97 | - |
Episode score at the end of training attained by SLM Lab implementations on discrete-action control problems. Reported episode scores are the average over the last 100 checkpoints, and then averaged over 4 Sessions. Results marked with
*
were trained using the hybrid synchronous/asynchronous version of SAC to parallelize and speed up training time.
For the full Atari benchmark, see Atari Benchmark
Continuous Benchmark
Env. \ Alg. | A2C (GAE) | A2C (n-step) | PPO | SAC |
RoboschoolAnt | 787 | 1396 | 1843 | 2915 |
RoboschoolAtlasForwardWalk | 59.87 | 88.04 | 172 | 800 |
RoboschoolHalfCheetah | 712 | 439 | 1960 | 2497 |
RoboschoolHopper | 710 | 285 | 2042 | 2045 |
RoboschoolInvertedDoublePendulum | 996 | 4410 | 8076 | 8085 |
RoboschoolInverted |
1b634c0
Discrete SAC benchmark update
graph
graph
graph
graph
graph
graph
graph