ray/rllib/examples/sb2rllib_rllib_example.py
Yi Cheng fd0f967d2e
Revert "[RLlib] Move (A/DD)?PPO and IMPALA algos to algorithms dir and rename policy and trainer classes. (#25346)" (#25420)
This reverts commit e4ceae19ef.

Reverts #25346

linux://python/ray/tests:test_client_library_integration never fail before this PR.

In the CI of the reverted PR, it also fails (https://buildkite.com/ray-project/ray-builders-pr/builds/34079#01812442-c541-4145-af22-2a012655c128). So high likely it's because of this PR.

And test output failure seems related as well (https://buildkite.com/ray-project/ray-builders-branch/builds/7923#018125c2-4812-4ead-a42f-7fddb344105b)
2022-06-02 20:38:44 -07:00

46 lines
1.3 KiB
Python

"""
Example script on how to train, save, load, and test an RLlib agent.
Equivalent script with stable baselines: sb2rllib_sb_example.py.
Demonstrates transition from stable_baselines to Ray RLlib.
Run example: python sb2rllib_rllib_example.py
"""
import gym
import ray
import ray.rllib.agents.ppo as ppo
# settings used for both stable baselines and rllib
env_name = "CartPole-v1"
train_steps = 10000
learning_rate = 1e-3
save_dir = "saved_models"
# training and saving
analysis = ray.tune.run(
"PPO",
stop={"timesteps_total": train_steps},
config={"env": env_name, "lr": learning_rate},
checkpoint_at_end=True,
local_dir=save_dir,
)
# retrieve the checkpoint path
analysis.default_metric = "episode_reward_mean"
analysis.default_mode = "max"
checkpoint_path = analysis.get_best_checkpoint(trial=analysis.get_best_trial())
print(f"Trained model saved at {checkpoint_path}")
# load and restore model
agent = ppo.PPOTrainer(env=env_name)
agent.restore(checkpoint_path)
print(f"Agent loaded from saved model at {checkpoint_path}")
# inference
env = gym.make(env_name)
obs = env.reset()
for i in range(1000):
action = agent.compute_single_action(obs)
obs, reward, done, info = env.step(action)
env.render()
if done:
print(f"Cart pole dropped after {i} steps.")
break