(tune-rllib-example)=

# Using RLlib with Tune

```{image} /rllib/images/rllib-logo.png
:align: center
:alt: RLlib Logo
:height: 120px
:target: https://docs.ray.io
```

```{contents}
:backlinks: none
:local: true
```

## Example

Example of using PBT with RLlib.

Note that this requires a cluster with at least 8 GPUs in order for all trials
to run concurrently, otherwise PBT will round-robin train the trials which
is less efficient (or you can set {"gpu": 0} to use CPUs for SGD instead).

Note that Tune in general does not need 8 GPUs, and this is just a more
computationally demanding example.

In [1]:
import random

from ray import air, tune
from ray.tune.schedulers import PopulationBasedTraining

if __name__ == "__main__":

    # Postprocess the perturbed config to ensure it's still valid
    def explore(config):
        # ensure we collect enough timesteps to do sgd
        if config["train_batch_size"] < config["sgd_minibatch_size"] * 2:
            config["train_batch_size"] = config["sgd_minibatch_size"] * 2
        # ensure we run at least one sgd iter
        if config["num_sgd_iter"] < 1:
            config["num_sgd_iter"] = 1
        return config

    pbt = PopulationBasedTraining(
        time_attr="time_total_s",
        perturbation_interval=120,
        resample_probability=0.25,
        # Specifies the mutations of these hyperparams
        hyperparam_mutations={
            "lambda": lambda: random.uniform(0.9, 1.0),
            "clip_param": lambda: random.uniform(0.01, 0.5),
            "lr": [1e-3, 5e-4, 1e-4, 5e-5, 1e-5],
            "num_sgd_iter": lambda: random.randint(1, 30),
            "sgd_minibatch_size": lambda: random.randint(128, 16384),
            "train_batch_size": lambda: random.randint(2000, 160000),
        },
        custom_explore_fn=explore,
    )
    
    tuner = tune.Tuner(
        "PPO",
        tune_config=tune.TuneConfig(
            metric="episode_reward_mean",
            mode="max",
            scheduler=pbt,
            num_samples=1,
        ),
        param_space={
            "env": "Humanoid-v1",
            "kl_coeff": 1.0,
            "num_workers": 8,
            "num_gpus": 0, # number of GPUs to use
            "model": {"free_log_std": True},
            # These params are tuned from a fixed starting value.
            "lambda": 0.95,
            "clip_param": 0.2,
            "lr": 1e-4,
            # These params start off randomly drawn from a set.
            "num_sgd_iter": tune.choice([10, 20, 30]),
            "sgd_minibatch_size": tune.choice([128, 512, 2048]),
            "train_batch_size": tune.choice([10000, 20000, 40000]),
        },
    )
    results = tuner.fit()

    print("best hyperparameters: ", results.get_best_result().config)


2022-07-22 16:45:08,004	INFO services.py:1483 -- View the Ray dashboard at [1m[32mhttp://127.0.0.1:8274[39m[22m
  "Consider boosting PBT performance by enabling `reuse_actors` as "


Trial name,status,loc,num_sgd_iter,sgd_minibatch_size,train_batch_size
PPO_Humanoid-v1_45196_00000,ERROR,,30,128,20000

Trial name,# failures,error file
PPO_Humanoid-v1_45196_00000,1,"/Users/kai/ray_results/PPO/PPO_Humanoid-v1_45196_00000_0_num_sgd_iter=30,sgd_minibatch_size=128,train_batch_size=20000_2022-07-22_16-45-11/error.txt"


2022-07-22 16:45:11,640	INFO plugin_schema_manager.py:52 -- Loading the default runtime env schemas: ['/Users/kai/coding/ray/python/ray/_private/runtime_env/../../runtime_env/schemas/working_dir_schema.json', '/Users/kai/coding/ray/python/ray/_private/runtime_env/../../runtime_env/schemas/pip_schema.json'].
[2m[36m(PPO pid=53765)[0m 2022-07-22 16:45:21,449	INFO algorithm.py:1855 -- Your framework setting is 'tf', meaning you are using static-graph mode. Set framework='tf2' to enable eager execution with tf2.x. You may also then want to set eager_tracing=True in order to reach similar execution speed as with static-graph mode.
[2m[36m(PPO pid=53765)[0m 2022-07-22 16:45:21,450	INFO ppo.py:379 -- In multi-agent mode, policies will be optimized sequentially by the multi-GPU optimizer. Consider setting simple_optimizer=True if this doesn't work for you.
[2m[36m(PPO pid=53765)[0m 2022-07-22 16:45:21,450	INFO algorithm.py:343 -- Current log_level is WARN. For more information, set 'l

Result for PPO_Humanoid-v1_45196_00000:
  trial_id: '45196_00000'
  


2022-07-22 16:45:36,688	ERROR ray_trial_executor.py:104 -- An exception occurred when trying to stop the Ray actor:Traceback (most recent call last):
  File "/Users/kai/coding/ray/python/ray/tune/execution/ray_trial_executor.py", line 94, in post_stop_cleanup
    ray.get(future, timeout=0)
  File "/Users/kai/coding/ray/python/ray/_private/client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "/Users/kai/coding/ray/python/ray/_private/worker.py", line 2199, in get
    raise value
  File "python/ray/_raylet.pyx", line 812, in ray._raylet.task_execution_handler
  File "python/ray/_raylet.pyx", line 623, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 772, in ray._raylet.execute_task
ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, [36mray::PPO.__init__()[39m (pid=53765, ip=127.0.0.1, repr=PPO)
  File "/Users/kai/coding/ray/python/ray/rllib/evaluation/worker_set.py", line 127, in __init__
    vali

best hyperparameters:  None


## More RLlib Examples

- {doc}`/tune/examples/includes/pb2_ppo_example`:
  Example of optimizing a distributed RLlib algorithm (PPO) with the PB2 scheduler.
  Uses a small population size of 4, so can train on a laptop.