ray/rllib/optimizers/rollout.py

import logging

import ray
from ray.rllib.policy.sample_batch import SampleBatch

logger = logging.getLogger(__name__)


def collect_samples(agents, rollout_fragment_length, num_envs_per_worker,
                    train_batch_size):
    """Collects at least train_batch_size samples, never discarding any."""

    num_timesteps_so_far = 0
    trajectories = []
    agent_dict = {}

    for agent in agents:
        fut_sample = agent.sample.remote()
        agent_dict[fut_sample] = agent

    while agent_dict:
        [fut_sample], _ = ray.wait(list(agent_dict))
        agent = agent_dict.pop(fut_sample)
        next_sample = ray.get(fut_sample)
        num_timesteps_so_far += next_sample.count
        trajectories.append(next_sample)

        # Only launch more tasks if we don't already have enough pending
        pending = len(
            agent_dict) * rollout_fragment_length * num_envs_per_worker
        if num_timesteps_so_far + pending < train_batch_size:
            fut_sample2 = agent.sample.remote()
            agent_dict[fut_sample2] = agent

    return SampleBatch.concat_samples(trajectories)
[rllib] Avoid sample wastage with bad PPO configurations (#3552) ## What do these changes do? Previously we logged a warning if the PPO configuration would waste many samples. However, this didn't apply in the case of long episodes in `complete_episodes` batch mode, and also the amount of waste is up to 2x in common cases. This pr: - Estimates the number of sampling tasks needed to avoid over-sampling. - Collects all sample results and never discards any. In principle this can degrade performance at large scale if certain machines are slower. Add a config flag to enable this legacy behavior. ## Related issue number Closes: https://github.com/ray-project/ray/issues/3549 2018-12-21 03:50:44 +09:00			`import logging`

			`import ray`
[rllib] Rename PolicyGraph => Policy, move from evaluation/ to policy/ (#4819) This implements some of the renames proposed in #4813 We leave behind backwards-compatibility aliases for *PolicyGraph and SampleBatch. 2019-05-20 16:46:05 -07:00			`from ray.rllib.policy.sample_batch import SampleBatch`
[rllib] Avoid sample wastage with bad PPO configurations (#3552) ## What do these changes do? Previously we logged a warning if the PPO configuration would waste many samples. However, this didn't apply in the case of long episodes in `complete_episodes` batch mode, and also the amount of waste is up to 2x in common cases. This pr: - Estimates the number of sampling tasks needed to avoid over-sampling. - Collects all sample results and never discards any. In principle this can degrade performance at large scale if certain machines are slower. Add a config flag to enable this legacy behavior. ## Related issue number Closes: https://github.com/ray-project/ray/issues/3549 2018-12-21 03:50:44 +09:00
			`logger = logging.getLogger(__name__)`


[rllib] Rename sample_batch_size => rollout_fragment_length (#7503) * bulk rename * deprecation warn * update doc * update fig * line length * rename * make pytest comptaible * fix test * fi sys * rename * wip * fix more * lint * update svg * comments * lint * fix use of batch steps 2020-03-14 12:05:04 -07:00			`def collect_samples(agents, rollout_fragment_length, num_envs_per_worker,`
[rllib] Avoid sample wastage with bad PPO configurations (#3552) ## What do these changes do? Previously we logged a warning if the PPO configuration would waste many samples. However, this didn't apply in the case of long episodes in `complete_episodes` batch mode, and also the amount of waste is up to 2x in common cases. This pr: - Estimates the number of sampling tasks needed to avoid over-sampling. - Collects all sample results and never discards any. In principle this can degrade performance at large scale if certain machines are slower. Add a config flag to enable this legacy behavior. ## Related issue number Closes: https://github.com/ray-project/ray/issues/3549 2018-12-21 03:50:44 +09:00			`train_batch_size):`
			`"""Collects at least train_batch_size samples, never discarding any."""`

			`num_timesteps_so_far = 0`
			`trajectories = []`
			`agent_dict = {}`

			`for agent in agents:`
			`fut_sample = agent.sample.remote()`
			`agent_dict[fut_sample] = agent`

			`while agent_dict:`
			`[fut_sample], _ = ray.wait(list(agent_dict))`
			`agent = agent_dict.pop(fut_sample)`
[rllib] Deprecate policy optimizers (#8345) 2020-05-21 10:16:18 -07:00			`next_sample = ray.get(fut_sample)`
[rllib] Avoid sample wastage with bad PPO configurations (#3552) ## What do these changes do? Previously we logged a warning if the PPO configuration would waste many samples. However, this didn't apply in the case of long episodes in `complete_episodes` batch mode, and also the amount of waste is up to 2x in common cases. This pr: - Estimates the number of sampling tasks needed to avoid over-sampling. - Collects all sample results and never discards any. In principle this can degrade performance at large scale if certain machines are slower. Add a config flag to enable this legacy behavior. ## Related issue number Closes: https://github.com/ray-project/ray/issues/3549 2018-12-21 03:50:44 +09:00			`num_timesteps_so_far += next_sample.count`
			`trajectories.append(next_sample)`

			`# Only launch more tasks if we don't already have enough pending`
[rllib] Rename sample_batch_size => rollout_fragment_length (#7503) * bulk rename * deprecation warn * update doc * update fig * line length * rename * make pytest comptaible * fix test * fi sys * rename * wip * fix more * lint * update svg * comments * lint * fix use of batch steps 2020-03-14 12:05:04 -07:00			`pending = len(`
			`agent_dict) * rollout_fragment_length * num_envs_per_worker`
[rllib] Avoid sample wastage with bad PPO configurations (#3552) ## What do these changes do? Previously we logged a warning if the PPO configuration would waste many samples. However, this didn't apply in the case of long episodes in `complete_episodes` batch mode, and also the amount of waste is up to 2x in common cases. This pr: - Estimates the number of sampling tasks needed to avoid over-sampling. - Collects all sample results and never discards any. In principle this can degrade performance at large scale if certain machines are slower. Add a config flag to enable this legacy behavior. ## Related issue number Closes: https://github.com/ray-project/ray/issues/3549 2018-12-21 03:50:44 +09:00			`if num_timesteps_so_far + pending < train_batch_size:`
			`fut_sample2 = agent.sample.remote()`
			`agent_dict[fut_sample2] = agent`

			`return SampleBatch.concat_samples(trajectories)`