hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Sven Mika	9e6b871739	[RLlib] Better utils for flattening complex inputs and enable prev-actions for LSTM/attention for complex action spaces. (#21330 )	2022-01-05 11:29:44 +01:00
Sven Mika	0b308719f8	[RLlib; Docs overhaul] Docstring cleanup: rllib/utils (#19829 )	2021-11-01 21:46:02 +01:00
Sven Mika	b4300dd532	[RLlib] Issue 18812: Torch multi-GPU stats not protected against race conditions. (#18937 )	2021-10-04 13:29:00 +02:00
Sven Mika	ed85f59194	[RLlib] Unify all RLlib Trainer.train() -> results[info][learner][policy ID][learner_stats] and add structure tests. (#18879 )	2021-09-30 16:39:05 +02:00
Sven Mika	924f11cd45	[RLlib] Torch algos use now-framework-agnostic MultiGPUTrainOneStep execution op (~33% speedup for PPO-torch + GPU). (#17371 )	2021-08-03 11:35:49 -04:00
Sven Mika	e0640ad0dc	[RLlib] Fix seeding for ES and ARS. (#16744 )	2021-07-19 13:13:05 -04:00
Michael Luo	4cbe13cdfd	[RLlib] CQL loss fn fixes, MuJoCo + Pendulum benchmarks, offline-RL example script w/ json file. (#15603 ) Co-authored-by: Sven Mika <sven@anyscale.io> Co-authored-by: sven1977 <svenmika1977@gmail.com>	2021-05-04 19:06:19 +02:00
Sven Mika	cecfc3b43b	[RLlib] Multi-GPU support for Torch algorithms. (#14709 )	2021-04-16 09:16:24 +02:00
Sven Mika	2e3655e8a9	[RLlib] Issue 9071 A3C w/ RNN not working due to VF assuming no RNN. (#13238 )	2021-01-19 14:22:36 +01:00
Sven Mika	1e74187179	[RLlib] TorchPolicies: Accessing "infos" dict in train_batch causes `TypeError`. (#13039 )	2020-12-23 11:30:50 -05:00
Sven Mika	01faeabc17	[RLlib] Issue 12789: RLlib throws the warning "The given NumPy array is not writeable" (#12793 )	2020-12-22 09:28:07 -05:00
Sven Mika	5b788ccb13	[RLlib] Trajectory view API (prep PR for switching on by default across all RLlib; plumbing only) (#11717 )	2020-11-03 12:53:34 -08:00
Sven Mika	36bda8432b	[RLlib] Trajectory view API: Simple List Collector (on by default for PPO); LSTM-agnostic (#11056 )	2020-10-01 16:57:10 +02:00
Sven Mika	244aafdcf8	[RLlib] Curiosity enhancements. (#10373 )	2020-09-05 13:14:24 +02:00
Eric Liang	deea1861ab	[rllib] Try fixing torch GPU and masking errors (#10168 )	2020-08-25 18:34:19 -07:00
Sven Mika	b0b0463161	[RLlib] Trajectory View API (preparatory cleanup and enhancements). (#9678 )	2020-07-29 21:15:09 +02:00
Sven Mika	5dc4b6686e	[RLlib] Implement DQN PyTorch distributional head. (#9589 )	2020-07-25 09:29:24 +02:00
Sven Mika	935d8308fb	[RLlib] Issue #9437 (PyTorch converts to CPU tensor, even if on GPU). (#9497 )	2020-07-16 14:55:50 +02:00
Sven Mika	5b2a97597b	[RLlib] Retire `try_import_tree` (should be installed along with other requirements). (#9211 ) - Retire try_import_tree. - Stabilize test_supported_multi_agent.py.	2020-07-02 13:06:34 +02:00
Sven Mika	5c6d5d4ab1	This PR fixes the currently broken lstm_use_prev_action_reward flag for default lstm models (model.use_lstm=True). (#8970 )	2020-06-27 20:50:01 +02:00
Tanay Wakhare	f77c638d6d	Pytorch AttentionNet (#9088 )	2020-06-23 20:42:30 +02:00
Sven Mika	7008902cff	[RLlib] Minor `rllib.utils` cleanup. (#8932 )	2020-06-16 08:52:20 +02:00
Sven Mika	25c0974543	[RLlib] Issue 8412 (Adam vars not stored in ModelV2). (#8480 )	2020-06-05 21:07:02 +02:00
Sven Mika	1775e89f26	[RLlib] Remove TupleActions and support arbitrarily nested action spaces. (#8143 ) Deprecate TupleActions and support arbitrarily nested action spaces. Closes issue #8143.	2020-04-28 14:59:16 +02:00
Sven Mika	499ad5fbe4	[RLlib] PyTorch version of APPO. (#8120 ) - Translate all vtrace functionality to torch and added torch to the framework_iterator-loop in all existing vtrace test cases. - Add learning test cases for APPO torch (both w/ and w/o v-trace). - Add quick compilation tests for APPO (tf and torch, v-trace and no v-trace).	2020-04-23 09:11:12 +02:00
Sven Mika	d0fab84e4d	[RLlib] DDPG PyTorch version. (#7953 ) The DDPG/TD3 algorithms currently do not have a PyTorch implementation. This PR adds PyTorch support for DDPG/TD3 to RLlib. This PR: - Depends on the re-factor PR for DDPG (Functional Algorithm API). - Adds learning regression tests for the PyTorch version of DDPG and a DDPG (torch) - Updates the documentation to reflect that DDPG and TD3 now support PyTorch. * Learning Pendulum-v0 on torch version (same config as tf). Wall time a little slower (~20% than tf). * Fix GPU target model problem.	2020-04-16 10:20:01 +02:00
Sven Mika	428516056a	[RLlib] SAC Torch (incl. Atari learning) (#7984 ) * Policy-classes cleanup and torch/tf unification. - Make Policy abstract. - Add `action_dist` to call to `extra_action_out_fn` (necessary for PPO torch). - Move some methods and vars to base Policy (from TFPolicy): num_state_tensors, ACTION_PROB, ACTION_LOGP and some more. * Fix `clip_action` import from Policy (should probably be moved into utils altogether). * - Move `is_recurrent()` and `num_state_tensors()` into TFPolicy (from DynamicTFPolicy). - Add config to all Policy c'tor calls (as 3rd arg after obs and action spaces). * Add `config` to c'tor call to TFPolicy. * Add missing `config` to c'tor call to TFPolicy in marvil_policy.py. * Fix test_rollout_worker.py::MockPolicy and BadPolicy classes (Policy base class is now abstract). * Fix LINT errors in Policy classes. * Implement StatefulPolicy abstract methods in test cases: test_multi_agent_env.py. * policy.py LINT errors. * Create a simple TestPolicy to sub-class from when testing Policies (reduces code in some test cases). * policy.py - Remove abstractmethod from `apply_gradients` and `compute_gradients` (these are not required iff `learn_on_batch` implemented). - Fix docstring of `num_state_tensors`. * Make QMIX torch Policy a child of TorchPolicy (instead of Policy). * QMixPolicy add empty implementations of abstract Policy methods. * Store Policy's config in self.config in base Policy c'tor. * - Make only compute_actions in base Policy's an abstractmethod and provide pass implementation to all other methods if not defined. - Fix state_batches=None (most Policies don't have internal states). * Cartpole tf learning. * Cartpole tf AND torch learning (in ~ same ts). * Cartpole tf AND torch learning (in ~ same ts). 2 * Cartpole tf (torch syntax-broken) learning (in ~ same ts). 3 * Cartpole tf AND torch learning (in ~ same ts). 4 * Cartpole tf AND torch learning (in ~ same ts). 5 * Cartpole tf AND torch learning (in ~ same ts). 6 * Cartpole tf AND torch learning (in ~ same ts). Pendulum tf learning. * WIP. * WIP. * SAC torch learning Pendulum. * WIP. * SAC torch and tf learning Pendulum and Cartpole after cleanup. * WIP. * LINT. * LINT. * SAC: Move policy.target_model to policy.device as well. * Fixes and cleanup. * Fix data-format of tf keras Conv2d layers (broken for some tf-versions which have data_format="channels_first" as default). * Fixes and LINT. * Fixes and LINT. * Fix and LINT. * WIP. * Test fixes and LINT. * Fixes and LINT. Co-authored-by: Sven Mika <sven@Svens-MacBook-Pro.local>	2020-04-15 13:25:16 +02:00
Jan Blumenkamp	8e439688fc	Torch sequence_mask now works for tensors on different devices (#7980 )	2020-04-15 07:21:51 +02:00
Sven Mika	22ccc43670	[RLlib] DQN torch version. (#7597 ) * Fix. * Rollback. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * Fix. * Fix. * Fix. * Fix. * Fix. * WIP. * WIP. * Fix. * Test case fixes. * Test case fixes and LINT. * Test case fixes and LINT. * Rollback. * WIP. * WIP. * Test case fixes. * Fix. * Fix. * Fix. * Add regression test for DQN w/ param noise. * Fixes and LINT. * Fixes and LINT. * Fixes and LINT. * Fixes and LINT. * Fixes and LINT. * Comment * Regression test case. * WIP. * WIP. * LINT. * LINT. * WIP. * Fix. * Fix. * Fix. * LINT. * Fix (SAC does currently not support eager). * Fix. * WIP. * LINT. * Update rllib/evaluation/sampler.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Update rllib/evaluation/sampler.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Update rllib/utils/exploration/exploration.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Update rllib/utils/exploration/exploration.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * WIP. * WIP. * Fix. * LINT. * LINT. * Fix and LINT. * WIP. * WIP. * WIP. * WIP. * Fix. * LINT. * Fix. * Fix and LINT. * Update rllib/utils/exploration/exploration.py * Update rllib/policy/dynamic_tf_policy.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Update rllib/policy/dynamic_tf_policy.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Update rllib/policy/dynamic_tf_policy.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Fixes. * WIP. * LINT. * Fixes and LINT. * LINT and fixes. * LINT. * Move action_dist back into torch extra_action_out_fn and LINT. * Working SimpleQ learning cartpole on both torch AND tf. * Working Rainbow learning cartpole on tf. * Working Rainbow learning cartpole on tf. * WIP. * LINT. * LINT. * Update docs and add torch to APEX test. * LINT. * Fix. * LINT. * Fix. * Fix. * Fix and docstrings. * Fix broken RLlib tests in master. * Split BAZEL learning tests into cartpole and pendulum (reached the 60min barrier). * Fix error_outputs option in BAZEL for RLlib regression tests. * Fix. * Tune param-noise tests. * LINT. * Fix. * Fix. * test * test * test * Fix. * Fix. * WIP. * WIP. * WIP. * WIP. * LINT. * WIP. Co-authored-by: Eric Liang <ekhliang@gmail.com>	2020-04-06 11:56:16 -07:00
Sven Mika	2fb219a658	[Ray RLlib] Fix tree import (#7662 ) * Rollback. * Fix import tree error by adding meaningful error and replacing by tf.nest wherever possible. * LINT. * LINT. * Fix. * Fix log-likelihood test case failing on travis.	2020-03-22 13:51:24 -07:00
Sven Mika	876a1ba5bd	[RLlib] Issue 7421: can't convert cuda tensor to numpy in torch ppo. (#7445 )	2020-03-06 12:45:30 -08:00
Eric Liang	1660b52751	[rllib] Fix torch GPU / yaml load warning (#7278 ) * fix * safe load * reduce num buffer shardscZZ	2020-02-23 13:13:43 -08:00
Sven Mika	e2edca45d4	[RLlib] PPO torch memory leak and unnecessary torch.Tensor creation and gc'ing. (#7238 ) * Take out stats to analyze memory leak in torch PPO. * WIP * WIP * WIP * WIP * WIP * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * LINT. * Fix determine_tests_to_run.py. * minor change to re-test after determine_tests_to_run.py. * LINT. * update comments. * WIP * WIP * WIP * FIX. * Fix sequence_mask being dependent on torch being installed. * Fix strange ray-core tf-error in test_memory_scheduling test case. * Fix strange ray-core tf-error in test_memory_scheduling test case. * Fix strange ray-core tf-error in test_memory_scheduling test case. * Fix strange ray-core tf-error in test_memory_scheduling test case.	2020-02-22 11:02:31 -08:00
Sven Mika	6e1c3ea824	[RLlib] Exploration API (+EpsilonGreedy sub-class). (#6974 )	2020-02-10 15:22:07 -08:00
Sven	60d4d5e1aa	Remove future imports (#6724 ) * Remove all __future__ imports from RLlib. * Remove (object) again from tf_run_builder.py::TFRunBuilder. * Fix 2xLINT warnings. * Fix broken appo_policy import (must be appo_tf_policy) * Remove future imports from all other ray files (not just RLlib). * Remove future imports from all other ray files (not just RLlib). * Remove future import blocks that contain `unicode_literals` as well. Revert appo_tf_policy.py to appo_policy.py (belongs to another PR). * Add two empty lines before Schedule class. * Put back __future__ imports into determine_tests_to_run.py. Fails otherwise on a py2/print related error.	2020-01-09 00:15:48 -08:00
Sven	8b16847c02	Get utils ready for better Agent torch support. (#6561 )	2019-12-30 12:27:32 -08:00

36 commits