hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-07 02:51:39 -05:00

Author	SHA1	Message	Date
Sven Mika	6c2b9a4cfa	[RLlib] Remove tf.py_function from all Schedule classes (not differentiable and causes other bugs in MA setups). (#8304 ) Remove tf.py_function from all Schedule classes (not differentiable and causes other bugs in MA setups). (#8304)	2020-05-04 23:53:38 +02:00
Sven Mika	a00144f746	[RLlib] Fix issue 8135 (DDPG inf actions when using [-inf,inf] action space). (#8302 )	2020-05-04 22:27:30 +02:00
Sven Mika	b95e28faea	[RLlib] APEX_DDPG (PyTorch) test case and docs. (#8288 ) APEX_DDPG (PyTorch) test case and docs.	2020-05-04 09:36:27 +02:00
Sven Mika	166bb5d690	[RLlib] IMPALA PyTorch (#8287 ) This PR adds an IMPALA PyTorch implementation. - adds compilation tests for LSTM and w/o LSTM. - adds learning test for CartPole.	2020-05-03 13:44:25 +02:00
Sven Mika	76e1a4df9e	Fix TD3 torch via GaussianNoise torch bug. (#8276 )	2020-05-02 08:12:21 +02:00
Sven Mika	42991d723f	[RLlib] rllib/examples folder restructuring (#8250 ) Cleans up of the rllib/examples folder by moving all example Envs into rllibexamples/env (so they can be used by other scripts and tests as well).	2020-05-01 22:59:34 +02:00
Sven Mika	eea75ac623	[RLlib] Beta distribution. (#8229 )	2020-04-30 11:09:33 -07:00
Eric Liang	baadbdf8d4	[rllib] Execute PPO using training workflow (#8206 ) * wip * add kl * kl * works now * doc update * reorg * add ddppo * add stats * fix fetch * comment * fix learner stat regression * test fixes * fix test	2020-04-30 01:18:09 -07:00
Sven Mika	bf25aee392	[RLlib] Deprecate all Model(v1) usage. (#8146 ) Deprecate all Model(v1) usage.	2020-04-29 12:12:59 +02:00
Sven Mika	eb91619175	Fix release 0.8.5 tests for PPO torch Breakout. (#8226 )	2020-04-29 10:36:41 +02:00
Sven Mika	1775e89f26	[RLlib] Remove TupleActions and support arbitrarily nested action spaces. (#8143 ) Deprecate TupleActions and support arbitrarily nested action spaces. Closes issue #8143.	2020-04-28 14:59:16 +02:00
Sven Mika	7ec2223c84	[RLlib] DDPG PyTorch actor-model was missing sigmoid layer (#8188 ) Fix DDPG PyTorch (missing sigmoid layer (to squash action outputs) after deterministic action outputs).	2020-04-26 23:08:13 +02:00
Eric Liang	2298f6fb40	[rllib] Port DQN/Ape-X to training workflow api (#8077 )	2020-04-23 12:39:19 -07:00
Sven Mika	499ad5fbe4	[RLlib] PyTorch version of APPO. (#8120 ) - Translate all vtrace functionality to torch and added torch to the framework_iterator-loop in all existing vtrace test cases. - Add learning test cases for APPO torch (both w/ and w/o v-trace). - Add quick compilation tests for APPO (tf and torch, v-trace and no v-trace).	2020-04-23 09:11:12 +02:00
Sven Mika	d15609ba2a	[RLlib] PyTorch version of ARS (Augmented Random Search). (#8106 ) This PR implements a PyTorch version of RLlib's ARS algorithm using RLlib's functional algo builder API. It also adds a regression test for ARS (torch) on CartPole.	2020-04-21 09:47:52 +02:00
Sven Mika	3812bfedda	[RLlib] PyTorch version of ES (Evolution Strategies). (#8104 ) PyTorch version of Evolution Strategies (ES) Algo.	2020-04-20 21:47:28 +02:00
Sven Mika	d6cb7d865e	[RLlib] Torch DQN (APEX) TD-Error/prio. replay fixes. (#8082 ) PyTorch APEX_DQN with Prioritized Replay enabled would not work properly due to the td_error not being retrievable by the AsyncReplayOptimizer.	2020-04-20 10:03:25 +02:00
Sven Mika	f7e4dae852	[RLlib] DQN and SAC Atari benchmark fixes. (#7962 ) * Add Atari SAC-discrete (learning MsPacman in 40k ts up to 780 rewards). * SAC loss function test case fix.	2020-04-17 08:49:15 +02:00
roireshef	dbcad35022	[RLlib] Added DefaultCallbacks which replaces old callbacks dict interface (#6972 )	2020-04-16 16:06:42 -07:00
Sven Mika	d0fab84e4d	[RLlib] DDPG PyTorch version. (#7953 ) The DDPG/TD3 algorithms currently do not have a PyTorch implementation. This PR adds PyTorch support for DDPG/TD3 to RLlib. This PR: - Depends on the re-factor PR for DDPG (Functional Algorithm API). - Adds learning regression tests for the PyTorch version of DDPG and a DDPG (torch) - Updates the documentation to reflect that DDPG and TD3 now support PyTorch. * Learning Pendulum-v0 on torch version (same config as tf). Wall time a little slower (~20% than tf). * Fix GPU target model problem.	2020-04-16 10:20:01 +02:00
Xianyang Liu	e1d3f7eba6	[rllib]Add config for rllib to support set python environments (#8026 ) * support set extra python environments * wrap value with str * Apply suggestions from code review Co-Authored-By: Eric Liang <ekhliang@gmail.com> * addresses comments * fix lint errors * remove unrelated changes due to format.sh * remove unrelated changes due to format.sh Co-authored-by: Eric Liang <ekhliang@gmail.com>	2020-04-16 01:13:45 -07:00
Sven Mika	428516056a	[RLlib] SAC Torch (incl. Atari learning) (#7984 ) * Policy-classes cleanup and torch/tf unification. - Make Policy abstract. - Add `action_dist` to call to `extra_action_out_fn` (necessary for PPO torch). - Move some methods and vars to base Policy (from TFPolicy): num_state_tensors, ACTION_PROB, ACTION_LOGP and some more. * Fix `clip_action` import from Policy (should probably be moved into utils altogether). * - Move `is_recurrent()` and `num_state_tensors()` into TFPolicy (from DynamicTFPolicy). - Add config to all Policy c'tor calls (as 3rd arg after obs and action spaces). * Add `config` to c'tor call to TFPolicy. * Add missing `config` to c'tor call to TFPolicy in marvil_policy.py. * Fix test_rollout_worker.py::MockPolicy and BadPolicy classes (Policy base class is now abstract). * Fix LINT errors in Policy classes. * Implement StatefulPolicy abstract methods in test cases: test_multi_agent_env.py. * policy.py LINT errors. * Create a simple TestPolicy to sub-class from when testing Policies (reduces code in some test cases). * policy.py - Remove abstractmethod from `apply_gradients` and `compute_gradients` (these are not required iff `learn_on_batch` implemented). - Fix docstring of `num_state_tensors`. * Make QMIX torch Policy a child of TorchPolicy (instead of Policy). * QMixPolicy add empty implementations of abstract Policy methods. * Store Policy's config in self.config in base Policy c'tor. * - Make only compute_actions in base Policy's an abstractmethod and provide pass implementation to all other methods if not defined. - Fix state_batches=None (most Policies don't have internal states). * Cartpole tf learning. * Cartpole tf AND torch learning (in ~ same ts). * Cartpole tf AND torch learning (in ~ same ts). 2 * Cartpole tf (torch syntax-broken) learning (in ~ same ts). 3 * Cartpole tf AND torch learning (in ~ same ts). 4 * Cartpole tf AND torch learning (in ~ same ts). 5 * Cartpole tf AND torch learning (in ~ same ts). 6 * Cartpole tf AND torch learning (in ~ same ts). Pendulum tf learning. * WIP. * WIP. * SAC torch learning Pendulum. * WIP. * SAC torch and tf learning Pendulum and Cartpole after cleanup. * WIP. * LINT. * LINT. * SAC: Move policy.target_model to policy.device as well. * Fixes and cleanup. * Fix data-format of tf keras Conv2d layers (broken for some tf-versions which have data_format="channels_first" as default). * Fixes and LINT. * Fixes and LINT. * Fix and LINT. * WIP. * Test fixes and LINT. * Fixes and LINT. Co-authored-by: Sven Mika <sven@Svens-MacBook-Pro.local>	2020-04-15 13:25:16 +02:00
Eric Liang	31b40b00f6	[rllib] Pull out experimental dsl into rllib.execution module, add initial unit tests (#7958 )	2020-04-10 00:56:08 -07:00
Sven Mika	1b31c11806	[RLlib] DDPG re-factor to fit into RLlib's functional algorithm builder API. (#7934 )	2020-04-09 14:04:21 -07:00
Eric Liang	e8c19aba41	[rllib] Add test case that we don't have a hard torch dependency (#7926 )	2020-04-07 18:07:39 -07:00
Sven Mika	81314143eb	[RLlib] Use framework_iterator (add torch/eager/tf) to PPO and PG tests. (#7915 )	2020-04-07 12:40:34 -07:00
Sven Mika	c2cb5c2214	[RLlib] MARWIL torch. (#7836 ) * WIP. * WIP. * LINT. * Fix MARWIL so it can run with eager-mode. * LINT.	2020-04-06 16:38:50 -07:00
Sven Mika	22ccc43670	[RLlib] DQN torch version. (#7597 ) * Fix. * Rollback. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * Fix. * Fix. * Fix. * Fix. * Fix. * WIP. * WIP. * Fix. * Test case fixes. * Test case fixes and LINT. * Test case fixes and LINT. * Rollback. * WIP. * WIP. * Test case fixes. * Fix. * Fix. * Fix. * Add regression test for DQN w/ param noise. * Fixes and LINT. * Fixes and LINT. * Fixes and LINT. * Fixes and LINT. * Fixes and LINT. * Comment * Regression test case. * WIP. * WIP. * LINT. * LINT. * WIP. * Fix. * Fix. * Fix. * LINT. * Fix (SAC does currently not support eager). * Fix. * WIP. * LINT. * Update rllib/evaluation/sampler.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Update rllib/evaluation/sampler.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Update rllib/utils/exploration/exploration.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Update rllib/utils/exploration/exploration.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * WIP. * WIP. * Fix. * LINT. * LINT. * Fix and LINT. * WIP. * WIP. * WIP. * WIP. * Fix. * LINT. * Fix. * Fix and LINT. * Update rllib/utils/exploration/exploration.py * Update rllib/policy/dynamic_tf_policy.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Update rllib/policy/dynamic_tf_policy.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Update rllib/policy/dynamic_tf_policy.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Fixes. * WIP. * LINT. * Fixes and LINT. * LINT and fixes. * LINT. * Move action_dist back into torch extra_action_out_fn and LINT. * Working SimpleQ learning cartpole on both torch AND tf. * Working Rainbow learning cartpole on tf. * Working Rainbow learning cartpole on tf. * WIP. * LINT. * LINT. * Update docs and add torch to APEX test. * LINT. * Fix. * LINT. * Fix. * Fix. * Fix and docstrings. * Fix broken RLlib tests in master. * Split BAZEL learning tests into cartpole and pendulum (reached the 60min barrier). * Fix error_outputs option in BAZEL for RLlib regression tests. * Fix. * Tune param-noise tests. * LINT. * Fix. * Fix. * test * test * test * Fix. * Fix. * WIP. * WIP. * WIP. * WIP. * LINT. * WIP. Co-authored-by: Eric Liang <ekhliang@gmail.com>	2020-04-06 11:56:16 -07:00
Sven Mika	82c2d9faba	[RLlib] Fix broken RLlib tests in master. (#7894 )	2020-04-05 09:34:23 -07:00
Sven Mika	1d4823c0ec	[RLlib] Add testing framework_iterator. (#7852 ) * Add testing framework_iterator. * LINT. * WIP. * Fix and LINT. * LINT fix.	2020-04-03 12:24:25 -07:00
Sven Mika	bb6c675231	[RLlib] Bug fix: Copy `is_exploring` placeholder for multi-GPU tower generation. (#7846 )	2020-04-03 10:44:58 -07:00
Sven Mika	5537fe13b0	[RLlib] Exploration API: ParamNoise Integration into DQN; working example/test cases. (#7814 )	2020-04-03 10:44:25 -07:00
Sven Mika	7b08db9f8c	[RLlib] Remove all instances of tf.contrib.layers. ... from RLlib code (deprecated). (#7851 )	2020-04-01 18:03:14 -07:00
Sven Mika	e153e3179f	[RLlib] Exploration API: Policy changes needed for forward pass noisifications. (#7798 ) * Rollback. * WIP. * WIP. * LINT. * WIP. * Fix. * Fix. * Fix. * LINT. * Fix (SAC does currently not support eager). * Fix. * WIP. * LINT. * Update rllib/evaluation/sampler.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Update rllib/evaluation/sampler.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Update rllib/utils/exploration/exploration.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Update rllib/utils/exploration/exploration.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * WIP. * WIP. * Fix. * LINT. * LINT. * Fix and LINT. * WIP. * WIP. * WIP. * WIP. * Fix. * LINT. * Fix. * Fix and LINT. * Update rllib/utils/exploration/exploration.py * Update rllib/policy/dynamic_tf_policy.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Update rllib/policy/dynamic_tf_policy.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Update rllib/policy/dynamic_tf_policy.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Fixes. * LINT. * WIP. Co-authored-by: Eric Liang <ekhliang@gmail.com>	2020-04-01 00:43:21 -07:00
Sven Mika	66df8b8c35	[RLlib] Working/learning example: PPO + torch + LSTM. (#7797 )	2020-03-31 22:00:28 -07:00
Sven Mika	e4bd5db4d8	[RLlib] Minimal ParamNoise PR. (#7772 )	2020-03-28 16:16:30 -07:00
Sven Mika	93b5c38b7d	[RLlib] Noisy layers in DQN throw different errors (issue #7635 ). (#7750 ) * Rollback. * Fix issue 7635. * Fix issue 7635. * LINT and bug fix.	2020-03-26 22:08:34 -07:00
Sven Mika	bcf963a53b	[RLlib] Bug default policy overrides torch policy. (#7756 ) * Rollback. * Bug fix!	2020-03-26 10:03:20 -07:00
Sven Mika	1138f2ebed	[RLlib] Issue 7046 cannot restore keras model from h5 file. (#7482 )	2020-03-23 12:19:30 -07:00
Robert Nishihara	ee8c9ff732	Remove six and cloudpickle from setup.py. (#7694 )	2020-03-23 11:42:05 -07:00
Eric Liang	288933ec6b	[rllib] Fix shared metrics context in parallel iterators (#7666 ) * debug * build * update * wip * wpi * update * recurisve sync * comment * stream * fix * Update .travis.yml	2020-03-22 14:15:01 -07:00
Eric Liang	7ebc6783e4	[rllib] Add back get_policy_output method for SAC model (#7604 )	2020-03-20 12:44:04 -07:00
Eric Liang	9392cdbf74	[rllib] Add high-performance external application connector (#7641 )	2020-03-20 12:43:57 -07:00
Eric Liang	dd70720578	[rllib] Rename sample_batch_size => rollout_fragment_length (#7503 ) * bulk rename * deprecation warn * update doc * update fig * line length * rename * make pytest comptaible * fix test * fi sys * rename * wip * fix more * lint * update svg * comments * lint * fix use of batch steps	2020-03-14 12:05:04 -07:00
Eric Liang	52cf77f5a9	[rllib] SAC no_done_at_end should default to False (#7594 ) * update * update doc * stochastic * cleanu	2020-03-14 11:16:54 -07:00
Eric Liang	c3a8ba399f	[rllib] Enable distributed exec api for A2C, A3C, PG by default (#7580 )	2020-03-13 18:48:41 -07:00
Eric Liang	f5d12a958b	[rllib] Port Ape-X to distributed execution API (#7497 )	2020-03-12 00:54:08 -07:00
Sven Mika	20ef4a8603	[RLlib] Cleanup/unify all test cases. (#7533 )	2020-03-11 20:39:47 -07:00
Sven Mika	dded5b6d22	[RLlib] ES `env_config` is not a EnvContext object (e.g. does not contain `worker_index`). (#7560 )	2020-03-11 20:33:20 -07:00
Eric Liang	be48e1964b	[rllib] Fix per-worker exploration in Ape-X; make more kwargs required for future safety (#7504 ) * fix sched * lintc * lint * fix * add unit test * fix * format * fix test * fix test	2020-03-10 11:14:14 -07:00

1 2 3 4 5

223 commits