hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-09 12:56:46 -04:00

Author	SHA1	Message	Date
Rohan138	b9c9cc5946	[RLlib] Updated PettingZoo+RLlib tutorial; Removed pettingzoo example script (#19069 ) * Updated PettingZoo+RLlib tutorial Updated the tutorial and added link to the blog post by the PettingZoo team. * Ran linting * Converted link to tinyurl for linting * fixed line lengths * Decrease num_workers to 1 * Added comments * Decreased num_workers * Decreased timesteps * Increased num_workers * Update links and remove pettingzoo_env.py * remove pettingzoo.py script from tests Co-authored-by: sven1977 <svenmika1977@gmail.com>	2021-10-29 10:57:10 +02:00
Renos Zabounidis	41dd037ae9	[RLlib; Docs] Correcting documentation with respect to postprocess_trajectory (#19672 ) postprocess_trajectory is referred to incorrectly in the rllib-environments documentation. When defining a custom policy, a user never directly modifies Policy.postprocess_trajectory, they define postprocess_fn, which is in turn called by postprocess_trajectory.	2021-10-25 09:37:58 +02:00
Sven Mika	c95dea51e9	[RLlib] External env enhancements + more examples. (#16583 )	2021-06-23 09:09:01 +02:00
Sven Mika	391cdfae8c	[RLlib] Trajectory view API docs. (#12718 )	2020-12-30 17:32:21 -08:00
Benjamin Black	1999266bba	Updated pettingzoo env to acomidate api changes and fixes (#11873 ) * Updated pettingzoo env to acomidate api changes and fixes * fixed test failure * fixed linting issue * fixed test failure	2020-11-09 16:09:49 -08:00
Carlos Aguayo	86b1814e62	Update rllib-env.rst (#10891 ) Tiny typo	2020-09-19 00:29:56 -07:00
Benjamin Black	f2408b719c	Fixed PettingZooEnv (#10847 )	2020-09-17 11:28:42 -07:00
Stefan Schneider	6db55ca8db	[docs][rllib] Recommended workflow for training, saving, and testing (#9319 )	2020-07-09 15:47:10 -07:00
Sven Mika	4da0e542d5	[RLlib] DDPG and SAC eager support (preparation for tf2.x) (#9204 )	2020-07-08 16:12:20 +02:00
Benjamin Black	1425cdf834	Pettingzoo environment support (#9271 ) * added pettingzoo wrapper env and example * added docs, examples for pettingzoo env support * fixed pettingzoo env flake8, added test * fixed pettingzoo env import * fixed pettingzoo env import * fixed pettingzoo import issue * fixed pettingzoo test * fixed linting problem * fixed bad quotes * future proofed pettingzoo dependency * fixed ray init in pettingzoo env * lint * manual lint Co-authored-by: Eric Liang <ekhliang@gmail.com>	2020-07-06 21:32:26 -07:00
Sven Mika	d8a081a185	[RLlib] Unity3D integration (n Unity3D clients vs learning server). (#8590 )	2020-05-30 22:48:34 +02:00
Eric Liang	f48da50e1c	[rllib] observation function api for multi-agent (#8236 )	2020-05-04 22:13:49 -07:00
Eric Liang	5cebee68d6	[rllib] Add scaling guide to documentation, improve bandit docs (#7780 ) * update * reword * update * ms * multi node sgd * reorder * improve bandit docs * contrib * update * ref * improve refs * fix build * add pillow dep * add pil * update pil * pillow * remove false	2020-03-27 22:05:43 -07:00
hubcity	3d0a8662b3	#7246 - Fixing broken links (#7247 ) * #7246 - Fixing broken links * Apply suggestions from code review Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2020-03-25 21:46:13 -07:00
Eric Liang	9392cdbf74	[rllib] Add high-performance external application connector (#7641 )	2020-03-20 12:43:57 -07:00
Eric Liang	dd70720578	[rllib] Rename sample_batch_size => rollout_fragment_length (#7503 ) * bulk rename * deprecation warn * update doc * update fig * line length * rename * make pytest comptaible * fix test * fi sys * rename * wip * fix more * lint * update svg * comments * lint * fix use of batch steps	2020-03-14 12:05:04 -07:00
Sven Mika	510c850651	[RLlib] SAC add discrete action support. (#7320 ) * Exploration API (+EpsilonGreedy sub-class). * Exploration API (+EpsilonGreedy sub-class). * Cleanup/LINT. * Add `deterministic` to generic Trainer config (NOTE: this is still ignored by most Agents). * Add `error` option to deprecation_warning(). * WIP. * Bug fix: Get exploration-info for tf framework. Bug fix: Properly deprecate some DQN config keys. * WIP. * LINT. * WIP. * Split PerWorkerEpsilonGreedy out of EpsilonGreedy. Docstrings. * Fix bug in sampler.py in case Policy has self.exploration = None * Update rllib/agents/dqn/dqn.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * WIP. * Update rllib/agents/trainer.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * WIP. * Change requests. * LINT * In tune/utils/util.py::deep_update() Only keep deep_updat'ing if both original and value are dicts. If value is not a dict, set * Completely obsolete syn_replay_optimizer.py's parameters schedule_max_timesteps AND beta_annealing_fraction (replaced with prioritized_replay_beta_annealing_timesteps). * Update rllib/evaluation/worker_set.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Review fixes. * Fix default value for DQN's exploration spec. * LINT * Fix recursion bug (wrong parent c'tor). * Do not pass timestep to get_exploration_info. * Update tf_policy.py * Fix some remaining issues with test cases and remove more deprecated DQN/APEX exploration configs. * Bug fix tf-action-dist * DDPG incompatibility bug fix with new DQN exploration handling (which is imported by DDPG). * Switch off exploration when getting action probs from off-policy-estimator's policy. * LINT * Fix test_checkpoint_restore.py. * Deprecate all SAC exploration (unused) configs. * Properly use `model.last_output()` everywhere. Instead of `model._last_output`. * WIP. * Take out set_epsilon from multi-agent-env test (not needed, decays anyway). * WIP. * Trigger re-test (flaky checkpoint-restore test). * WIP. * WIP. * Add test case for deterministic action sampling in PPO. * bug fix. * Added deterministic test cases for different Agents. * Fix problem with TupleActions in dynamic-tf-policy. * Separate supported_spaces tests so they can be run separately for easier debugging. * LINT. * Fix autoregressive_action_dist.py test case. * Re-test. * Fix. * Remove duplicate py_test rule from bazel. * LINT. * WIP. * WIP. * SAC fix. * SAC fix. * WIP. * WIP. * WIP. * FIX 2 examples tests. * WIP. * WIP. * WIP. * WIP. * WIP. * Fix. * LINT. * Renamed test file. * WIP. * Add unittest.main. * Make action_dist_class mandatory. * fix * FIX. * WIP. * WIP. * Fix. * Fix. * Fix explorations test case (contextlib cannot find its own nullcontext??). * Force torch to be installed for QMIX. * LINT. * Fix determine_tests_to_run.py. * Fix determine_tests_to_run.py. * WIP * Add Random exploration component to tests (fixed issue with "static-graph randomness" via py_function). * Add Random exploration component to tests (fixed issue with "static-graph randomness" via py_function). * Rename some stuff. * Rename some stuff. * WIP. * update. * WIP. * Gumbel Softmax Dist. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP * WIP. * WIP. * Hypertune. * Hypertune. * Hypertune. * Lock-in. * Cleanup. * LINT. * Fix. * Update rllib/policy/eager_tf_policy.py Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com> * Update rllib/agents/sac/sac_policy.py Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com> * Update rllib/agents/sac/sac_policy.py Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com> * Update rllib/models/tf/tf_action_dist.py Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com> * Update rllib/models/tf/tf_action_dist.py Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com> * Fix items from review comments. * Add dm_tree to RLlib dependencies. * Add dm_tree to RLlib dependencies. * Fix DQN test cases ((Torch)Categorical). * Fix wrong pip install. Co-authored-by: Eric Liang <ekhliang@gmail.com> Co-authored-by: Kristian Hartikainen <kristian.hartikainen@gmail.com>	2020-03-06 10:37:12 -08:00
Eric Liang	249ca2cf9e	[rllib] add blog posts to examples list (#5762 ) * add blog post * remove * link	2019-09-23 10:42:21 -07:00
gehring	b520f6141e	[rllib] Adds eager support with a generic `TFEagerPolicy` class (#5436 )	2019-08-23 14:21:11 +08:00
Eric Liang	a1d2e17623	[rllib] Autoregressive action distributions (#5304 )	2019-08-10 14:05:12 -07:00
Eric Liang	592f313210	[rllib] Centralized critic / PPO example on TwoStepGame (#5392 )	2019-08-08 14:03:28 -07:00
Eric Liang	5d7afe8092	[rllib] Try moving RLlib to top level dir (#5324 )	2019-08-05 23:25:49 -07:00
Kristian Hartikainen	13fb9fe3db	[rllib] Feature/soft actor critic v2 (#5328 ) * Add base for Soft Actor-Critic * Pick changes from old SAC branch * Update sac.py * First implementation of sac model * Remove unnecessary SAC imports * Prune unnecessary noise and exploration code * Implement SAC model and use that in SAC policy * runs but doesn't learn * clear state * fix batch size * Add missing alpha grads and vars * -200 by 2k timesteps * doc * lazy squash * one file * ignore tfp * revert done	2019-08-01 23:37:36 -07:00
Eric Liang	20450a4e82	[rllib] Add rock paper scissors multi-agent example (#5336 )	2019-08-01 13:03:59 -07:00
Eric Liang	a45c61e19b	[rllib] Update concepts docs and add "Building Policies in Torch/TensorFlow" section (#4821 ) * wip * fix index * fix bugs * todo * add imports * note on get ph * note on get ph * rename to building custom algs * add rnn state info	2019-05-27 14:17:32 -07:00
Eric Liang	02583a8598	[rllib] Rename PolicyGraph => Policy, move from evaluation/ to policy/ (#4819 ) This implements some of the renames proposed in #4813 We leave behind backwards-compatibility aliases for *PolicyGraph and SampleBatch.	2019-05-20 16:46:05 -07:00
Eric Liang	3807fb505b	[rllib] TensorFlow 2 compatibility (#4802 )	2019-05-16 22:12:07 -07:00
Eric Liang	7d5ef6d99c	[rllib] Support continuous action distributions in IMPALA/APPO (#4771 )	2019-05-16 22:05:07 -07:00
Eric Liang	37208216ae	[rllib] Rename Agent to Trainer (#4556 )	2019-04-07 00:36:18 -07:00
bjg2	77005d1814	[rllib] Make batch timeout for remote workers tunable (#4435 )	2019-03-29 13:19:42 -07:00
Eric Liang	5b8eb475ce	[rllib] Allow None to be specified in multi-agent envs (#4464 ) * wip * check * doc update * Update hierarchical_training.py	2019-03-25 11:38:17 -07:00
Eric Liang	c7f74dbdc7	[rllib] Add async remote workers (#4253 )	2019-03-08 15:39:48 -08:00
Eric Liang	d9da183c7d	[rllib] Custom supervised loss API (#4083 )	2019-02-24 15:36:13 -08:00
Eric Liang	f8bef004da	[rllib] Improve error message for bad envs, add remote env docs (#4044 ) * commit * fix up rew	2019-02-18 01:28:19 -08:00
Eric Liang	fb73cedf70	[rllib] Add examples page, add hierarchical training example, delete SC2 examples (#3815 ) * wip * lint * wip * up * wip * update examples * wip * remove carla * update * improve envspec * link to custom * Update rllib-env.rst * update * fix * fn * lint * ds * ssd games * desc * fix up docs * fix	2019-01-29 21:06:09 -08:00
Eric Liang	04ec47cbd4	[rllib] annotate public vs developer vs private APIs (#3808 )	2019-01-23 21:27:26 -08:00
Michael Luo	16f7ca45e4	Appo (#3779 ) * Deleted old fork, updated new ray and moved PPO-impala to APPO in ppo folder * Deleted unneccesary vtrace.py file * Update pong-impala.yaml * Cleaned PPO Code * Update pong-impala.yaml * Update pong-impala.yaml * wip * new ifle * refactor * add vtrace off option * revert * support any space * docs * fix comment * remove kl * Update cartpole-appo-vtrace.yaml	2019-01-18 13:40:26 -08:00
Jones Wong	319c1340cb	[rllib] Develop MARWIL (#3635 ) * add marvil policy graph * fix typo * add offline optimizer and enable running marwil * fix loss function * add maintaining the moving average of advantage norm * use sync replay optimizer for unifying * remove offline optimizer and use sync replay optimizer * format by yapf * add imitation learning objective * fix according to eric's review * format by yapf * revise * add test data * marwil	2019-01-16 19:00:43 -08:00
Eric Liang	401e656b95	[rllib] Sync filters at end of iteration not start; hierarchical docs (#3769 )	2019-01-15 16:25:25 -08:00
Eric Liang	ca864faece	[rllib] Documentation for I/O API and multi-agent support / cleanup (#3650 )	2019-01-03 15:15:36 +08:00
Eric Liang	303883a3b6	[rllib] [rfc] add contrib module and guideline for merging (#3565 ) This adds guidelines for merging code into `rllib/contrib` vs `rllib/agents`. Also, clean up the agent import code to make registration easier.	2018-12-20 10:44:34 -08:00
Eric Liang	db0dee573e	[rllib] Q-Mix implementation (Q-Mix, VDN, IQN, and Ape-X variants) (#3548 )	2018-12-18 10:40:01 -08:00
Eric Liang	ce388a45cf	[rllib] Learner should not see clipped actions (#3496 )	2018-12-09 21:57:11 -08:00
Eric Liang	d864f299d7	[rllib] fixes from dogfooding multi-agent (#3456 ) auto wrap multi-agent dict and tuple spaces by keeping a policy -> preprocessor in the sampler add some Q-learning debug stats report min, max of custom metrics better errors	2018-12-05 23:31:45 -08:00
Eric Liang	ce355d13d4	[rllib] Allow envs to be auto-registered; add on_train_result callback with curriculum example (#3451 ) * train step and docs * debug * doc * doc * fix examples * fix code * integration test * fix * ... * space * instance * Update .travis.yml * fix test	2018-12-03 23:15:43 -08:00
Eric Liang	f0df97db6f	[rllib] example and docs on how to use parametric actions with DQN / PG algorithms (#3384 )	2018-11-27 23:35:19 -08:00
Eric Liang	8b76bab25c	[rllib] docs for td3 (#3381 ) * td3 doc * Update rllib-env.rst	2018-11-22 13:36:47 -08:00
Eric Liang	bd0dbde149	[rllib] Rename ServingEnv => ExternalEnv (#3302 )	2018-11-12 16:31:27 -08:00
Eric Liang	9dd3eedbac	[rllib] rollout.py should reduce num workers (#3263 ) ## What do these changes do? Don't create an excessive amount of workers for rollout.py, and also fix up the env wrapping to be consistent with the internal agent wrapper. ## Related issue number Closes #3260.	2018-11-09 12:29:16 -08:00
Eric Liang	369cb833fe	[rllib] Implement custom metrics (#3144 )	2018-11-03 18:48:32 -07:00

1 2

60 commits