hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-07 02:51:39 -05:00

Author	SHA1	Message	Date
Richard Liaw	b70f31339c	[sgd] Benchmark Fixes (#7553 ) * fix * fix	2020-03-11 13:08:27 -07:00
Markus Cozowicz	ea99063c10	added json schema to setup.py (#7554 )	2020-03-11 09:53:21 -07:00
mehrdadn	3b9caa98ba	Fix fate-sharing warning (#7545 ) * Fix kernel_fate_sharing being None instead of False * Remove fate-sharing warning Co-authored-by: Mehrdad <noreply@github.com>	2020-03-11 08:27:54 -07:00
Richard Liaw	fbac256982	[sgd] Add benchmarks (#7454 ) * Init fp16 * fp16 and schedulers * scheduler linking and fp16 * to fp16 * loss scaling and documentation * more documentation * add tests, refactor config * moredocs * more docs * fix logo, add test mode, add fp16 flag * fix tests * fix scheduler * fix apex * improve safety * fix tests * fix tests * remove pin memory default * rm * fix * Update doc/examples/doc_code/raysgd_torch_signatures.py * fix * migrate changes from other PR * ok thanks * pass * signatures * lint' * Update python/ray/experimental/sgd/pytorch/utils.py * Apply suggestions from code review Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com> * should address most comments * comments * fix this ci * first_pass * add overrides * override * fixing up operators * format * sgd * constants * rm * revert * save * failures * fixes * trainer * run test * operator * code * op * ok done * operator * sgd test fixes * ok * trainer * format * Apply suggestions from code review Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com> * Update doc/source/raysgd/raysgd_pytorch.rst * docstring * dcgan * doc * commits * nit * testing * revert * Start renaming pytorch to torch * Rename PyTorchTrainer to TorchTrainer * Rename PyTorch runners to Torch runners * Finish renaming API * Rename to torch in tests * Finish renaming docs + tests * Run format + fix DeprecationWarning * fix * move tests up * benchmarks * rename * remove some args * better metrics output * fix up the benchmark * benchmark-yaml * horovod-benchmark * benchmarks * Remove benchmark code for cleanups * benchmark-code * nits * benchmark yamls * benchmark yaml * ok * ok * ok * benchmark * nit * finish_bench * makedatacreator * relax * metrics * autosetsampler * profile * movements * OK * smoothen * fix * nitdocs * loss * envflag * comments * nit * format * visible * images * move_images * fix * rernder * rrender * rest * multgpu * fix * nit * finish * extrra * setup * revert Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Co-authored-by: Maksim Smolin <maximsmol@gmail.com>	2020-03-11 01:09:08 -07:00
Markus Cozowicz	49439611f1	[autoscaler] Replace cluster yaml validation with json schema v… (#7261 ) * replace manual cluster yaml validation with json schema - improved error message - support for intellisense in VSCode (or other IDEs) - run linting - moved schema to ray/autoscaler - fixed typo - remove importlib dependency * Update python/ray/autoscaler/autoscaler.py * read * restrict allowed properties * added unit test for invalid yaml added ray[test] package (remove pytest from default dependencies) * updated autoscaler test to use ValidationError exception * add missing dependency * added pytest * replace manual cluster yaml validation with json schema - improved error message - support for intellisense in VSCode (or other IDEs) - run linting - moved schema to ray/autoscaler - fixed typo - remove importlib dependency * Update python/ray/autoscaler/autoscaler.py * read * restrict allowed properties * added unit test for invalid yaml added ray[test] package (remove pytest from default dependencies) * updated autoscaler test to use ValidationError exception * add missing dependency * added pytest * removed parameterized dependency reverted ray[test] intro * removed parameterized * fix_tests * format Co-authored-by: Ubuntu <marcozo@mc-ray-jumpbox.chcbtljllnieveqhw3e4c1ducc.xx.internal.cloudapp.net> Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2020-03-10 18:58:55 -07:00
Richard Liaw	6163b21458	[raysgd] Better user errors! (#7546 ) * format * callable * Update python/ray/util/sgd/torch/torch_trainer.py Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com> * Update python/ray/util/sgd/torch/torch_trainer.py Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com> * data * torchtrainer * num_rep Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>	2020-03-10 18:58:19 -07:00
Edward Oakes	7b609ca211	Remove instances of 'raise Exception' (#7523 )	2020-03-10 17:51:22 -07:00
Stephanie Wang	fdb528514b	[core] Ref counting for actor handles (#7434 ) * tmp * Move Exit handler into CoreWorker, exit once owner's ref count goes to 0 * fix build * Remove __ray_terminate__ and add test case for distributed ref counting * lint * Remove unused * Fixes for detached actor, duplicate actor handles * Remove unused * Remove creation return ID * Remove ObjectIDs from python, set references in CoreWorker * Fix crash * Fix memory crash * Fix tests * fix * fixes * fix tests * fix java build * fix build * fix * check status * check status	2020-03-10 17:45:07 -07:00
Richard Liaw	d192ef0611	[raysgd] Cleanup User API (#7384 ) * Init fp16 * fp16 and schedulers * scheduler linking and fp16 * to fp16 * loss scaling and documentation * more documentation * add tests, refactor config * moredocs * more docs * fix logo, add test mode, add fp16 flag * fix tests * fix scheduler * fix apex * improve safety * fix tests * fix tests * remove pin memory default * rm * fix * Update doc/examples/doc_code/raysgd_torch_signatures.py * fix * migrate changes from other PR * ok thanks * pass * signatures * lint' * Update python/ray/experimental/sgd/pytorch/utils.py * Apply suggestions from code review Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com> * should address most comments * comments * fix this ci * first_pass * add overrides * override * fixing up operators * format * sgd * constants * rm * revert * save * failures * fixes * trainer * run test * operator * code * op * ok done * operator * sgd test fixes * ok * trainer * format * Apply suggestions from code review Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com> * Update doc/source/raysgd/raysgd_pytorch.rst * docstring * dcgan * doc * commits * nit * testing * revert * Start renaming pytorch to torch * Rename PyTorchTrainer to TorchTrainer * Rename PyTorch runners to Torch runners * Finish renaming API * Rename to torch in tests * Finish renaming docs + tests * Run format + fix DeprecationWarning * fix * move tests up * benchmarks * rename * remove some args * better metrics output * fix up the benchmark * benchmark-yaml * horovod-benchmark * benchmarks * Remove benchmark code for cleanups * makedatacreator * relax * metrics * autosetsampler * profile * movements * OK * smoothen * fix * nitdocs * loss * comments * fix * fix * runner_tests * codes * example * fix_test * fix * tests Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Co-authored-by: Maksim Smolin <maximsmol@gmail.com>	2020-03-10 08:41:42 -07:00
Anthony Yu	89ec4adb72	[tune] Dragonfly Optimizer (#5955 ) * Add sample example * Copy relevant lines of ask from inherited Optimizer * Ignore strategy * Additional changes * Add DragonflySearch for tune connector for Dragonfly * Add example and fix small errors * lint * Remove skopt references * Update example based off of Dragonfly changes * Edit example for final Dragonfly edits * Formatting and documentation edits * Add documentation and add to test pipeline * Address PR comments * Fix Jenkins test * Adjust Dragonfly to PR#7366 * Lint * fix_tests Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2020-03-10 08:40:36 -07:00
Eric Liang	90e23a5c43	[iterators] Add duplicate() call and fix broken test case (#7510 )	2020-03-09 17:18:52 -07:00
Edward Oakes	4ab80eafb9	Deprecate use_pickle flag (#7474 )	2020-03-09 16:03:56 -07:00
Edward Oakes	0c254295b0	Remove experimental.signal API (#7477 ) * Remove experimental.signal API * fix test	2020-03-09 16:03:36 -07:00
Ujval Misra	023d4c02a9	[tune] Prevent deletion of checkpoint from user-initiated resto… (#7501 ) * Fix restore bug * Add test * Lint * Indent	2020-03-09 15:53:10 -07:00
Edward Oakes	b4e2d5317e	Remove experimental.NoReturn (#7475 )	2020-03-09 11:09:36 -07:00
Stephanie Wang	95bb0c5357	Upgrade plasma to latest version, use synchronous Seal (#7470 ) * Upgrade arrow to master * fix build * todo * lint * Fix hanging test	2020-03-09 10:30:44 -07:00
Eric Liang	a644060daa	[rllib] First pass at pipeline implementation of DQN (#7433 ) * wip iters * add test * speed up * update docs * document it * support serial sampling * add test * spacing * annotate it * update * rename to pipeline * comment * iter2 wip * update * update * context test * update * fix * fix * a3c pipeline * doc * update * move timer * comment * add piepline test * fix * clean up * document * iter s * wip dqn * wip * wip * metrics * metrics rename * metrics ctx * wip * constants * add todo * suppport .union * wip * support union * remove prints * add todo * remove auto timer * fix up * fix pipeline test * typing * fix breakage * remove bad assert * wip * fix multiagent example * fixapply * update a3c * remove a2c pl * 0 workers * wip * wip * share metrics * wip * wip * doc * fix weight sync and global var updates * mode * fix * fix * doc * fix	2020-03-07 14:47:58 -08:00
Landcold7	beb9b02dbd	Add numba test (#7298 ) (#7487 )	2020-03-07 11:12:25 -08:00
Richard Liaw	115468de2c	[tune] Repeated evals (#7366 ) * easyrepeat * done * suggest * doc * ok * commit * Apply suggestions from code review Co-Authored-By: Ujval Misra <misraujval@gmail.com> * Apply suggestions from code review Co-Authored-By: Ujval Misra <misraujval@gmail.com> * Apply suggestions from code review * ok * docs Co-authored-by: Ujval Misra <misraujval@gmail.com>	2020-03-07 11:08:23 -08:00
mehrdadn	a8bda9b551	Fix incorrect handling of command-lines (#7439 )	2020-03-06 15:51:49 -08:00
Sven Mika	510c850651	[RLlib] SAC add discrete action support. (#7320 ) * Exploration API (+EpsilonGreedy sub-class). * Exploration API (+EpsilonGreedy sub-class). * Cleanup/LINT. * Add `deterministic` to generic Trainer config (NOTE: this is still ignored by most Agents). * Add `error` option to deprecation_warning(). * WIP. * Bug fix: Get exploration-info for tf framework. Bug fix: Properly deprecate some DQN config keys. * WIP. * LINT. * WIP. * Split PerWorkerEpsilonGreedy out of EpsilonGreedy. Docstrings. * Fix bug in sampler.py in case Policy has self.exploration = None * Update rllib/agents/dqn/dqn.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * WIP. * Update rllib/agents/trainer.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * WIP. * Change requests. * LINT * In tune/utils/util.py::deep_update() Only keep deep_updat'ing if both original and value are dicts. If value is not a dict, set * Completely obsolete syn_replay_optimizer.py's parameters schedule_max_timesteps AND beta_annealing_fraction (replaced with prioritized_replay_beta_annealing_timesteps). * Update rllib/evaluation/worker_set.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Review fixes. * Fix default value for DQN's exploration spec. * LINT * Fix recursion bug (wrong parent c'tor). * Do not pass timestep to get_exploration_info. * Update tf_policy.py * Fix some remaining issues with test cases and remove more deprecated DQN/APEX exploration configs. * Bug fix tf-action-dist * DDPG incompatibility bug fix with new DQN exploration handling (which is imported by DDPG). * Switch off exploration when getting action probs from off-policy-estimator's policy. * LINT * Fix test_checkpoint_restore.py. * Deprecate all SAC exploration (unused) configs. * Properly use `model.last_output()` everywhere. Instead of `model._last_output`. * WIP. * Take out set_epsilon from multi-agent-env test (not needed, decays anyway). * WIP. * Trigger re-test (flaky checkpoint-restore test). * WIP. * WIP. * Add test case for deterministic action sampling in PPO. * bug fix. * Added deterministic test cases for different Agents. * Fix problem with TupleActions in dynamic-tf-policy. * Separate supported_spaces tests so they can be run separately for easier debugging. * LINT. * Fix autoregressive_action_dist.py test case. * Re-test. * Fix. * Remove duplicate py_test rule from bazel. * LINT. * WIP. * WIP. * SAC fix. * SAC fix. * WIP. * WIP. * WIP. * FIX 2 examples tests. * WIP. * WIP. * WIP. * WIP. * WIP. * Fix. * LINT. * Renamed test file. * WIP. * Add unittest.main. * Make action_dist_class mandatory. * fix * FIX. * WIP. * WIP. * Fix. * Fix. * Fix explorations test case (contextlib cannot find its own nullcontext??). * Force torch to be installed for QMIX. * LINT. * Fix determine_tests_to_run.py. * Fix determine_tests_to_run.py. * WIP * Add Random exploration component to tests (fixed issue with "static-graph randomness" via py_function). * Add Random exploration component to tests (fixed issue with "static-graph randomness" via py_function). * Rename some stuff. * Rename some stuff. * WIP. * update. * WIP. * Gumbel Softmax Dist. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP * WIP. * WIP. * Hypertune. * Hypertune. * Hypertune. * Lock-in. * Cleanup. * LINT. * Fix. * Update rllib/policy/eager_tf_policy.py Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com> * Update rllib/agents/sac/sac_policy.py Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com> * Update rllib/agents/sac/sac_policy.py Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com> * Update rllib/models/tf/tf_action_dist.py Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com> * Update rllib/models/tf/tf_action_dist.py Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com> * Fix items from review comments. * Add dm_tree to RLlib dependencies. * Add dm_tree to RLlib dependencies. * Fix DQN test cases ((Torch)Categorical). * Fix wrong pip install. Co-authored-by: Eric Liang <ekhliang@gmail.com> Co-authored-by: Kristian Hartikainen <kristian.hartikainen@gmail.com>	2020-03-06 10:37:12 -08:00
Eric Liang	476b5c6196	[Parallel Iterators] Allow for operator chaining after repartition (#7268 ) * bug fix repartition * change add_transform from private to inner * formatting * addressing comments * formatting	2020-03-04 14:42:52 -08:00
Philipp Moritz	de0c99876e	Fix fate_share not being passed to Redis shards (#7432 )	2020-03-04 11:29:45 -08:00
Edward Oakes	0abcca258f	Add entries to in-memory store on Put() (#7085 )	2020-03-04 10:17:27 -08:00
Philipp Moritz	fb1c1e2d27	Revert "Keep cloudpickle up-to-date with the upstream (#7406 )" (#7437 ) This reverts commit `f6883bf725`.	2020-03-03 18:36:15 -08:00
Maksim Smolin	3a134c7224	[RaySGD] Rename PyTorch API endpoints to start with Torch (#7425 ) * Start renaming pytorch to torch * Rename PyTorchTrainer to TorchTrainer * Rename PyTorch runners to Torch runners * Finish renaming API * Rename to torch in tests * Finish renaming docs + tests * Run format + fix DeprecationWarning * fix * move tests up * rename Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2020-03-03 16:44:42 -08:00
Siyuan (Ryans) Zhuang	f6883bf725	Keep cloudpickle up-to-date with the upstream (#7406 )	2020-03-03 13:52:54 -08:00
Edward Oakes	b0bf5450c2	Fix flaky multiprocessing tests (#7413 )	2020-03-03 15:07:59 -06:00
ijrsvt	fb76092d75	Re-route asyncio plasma code path through raylet instead of direct plasma connection (#7234 )	2020-03-03 15:43:46 -05:00
Edward Oakes	04ec599441	Use ray.kill() in multiprocessing.Pool (#7409 )	2020-03-03 12:49:13 -06:00
Allen	b74eb5fce6	Capture output for commands run by the autoscaler (#7381 )	2020-03-03 10:19:21 -08:00
mehrdadn	4d42664b2a	Use prctl(PR_SET_PDEATHSIG) on Linux instead of reaper (#7150 )	2020-03-03 11:45:42 -06:00
ijrsvt	584645cc7d	Fix Experimental Async API (#7391 )	2020-03-02 22:24:20 -06:00
Edward Oakes	580b017b43	Fix flaky global GC tests (#7407 )	2020-03-02 21:03:01 -06:00
Edward Oakes	9e9f1962c7	Enable test_actor_pool in CI (#7405 )	2020-03-02 20:24:36 -06:00
Edward Oakes	2b6f00724a	Enable test_joblib in CI (#7404 )	2020-03-02 20:03:27 -06:00
Edward Oakes	d69fe54f6d	Temporarily skip testEndToEndReporting (#7402 )	2020-03-02 18:27:34 -06:00
Siyuan (Ryans) Zhuang	0792b5cb93	Fix the numpy ndarray subclass serialization bug (#7392 )	2020-03-01 23:05:59 -08:00
Richard Liaw	48cdca843f	[raysgd] Custom training operator (#7211 )	2020-03-01 21:22:48 -08:00
Eric Liang	3c6b94f3f5	[rllib] Enable performance metrics reporting for RLlib pipelines, add A3C (#7299 )	2020-02-28 16:44:17 -08:00
Richard Liaw	fb73d51d4d	[tune] fix hparams for tbx (#7312 ) * fix * test_hist * remove unnecessary value check * pbt * queue * skip_for_now * Apply suggestions from code review	2020-02-28 11:51:56 -08:00
Richard Liaw	ca40b0fcc6	[tune][minor] Avoid throwing error when gpu check fails (#7362 )	2020-02-28 11:32:44 -08:00
Edward Oakes	f321eaec9b	Working but not passing test (#7358 )	2020-02-28 12:57:28 -06:00
mehrdadn	fb0bc7b947	Partially revert "[Core/RLlib] Move `log_once` from rllib to ray.util. (#7273 )" (#7361 ) This partially reverts commit `357232d124`. The addition of python/__init__.py broke the build on Windows. However, this is difficult to notice because Bazel doesn't seem to notice this dependency. You first have to go to a commit that fails on this issue, and then try to re-build this commit, so that Bazel actually performs a rebuild. A useful command-line for triggering the exact build i: bazel build --compile_one_dependency //:python/ray/_raylet.pyx	2020-02-28 10:27:45 -08:00
Edward Oakes	93fe4b0b58	Change actor.__ray_kill__() to ray.kill(actor) (#7360 )	2020-02-28 11:55:13 -06:00
Richard Liaw	3fc162f93c	[tune] Add Unit Test for nested PBT + Jenkins (#7324 )	2020-02-27 18:17:11 -08:00
mehrdadn	8730996682	Windows changes (#7315 )	2020-02-27 15:14:10 -08:00
Edward Oakes	ced062319d	Decrease test_object_manager put size to avoid OOMs in CI (#7355 )	2020-02-27 11:08:10 -08:00
Edward Oakes	cbf55d69a6	Remove serialized from_random object ids in tests (#7340 )	2020-02-27 11:04:06 -08:00
Edward Oakes	bd9411f849	Call TriggerGlobalGC when the plasma store is full (#7337 )	2020-02-27 11:01:49 -08:00

1 2 3 4 5 ...

2220 commits