hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Sven Mika	80d314ae5e	[RLlib] Add all agents to `rllib rollout` tests. (#7534 )	2020-03-12 11:02:51 -07:00
ZhuSenlin	b663bc6d67	Use gcs server to replace raylet monitor when RAY_GCS_SERVICE_ENABLED=true (#7166 )	2020-03-12 22:13:56 +08:00
fangfengbin	428fb79b27	Fix streaming compile bug (#7577 ) Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>	2020-03-12 17:26:45 +08:00
Eric Liang	f5d12a958b	[rllib] Port Ape-X to distributed execution API (#7497 )	2020-03-12 00:54:08 -07:00
fangfengbin	4c834b9d68	Fix the issue that gcs service client ignores error status code (#7539 ) * add gcs reply status * rebase master * use macro to simplify * convert status in gcs rpc client * define a Status message in probobuf Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>	2020-03-12 15:08:29 +08:00
Sven Mika	20ef4a8603	[RLlib] Cleanup/unify all test cases. (#7533 )	2020-03-11 20:39:47 -07:00
Sven Mika	dded5b6d22	[RLlib] ES `env_config` is not a EnvContext object (e.g. does not contain `worker_index`). (#7560 )	2020-03-11 20:33:20 -07:00
Sven Mika	bc120730e5	[RLlib] PPO(torch) on CartPole not tuned well enough for consistent learning (#7556 )	2020-03-11 20:31:27 -07:00
Kai Yang	932a749fa9	Fix the `java_worker_options` parameter (#7537 ) * fix Java CI * Minor fix * move json.loads out of build_java_worker_command * lint * fix cross language test	2020-03-12 10:44:23 +08:00
Markus Cozowicz	ba1b081477	Azure Portal cluster deployment \| Support spot instances (#7558 ) * added priority option * added head node priority * upgrade api version	2020-03-11 18:46:11 -07:00
Simon Mo	31d63d3ca7	Fix global state actors() call (#7567 )	2020-03-11 16:59:50 -07:00
Richard Liaw	b38ed4be71	[raysgd] Fix More Docs (#7565 )	2020-03-11 14:17:47 -07:00
Richard Liaw	d046faeb9c	[sgd] Readme fix (#7564 ) * readme fix * replicas	2020-03-11 13:40:18 -07:00
Richard Liaw	b70f31339c	[sgd] Benchmark Fixes (#7553 ) * fix * fix	2020-03-11 13:08:27 -07:00
Markus Cozowicz	ea99063c10	added json schema to setup.py (#7554 )	2020-03-11 09:53:21 -07:00
mehrdadn	3b9caa98ba	Fix fate-sharing warning (#7545 ) * Fix kernel_fate_sharing being None instead of False * Remove fate-sharing warning Co-authored-by: Mehrdad <noreply@github.com>	2020-03-11 08:27:54 -07:00
Richard Liaw	fbac256982	[sgd] Add benchmarks (#7454 ) * Init fp16 * fp16 and schedulers * scheduler linking and fp16 * to fp16 * loss scaling and documentation * more documentation * add tests, refactor config * moredocs * more docs * fix logo, add test mode, add fp16 flag * fix tests * fix scheduler * fix apex * improve safety * fix tests * fix tests * remove pin memory default * rm * fix * Update doc/examples/doc_code/raysgd_torch_signatures.py * fix * migrate changes from other PR * ok thanks * pass * signatures * lint' * Update python/ray/experimental/sgd/pytorch/utils.py * Apply suggestions from code review Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com> * should address most comments * comments * fix this ci * first_pass * add overrides * override * fixing up operators * format * sgd * constants * rm * revert * save * failures * fixes * trainer * run test * operator * code * op * ok done * operator * sgd test fixes * ok * trainer * format * Apply suggestions from code review Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com> * Update doc/source/raysgd/raysgd_pytorch.rst * docstring * dcgan * doc * commits * nit * testing * revert * Start renaming pytorch to torch * Rename PyTorchTrainer to TorchTrainer * Rename PyTorch runners to Torch runners * Finish renaming API * Rename to torch in tests * Finish renaming docs + tests * Run format + fix DeprecationWarning * fix * move tests up * benchmarks * rename * remove some args * better metrics output * fix up the benchmark * benchmark-yaml * horovod-benchmark * benchmarks * Remove benchmark code for cleanups * benchmark-code * nits * benchmark yamls * benchmark yaml * ok * ok * ok * benchmark * nit * finish_bench * makedatacreator * relax * metrics * autosetsampler * profile * movements * OK * smoothen * fix * nitdocs * loss * envflag * comments * nit * format * visible * images * move_images * fix * rernder * rrender * rest * multgpu * fix * nit * finish * extrra * setup * revert Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Co-authored-by: Maksim Smolin <maximsmol@gmail.com>	2020-03-11 01:09:08 -07:00
Markus Cozowicz	49439611f1	[autoscaler] Replace cluster yaml validation with json schema v… (#7261 ) * replace manual cluster yaml validation with json schema - improved error message - support for intellisense in VSCode (or other IDEs) - run linting - moved schema to ray/autoscaler - fixed typo - remove importlib dependency * Update python/ray/autoscaler/autoscaler.py * read * restrict allowed properties * added unit test for invalid yaml added ray[test] package (remove pytest from default dependencies) * updated autoscaler test to use ValidationError exception * add missing dependency * added pytest * replace manual cluster yaml validation with json schema - improved error message - support for intellisense in VSCode (or other IDEs) - run linting - moved schema to ray/autoscaler - fixed typo - remove importlib dependency * Update python/ray/autoscaler/autoscaler.py * read * restrict allowed properties * added unit test for invalid yaml added ray[test] package (remove pytest from default dependencies) * updated autoscaler test to use ValidationError exception * add missing dependency * added pytest * removed parameterized dependency reverted ray[test] intro * removed parameterized * fix_tests * format Co-authored-by: Ubuntu <marcozo@mc-ray-jumpbox.chcbtljllnieveqhw3e4c1ducc.xx.internal.cloudapp.net> Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2020-03-10 18:58:55 -07:00
Richard Liaw	6163b21458	[raysgd] Better user errors! (#7546 ) * format * callable * Update python/ray/util/sgd/torch/torch_trainer.py Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com> * Update python/ray/util/sgd/torch/torch_trainer.py Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com> * data * torchtrainer * num_rep Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>	2020-03-10 18:58:19 -07:00
Edward Oakes	7b609ca211	Remove instances of 'raise Exception' (#7523 )	2020-03-10 17:51:22 -07:00
Stephanie Wang	fdb528514b	[core] Ref counting for actor handles (#7434 ) * tmp * Move Exit handler into CoreWorker, exit once owner's ref count goes to 0 * fix build * Remove __ray_terminate__ and add test case for distributed ref counting * lint * Remove unused * Fixes for detached actor, duplicate actor handles * Remove unused * Remove creation return ID * Remove ObjectIDs from python, set references in CoreWorker * Fix crash * Fix memory crash * Fix tests * fix * fixes * fix tests * fix java build * fix build * fix * check status * check status	2020-03-10 17:45:07 -07:00
Edward Oakes	119a303ea0	Remove static concurrency limit from gRPC server (#7544 )	2020-03-10 16:27:02 -07:00
Edward Oakes	dbbf0c0e70	Add Apache 2 license to C++ files (#7520 )	2020-03-10 16:07:17 -07:00
Eric Liang	be48e1964b	[rllib] Fix per-worker exploration in Ape-X; make more kwargs required for future safety (#7504 ) * fix sched * lintc * lint * fix * add unit test * fix * format * fix test * fix test	2020-03-10 11:14:14 -07:00
Richard Liaw	d192ef0611	[raysgd] Cleanup User API (#7384 ) * Init fp16 * fp16 and schedulers * scheduler linking and fp16 * to fp16 * loss scaling and documentation * more documentation * add tests, refactor config * moredocs * more docs * fix logo, add test mode, add fp16 flag * fix tests * fix scheduler * fix apex * improve safety * fix tests * fix tests * remove pin memory default * rm * fix * Update doc/examples/doc_code/raysgd_torch_signatures.py * fix * migrate changes from other PR * ok thanks * pass * signatures * lint' * Update python/ray/experimental/sgd/pytorch/utils.py * Apply suggestions from code review Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com> * should address most comments * comments * fix this ci * first_pass * add overrides * override * fixing up operators * format * sgd * constants * rm * revert * save * failures * fixes * trainer * run test * operator * code * op * ok done * operator * sgd test fixes * ok * trainer * format * Apply suggestions from code review Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com> * Update doc/source/raysgd/raysgd_pytorch.rst * docstring * dcgan * doc * commits * nit * testing * revert * Start renaming pytorch to torch * Rename PyTorchTrainer to TorchTrainer * Rename PyTorch runners to Torch runners * Finish renaming API * Rename to torch in tests * Finish renaming docs + tests * Run format + fix DeprecationWarning * fix * move tests up * benchmarks * rename * remove some args * better metrics output * fix up the benchmark * benchmark-yaml * horovod-benchmark * benchmarks * Remove benchmark code for cleanups * makedatacreator * relax * metrics * autosetsampler * profile * movements * OK * smoothen * fix * nitdocs * loss * comments * fix * fix * runner_tests * codes * example * fix_test * fix * tests Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Co-authored-by: Maksim Smolin <maximsmol@gmail.com>	2020-03-10 08:41:42 -07:00
Anthony Yu	89ec4adb72	[tune] Dragonfly Optimizer (#5955 ) * Add sample example * Copy relevant lines of ask from inherited Optimizer * Ignore strategy * Additional changes * Add DragonflySearch for tune connector for Dragonfly * Add example and fix small errors * lint * Remove skopt references * Update example based off of Dragonfly changes * Edit example for final Dragonfly edits * Formatting and documentation edits * Add documentation and add to test pipeline * Address PR comments * Fix Jenkins test * Adjust Dragonfly to PR#7366 * Lint * fix_tests Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2020-03-10 08:40:36 -07:00
fangfengbin	fa785a2ad2	ServiceBasedGcsClient support detect gcs server availability and retry (#7292 )	2020-03-10 21:01:07 +08:00
mehrdadn	fc76586518	Redis on Windows (#7509 ) * Switch hiredis on Windows to that of the Windows port of Redis * Use boost::asio::ip::tcp::socket::native_handle_type * Use normal hiredis instead of Windows-specific one * Finish up using normal hiredis Co-authored-by: Mehrdad <noreply@github.com>	2020-03-09 18:49:54 -07:00
Eric Liang	90e23a5c43	[iterators] Add duplicate() call and fix broken test case (#7510 )	2020-03-09 17:18:52 -07:00
Edward Oakes	883ee4912d	Return reconcile.Result{}, not nil (#7521 )	2020-03-09 16:27:15 -07:00
Edward Oakes	4ab80eafb9	Deprecate use_pickle flag (#7474 )	2020-03-09 16:03:56 -07:00
Edward Oakes	0c254295b0	Remove experimental.signal API (#7477 ) * Remove experimental.signal API * fix test	2020-03-09 16:03:36 -07:00
Ujval Misra	023d4c02a9	[tune] Prevent deletion of checkpoint from user-initiated resto… (#7501 ) * Fix restore bug * Add test * Lint * Indent	2020-03-09 15:53:10 -07:00
Edward Oakes	08d4cb3822	[operator] Minor cleanup (#7498 )	2020-03-09 11:23:46 -07:00
Edward Oakes	b4e2d5317e	Remove experimental.NoReturn (#7475 )	2020-03-09 11:09:36 -07:00
Edward Oakes	27b4ffa98e	Improve k8s operator documentation (#7496 )	2020-03-09 11:09:06 -07:00
Stephanie Wang	95bb0c5357	Upgrade plasma to latest version, use synchronous Seal (#7470 ) * Upgrade arrow to master * fix build * todo * lint * Fix hanging test	2020-03-09 10:30:44 -07:00
Markus Cozowicz	e03259455f	[autoscaler] azure init script path (#7515 )	2020-03-09 09:49:07 -07:00
Markus Cozowicz	145ebe14c7	added Azure Resource Manager (ARM) template (#7494 ) * added Azure Resource Manager (ARM) template * removed Azure doc (moved to separate PR) * nit * fixpaths * nit Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2020-03-08 22:29:10 -07:00
Eric Liang	e7bc5c612d	Add testing strategy to PR template (#7505 )	2020-03-08 15:16:49 -07:00
Sven Mika	f08687f550	[RLlib] `rllib train` crashes when using torch PPO/PG/A2C. (#7508 ) * Fix. * Rollback. * TEST. * TEST. * TEST. * TEST. * TEST. * TEST. * TEST. * TEST. * TEST. * TEST. * TEST. * TEST. * TEST. * TEST. * TEST. * TEST. * TEST. * TEST. * TEST. * TEST. * TEST. * TEST. * TEST.	2020-03-08 13:03:18 -07:00
Sven Mika	bc637a2546	[Tune Jenkins tests] Add dm_tree to docker. (#7500 ) * Fix. * Rollback. * Add dm_tree to docker examples and tune_test containers.	2020-03-07 23:16:00 -08:00
Eric Liang	a644060daa	[rllib] First pass at pipeline implementation of DQN (#7433 ) * wip iters * add test * speed up * update docs * document it * support serial sampling * add test * spacing * annotate it * update * rename to pipeline * comment * iter2 wip * update * update * context test * update * fix * fix * a3c pipeline * doc * update * move timer * comment * add piepline test * fix * clean up * document * iter s * wip dqn * wip * wip * metrics * metrics rename * metrics ctx * wip * constants * add todo * suppport .union * wip * support union * remove prints * add todo * remove auto timer * fix up * fix pipeline test * typing * fix breakage * remove bad assert * wip * fix multiagent example * fixapply * update a3c * remove a2c pl * 0 workers * wip * wip * share metrics * wip * wip * doc * fix weight sync and global var updates * mode * fix * fix * doc * fix	2020-03-07 14:47:58 -08:00
Landcold7	beb9b02dbd	Add numba test (#7298 ) (#7487 )	2020-03-07 11:12:25 -08:00
Richard Liaw	115468de2c	[tune] Repeated evals (#7366 ) * easyrepeat * done * suggest * doc * ok * commit * Apply suggestions from code review Co-Authored-By: Ujval Misra <misraujval@gmail.com> * Apply suggestions from code review Co-Authored-By: Ujval Misra <misraujval@gmail.com> * Apply suggestions from code review * ok * docs Co-authored-by: Ujval Misra <misraujval@gmail.com>	2020-03-07 11:08:23 -08:00
mehrdadn	a8bda9b551	Fix incorrect handling of command-lines (#7439 )	2020-03-06 15:51:49 -08:00
Sven Mika	876a1ba5bd	[RLlib] Issue 7421: can't convert cuda tensor to numpy in torch ppo. (#7445 )	2020-03-06 12:45:30 -08:00
Sven Mika	510c850651	[RLlib] SAC add discrete action support. (#7320 ) * Exploration API (+EpsilonGreedy sub-class). * Exploration API (+EpsilonGreedy sub-class). * Cleanup/LINT. * Add `deterministic` to generic Trainer config (NOTE: this is still ignored by most Agents). * Add `error` option to deprecation_warning(). * WIP. * Bug fix: Get exploration-info for tf framework. Bug fix: Properly deprecate some DQN config keys. * WIP. * LINT. * WIP. * Split PerWorkerEpsilonGreedy out of EpsilonGreedy. Docstrings. * Fix bug in sampler.py in case Policy has self.exploration = None * Update rllib/agents/dqn/dqn.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * WIP. * Update rllib/agents/trainer.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * WIP. * Change requests. * LINT * In tune/utils/util.py::deep_update() Only keep deep_updat'ing if both original and value are dicts. If value is not a dict, set * Completely obsolete syn_replay_optimizer.py's parameters schedule_max_timesteps AND beta_annealing_fraction (replaced with prioritized_replay_beta_annealing_timesteps). * Update rllib/evaluation/worker_set.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Review fixes. * Fix default value for DQN's exploration spec. * LINT * Fix recursion bug (wrong parent c'tor). * Do not pass timestep to get_exploration_info. * Update tf_policy.py * Fix some remaining issues with test cases and remove more deprecated DQN/APEX exploration configs. * Bug fix tf-action-dist * DDPG incompatibility bug fix with new DQN exploration handling (which is imported by DDPG). * Switch off exploration when getting action probs from off-policy-estimator's policy. * LINT * Fix test_checkpoint_restore.py. * Deprecate all SAC exploration (unused) configs. * Properly use `model.last_output()` everywhere. Instead of `model._last_output`. * WIP. * Take out set_epsilon from multi-agent-env test (not needed, decays anyway). * WIP. * Trigger re-test (flaky checkpoint-restore test). * WIP. * WIP. * Add test case for deterministic action sampling in PPO. * bug fix. * Added deterministic test cases for different Agents. * Fix problem with TupleActions in dynamic-tf-policy. * Separate supported_spaces tests so they can be run separately for easier debugging. * LINT. * Fix autoregressive_action_dist.py test case. * Re-test. * Fix. * Remove duplicate py_test rule from bazel. * LINT. * WIP. * WIP. * SAC fix. * SAC fix. * WIP. * WIP. * WIP. * FIX 2 examples tests. * WIP. * WIP. * WIP. * WIP. * WIP. * Fix. * LINT. * Renamed test file. * WIP. * Add unittest.main. * Make action_dist_class mandatory. * fix * FIX. * WIP. * WIP. * Fix. * Fix. * Fix explorations test case (contextlib cannot find its own nullcontext??). * Force torch to be installed for QMIX. * LINT. * Fix determine_tests_to_run.py. * Fix determine_tests_to_run.py. * WIP * Add Random exploration component to tests (fixed issue with "static-graph randomness" via py_function). * Add Random exploration component to tests (fixed issue with "static-graph randomness" via py_function). * Rename some stuff. * Rename some stuff. * WIP. * update. * WIP. * Gumbel Softmax Dist. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP * WIP. * WIP. * Hypertune. * Hypertune. * Hypertune. * Lock-in. * Cleanup. * LINT. * Fix. * Update rllib/policy/eager_tf_policy.py Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com> * Update rllib/agents/sac/sac_policy.py Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com> * Update rllib/agents/sac/sac_policy.py Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com> * Update rllib/models/tf/tf_action_dist.py Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com> * Update rllib/models/tf/tf_action_dist.py Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com> * Fix items from review comments. * Add dm_tree to RLlib dependencies. * Add dm_tree to RLlib dependencies. * Fix DQN test cases ((Torch)Categorical). * Fix wrong pip install. Co-authored-by: Eric Liang <ekhliang@gmail.com> Co-authored-by: Kristian Hartikainen <kristian.hartikainen@gmail.com>	2020-03-06 10:37:12 -08:00
Qing Wang	7a33a6ea3c	[Java] Enable skipped direct call cases (#7363 ) * Comment out * Refine * Revert	2020-03-06 16:22:08 +08:00
Stephanie Wang	7c174d0ffe	Make the ref counting test more stressful (#7473 )	2020-03-05 20:51:24 -08:00

... 3 4 5 6 7 ...

4393 commits