hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 18:41:40 -05:00

Author	SHA1	Message	Date
Eric Liang	86f89fc3b3	[tune] Higher timeout for progress reporter test (#7679 ) * wip * medium size	2020-03-22 13:47:08 -07:00
Stephanie Wang	ba86a02b37	[core] Revert lineage pinning (#7499 ) (#7692 ) * Revert "fix (#7681)" This reverts commit `6a12a31b2e`. * Revert "[core] Pin lineage of plasma objects that are still in scope (#7499)" This reverts commit `014929e658`.	2020-03-21 18:35:43 -07:00
Simon Mo	89d959fd6a	Stop gap solution for cython functions breaking in memory monitor (#7687 )	2020-03-21 15:16:12 -07:00
Zhijun Fu	a7a5d172b1	[core] fix bug that actor tasks from reconstructed actor is ignored by scheduling queue (#7637 )	2020-03-21 13:05:24 +08:00
Edward Oakes	58dc70f90e	[minor] Remove get_global_worker(), RuntimeContext (#7638 )	2020-03-20 15:45:29 -05:00
Stephanie Wang	014929e658	[core] Pin lineage of plasma objects that are still in scope (#7499 ) * Add a lineage_ref_count to References * Refactor TaskManager to store TaskEntry as a struct * Refactor to fix deadlock between TaskManager and ReferenceCounter Add references to task specs * Pin TaskEntries and References in the lineage of any ObjectIDs in scope * Fix deadlock, convert num_plasma_returns to a set of object IDs * fix unit tests * Feature flag * Do not release lineage for objects that were promoted to plasma * fix build * fix build * Remove num executions * Simplify num return values * Remove unused * doc * Set num returns * Move lineage pinning flag to ReferenceCounter * comments * Fixes * Remove irrelevant test (replaced by ref counting tests)	2020-03-20 10:56:43 -07:00
ZhuSenlin	7d08b418fc	fix test_worker_stats (#7655 ) * fix test_worker_stats * fix lint error * fix lint error Co-authored-by: senlin.zsl <senlin.zsl@antfin.com>	2020-03-20 14:53:40 +08:00
mehrdadn	e69664b74b	Miscellaneous Windows compatibility bugfixes (#7658 ) * Windows compatibility bug fixes * Use WSASend/WSARecv as WSASendMsg/WSARecvMsg do not work with TCP sockets * Clean up some TODOs * Fix duplicate compilations * RedisAsioClient boost::asio::error::connection_reset Co-authored-by: Mehrdad <noreply@github.com>	2020-03-19 19:32:53 -07:00
Eric Liang	5a112ab212	Remove object store memory cap (#7654 )	2020-03-19 16:00:30 -07:00
Clark Zinzow	c37f6e745a	Remove duplicate jsonschema from setup.py (#7665 )	2020-03-19 13:12:47 -07:00
Stephanie Wang	b499100a88	Enable distributed ref counting by default (#7628 ) * enable * Turn on eager eviction * Shorten tests and drain ReferenceCounter * Don't force kill actor handles that have gone out of scope, lint * Fix locks * Cleanup Plasma Async Callback (#7452) * [rllib][tune] fix some nans (#7611) * Change /tmp to platform-specific temporary directory (#7529) * [Serve] UI Improvements (#7569) * bugfix about test_dynres.py (#7615) Co-authored-by: senlin.zsl <senlin.zsl@antfin.com> * Java call Python actor method use actor.call (#7614) * bug fix about useage of absl::flat_hash_map::erase and absl::flat_hash_set::erase (#7633) Co-authored-by: senlin.zsl <senlin.zsl@antfin.com> * [Java] Make both `RayActor` and `RayPyActor` inheriting from `BaseActor` (#7462) * [Java] Fix the issue that the cached value in `RayObject` is serialized (#7613) * Add failure tests to test_reference_counting (#7400) * Fix typo in asyncio documentation (#7602) * Fix segfault * debug * Force kill actor * Fix test	2020-03-18 22:39:21 -07:00
fangfengbin	fca9dc73e1	Fix test_raylet_pending_tasks test case failed (#7636 )	2020-03-19 11:09:38 +08:00
Seung Hyeon, Kim	ee49f4a875	[tune] Fix an example for _Brackets of async hyperband scheduler (#7538 )	2020-03-18 19:06:32 -07:00
Richard Liaw	ea10cd212c	[tune] add accessible trial_info (#7378 ) * add accessible trial_info * trial name and info * doc * fix gp * Update doc/source/tune-package-ref.rst * Apply suggestions from code review * fix * trial * fixtest * testfix	2020-03-17 23:44:18 -07:00
Eric Liang	745b9d643d	First pass at `ray memory` command for memory debugging (#7589 )	2020-03-17 20:45:07 -07:00
Edward Oakes	c1b0f9ccdf	Add failure tests to test_reference_counting (#7400 )	2020-03-17 10:30:21 -05:00
fyrestone	7697ea2be2	Java call Python actor method use actor.call (#7614 )	2020-03-17 14:52:43 +08:00
Simon Mo	ce0885a897	[Serve] UI Improvements (#7569 )	2020-03-16 22:23:16 -07:00
mehrdadn	a0700e2f86	Change /tmp to platform-specific temporary directory (#7529 )	2020-03-16 18:10:14 -07:00
Eric Liang	797e6cfc2a	[rllib][tune] fix some nans (#7611 )	2020-03-16 11:19:58 -07:00
ijrsvt	46953c53b1	Cleanup Plasma Async Callback (#7452 )	2020-03-16 10:12:44 -07:00
Scott Graham	37e4d29f87	[autoscaler] Adding Azure Support (#7080 ) * adding directory and node_provider entry for azure autoscaler * adding initial cut at azure autoscaler functionality, needs testing and node_provider methods need updating * adding todos and switching to auth file for service principal authentication * adding role / scope to service principal * resolving issues with app credentials * adding retry for setting service principal role * typo and adding retry to nic creation * adding nsg to config, moving nic/public ip to node provider, cleanup node_provider, leaving in NodeProvider stub for testing * linting * updating cleanup and fixing bugs * adding directory and node_provider entry for azure autoscaler * adding initial cut at azure autoscaler functionality, needs testing and node_provider methods need updating * adding todos and switching to auth file for service principal authentication * adding role / scope to service principal * resolving issues with app credentials * adding retry for setting service principal role * typo and adding retry to nic creation * adding nsg to config, moving nic/public ip to node provider, cleanup node_provider, leaving in NodeProvider stub for testing * linting * updating cleanup and fixing bugs * minor fixes * first working version :) * added tag support * added msi identity intermediate * enable MSI through user managed identity * updated schema * extend yaml schema remove service principal code add re-use of managed user identity * fix rg_id * fix logging * replace manual cluster yaml validation with json schema - improved error message - support for intellisense in VSCode (or other IDEs) * run linting * updating yaml configs and formatting * updating yaml configs and formatting * typo in example config * pulling default config from example-full * resetting min, init worker prop * adding docs for azure autoscaler and fixing status * add azure to docs, fix config for spot instances, update azure provider to avoid caching issues during deployment * fix for default subscription in azure node provider * vm dev image build * minor change * keeping example-full.yaml in autoscaler/azure, updating azure example config * linting azure config * extending retries on azure config * lint * support for internal ips, fix to azure docs, and new azure gpu example config * linting * Update python/ray/autoscaler/azure/node_provider.py Co-Authored-By: Richard Liaw <rliaw@berkeley.edu> * revert_this * remove_schema * updating configs and removing ssh keygen, tweak azure node provider terminate * minor tweaks Co-authored-by: Markus Cozowicz <marcozo@microsoft.com> Co-authored-by: Ubuntu <marcozo@mc-ray-jumpbox.chcbtljllnieveqhw3e4c1ducc.xx.internal.cloudapp.net> Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2020-03-15 14:48:27 -07:00
Simon Mo	3f1fcaa024	Blocking ray.get/wait inside async context will warn instead of error (#7262 )	2020-03-14 22:02:30 -07:00
Kai Yang	630e48967d	[Java] Allow passing internal config from raylet to Java worker (#7532 )	2020-03-15 12:03:38 +08:00
Stephanie Wang	53549314c5	[core] Option to fallback to LRU on OutOfMemory (#7410 ) * Add a test for LRU fallback * Update error message * Upgrade arrow to master * Integrate with arrow * Revert "Bazel mirrors (#7385)" This reverts commit `44aded5272`. * Don't LRU evict * Revert "Revert "Bazel mirrors (#7385)"" This reverts commit b6359fea78d1bd3925452ca88ac71e0c9e5c7dd3. * Add lru_evict flag * fix internal config * Fix * upgrade arrow * debug * Set free period in config for lru_evict, override max retries to fix test * Fix test? * fix test * Revert "debug" This reverts commit 98f01c63a267f38218f5047b1866e4c1c8280017. * fix exception str * Fix ref count test * Shorten travis test?	2020-03-14 11:28:43 -07:00
Anthony Yu	094125cf03	[tune] Dragonfly integration ask tell nit (#7593 ) * Add sample example * Copy relevant lines of ask from inherited Optimizer * Ignore strategy * Additional changes * Add DragonflySearch for tune connector for Dragonfly * Add example and fix small errors * lint * Remove skopt references * Update example based off of Dragonfly changes * Edit example for final Dragonfly edits * Formatting and documentation edits * Add documentation and add to test pipeline * Address PR comments * Fix Jenkins test * Adjust Dragonfly to PR#7366 * Lint * fix_tests * Minor changes to ordering Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2020-03-13 15:27:03 -07:00
Kai Yang	d6e8f47065	Add a flag to disable reconstruction for a killed actor (#7346 )	2020-03-13 19:10:21 +08:00
Ujval Misra	6022eb53c4	[tune] Use newest checkpoint in normal operation (#7563 ) * Use persistent checkpoint for failures * Fix test * Add unpause test * move test * Fix tests * remove debug statement * Mark test as flaky	2020-03-12 22:21:42 -07:00
ZhuSenlin	b663bc6d67	Use gcs server to replace raylet monitor when RAY_GCS_SERVICE_ENABLED=true (#7166 )	2020-03-12 22:13:56 +08:00
Eric Liang	f5d12a958b	[rllib] Port Ape-X to distributed execution API (#7497 )	2020-03-12 00:54:08 -07:00
Kai Yang	932a749fa9	Fix the `java_worker_options` parameter (#7537 ) * fix Java CI * Minor fix * move json.loads out of build_java_worker_command * lint * fix cross language test	2020-03-12 10:44:23 +08:00
Simon Mo	31d63d3ca7	Fix global state actors() call (#7567 )	2020-03-11 16:59:50 -07:00
Richard Liaw	b70f31339c	[sgd] Benchmark Fixes (#7553 ) * fix * fix	2020-03-11 13:08:27 -07:00
Markus Cozowicz	ea99063c10	added json schema to setup.py (#7554 )	2020-03-11 09:53:21 -07:00
mehrdadn	3b9caa98ba	Fix fate-sharing warning (#7545 ) * Fix kernel_fate_sharing being None instead of False * Remove fate-sharing warning Co-authored-by: Mehrdad <noreply@github.com>	2020-03-11 08:27:54 -07:00
Richard Liaw	fbac256982	[sgd] Add benchmarks (#7454 ) * Init fp16 * fp16 and schedulers * scheduler linking and fp16 * to fp16 * loss scaling and documentation * more documentation * add tests, refactor config * moredocs * more docs * fix logo, add test mode, add fp16 flag * fix tests * fix scheduler * fix apex * improve safety * fix tests * fix tests * remove pin memory default * rm * fix * Update doc/examples/doc_code/raysgd_torch_signatures.py * fix * migrate changes from other PR * ok thanks * pass * signatures * lint' * Update python/ray/experimental/sgd/pytorch/utils.py * Apply suggestions from code review Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com> * should address most comments * comments * fix this ci * first_pass * add overrides * override * fixing up operators * format * sgd * constants * rm * revert * save * failures * fixes * trainer * run test * operator * code * op * ok done * operator * sgd test fixes * ok * trainer * format * Apply suggestions from code review Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com> * Update doc/source/raysgd/raysgd_pytorch.rst * docstring * dcgan * doc * commits * nit * testing * revert * Start renaming pytorch to torch * Rename PyTorchTrainer to TorchTrainer * Rename PyTorch runners to Torch runners * Finish renaming API * Rename to torch in tests * Finish renaming docs + tests * Run format + fix DeprecationWarning * fix * move tests up * benchmarks * rename * remove some args * better metrics output * fix up the benchmark * benchmark-yaml * horovod-benchmark * benchmarks * Remove benchmark code for cleanups * benchmark-code * nits * benchmark yamls * benchmark yaml * ok * ok * ok * benchmark * nit * finish_bench * makedatacreator * relax * metrics * autosetsampler * profile * movements * OK * smoothen * fix * nitdocs * loss * envflag * comments * nit * format * visible * images * move_images * fix * rernder * rrender * rest * multgpu * fix * nit * finish * extrra * setup * revert Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Co-authored-by: Maksim Smolin <maximsmol@gmail.com>	2020-03-11 01:09:08 -07:00
Markus Cozowicz	49439611f1	[autoscaler] Replace cluster yaml validation with json schema v… (#7261 ) * replace manual cluster yaml validation with json schema - improved error message - support for intellisense in VSCode (or other IDEs) - run linting - moved schema to ray/autoscaler - fixed typo - remove importlib dependency * Update python/ray/autoscaler/autoscaler.py * read * restrict allowed properties * added unit test for invalid yaml added ray[test] package (remove pytest from default dependencies) * updated autoscaler test to use ValidationError exception * add missing dependency * added pytest * replace manual cluster yaml validation with json schema - improved error message - support for intellisense in VSCode (or other IDEs) - run linting - moved schema to ray/autoscaler - fixed typo - remove importlib dependency * Update python/ray/autoscaler/autoscaler.py * read * restrict allowed properties * added unit test for invalid yaml added ray[test] package (remove pytest from default dependencies) * updated autoscaler test to use ValidationError exception * add missing dependency * added pytest * removed parameterized dependency reverted ray[test] intro * removed parameterized * fix_tests * format Co-authored-by: Ubuntu <marcozo@mc-ray-jumpbox.chcbtljllnieveqhw3e4c1ducc.xx.internal.cloudapp.net> Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2020-03-10 18:58:55 -07:00
Richard Liaw	6163b21458	[raysgd] Better user errors! (#7546 ) * format * callable * Update python/ray/util/sgd/torch/torch_trainer.py Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com> * Update python/ray/util/sgd/torch/torch_trainer.py Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com> * data * torchtrainer * num_rep Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>	2020-03-10 18:58:19 -07:00
Edward Oakes	7b609ca211	Remove instances of 'raise Exception' (#7523 )	2020-03-10 17:51:22 -07:00
Stephanie Wang	fdb528514b	[core] Ref counting for actor handles (#7434 ) * tmp * Move Exit handler into CoreWorker, exit once owner's ref count goes to 0 * fix build * Remove __ray_terminate__ and add test case for distributed ref counting * lint * Remove unused * Fixes for detached actor, duplicate actor handles * Remove unused * Remove creation return ID * Remove ObjectIDs from python, set references in CoreWorker * Fix crash * Fix memory crash * Fix tests * fix * fixes * fix tests * fix java build * fix build * fix * check status * check status	2020-03-10 17:45:07 -07:00
Richard Liaw	d192ef0611	[raysgd] Cleanup User API (#7384 ) * Init fp16 * fp16 and schedulers * scheduler linking and fp16 * to fp16 * loss scaling and documentation * more documentation * add tests, refactor config * moredocs * more docs * fix logo, add test mode, add fp16 flag * fix tests * fix scheduler * fix apex * improve safety * fix tests * fix tests * remove pin memory default * rm * fix * Update doc/examples/doc_code/raysgd_torch_signatures.py * fix * migrate changes from other PR * ok thanks * pass * signatures * lint' * Update python/ray/experimental/sgd/pytorch/utils.py * Apply suggestions from code review Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com> * should address most comments * comments * fix this ci * first_pass * add overrides * override * fixing up operators * format * sgd * constants * rm * revert * save * failures * fixes * trainer * run test * operator * code * op * ok done * operator * sgd test fixes * ok * trainer * format * Apply suggestions from code review Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com> * Update doc/source/raysgd/raysgd_pytorch.rst * docstring * dcgan * doc * commits * nit * testing * revert * Start renaming pytorch to torch * Rename PyTorchTrainer to TorchTrainer * Rename PyTorch runners to Torch runners * Finish renaming API * Rename to torch in tests * Finish renaming docs + tests * Run format + fix DeprecationWarning * fix * move tests up * benchmarks * rename * remove some args * better metrics output * fix up the benchmark * benchmark-yaml * horovod-benchmark * benchmarks * Remove benchmark code for cleanups * makedatacreator * relax * metrics * autosetsampler * profile * movements * OK * smoothen * fix * nitdocs * loss * comments * fix * fix * runner_tests * codes * example * fix_test * fix * tests Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Co-authored-by: Maksim Smolin <maximsmol@gmail.com>	2020-03-10 08:41:42 -07:00
Anthony Yu	89ec4adb72	[tune] Dragonfly Optimizer (#5955 ) * Add sample example * Copy relevant lines of ask from inherited Optimizer * Ignore strategy * Additional changes * Add DragonflySearch for tune connector for Dragonfly * Add example and fix small errors * lint * Remove skopt references * Update example based off of Dragonfly changes * Edit example for final Dragonfly edits * Formatting and documentation edits * Add documentation and add to test pipeline * Address PR comments * Fix Jenkins test * Adjust Dragonfly to PR#7366 * Lint * fix_tests Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2020-03-10 08:40:36 -07:00
Eric Liang	90e23a5c43	[iterators] Add duplicate() call and fix broken test case (#7510 )	2020-03-09 17:18:52 -07:00
Edward Oakes	4ab80eafb9	Deprecate use_pickle flag (#7474 )	2020-03-09 16:03:56 -07:00
Edward Oakes	0c254295b0	Remove experimental.signal API (#7477 ) * Remove experimental.signal API * fix test	2020-03-09 16:03:36 -07:00
Ujval Misra	023d4c02a9	[tune] Prevent deletion of checkpoint from user-initiated resto… (#7501 ) * Fix restore bug * Add test * Lint * Indent	2020-03-09 15:53:10 -07:00
Edward Oakes	b4e2d5317e	Remove experimental.NoReturn (#7475 )	2020-03-09 11:09:36 -07:00
Stephanie Wang	95bb0c5357	Upgrade plasma to latest version, use synchronous Seal (#7470 ) * Upgrade arrow to master * fix build * todo * lint * Fix hanging test	2020-03-09 10:30:44 -07:00
Eric Liang	a644060daa	[rllib] First pass at pipeline implementation of DQN (#7433 ) * wip iters * add test * speed up * update docs * document it * support serial sampling * add test * spacing * annotate it * update * rename to pipeline * comment * iter2 wip * update * update * context test * update * fix * fix * a3c pipeline * doc * update * move timer * comment * add piepline test * fix * clean up * document * iter s * wip dqn * wip * wip * metrics * metrics rename * metrics ctx * wip * constants * add todo * suppport .union * wip * support union * remove prints * add todo * remove auto timer * fix up * fix pipeline test * typing * fix breakage * remove bad assert * wip * fix multiagent example * fixapply * update a3c * remove a2c pl * 0 workers * wip * wip * share metrics * wip * wip * doc * fix weight sync and global var updates * mode * fix * fix * doc * fix	2020-03-07 14:47:58 -08:00
Landcold7	beb9b02dbd	Add numba test (#7298 ) (#7487 )	2020-03-07 11:12:25 -08:00

1 2 3 4 5 ...

2152 commits