DK.Pino
7f6d326ad8
[Placement Group]Add detached support for placement group. ( #13582 )
2021-01-27 18:51:26 +08:00
SangBin Cho
d2963f4ee1
[Object Spilling] Clean up FS storage upon sigint for ray.init(). ( #13649 )
...
* Initial iteration done.
* Remove unnecessary messages.
* Addressed code review.
* Addressed code review.
* fix issues.
* addressed code review.
* Addressed the last code review.
2021-01-26 23:10:29 -08:00
SangBin Cho
8baafacb1e
[Logging] Log rotation config ( #13375 )
...
* In Progress.
* formatting.
* in progress.
* linting.
* Done.
* Fix typo.
* Fixed the issue.
2021-01-26 20:15:55 -08:00
Simon Mo
9cf0c49015
[CI] Skip test_multi_node_3 on Windows ( #13723 )
...
test_multi_node_3 was recently split from test_multi_node, but we forgot
to skip it on Windows
2021-01-26 16:12:13 -08:00
Ian Rodney
4db0a31130
[Core] Better error if /dev/shm is too small ( #13624 )
2021-01-26 15:26:45 -08:00
Rand Xie
4f4e1b664b
Fix multiprocessing starmap to allow passing in zip ( #13664 )
2021-01-26 16:15:35 -06:00
Simon Mo
2f482193b9
Revert "[CLI] Fix Ray Status with ENV Variable set ( #13707 )" ( #13719 )
...
This reverts commit 5d82654022
.
2021-01-26 14:14:51 -08:00
Ian Rodney
ab6a634a94
[Serve] Revert "Revert "[Serve] Refactor BackendState" ( #13626 ) ( #13697 )
2021-01-26 15:31:01 -06:00
Barak Michener
f490e2be43
[ray_client] Fix and extend get_actor test to detached actors ( #13016 )
2021-01-26 15:19:51 -06:00
Amog Kamsetty
6b477dd37a
[CI] Split test_multi_node to avoid timeouts ( #13712 )
2021-01-26 12:06:19 -08:00
Barak Michener
0c46d09940
[ray_client]: Monitor client stream errors ( #13386 )
2021-01-26 10:56:56 -08:00
Ian Rodney
5d82654022
[CLI] Fix Ray Status with ENV Variable set ( #13707 )
2021-01-26 10:29:42 -08:00
Dmitri Gekhtman
ddcbd229ba
Rename the ray.operator module to ray.ray_operator ( #13705 )
...
* Rename ray.operator module
* mypy
2021-01-26 10:29:07 -08:00
Amog Kamsetty
4aff86bfa7
[CI] skip failing java tests ( #13702 )
2021-01-26 10:17:58 -08:00
Edward Oakes
5d882b062d
[Serve] fix k8s doc ( #13713 )
2021-01-26 10:09:13 -08:00
dependabot[bot]
148b1022d6
[tune](deps): Bump autogluon-core in /python/requirements ( #13698 )
...
Bumps [autogluon-core](https://github.com/awslabs/autogluon ) from 0.0.16b20210122 to 0.0.16b20210125.
- [Release notes](https://github.com/awslabs/autogluon/releases )
- [Changelog](https://github.com/awslabs/autogluon/blob/master/docs/ReleaseInstructions.md )
- [Commits](https://github.com/awslabs/autogluon/commits )
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-01-26 11:32:56 +01:00
dependabot[bot]
ef1f7e4d42
[tune](deps): Bump smart-open[s3] in /python/requirements ( #13699 )
...
Bumps [smart-open[s3]](https://github.com/piskvorky/smart_open ) from 4.0.1 to 4.1.2.
- [Release notes](https://github.com/piskvorky/smart_open/releases )
- [Changelog](https://github.com/RaRe-Technologies/smart_open/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/piskvorky/smart_open/compare/4.0.1...v4.1.2 )
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-01-26 11:32:17 +01:00
Hao Zhang
7a78f4e959
[Collective][PR 4/6] NCCL Communicator caching and preliminary stream management ( #13030 )
...
Co-authored-by: Dacheng Li <dal177@ucsd.edu>
2021-01-26 01:05:21 -08:00
Alex Wu
840987c7af
Scalability Envelope Tests ( #13464 )
2021-01-25 18:48:31 -08:00
Simon Mo
f2867b0609
[CI] Remove object_manager_test ( #13703 )
...
0998d69968
removed the object_manager_test.
2021-01-25 17:33:41 -08:00
Simon Mo
fe8262afd0
Add K8s test to release process ( #13694 )
2021-01-25 16:53:52 -08:00
Simon Mo
8b8d6b984b
[Buildkite] Add all Python tests ( #13566 )
2021-01-25 16:05:59 -08:00
dependabot[bot]
0d75f37c1f
[tune](deps): Bump distributed in /python/requirements ( #13643 )
...
Bumps [distributed](https://github.com/dask/distributed ) from 2020.12.0 to 2021.1.1.
- [Release notes](https://github.com/dask/distributed/releases )
- [Changelog](https://github.com/dask/distributed/blob/master/docs/release-procedure.md )
- [Commits](https://github.com/dask/distributed/compare/2020.12.0...2021.01.1 )
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-01-26 00:03:38 +01:00
Amog Kamsetty
9feae90e3b
skip test_spill ( #13693 )
2021-01-25 14:37:07 -08:00
Amog Kamsetty
d96a9fa192
Revert "Revert "[dashboard] Fix RAY_RAYLET_PID KeyError on Windows ( #12948 )" ( #13572 )" ( #13685 )
...
This reverts commit c4a710369b
.
2021-01-25 10:35:25 -08:00
Edward Oakes
1c77cc7e23
[docs] Remove API warning from mp.Pool ( #13683 )
2021-01-25 09:59:46 -08:00
Dmitri Gekhtman
79209110c5
[kubernetes][operator][hotfix] Dictionary fix ( #13663 )
2021-01-25 10:40:59 -06:00
Lingxuan Zuo
f9f2bfa778
[Metric] Fix crashed when register metric view in multithread ( #13485 )
...
* Fix crashed when register metric view in multithread
* fix comments
* fix
2021-01-25 20:32:08 +08:00
DK.Pino
db2c836587
[Placement Group] Move PlacementGroup public method to interface. ( #13629 )
2021-01-25 20:14:21 +08:00
Maltimore
b4702de1c2
[RLlib] move evaluation to trainer.step() such that the result is properly logged ( #12708 )
2021-01-25 12:56:00 +01:00
Jan Blumenkamp
964689b280
[RLlib] Fix bug in ModelCatalog when using custom action distribution ( #12846 )
...
* return tuple returned from _get_multi_action_distribution when using custom action dict
* Always return dst_class and required_model_output_shape in _get_multi_action_distribution
* pass model config to _get_multi_action_distribution
2021-01-25 12:42:39 +01:00
Sven Mika
9423930bcc
[RLlib] MAML: Add cartpole mass test for PyTorch. ( #13679 )
2021-01-25 12:32:41 +01:00
Kai Yang
e9103eeb6d
[Java] [Test] Move multi-worker config to ray.conf file ( #13583 )
2021-01-25 18:07:45 +08:00
Ameer Haj Ali
4dabf017ee
Close #12031 (Autoscaler is overriding your resource for same quantity) ( #13671 )
2021-01-24 16:31:53 -08:00
SangBin Cho
edbb2937d3
[Object Spilling] Multi node file spilling V2. ( #13542 )
...
* done.
* done.
* Fix a mistake.
* Ready.
* Fix issues.
* fix.
* Finished the first round of code review.
* formatting.
* In progress.
* Formatting.
* Addressed code review.
* Formatting
* Fix tests.
* fix bugs.
* Skip flaky tests for now.
2021-01-23 23:15:32 -08:00
Barak Michener
e675e5b75a
[ray_client]: Add more retry logic ( #13478 )
2021-01-23 23:11:39 -08:00
Ameer Haj Ali
b7dd7ddb52
deprecate useless fields in the cluster yaml. ( #13637 )
...
* prepare for head node
* move command runner interface outside _private
* remove space
* Eric
* flake
* min_workers in multi node type
* fixing edge cases
* eric not idle
* fix target_workers to consider min_workers of node types
* idle timeout
* minor
* minor fix
* test
* lint
* eric v2
* eric 3
* min_workers constraint before bin packing
* Update resource_demand_scheduler.py
* Revert "Update resource_demand_scheduler.py"
This reverts commit 818a63a2c86d8437b3ef21c5035d701c1d1127b5.
* reducing diff
* make get_nodes_to_launch return a dict
* merge
* weird merge fix
* auto fill instance types for AWS
* Alex/Eric
* Update doc/source/cluster/autoscaling.rst
* merge autofill and input from user
* logger.exception
* make the yaml use the default autofill
* docs Eric
* remove test_autoscaler_yaml from windows tests
* lets try changing the test a bit
* return test
* lets see
* edward
* Limit max launch concurrency
* commenting frac TODO
* move to resource demand scheduler
* use STATUS UP TO DATE
* Eric
* make logger of gc freed refs debug instead of info
* add cluster name to docker mount prefix directory
* grrR
* fix tests
* moving docker directory to sdk
* move the import to prevent circular dependency
* smallf fix
* ian
* fix max launch concurrency bug to assume failing nodes as pending and consider only load_metric's connected nodes as running
* small fix
* deflake test_joblib
* lint
* placement groups bypass
* remove space
* Eric
* first ocmmit
* lint
* exmaple
* documentation
* hmm
* file path fix
* fix test
* some format issue in docs
* modified docs
* joblib strikes again on windows
* add ability to not start autoscaler/monitor
* a
* remove worker_default
* Remove default pod type from operator
* Remove worker_default_node_type from rewrite_legacy_yaml_to_availble_node_types
* deprecate useless fields
Co-authored-by: Ameer Haj Ali <ameerhajali@ameers-mbp.lan>
Co-authored-by: Alex Wu <alex@anyscale.io>
Co-authored-by: Alex Wu <itswu.alex@gmail.com>
Co-authored-by: Eric Liang <ekhliang@gmail.com>
Co-authored-by: Ameer Haj Ali <ameerhajali@Ameers-MacBook-Pro.local>
Co-authored-by: root <root@ip-172-31-56-188.us-west-2.compute.internal>
Co-authored-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
2021-01-23 12:06:51 -08:00
Kai Fricke
17760e1510
[tune] update Optuna integration to 2.4.0 API ( #13631 )
...
Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
2021-01-23 00:32:37 -08:00
Qing Wang
8ef835ff03
Remove idle actor from worker pool. ( #13523 )
2021-01-23 13:57:30 +08:00
Amog Kamsetty
01d74af89d
[horovod] Horovod+Ray Pytorch Lightning Accelerator ( #13458 )
2021-01-22 16:30:10 -08:00
Amog Kamsetty
25e1b78eed
[Dependencies] Move requirements.txt to requirements directory. ( #13636 )
2021-01-22 16:29:05 -08:00
architkulkarni
0c3d9a3eaa
[Metrics] Fix serialization for custom metrics ( #13571 )
2021-01-22 14:11:59 -06:00
Amog Kamsetty
c4a710369b
Revert "[dashboard] Fix RAY_RAYLET_PID KeyError on Windows ( #12948 )" ( #13572 )
...
This reverts commit ef6d859e9b
.
2021-01-22 14:10:24 -06:00
Dmitri Gekhtman
7fec19dad2
[kubernetes][operator][minutiae] Backwards compatibility of operator ( #13623 )
2021-01-22 14:07:25 -06:00
Sven Mika
d629292d63
[RLlib] Add grad_clip config option to MARWIL and stabilize grad clipping against inf global_norms. ( #13634 )
2021-01-22 19:36:02 +01:00
architkulkarni
da5928304a
[Metrics] Cache metrics ports in a file at each node ( #13501 )
...
* cache metric ports in a file at each node
* remove old assignment of export port
* lint
* lint
* move e2e test to top of file to avoid shutdown bug
2021-01-22 09:59:20 -08:00
Kai Yang
90f1e408de
[Java] Add fetchLocal
parameter in Ray.wait()
( #13604 )
2021-01-22 17:55:00 +08:00
Amog Kamsetty
00c14ce4a4
[Object Spilling] Skip flaky tests ( #13628 )
...
* skip flaky tests
* lint
* skip one more
* fix
2021-01-22 00:31:33 -08:00
Amog Kamsetty
39755fdb20
Revert "[Serve] Refactor BackendState" ( #13626 )
...
This reverts commit 68038741ac
.
2021-01-21 23:06:15 -08:00
Tao Wang
aa5d7a5e6c
[Dashboard]Don't set node actors when node_id of actor is Nil ( #13573 )
...
* Don't set node actors when node_id of actor is Nil
* add test per comment
2021-01-21 20:18:34 -08:00