Eric Liang
7f342eb371
Update example shuffle script ( #14021 )
2021-02-09 20:47:41 -08:00
Clark Zinzow
79c7c181f3
[dask-on-ray] Add multiple return DataFrame shuffle optimization. ( #13951 )
2021-02-09 15:39:48 -08:00
Kai Yang
e0b81796c5
Revert "Revert "[Java] fix test hang occasionally when running FailureTest ( #13934 )" ( #13992 )" ( #14008 )
2021-02-09 12:43:26 -08:00
Simon Mo
f51c26bae6
Revert "[Core]Fix ray.kill doesn't cancel pending actor bug ( #13254 )" ( #14013 )
...
This reverts commit 2092b097ea
.
2021-02-09 11:36:38 -08:00
Alex Wu
1dcdfe9101
[autoscaler/dashboard] Publish resource usage in units of bytes ( #14002 )
2021-02-09 10:27:26 -08:00
Crissman Loomis
43083b9653
[docs] optuna variable typo ( #14006 )
...
* fix variable name typo
* align
2021-02-09 09:51:29 -08:00
Kai Fricke
3c8b164882
[tune] pass trainable function name when using tune.with_parameters
( #14009 )
2021-02-09 08:51:14 -08:00
Sven Mika
d7301a51f4
[RLlib]: Trajectory View API: Keep env infos (e.g. for postprocessing callbacks), no matter what. ( #13555 )
2021-02-09 17:05:26 +01:00
fangfengbin
2092b097ea
[Core]Fix ray.kill doesn't cancel pending actor bug ( #13254 )
2021-02-09 10:59:14 +08:00
Simon Mo
914696ac3f
Skip placement tests on Windows ( #14000 )
2021-02-08 18:27:11 -08:00
Dmitri Gekhtman
081f3e5f07
[autoscaler][kubernetes] Ray client setup, example config simplification, example scripts. ( #13920 )
2021-02-08 20:00:34 -06:00
Ameer Haj Ali
1643bc5c4f
Fix autoscaler wrong parameter names ( #13966 )
...
* prepare for head node
* move command runner interface outside _private
* remove space
* Eric
* flake
* min_workers in multi node type
* fixing edge cases
* eric not idle
* fix target_workers to consider min_workers of node types
* idle timeout
* minor
* minor fix
* test
* lint
* eric v2
* eric 3
* min_workers constraint before bin packing
* Update resource_demand_scheduler.py
* Revert "Update resource_demand_scheduler.py"
This reverts commit 818a63a2c86d8437b3ef21c5035d701c1d1127b5.
* reducing diff
* make get_nodes_to_launch return a dict
* merge
* weird merge fix
* auto fill instance types for AWS
* Alex/Eric
* Update doc/source/cluster/autoscaling.rst
* merge autofill and input from user
* logger.exception
* make the yaml use the default autofill
* docs Eric
* remove test_autoscaler_yaml from windows tests
* lets try changing the test a bit
* return test
* lets see
* edward
* Limit max launch concurrency
* commenting frac TODO
* move to resource demand scheduler
* use STATUS UP TO DATE
* Eric
* make logger of gc freed refs debug instead of info
* add cluster name to docker mount prefix directory
* grrR
* fix tests
* moving docker directory to sdk
* move the import to prevent circular dependency
* smallf fix
* ian
* fix max launch concurrency bug to assume failing nodes as pending and consider only load_metric's connected nodes as running
* small fix
* improve code readability
* lint
Co-authored-by: Ameer Haj Ali <ameerhajali@ameers-mbp.lan>
Co-authored-by: Alex Wu <alex@anyscale.io>
Co-authored-by: Alex Wu <itswu.alex@gmail.com>
Co-authored-by: Eric Liang <ekhliang@gmail.com>
Co-authored-by: Ameer Haj Ali <ameerhajali@Ameers-MacBook-Pro.local>
2021-02-08 13:19:33 -08:00
SongGuyang
09242e6d31
random a job id in c++ worker ( #13982 )
2021-02-08 12:57:25 -08:00
Simon Mo
ec94214957
Revert "[Java] fix test hang occasionally when running FailureTest ( #13934 )" ( #13992 )
...
This reverts commit bcf9457abb
.
2021-02-08 11:30:30 -08:00
SangBin Cho
0e07b5fa89
[Doc] Update actor resource information ( #13909 )
...
* in progress.
* Revert "in progress."
This reverts commit 21a91a47522797210bdc5db9477bd0b02ed9d926.
* done.
* done.
2021-02-08 10:23:57 -08:00
Sven Mika
eb0038612f
[RLlib] Extend on_learn_on_batch callback to allow for custom metrics to be added. ( #13584 )
2021-02-08 15:02:19 +01:00
Chace Ashcraft
ebeee1d59a
[RLlib] Pytorch MAML fix for more than two workers with discrete actions ( #13835 )
2021-02-08 12:06:02 +01:00
Sven Mika
d001af3e59
[RLlib] Allow rllib rollout
to run distributed via evaluation workers. ( #13718 )
2021-02-08 12:05:16 +01:00
Kai Yang
bcf9457abb
[Java] fix test hang occasionally when running FailureTest ( #13934 )
2021-02-08 18:21:50 +08:00
Xianyang Liu
918ad84f08
[core] Java worker should respect the user provided node_ip_address ( #13732 )
2021-02-08 11:59:06 +08:00
Richard Liaw
7231b6b91c
[core/client] enable more tests ( #13961 )
2021-02-07 19:37:52 -08:00
Richard Liaw
3a230fa1a4
[ray_client] close ray connection upon client deactivation ( #13919 )
2021-02-07 13:11:38 -08:00
Kai Yang
4b4941435d
[Java] fix actor restart failure when multi-worker is turned on ( #13793 )
2021-02-07 21:12:54 +08:00
Devin Petersohn
1412f3c546
[docs] page for using Modin with Ray ( #13937 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-02-06 00:28:04 -08:00
Clark Zinzow
f070b3c9a9
[dask-on-ray] Fix Dask-on-Ray test: Python 3 dictionary .values() is a view, and is not indexable ( #13945 )
2021-02-05 21:21:41 -08:00
Simon Mo
ea4154df80
[Hotfix] Master compilation error on MacOS. ( #13946 )
2021-02-05 16:07:45 -08:00
Travis Addair
cbd3598970
[tune] Fixed wait_for_gpu to handle str representations of ordinal IDs ( #13936 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-02-05 15:41:24 -08:00
Hao Chen
e1a5e5bad4
Fix test_actor_restart ( #13901 )
2021-02-05 14:08:43 -08:00
Simon Mo
4a3dd6858d
Buildkite determine-to-run support ( #13866 )
2021-02-05 12:58:07 -08:00
Amog Kamsetty
f44f368eae
[Tune] Add try-except to FailureInjectorCallback ( #13939 )
2021-02-05 11:02:42 -08:00
Eric Liang
f782ed59a0
Ray client version check strict eq ( #13926 )
2021-02-05 00:06:10 -08:00
fyrestone
eee624cf5f
Revert "Fix passing env on windows ( #13253 )" ( #13828 )
2021-02-05 13:03:16 +08:00
fangfengbin
8a5999c12a
[GCS]Fix bug that gcs client does not set last_resource_usage_ ( #13856 )
2021-02-05 11:51:25 +08:00
DK.Pino
fb89f9c2c8
[Placement Group] Support named placement group ( #13755 )
2021-02-05 11:04:51 +08:00
Dmitri Gekhtman
40bad86c7a
[hotfix][test][windows] Exclude k8s operator mock test from build. ( #13924 )
2021-02-04 18:35:10 -08:00
Kathryn Zhou
982c606b86
Add more user-friendly error message upon async def
remote task ( #13915 )
2021-02-04 18:33:33 -08:00
architkulkarni
e89bbcbd44
[Serve] Revert "Revert "[Serve] Fix ServeHandle serialization"" and disable failing Windows test ( #13771 )
2021-02-04 14:50:01 -08:00
Edward Oakes
7af0c999f3
[serve] Built-in support for imported backends ( #13867 )
2021-02-04 15:09:12 -06:00
Dmitri Gekhtman
db59736b1a
[autoscaler][kubernetes] Add ability to not copy cluster config to head node when calling create_or_update_head_node
. ( #13720 )
...
* Add option to skip bootstrapping head node autoscaling config
* don't close remote config before copying
* Type
* Type hints etc.
* test
* Test CR to config conversion
* comment
2021-02-04 10:30:03 -08:00
Kai Fricke
1e113d2e6e
[tune/xgboost] Update release test docs ( #13880 )
...
* Update release test docs
* Update
2021-02-04 13:10:56 +01:00
Richard Liaw
6c77aeb98a
[docs] ray slack remove banners ( #13898 )
...
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2021-02-04 01:14:34 -08:00
Richard Liaw
0fc81e2393
[tune] fix gpu check ( #13825 )
...
Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
2021-02-04 01:13:58 -08:00
Eric Liang
e79a380a7e
Check in shuffle code as experimental ( #13899 )
2021-02-04 00:24:16 -08:00
Clark Zinzow
243f678ffd
Fall back to random port instead of default port for non-primary Redis shards; attempt to cluster Redis shard ports close to each other. ( #13847 )
2021-02-03 22:00:15 -08:00
Alex Wu
a13208f113
Scalability envelope readme typo ( #13874 )
2021-02-03 21:43:45 -08:00
Tao Wang
44aa9c173f
Rename timeout to period with heartbeat interval ( #13872 )
2021-02-04 10:37:28 +08:00
Tao Wang
e0d9c8f0a8
Always replace DEL with UNLINK ( #13832 )
2021-02-04 10:30:00 +08:00
Dmitri Gekhtman
1187d1dd3e
[autoscaler][kubernetes][operator] Rudimentary error handling, make "MODIFIED" -> update event work. ( #13756 )
2021-02-03 20:07:11 -06:00
Eric Liang
e8fce9f1f3
Check Ray client protocol version ( #13886 )
...
* wip
* wip
* fix tests
2021-02-03 16:44:09 -08:00
Clark Zinzow
407302f93a
[Core] Ownership-based Object Directory - Changed infinite short-poll location subscription to long-poll. ( #13841 )
2021-02-03 14:16:42 -08:00