Commit graph

7394 commits

Author SHA1 Message Date
Richard Liaw
864956f817
fix-skopt (#14116)
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2021-02-16 14:36:19 +01:00
Eric Liang
e434ffe06c
[tune] Avoid crash in client mode when return results creating logdir (#14115) 2021-02-15 19:25:14 -08:00
Ian Rodney
350fb5b9d1
[autoscaler] Remove Hardcoded 8265 (#14112) 2021-02-15 18:04:00 -08:00
Patrick Ames
da0c2c99a0
[autoscaler] Fix bad reference error when specifying IamInstanceProfile by name in config. (#14083) 2021-02-15 16:29:36 -08:00
Jack Parker-Holder
ebb6e552d2
[tune] PB2 - add small constant (#14118) 2021-02-15 16:04:10 -08:00
Edward Oakes
5e763893ea
[serve] Don't overwrite self.handle in StarletteEndpoint (#14111) 2021-02-15 17:51:54 -06:00
SangBin Cho
4ad79ca963
[Object Spilling] Remove LRU eviction (#13977)
* done.

* formatting.

* done.

* done.
2021-02-15 14:24:53 -08:00
Eric Liang
e457872fe1
Revert "Revert "Unhandled exception handler based on local ref counti… (#14113)
* Revert "Revert "Unhandled exception handler based on local ref counting (#14049)" (#14099)"

This reverts commit b45ae76765.

* reomve test

* fix

* fix
2021-02-15 14:11:11 -08:00
Alex Wu
4846a6c2d0
Release process update (#13798) 2021-02-15 11:40:49 -08:00
architkulkarni
496dd297e5
skip test_basic_reconstruction_actor_task on win (#14110) 2021-02-15 10:17:33 -08:00
architkulkarni
0fb96a61fc
[Serve] Add support for variable routes (#13968) 2021-02-15 11:42:42 -06:00
Richard Liaw
4d727e4cdf
[tune] enable more tests (#13969)
* try-this

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>

* fix

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>

* test

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>

* fix-tests

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>

* address

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>

* fix

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>

* real-ray

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>

* fix-client

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>

* fix-race-condition

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>

* revert-new-tune-tests

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>

* Revert "revert-new-tune-tests"

This reverts commit 3866b920bc47ac4b5cb9dab8f7b9d50e4acdb27a.

* format

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>

* update

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>

* build

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2021-02-15 09:19:55 -08:00
architkulkarni
bcb51a27c6
[Serve] [Doc] Add version warning (#14001) 2021-02-15 11:16:01 -06:00
javi-redondo
b8b2d6410d
[docs] new Ray Cluster documentation (#13839)
Co-authored-by: Javier Redondo <javier@anyscale.com>
Co-authored-by: AmeerHajAli <ameerh@berkeley.edu>
2021-02-15 00:47:14 -08:00
Kathryn Zhou
82539f2da4
Export additional metrics to Prometheus (#14061) 2021-02-14 23:16:26 -08:00
SangBin Cho
b45ae76765
Revert "Unhandled exception handler based on local ref counting (#14049)" (#14099)
This reverts commit 9dc671ae02.
2021-02-14 22:08:32 -08:00
architkulkarni
75568f856c
skip restart and multi restart test on win (#14084) 2021-02-14 15:17:54 -08:00
Alex Wu
5636af8084
[hotfix] Fix mac build (#14075)
* .

* done?

* .

Co-authored-by: Alex Wu <alex@anyscale.com>
2021-02-14 14:26:51 -08:00
Eric Liang
9dc671ae02
Unhandled exception handler based on local ref counting (#14049) 2021-02-12 22:58:38 -08:00
Erik Erlandson
ff1b26274e
[operator] expose RAY_CONFIG_DIR env var (fix #14074) (#14076) 2021-02-12 17:47:00 -08:00
architkulkarni
20f6cc2cb2
skip test_basic_reconstruction_put on win (#14082) 2021-02-12 15:47:00 -08:00
Clark Zinzow
c9a9d422c7
[OBOD] Disable the ownership-based object directory for all tests that use ray.objects(). (#14065) 2021-02-12 12:12:57 -08:00
Clark Zinzow
c7ff69f4bf
[OBOD] Add support for ownership-based object directory object recovery. (#14066) 2021-02-12 11:58:31 -08:00
Sven Mika
936cb5929c
[RLlib] Issue #13646: Rewards still not available in loss/json-output in certain situations when using the traj. view API. (#14036) 2021-02-12 10:07:44 +01:00
Dmitri Gekhtman
6644a0fe50
[autoscaler][kubernetes][docs] Updated Kubernetes Documentation (#14016)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-02-11 23:00:25 -08:00
Alex Wu
02938f3a21
[hotfix] Disable dashboard agent windows (#14062) 2021-02-11 17:54:55 -08:00
Amog Kamsetty
24e020b062
[Doc] Add PTL and RAG to community integrations (#14064) 2021-02-11 15:48:19 -08:00
Amog Kamsetty
a430ac2334
[Tune] Revert Pinning Tune Dependencies (#14059)
* remove lockfiles

* docker

* remove constraint file

* fix
2021-02-11 15:43:09 -08:00
Jeroen Boeye
2af1f0616d
Fix broken link to Flow docs (#14058) 2021-02-11 13:20:34 -08:00
SangBin Cho
cb8523a5e6
Fix the wrong spark on ray link. (#14057) 2021-02-11 12:31:18 -08:00
Clark Zinzow
cd7e567a57
[Core] Ownership-based Object Directory - Added support for object spilling in the ownership-based object directory. (#13948)
* Add support for object spilling in the ownership-based object directory.

* Move owner address hashmap into pinned_objects_ and objects_pending_spill_.

* Update local object manager tests.

* Feedback and misc. fixes.

* Move spilled unpin callback lambda to std::binded private method.

* Skip test_delete_objects_multi_node test on MacOS for now.
2021-02-11 10:36:22 -08:00
Sven Mika
4db86404ad
[RLlib] Issue #13507: Fix MB-MPO CartPole Env's reward function as well as MB-MPO running into a traj. view API related issue. (#14037) 2021-02-11 18:58:46 +01:00
Sven Mika
a2f7998026
[RLlib] Issue #13342: Add validate_spaces to MB-MPO. (#14038) 2021-02-11 11:36:53 +01:00
Ian Rodney
f6cfc44dbd
[autoscaler] run setup commands with restart_only=True (#13836) 2021-02-10 20:17:20 -08:00
Ameer Haj Ali
d87a82e891
Revert "Revert "[Autoscaler] Monitor refactor for backward compatability. (#13970)" (#14046)" (#14050)
* prepare for head node

* move command runner interface outside _private

* remove space

* Eric

* flake

* min_workers in multi node type

* fixing edge cases

* eric not idle

* fix target_workers to consider min_workers of node types

* idle timeout

* minor

* minor fix

* test

* lint

* eric v2

* eric 3

* min_workers constraint before bin packing

* Update resource_demand_scheduler.py

* Revert "Update resource_demand_scheduler.py"

This reverts commit 818a63a2c86d8437b3ef21c5035d701c1d1127b5.

* reducing diff

* make get_nodes_to_launch return a dict

* merge

* weird merge fix

* auto fill instance types for AWS

* Alex/Eric

* Update doc/source/cluster/autoscaling.rst

* merge autofill and input from user

* logger.exception

* make the yaml use the default autofill

* docs Eric

* remove test_autoscaler_yaml from windows tests

* lets try changing the test a bit

* return test

* lets see

* edward

* Limit max launch concurrency

* commenting frac TODO

* move to resource demand scheduler

* use STATUS UP TO DATE

* Eric

* make logger of gc freed refs debug instead of info

* add cluster name to docker mount prefix directory

* grrR

* fix tests

* moving docker directory to sdk

* move the import to prevent circular dependency

* smallf fix

* ian

* fix max launch concurrency bug to assume failing nodes as pending and consider only load_metric's connected nodes as running

* small fix

* Revert "Revert "[Autoscaler] Monitor refactor for backward compatability. (#13970)" (#14046)"

This reverts commit 6f9d39fb3e.

* fake news

Co-authored-by: Ameer Haj Ali <ameerhajali@ameers-mbp.lan>
Co-authored-by: Alex Wu <alex@anyscale.io>
Co-authored-by: Alex Wu <itswu.alex@gmail.com>
Co-authored-by: Eric Liang <ekhliang@gmail.com>
Co-authored-by: Ameer Haj Ali <ameerhajali@Ameers-MacBook-Pro.local>
2021-02-10 17:59:08 -08:00
Clark Zinzow
c5574a33e4
[dask-on-ray] Add better Dask-on-Ray example, and detail custom shuffle optimization. (#13950)
* Add better Dask-on-Ray example, and detail custom shuffle optimization.

* Misc. updates and feedback.

* Update doc/source/dask-on-ray.rst

Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>

* Set max_branch to infinity in shuffle optimization example.

* Feedback

* Apply suggestions from code review

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* 80 col width

Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-02-10 14:24:09 -08:00
Crissman Loomis
05ab75fbe1
[docs] Add mode to Ray Tune quick start (#14023) 2021-02-10 12:41:45 -08:00
Thomas J. Fan
75fbd48edd
[doc] Minor fix to indentation (#14040) 2021-02-10 12:31:47 -08:00
Stephanie Wang
fc89984162
Subtract from num bytes in use (#13944) 2021-02-10 12:22:08 -08:00
architkulkarni
6f9d39fb3e
Revert "[Autoscaler] Monitor refactor for backward compatability. (#13970)" (#14046)
This reverts commit 7a6f8054d1.
2021-02-10 12:16:52 -08:00
Alex Wu
68e985ddcd
[hotfix][docs] RayDP tensorflow != pytorch (#14044) 2021-02-10 11:23:02 -08:00
Kai Fricke
1ef2a6790c
[tune] add scalability release tests (#13986)
* Add scalability tests

* Network overhead cluster

* Update xgboost tests

* Document release tests

* Don't raise on failed trial

* Update to multi node yamls

* Update yamls

* Revert xgboost test changes

* Fix import

* Update release/tune_tests/scalability_tests/workloads/test_bookkeeping_overhead.py

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* Pass aws credentials (WIP)

* Update durable trainable example

* Update xgboost sweep

* Change xgboost scope, fix durable trainable stop condition

* Fix max depth to limit total test length

* Add cluster information to test descriptions. Update release checklist/process docs

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-02-10 17:16:31 +01:00
Sven Mika
81e7434091
[RLlib] TFPolicy.export_model: Add timestep placeholder to model's signature, if needed. (#13988) 2021-02-10 15:21:46 +01:00
Sven Mika
37c7daa3c0
[RLlib] DDPG: Support simplex action space. (#14011) 2021-02-10 15:10:01 +01:00
fangfengbin
1754359281
[Core]Fix ray.kill doesn't cancel pending actor bug (#14025) 2021-02-10 15:30:21 +08:00
Alex Wu
ce80ef5aee
[Docs] RayDP Documentation (#14018)
* .

* done?

* Docs

* Docs

* Update raydp.rst

* Update raydp.rst

Co-authored-by: Alex Wu <alex@anyscale.com>
2021-02-09 23:05:18 -08:00
Dmitri Gekhtman
8ca0a32819
HotFix k8s autoscaling (#14024) 2021-02-09 22:34:24 -08:00
Eric Liang
8b7cf7cab9
Add tip on how to disable Ray OOM handler (#14017) 2021-02-09 21:52:22 -08:00
Ameer Haj Ali
7a6f8054d1
[Autoscaler] Monitor refactor for backward compatability. (#13970) 2021-02-09 21:41:50 -08:00
Eric Liang
7f342eb371
Update example shuffle script (#14021) 2021-02-09 20:47:41 -08:00