Commit graph

3623 commits

Author SHA1 Message Date
Amog Kamsetty
39755fdb20
Revert "[Serve] Refactor BackendState" (#13626)
This reverts commit 68038741ac.
2021-01-21 23:06:15 -08:00
Ameer Haj Ali
1fbb752f42
[autoscaler] remove worker_default_node_type that is useless. (#13588) 2021-01-21 17:04:38 -08:00
Nikita Vemuri
4e01a9ec38
[Autoscaler] Ensure ubuntu is owner of docker host mount folder (#13579)
* change ownership to ubuntu if root

* use ssh user in cluster config

* formatting

Co-authored-by: Nikita Vemuri <nikitavemuri@Nikitas-MacBook-Pro.local>
2021-01-21 17:01:55 -08:00
Stephanie Wang
0998d69968
[core] Admission control for pulling objects to the local node (#13514)
* Admission control, TODO: tests, object size

* Unit tests for admission control and some bug fixes

* Add object size to object table, only activate pull if object size is known

* Some fixes, reset timer on eviction

* doc

* update

* Trigger OOM from the pull manager

* don't spam

* doc

* Update src/ray/object_manager/pull_manager.cc

Co-authored-by: Eric Liang <ekhliang@gmail.com>

* Remove useless tests

* Fix test

* osx build

* Skip broken test

* tests

* Skip failing tests

Co-authored-by: Eric Liang <ekhliang@gmail.com>
2021-01-21 16:46:42 -08:00
Amog Kamsetty
ccc901f662
add 3.8 (#13608) 2021-01-21 16:38:51 -08:00
Amog Kamsetty
20acc3b05e
Revert "Inline small objects in GetObjectStatus response. (#13309)" (#13615)
This reverts commit a82fa80f7b.
2021-01-21 16:10:34 -08:00
Dmitri Gekhtman
87ca102c93
[Kubernetes] Unit test for cluster launch and teardown using K8s Operator (#13437) 2021-01-21 12:00:37 -06:00
Ian Rodney
68038741ac
[serve] Refactor BackendState to use ReplicaState classes (#13406) 2021-01-21 11:16:02 -06:00
Clark Zinzow
a82fa80f7b
Inline small objects in GetObjectStatus response. (#13309) 2021-01-21 09:15:18 -08:00
Alex Wu
b9ac3878ae
[Autoscaler] Display node status tag in autsocaler status (#13561)
* .

* .

* .

* .

* .

* lint

Co-authored-by: Alex Wu <alex@anyscale.com>
2021-01-20 19:20:54 -08:00
Edward Oakes
b796de4104
[metrics] Check that all tag_keys are set when recording (#13420) 2021-01-20 13:09:44 -06:00
dmatch01
fd6882176a
Fix for operator role definition to add raycluster/finalizer (#13567) 2021-01-20 13:02:02 -06:00
Eric Liang
e6412efdf5
Extra fix ray client newline (#13577) 2021-01-20 09:23:14 -08:00
Kai Fricke
6c23bef2a7
[tune] Allow actor reuse for new trials (#13549)
* Allow actor reuse for new trials

* Fix tests and update conf when starting new trial

* Move magic config to `reset_trial`
2021-01-20 11:25:33 +01:00
Daan Klijn
800304acfb
[tune] wandb - WandbLogger now also accepts wandb.data_types.Video (#13169) 2021-01-20 01:19:54 -08:00
Eric Liang
d0f224d5cf
Revert "Pipe monitor.err logs to driver" (#13574)
This reverts commit a0d08c2cc6.
2021-01-20 00:29:19 -08:00
Eric Liang
a0d08c2cc6
Pipe monitor.err logs to driver 2021-01-19 12:27:07 -08:00
Simon Mo
c963cbc038
Fix Docker Permission for Serve release test again (#13543) 2021-01-19 12:23:30 -08:00
Dmitri Gekhtman
7b4a97c610
Make AWSNodeProvider.create_node return nodes created (#13498)
* Make AWSNodeProvider.create_node return node config

* return-dict

* Node provider interface create node return type Any

* Type clarification.

* Delete debug code

* Oops reset example-full changes

* Return type specified. GCP create node returns None.

* Article
2021-01-19 12:17:46 -08:00
Amog Kamsetty
20016c983f
[Tune] MLflow Credentials (#13533) 2021-01-19 11:55:13 -08:00
Edward Oakes
9b071eb449
[metrics] Better validation for tags (#13421) 2021-01-19 13:26:51 -06:00
SangBin Cho
99375c4cfc
[Object Spilling] Remove retries and use a timer instead. (#13175) 2021-01-19 11:01:45 -08:00
Sven Mika
e74947cc94
[RLlib] Env directory cleanup and tests. (#13082) 2021-01-19 10:09:39 +01:00
Todd A. Anderson
2506a6cd0e
Remove PYTHON_MODE that is not defined in Ray so that import * will work from other packages. (#13544) 2021-01-18 23:07:01 -08:00
Richard Liaw
7a2997ea8c
[tune] support experiment checkpointing for grid search (#13357) 2021-01-18 19:24:36 -08:00
Ameer Haj Ali
1fbc3ddfac
Add ability to not start Monitor when calling ray start (#13505) 2021-01-18 18:31:53 -08:00
Simon Mo
6341f1fa2e
[Serve] Allow ObjectRef for Composition (#12592) 2021-01-18 15:26:35 -08:00
Kai Fricke
dc42abb2f5
[tune] placement group support (#13370) 2021-01-18 11:58:57 -08:00
Eric Liang
8c8af2616e
Minimal version of piping autoscaler events to driver logs (#13434) 2021-01-16 10:06:20 -08:00
Dmitri Gekhtman
7e54911093
move message to debug (#13472) 2021-01-16 10:04:41 -08:00
Amog Kamsetty
1d3941e41a
[Tests] Skip failing windows tests (#13495)
* skip failing windows tests

* skip more

* remove

* updates
2021-01-15 20:51:33 -08:00
Eric Liang
ee6332dbb0
Bump dev branch to 2.0 to avoid endless version bump toil (#13497)
* wip

* fix

* fix
2021-01-15 17:41:17 -08:00
Barak Michener
68e3a0e0e1
[ray_client]: fix wrong reference in server_pickler (#13474)
Change-Id: Ie3d219541b1875e986e72e3ae73ece145c715acf
2021-01-15 15:49:38 -08:00
Eric Liang
4aeb0ea550
Return version info from Ray client connect, to allow for discovering version mismatches 2021-01-15 14:27:26 -08:00
Ian Rodney
0ec9ddabc1
[docker/dashboard] Fix ray dashboard (#12899) 2021-01-15 10:03:01 -08:00
Barak Michener
84e110a949
[ray_client]: Support runtime_context as metadata (#13428) 2021-01-14 14:37:00 -08:00
Clark Zinzow
9a658b568f
[Core] Ownership-based Object Directory: Consolidate location table and reference table. (#13220)
* Added owned object reference before Plasma put on Create() + Seal() path.

* Consolidated location table and reference table in reference counter.

* Restore type in definition.

* Clean up owned reference on failed Seal().

* Added RemoveOwnedObject test for reference counter.

* Guard against ref going out of scope before location RPCs.

* Add 'owner must have ref in scope' precondition to documentation for object location methods.

* Move to separate Create() + Seal() methods for existing objects.

* Clearer distinction between Create() and Seal() methods.

* Make it clear that references will normally be cleaned up by reference counting.
2021-01-14 13:48:10 -08:00
Siyuan (Ryans) Zhuang
d1e9887be2
[Serialization] New custom serialization API (#13291)
* new serialization API with doc & test

* add more notes

* refine notes

* doc
2021-01-14 13:15:31 -08:00
Amog Kamsetty
07e97fe4c2
[xgb] re-enable xgboost_ray tests (#13416)
* re-enable

* fix

* update xgb_ray version
2021-01-14 22:14:44 +01:00
Edward Oakes
7ba87b8abe
Fix getting runtime context dict in driver (#13417) 2021-01-14 14:41:53 -06:00
Ian Rodney
411e37ce3f
[serve] Properly obey SERVE_LOG_DEBUG=0 (#13460) 2021-01-14 12:24:22 -08:00
Micah Yong
c89ebdd94a
[Core][CLI] ray status and ray memory no longer starts a new job (#13391)
* Access memory info in ray memory via GlobalStateAccessor rather than calling ray.init()

* Modify ray status cli so that it doesn't start a new job via ray.init()

* Remove local test file

* Access memory info in ray memory via GlobalStateAccessor rather than calling ray.init()

* Modify ray status cli so that it doesn't start a new job via ray.init()

* Remove local test file

* Make status and error args required in commands.py#debug.status

* Remove unnecessary imports

* Access memory info in ray memory via GlobalStateAccessor rather than calling ray.init()

* Modify ray status cli so that it doesn't start a new job via ray.init()

* Remove local test file

* Access memory info in ray memory via GlobalStateAccessor rather than calling ray.init()

* Modify ray status cli so that it doesn't start a new job via ray.init()

* Remove local test file

* Make status and error args required in commands.py#debug.status

* Remove unnecessary imports

* Job 38482.1 should now pass

* Resolve merge conflict
2021-01-14 10:12:16 -08:00
Dmitri Gekhtman
2d772a5a6d
[kubernetes][minor] Operator garbage collection fix (#13392) 2021-01-14 10:40:15 -06:00
Barak Michener
9c6d892eec
[ray_client]: fix exceptions raised while executing on the server on behalf of the client (#13424) 2021-01-14 10:38:01 -06:00
Ameer Haj Ali
2f7ba25efb
[joblib] joblib strikes again but this time on windows (#13212) 2021-01-14 10:36:52 -06:00
fangfengbin
4a6c53da46
[Core]Fix raylet scheduling bug (#13452)
* [Core]Fix raylet scheduling bug

* fix lint error

* fix lint error

Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2021-01-14 14:50:32 +01:00
Amog Kamsetty
560299972c
Revert "Enable Ray client server by default (#13350)" (#13429)
This reverts commit 912d0cbbf9.
2021-01-13 21:28:54 -08:00
dHannasch
ad015cb7df
Split out the part of get_node_ip_address for which the docstring is correct (#12796) 2021-01-14 11:32:56 +08:00
Amog Kamsetty
3f42e6bafe
[Tune] Pin Transitive Dependencies (#13358) 2021-01-13 19:10:21 -08:00
Eric Liang
602c103eae
Make request_resources() use internal kv instead of redis pub sub (#13410) 2021-01-13 17:30:43 -08:00