Commit graph

4818 commits

Author SHA1 Message Date
Jiao
e38db5875b
Add serve external kv store (#17622) 2021-08-11 12:06:14 -07:00
Amog Kamsetty
ed24bae644
[SGD] Fail if num_workers is not greater than 0 (#17723) 2021-08-11 10:05:19 -07:00
Ian Rodney
97f7ae5e06
[Cluster Launcher] Allow attach/exec on uninitialized head node (#17688) 2021-08-11 09:43:23 -07:00
chenk008
f0fc26960d
[sgd] Wait for placement_group deletion when shutdown worker_group (#17698)
* fix

* fix ut

* delete sleep

* fix according to comment

* fix according to comment

* use pg in test_resize

* fix
2021-08-11 08:47:49 -07:00
J K Terry
48e32555c8
[rllib] Update PettingZoo dependency versions (#17702)
* update pettingzoo dependency versions

* pettingzoo verison

* fix tests
2021-08-11 01:19:19 -07:00
Shantanu
abc593561c
[client] fix ClientRemoteMethod error message (#17726)
Co-authored-by: hauntsaninja <>
2021-08-11 00:43:17 -07:00
Yi Cheng
bd4db53df2
[Observability] Export useful metrics (#17578)
* up

* up

* up

* up

* up

* up

* up

* up

* up

* up

* up

* up

* up

* checkpoint

* up

* up

* up

* up

* fix

* up

* up

* up

* up

* up

* up

* up

* up

* up

* up

* add comments

* up

* up

* up

* up

* add tests
2021-08-10 17:14:42 -07:00
SongGuyang
63c15d7ced
[core] make 'PopWorker' to be an async function (#17202)
* make 'PopWorker' to be an async function

* pop worker async works

* fix

* address comments

* bugfix

* fix cluster_task_manager_test

* fix

* bugfix of detached actor

* address comments

* fix

* address comments

* fix aioredis

* Revert "fix aioredis"

This reverts commit 041b983eac95b105ab0e853e84c4cf2647008431.

* bug fix

* fix

* fix test_step_resources test

* format

* add unit test

* fix

* add test case PopWorkerStatus

* address commit

* fix lint

* address comments

* add python test

* address comments

* make an independent function

* Update test_basic_3.py

Co-authored-by: Hao Chen <chenh1024@gmail.com>
2021-08-10 17:03:17 -07:00
xwjiang2010
932f038644
[tune] Type hint TrialExecutor. Use Abstract Base Class. (#17584) 2021-08-10 14:17:22 -07:00
Clark Zinzow
78d23434e6
[Datasets] Fix write_json so roundtrip writing + reading works. (#17691)
* Write out dataset blocks as newline-delimited JSON.

* Add roundtrip JSON reading + writing test.

* Formatting.
2021-08-10 13:24:33 -07:00
SangBin Cho
705a7192b3
Unflake multi node 3 (#17694) 2021-08-10 13:16:52 -07:00
SangBin Cho
6160c06c69
[Core] Fix a bug where get_actor crashes gcs if the actor is already killed. (#17670)
* Fix a bug where get_actor crashes gcs if the actor is already killed.

* Test the restart code path.

* Add an additional test

* Add a comment

* addressed code review.
2021-08-10 09:58:09 -07:00
yuduber
446ee1ad24
[autoscaler] Support Peloton node provider (#17312)
* modify updater to make it work with uber peloton node provider

* working solution for using NodeID as unique ID in peloton node provideer but need run ray.init

* working solution of using resource cmd to pass in node_id

* cleanup

* cleanup 2

* removed updater.py change to make sure of the disable_node_updaters flag

* add newliine to end of updater.py to undo all the change

* undo change in autoscaler.py

* use use_node_id_as_ip as field name in monitor

* lint

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>

* fix-for-monitor-without-autoscaler

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
2021-08-10 12:48:11 -04:00
Antoni Baum
13f39b2cb7
[SGD] v2 JSON logger callback & callback groundwork (#17619)
* finish session

* finish

* formatting

* tests

* wip

* remove pdb

* remove import

* add tests

* raise from None

* Address comments

* Exception

* remove from None

* fix test

* address comments

* SGDv2 JSON Logging Callback

* Revert testing change

* Prefix autofilled metrics

* Move env var check to local

* Fix exception

* Improve docstrings, default filename

* Add unit test

* Implement feedback

* SGDLoggingCallback to SGDSingleFileLoggingCallback

* Use env_integer

Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
2021-08-09 21:18:46 -07:00
Yi Cheng
473740b739
[gcs] Fix actor killing hang due to race condition (#17634)
* Revert "Revert "[gcs] Fix actor killing race condition (#17456)" (#17599)"

This reverts commit 381ffdb6d0.

* update

* format

* up
2021-08-09 21:11:26 -07:00
SangBin Cho
d05571af2d
Fix a progress bar issue and add it to the nightly (#17627)
* Fix a progress bar issue and add it to the nightly

* Trial

* in progress

* Fix issues.
2021-08-09 19:31:47 -07:00
Dmitri Gekhtman
c1b9f921a6
[autoscaler] Add option of returning node metadata from non_terminated_nodes. (#17273)
* Optional return from nonterminated

* format

* terminated node signature

* add return
2021-08-09 20:23:31 -04:00
Ian Rodney
6475fe1b82
[Autoscaler][Docker] Warn if a file is passed in Docker File Mounts (#16515)
Co-authored-by: Ian Rodney <ilr@anyscale.com>
2021-08-09 15:13:58 -07:00
Richard Liaw
bde14f2de6
[tune] add developer/stability annotations (#17442)
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2021-08-09 14:50:59 -07:00
Dmitri Gekhtman
07a42a5bdb
resolve (#17495) 2021-08-09 17:33:16 -04:00
Siyuan (Ryans) Zhuang
68e884ee43
[workflow] Test fault tolerance with storage (#17641)
* new test

* update storage

* enhance test

* fix s3
2021-08-09 11:19:14 -07:00
wanxing
8312628c30
Remove unused Spill function (#17607) 2021-08-09 10:10:03 -07:00
Simon Mo
7a0b8982f3
[serve] Return Client on serve.start() when connecting (#17552) 2021-08-09 10:55:05 -05:00
architkulkarni
bbcb06d45b
[doc] [runtime_env] Remove "experimental" label, add beta stability annotation (#17651) 2021-08-09 10:54:28 -05:00
Tao Wang
5990b60f8b
[Core]Cache named actor in local in case of getting them from GCS frequently. (#17339)
* [Core]Cach named actor in local in case of getting them from GCS frequently

* lint

* fix nullptr

* typo

* add namespace to cache

* lint

* lock, reference and others

* lint

* fix comments and add test

* lint

* lint

* optimize test

* add necessary fields in pub for caching

* add removing test

* fix test
2021-08-09 14:01:57 +08:00
SangBin Cho
1bcab9a7bb
[Object Spilling] Better error message for nightly test debugging (#17645)
* Fix

* Addressed code review.

* Addressed code review.
2021-08-08 20:44:49 -07:00
Hao Chen
0858f0e4f2
Change core worker C++ namespace to ray::core (#17610) 2021-08-08 23:34:25 +08:00
Simon Mo
c315596ed2
[Buildkite] Migrate macOS wheel builds (#16913) 2021-08-07 21:54:34 -07:00
Qing Wang
4cc34588db
[Core] Support ConcurrentGroup part1 (#16795)
* Core change and Java change.

* Fix void call.

* Address comments and fix cases.

* Fix asyncio
2021-08-07 22:41:33 +08:00
architkulkarni
f4c70be7f7
[Serve] Add replica tag to request counter and error counter (#17613) 2021-08-06 15:35:34 -07:00
architkulkarni
6d975b821b
[Serve] [Dashboard] Initial PR for exporting Serve data to cluster snapshot (#17489) 2021-08-06 15:03:29 -07:00
Edward Oakes
57b190c987
[serve] Remove logic to automatically infer conda env name (#17639) 2021-08-06 13:27:23 -05:00
Amog Kamsetty
f0cca063ad
[SGD v2] Reduce time for HF smoke test (#17623)
* reduce

* switch back model

* Update python/ray/util/sgd/v2/BUILD
2021-08-05 21:04:34 -07:00
Stephanie Wang
a06d71477f
[core] Do not spill back tasks blocked on args to blocked nodes (#17550)
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2021-08-05 20:43:32 -07:00
Amog Kamsetty
add6ceb3ec
[Dependencies] Fix missing dependency UX (#17420)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-08-05 20:18:42 -07:00
Amog Kamsetty
14b02c3341
Add ray.data symlink to setup-dev.py (#17624) 2021-08-05 19:51:15 -07:00
Chen Shen
0fd3f761b9
[ci][rfc] build debug wheels and run python test on debug build (#17399)
* enable debug mode

* add

* :upload debug wheels

* upload debug wheels

* add

* fix bug

* add dbg

* Update python/setup.py

Co-authored-by: Simon Mo <simon.mo@hey.com>

* skip windows

Co-authored-by: Simon Mo <simon.mo@hey.com>
2021-08-05 17:58:19 -07:00
SangBin Cho
8bc9286296
Remove an unused profile event code from object manager. (#17529)
* Remove an unused profile event code from object manager.

* Addressed code review.

* Temporarily skip a test

* lint
2021-08-05 17:13:16 -07:00
SangBin Cho
d59d6ad653
[RFC][Usability] Improve general Ray stacktrace including adding Actor repr (#17389)
* 1. Added a label to the stack trace. 2. Remove ray code from user stacktrace. Improve stacktrace message.

* Add a test to the build

* Fix the issue

* Addressed code review.

* Addressed code review and debugging

* fix

* Try fixing tests.

* Fixed the issue.

* Fixed a bug for real. Tests need to be re-written

* Try one test.

* Formatting

* Addressed code review.

* Addressed the last code review.
2021-08-05 17:12:24 -07:00
SangBin Cho
99b26b476d
Fix flaky windows reconstruction test (#17564) 2021-08-05 17:10:54 -07:00
Amog Kamsetty
e4cf26ea6e
[SGD] v2 Prototype sgd.report() implementation (#17536)
* finish session

* finish

* formatting

* tests

* wip

* remove pdb

* remove import

* add tests

* raise from None

* Address comments

* Exception

* remove from None

* fix test

* address comments

* Update python/ray/util/sgd/v2/constants.py

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>

* add tests for session

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
2021-08-05 16:03:21 -07:00
SangBin Cho
381ffdb6d0
Revert "[gcs] Fix actor killing race condition (#17456)" (#17599)
This reverts commit 521457b51b.
2021-08-05 15:54:03 -07:00
Edward Oakes
839ceba6db
[serve] Replace "backend" with "deployment" in metrics & logging (#17434) 2021-08-05 17:37:21 -05:00
architkulkarni
e84ae6caa5
[Core] [runtime env] Avoid spurious worker startup (#17422) 2021-08-05 15:46:23 -05:00
SangBin Cho
667851f0ad
Prototype done. (#17603) 2021-08-05 13:32:44 -07:00
Eric Liang
8ff3fce4ba
Add a warning if the number of queued tasks to an actor exceeds 5k (#17581) 2021-08-05 12:03:48 -07:00
Amog Kamsetty
be238e159d
[Tune] Update docs for with_parameters (#17441)
* with_parameters_doc

* update docstring

* address comments
2021-08-05 08:48:34 -07:00
architkulkarni
3ae5229b44
[core] Skip adding "script directory" to workers' sys.path when in interactive shell (#17556) 2021-08-05 10:05:19 -05:00
Siyuan (Ryans) Zhuang
ffe5b45cc1
[workflow] Enable test (#17585) 2021-08-04 21:18:50 -07:00
matthewdeng
1eca6ac154
[SGD] v2 alpha: Tensorflow Backend (#17532)
* [SGD] Implement Tensorflow Backend

* addres comments

* address comments

* format
2021-08-04 16:49:50 -07:00