1
0
Fork 0
mirror of https://github.com/vale981/ray synced 2025-04-06 10:19:09 -04:00
Commit graph

13372 commits

Author SHA1 Message Date
SangBin Cho
0684531e22
[Test] Break down placement group tests () 2021-09-14 21:55:18 -07:00
SangBin Cho
b8c361d3fb
[Test] Mark app config failure as a infra failure () 2021-09-14 17:20:05 -07:00
Eric Liang
d1f348cd9d
[RFC] Split the list of libraries into ML vs production 2021-09-14 16:32:07 -07:00
Chris K. W
cc1d7b8174
[client] Refactors for Reconnect PR ()
* add refactors

* add worker annotation

* Regenerate credentials by default

* use self._secure

* infer secure if credentials provided

* separate _shutdown
2021-09-14 16:13:35 -07:00
Eric Liang
15512c27c2
Revert "Revert "Route core worker ERROR/FATAL logs to driver logs (#1… () 2021-09-14 13:32:07 -07:00
SangBin Cho
31e1638fb3
[CLI] Improve ray status for placement groups () 2021-09-14 11:29:13 -07:00
Stephanie Wang
344f2d9073
[core] Fix race condition in distributed ref counting () 2021-09-14 11:02:59 -07:00
Kai Fricke
c8188ea70e
[ci/rllib] wait for stress test cluster () 2021-09-14 19:01:22 +01:00
Kai Fricke
6777e24293
[ci] Add release test owner overview file () 2021-09-14 11:00:31 -07:00
Sven Mika
08c09737fa
[RLlib] Fix R2D2 (torch) multi-GPU issue. () 2021-09-14 19:58:10 +02:00
SangBin Cho
51d94ebee0
[Tests] Make nightly test work + Remove work stealing logs ()
* make tests work

* .
2021-09-14 09:52:58 -07:00
Edward Oakes
644f7bd7fa
[runtime_env] Remove no-longer-used mock setup function () 2021-09-14 11:35:09 -05:00
matthewdeng
380a653787
[SGD] update SGDv2 user guide docs ()
* [SGD] update SGDv2 user guide docs

* Update doc/source/raysgd/v2/user_guide.rst

Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>

* add new line

* update docs

* fix header line length

* lint

* lint

* lint

* lint

* fix remaining lint issues

* Update doc/source/raysgd/v2/user_guide.rst

Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>

* Update doc/source/raysgd/v2/user_guide.rst

Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>

* address comments

* address comments

* add TODO for iterator API

* Update doc/source/raysgd/v2/user_guide.rst

Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>

* address comments

* address comments

* add tune doc

* restructure table of contents

* add examples; rename example files to include example suffix

* add quick start, porting code

* address comments

Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
2021-09-14 09:07:25 -07:00
mwtian
a3f399ef10
[Client] fix propagating errors to async calls during disconnect, and other cleanup ()
* cleanup tests and errors for clients

* Fix lock and async get

* rerun

* Avoid running callback under lock. Make lock non-reentrant

* Add all necessary apis

* Removed unused APIs
2021-09-14 18:48:27 +03:00
Edward Oakes
7f8cdce67d
Revert "Route core worker ERROR/FATAL logs to driver logs ()" ()
This reverts commit 3e0ae38e11.
2021-09-14 10:41:10 -05:00
Antoni Baum
65d5deae60
[tests] Increase golden notebook test timeout to 20 mins () 2021-09-14 16:27:56 +01:00
Jiao
d3734d803d
[serve] Change nightly test docker image and enable micro benchmark () 2021-09-14 09:41:21 -05:00
Jiao
18bbf044a7
[serve] Add reconfigure with exception test and ensure it can rollback () 2021-09-14 08:39:46 -05:00
Kai Fricke
e4754f1e19
[ci] wheel URLs - give some time for wheels to be built () 2021-09-14 09:56:34 +01:00
Ameer Haj Ali
e6807ecb43
Change tests owners for ml tests () 2021-09-14 01:04:52 -07:00
Eric Liang
3e0ae38e11
Route core worker ERROR/FATAL logs to driver logs () 2021-09-13 23:07:14 -07:00
Guyang Song
dee12be253
[Event] print event message to general log () 2021-09-14 12:24:49 +08:00
Guyang Song
beff857cc1
[release][C++ API] support sanity check C++ () 2021-09-14 11:39:08 +08:00
Tanmay Chordia
bf1176311f
[dashboard] add an endpoint to force kill an actor () 2021-09-13 20:03:15 -07:00
Yi Cheng
7d1f408de9
[workflow] Move experimental/workflow to workflow () 2021-09-13 17:45:18 -07:00
Stephanie Wang
284dee493e
[core][usability] Disambiguate ObjectLostErrors for better understandability ()
* Define error types, throw error for ObjectReleased

* x

* Disambiguate OBJECT_UNRECONSTRUCTABLE and OBJECT_LOST

* OwnerDiedError

* fix test

* x

* ObjectReconstructionFailed

* ObjectReconstructionFailed

* x

* x

* print owner addr

* str

* doc

* rename

* x
2021-09-13 16:16:17 -07:00
Alex Wu
6479a5fcfc
[Workflow] dedupe download on recovery ()
* maybe works

* ?

* .

* seems to work

* seems to work

* .

* .

* lint

* address comments

* .

* test

* test

* .

* works?

* cleanup

* cleanup

* cleanup

* cleanup

* fix test + cleanup

* lint

* .

* lint

* lint

* lint

* lint

* fix tests

* lint

Co-authored-by: Alex Wu <alex@anyscale.com>
2021-09-13 15:30:47 -07:00
gjoliver
2924afa41e
[Release] Create soft links for libcusolver.so.10 as a temporary fix. ()
Co-authored-by: Jun Gong <jungong@anyscale.com>
2021-09-13 14:37:12 -07:00
Edward Oakes
766d5526bb
[serve] Revert changes to serve::test_runtime_env () 2021-09-13 16:29:41 -05:00
Yi Cheng
e4d36f749d
[deflaky] Fix workflow storage test () 2021-09-13 12:47:30 -07:00
Jiajun Yao
ec6f5ae9ab
Upgrade serve_tests and runtime_env_tests base image to 1.6.0 () 2021-09-13 12:47:06 -07:00
Clark Zinzow
a0fcc311ec
[Datasets] URL-encode paths if they are URLs. () 2021-09-13 12:46:21 -07:00
Kai Fricke
b543c0e923
[ci] Do not use anyscale connect for xgboost_tests/train_small () 2021-09-13 20:38:00 +01:00
Chris K. W
5db1b2395e
ensure queue order matches req_id order () 2021-09-13 12:27:49 -07:00
Edward Oakes
111a31d6a1
[runtime_env] Make Ray client server setup go through the runtime_env agent () 2021-09-13 14:16:35 -05:00
Edward Oakes
3869af551a
[serve] Fix missing f-string in rollback message () 2021-09-13 14:16:04 -05:00
Jiajun Yao
f8ae2b2b62
Don't pass in TaskID to TaskManager::MarkPendingTaskFailed since it can ()
be got from TaskSpecification
2021-09-13 11:27:42 -07:00
Eric Liang
a0336578a9
Update API stability annotations () 2021-09-13 11:00:16 -07:00
Sven Mika
3803e796ff
[RLlib] Multi-GPU learner thread (IMPALA) error messages/comments/code-cleanup. () 2021-09-13 19:27:53 +02:00
Jiao
1e26ca83ed
[serve] Add ability to recover replica state from actor names () 2021-09-13 11:44:29 -05:00
Edward Oakes
c482779da2
[runtime_env] Improve file-not-found msg in deletion () 2021-09-13 11:32:22 -05:00
Kai Fricke
b6392aa6ea
[ci] upgrade microbenchmark base image to 1.6.0 () 2021-09-13 17:13:01 +01:00
Kai Fricke
092a8a92f2
[tune] Improve HyperOpt KeyError message when metric was not found () 2021-09-13 17:02:20 +01:00
qicosmos
ac0a153b06
[C++ Worker]Add some api of placement group () 2021-09-13 15:10:54 +08:00
Guyang Song
3bc5f0501f
fix WaitPlacementGroupReady API () 2021-09-13 14:07:40 +08:00
Lingxuan Zuo
a67b9ee8d7
Remove custom resource from streaming () 2021-09-12 12:20:59 -07:00
Jiajun Yao
ae10a80d5e
Fix async actor worker process leak after calling ray.actor.exit_actor() () 2021-09-12 11:09:12 -07:00
Eric Liang
53a2a47655
Polish workflows doc, add semantics and best practices for sub-workflows () 2021-09-12 11:08:06 -07:00
Sven Mika
ea4a22249c
[RLlib] Add simple action-masking example script/env/model (tf and torch). () 2021-09-11 23:08:09 +02:00
Yi Cheng
370473fc5f
[workflow] Update documentation () 2021-09-11 13:40:09 -07:00