Commit graph

9448 commits

Author SHA1 Message Date
Yi Cheng
7d1f408de9
[workflow] Move experimental/workflow to workflow (#18521) 2021-09-13 17:45:18 -07:00
Stephanie Wang
284dee493e
[core][usability] Disambiguate ObjectLostErrors for better understandability (#18292)
* Define error types, throw error for ObjectReleased

* x

* Disambiguate OBJECT_UNRECONSTRUCTABLE and OBJECT_LOST

* OwnerDiedError

* fix test

* x

* ObjectReconstructionFailed

* ObjectReconstructionFailed

* x

* x

* print owner addr

* str

* doc

* rename

* x
2021-09-13 16:16:17 -07:00
Alex Wu
6479a5fcfc
[Workflow] dedupe download on recovery (#18564)
* maybe works

* ?

* .

* seems to work

* seems to work

* .

* .

* lint

* address comments

* .

* test

* test

* .

* works?

* cleanup

* cleanup

* cleanup

* cleanup

* fix test + cleanup

* lint

* .

* lint

* lint

* lint

* lint

* fix tests

* lint

Co-authored-by: Alex Wu <alex@anyscale.com>
2021-09-13 15:30:47 -07:00
gjoliver
2924afa41e
[Release] Create soft links for libcusolver.so.10 as a temporary fix. (#18562)
Co-authored-by: Jun Gong <jungong@anyscale.com>
2021-09-13 14:37:12 -07:00
Edward Oakes
766d5526bb
[serve] Revert changes to serve::test_runtime_env (#18558) 2021-09-13 16:29:41 -05:00
Yi Cheng
e4d36f749d
[deflaky] Fix workflow storage test (#18536) 2021-09-13 12:47:30 -07:00
Jiajun Yao
ec6f5ae9ab
Upgrade serve_tests and runtime_env_tests base image to 1.6.0 (#18563) 2021-09-13 12:47:06 -07:00
Clark Zinzow
a0fcc311ec
[Datasets] URL-encode paths if they are URLs. (#18440) 2021-09-13 12:46:21 -07:00
Kai Fricke
b543c0e923
[ci] Do not use anyscale connect for xgboost_tests/train_small (#18569) 2021-09-13 20:38:00 +01:00
Chris K. W
5db1b2395e
ensure queue order matches req_id order (#18504) 2021-09-13 12:27:49 -07:00
Edward Oakes
111a31d6a1
[runtime_env] Make Ray client server setup go through the runtime_env agent (#18478) 2021-09-13 14:16:35 -05:00
Edward Oakes
3869af551a
[serve] Fix missing f-string in rollback message (#18561) 2021-09-13 14:16:04 -05:00
Jiajun Yao
f8ae2b2b62
Don't pass in TaskID to TaskManager::MarkPendingTaskFailed since it can (#18532)
be got from TaskSpecification
2021-09-13 11:27:42 -07:00
Eric Liang
a0336578a9
Update API stability annotations (#18452) 2021-09-13 11:00:16 -07:00
Sven Mika
3803e796ff
[RLlib] Multi-GPU learner thread (IMPALA) error messages/comments/code-cleanup. (#18540) 2021-09-13 19:27:53 +02:00
Jiao
1e26ca83ed
[serve] Add ability to recover replica state from actor names (#18293) 2021-09-13 11:44:29 -05:00
Edward Oakes
c482779da2
[runtime_env] Improve file-not-found msg in deletion (#18496) 2021-09-13 11:32:22 -05:00
Kai Fricke
b6392aa6ea
[ci] upgrade microbenchmark base image to 1.6.0 (#18542) 2021-09-13 17:13:01 +01:00
Kai Fricke
092a8a92f2
[tune] Improve HyperOpt KeyError message when metric was not found (#18549) 2021-09-13 17:02:20 +01:00
qicosmos
ac0a153b06
[C++ Worker]Add some api of placement group (#18431) 2021-09-13 15:10:54 +08:00
Guyang Song
3bc5f0501f
fix WaitPlacementGroupReady API (#18464) 2021-09-13 14:07:40 +08:00
Lingxuan Zuo
a67b9ee8d7
Remove custom resource from streaming (#18490) 2021-09-12 12:20:59 -07:00
Jiajun Yao
ae10a80d5e
Fix async actor worker process leak after calling ray.actor.exit_actor() (#18526) 2021-09-12 11:09:12 -07:00
Eric Liang
53a2a47655
Polish workflows doc, add semantics and best practices for sub-workflows (#18525) 2021-09-12 11:08:06 -07:00
Sven Mika
ea4a22249c
[RLlib] Add simple action-masking example script/env/model (tf and torch). (#18494) 2021-09-11 23:08:09 +02:00
Yi Cheng
370473fc5f
[workflow] Update documentation (#18522) 2021-09-11 13:40:09 -07:00
Yi Cheng
15d67aa775
Support workflows step names via decorator (#18520) 2021-09-11 13:39:07 -07:00
Qing Wang
371f03fa48
Remove dynamic resource from client side. (#18514) 2021-09-11 10:39:59 -07:00
Chong-Li
d314d0c10e
[GCS] Fix the Windows build of GCS actor scheduling (#18012) 2021-09-10 17:17:25 -07:00
Lixin Wei
7e37d6e348
[Core] Add gRPC Server Backpressure Tests (#18500) 2021-09-10 17:17:09 -07:00
Alex Wu
1587eb22f0
[workflow] Dedupe object reference uploads (#18438)
* maybe works

* ?

* .

* seems to work

* seems to work

* .

* .

* lint

* address comments

* .

* test

* test

* .

* works?

* cleanup

* cleanup

* cleanup

* cleanup

* fix test + cleanup

* lint

Co-authored-by: Alex Wu <alex@anyscale.com>
2021-09-10 16:08:11 -07:00
dependabot[bot]
30012c990f
[tune](deps): Bump matplotlib in /python/requirements/tune (#18025)
Bumps [matplotlib](https://github.com/matplotlib/matplotlib) from 3.4.2 to 3.4.3.
- [Release notes](https://github.com/matplotlib/matplotlib/releases)
- [Commits](https://github.com/matplotlib/matplotlib/compare/v3.4.2...v3.4.3)

---
updated-dependencies:
- dependency-name: matplotlib
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-09-10 16:00:16 -07:00
dependabot[bot]
42794c7a3e
[tune](deps): Bump pytorch-lightning in /python/requirements/tune (#18359)
Bumps [pytorch-lightning](https://github.com/PyTorchLightning/pytorch-lightning) from 1.4.3 to 1.4.5.
- [Release notes](https://github.com/PyTorchLightning/pytorch-lightning/releases)
- [Changelog](https://github.com/PyTorchLightning/pytorch-lightning/blob/master/CHANGELOG.md)
- [Commits](https://github.com/PyTorchLightning/pytorch-lightning/compare/1.4.3...1.4.5)

---
updated-dependencies:
- dependency-name: pytorch-lightning
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-09-10 15:59:26 -07:00
Lixin Wei
88de723e62
Fix flaky test_gcs_fault_tolerance.py (#18493) 2021-09-10 11:34:07 -07:00
Kai Fricke
7d1e6d3129
[ci/release] Add sanity check for ray wheels hash to release tests (#18489) 2021-09-10 17:50:31 +01:00
Kai Fricke
be438fb600
[release] Also download Ray CPP wheels (#18383) 2021-09-10 17:49:37 +01:00
Chris K. W
6f94d0f3c9
[client] Use application specific error code to propagate ray errors (#18278)
* Raise decoded exception if generated by grpc lib

* Switch to missing client_id error to FAILED_PRECONDITION

* switch to ABORTED

* fix comment

* fix decode_exception comment
2021-09-10 09:49:03 -07:00
Sven Mika
3f89f35e52
[RLlib] Better error messages and hints; + failure-mode tests; (#18466) 2021-09-10 16:52:47 +02:00
Ameer Haj Ali
ead02b21b9
[client] Fix exception error message (#18485) 2021-09-10 14:34:31 +03:00
Guyang Song
03a2c69a8a
Don't add ray-cpp wheel to extras by default (#18251) 2021-09-10 09:56:51 +01:00
xwjiang2010
ae689ecc6b
[tune] Add optional Experiment to Searcher/SearchAlgo. (#17724) 2021-09-10 09:30:18 +01:00
Edward Oakes
2fcfea10b3
[runtime_env] Move URI deletion logic to the agent, remove util worker code (#18471) 2021-09-10 00:13:32 -07:00
Yi Cheng
f2d8f23fb6
[workflow] Define default __getstate__ and __setstate__ (#18459) 2021-09-09 23:04:00 -07:00
Yi Cheng
965c55fe1b
[workflow] set max retry to 3 (#18477) 2021-09-09 23:03:24 -07:00
qicosmos
dd096c8e73
[C++ Worker]Fix abi issue (#18273) 2021-09-10 11:53:05 +08:00
SangBin Cho
7b2ed4c1f8
[Placement group] Placement group scheduling hangs due to creation/removal race condition (#18419) 2021-09-09 20:39:01 -07:00
SangBin Cho
688dbeb4cb
Revert "[cpp] Upgrade cpp from 14 -> 17 (#18455)" (#18480)
This reverts commit ccc16a46bb.
2021-09-09 16:47:19 -07:00
matthewdeng
e66f154b14
[release] increase torch_tune_serve timeout to 20 min (#18481) 2021-09-09 16:31:14 -07:00
Chen Shen
5f57079041
use clang for C++ debug testing (#18343) 2021-09-09 15:48:36 -07:00
Amog Kamsetty
d3d8120db3
[SGD] Fix shutdown hang on macOS Python 3.7 (#18473) 2021-09-09 15:32:52 -07:00