matthewdeng
380a653787
[SGD] update SGDv2 user guide docs ( #18270 )
...
* [SGD] update SGDv2 user guide docs
* Update doc/source/raysgd/v2/user_guide.rst
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
* add new line
* update docs
* fix header line length
* lint
* lint
* lint
* lint
* fix remaining lint issues
* Update doc/source/raysgd/v2/user_guide.rst
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
* Update doc/source/raysgd/v2/user_guide.rst
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
* address comments
* address comments
* add TODO for iterator API
* Update doc/source/raysgd/v2/user_guide.rst
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
* address comments
* address comments
* add tune doc
* restructure table of contents
* add examples; rename example files to include example suffix
* add quick start, porting code
* address comments
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
2021-09-14 09:07:25 -07:00
mwtian
a3f399ef10
[Client] fix propagating errors to async calls during disconnect, and other cleanup ( #18539 )
...
* cleanup tests and errors for clients
* Fix lock and async get
* rerun
* Avoid running callback under lock. Make lock non-reentrant
* Add all necessary apis
* Removed unused APIs
2021-09-14 18:48:27 +03:00
Edward Oakes
7f8cdce67d
Revert "Route core worker ERROR/FATAL logs to driver logs ( #18577 )" ( #18602 )
...
This reverts commit 3e0ae38e11
.
2021-09-14 10:41:10 -05:00
Antoni Baum
65d5deae60
[tests] Increase golden notebook test timeout to 20 mins ( #18554 )
2021-09-14 16:27:56 +01:00
Jiao
d3734d803d
[serve] Change nightly test docker image and enable micro benchmark ( #18566 )
2021-09-14 09:41:21 -05:00
Jiao
18bbf044a7
[serve] Add reconfigure with exception test and ensure it can rollback ( #18568 )
2021-09-14 08:39:46 -05:00
Kai Fricke
e4754f1e19
[ci] wheel URLs - give some time for wheels to be built ( #18505 )
2021-09-14 09:56:34 +01:00
Ameer Haj Ali
e6807ecb43
Change tests owners for ml tests ( #18417 )
2021-09-14 01:04:52 -07:00
Eric Liang
3e0ae38e11
Route core worker ERROR/FATAL logs to driver logs ( #18577 )
2021-09-13 23:07:14 -07:00
Guyang Song
dee12be253
[Event] print event message to general log ( #18376 )
2021-09-14 12:24:49 +08:00
Guyang Song
beff857cc1
[release][C++ API] support sanity check C++ ( #18545 )
2021-09-14 11:39:08 +08:00
Tanmay Chordia
bf1176311f
[dashboard] add an endpoint to force kill an actor ( #18508 )
2021-09-13 20:03:15 -07:00
Yi Cheng
7d1f408de9
[workflow] Move experimental/workflow
to workflow
( #18521 )
2021-09-13 17:45:18 -07:00
Stephanie Wang
284dee493e
[core][usability] Disambiguate ObjectLostErrors for better understandability ( #18292 )
...
* Define error types, throw error for ObjectReleased
* x
* Disambiguate OBJECT_UNRECONSTRUCTABLE and OBJECT_LOST
* OwnerDiedError
* fix test
* x
* ObjectReconstructionFailed
* ObjectReconstructionFailed
* x
* x
* print owner addr
* str
* doc
* rename
* x
2021-09-13 16:16:17 -07:00
Alex Wu
6479a5fcfc
[Workflow] dedupe download on recovery ( #18564 )
...
* maybe works
* ?
* .
* seems to work
* seems to work
* .
* .
* lint
* address comments
* .
* test
* test
* .
* works?
* cleanup
* cleanup
* cleanup
* cleanup
* fix test + cleanup
* lint
* .
* lint
* lint
* lint
* lint
* fix tests
* lint
Co-authored-by: Alex Wu <alex@anyscale.com>
2021-09-13 15:30:47 -07:00
gjoliver
2924afa41e
[Release] Create soft links for libcusolver.so.10 as a temporary fix. ( #18562 )
...
Co-authored-by: Jun Gong <jungong@anyscale.com>
2021-09-13 14:37:12 -07:00
Edward Oakes
766d5526bb
[serve] Revert changes to serve::test_runtime_env ( #18558 )
2021-09-13 16:29:41 -05:00
Yi Cheng
e4d36f749d
[deflaky] Fix workflow storage test ( #18536 )
2021-09-13 12:47:30 -07:00
Jiajun Yao
ec6f5ae9ab
Upgrade serve_tests and runtime_env_tests base image to 1.6.0 ( #18563 )
2021-09-13 12:47:06 -07:00
Clark Zinzow
a0fcc311ec
[Datasets] URL-encode paths if they are URLs. ( #18440 )
2021-09-13 12:46:21 -07:00
Kai Fricke
b543c0e923
[ci] Do not use anyscale connect for xgboost_tests/train_small ( #18569 )
2021-09-13 20:38:00 +01:00
Chris K. W
5db1b2395e
ensure queue order matches req_id order ( #18504 )
2021-09-13 12:27:49 -07:00
Edward Oakes
111a31d6a1
[runtime_env] Make Ray client server setup go through the runtime_env agent ( #18478 )
2021-09-13 14:16:35 -05:00
Edward Oakes
3869af551a
[serve] Fix missing f-string in rollback message ( #18561 )
2021-09-13 14:16:04 -05:00
Jiajun Yao
f8ae2b2b62
Don't pass in TaskID to TaskManager::MarkPendingTaskFailed since it can ( #18532 )
...
be got from TaskSpecification
2021-09-13 11:27:42 -07:00
Eric Liang
a0336578a9
Update API stability annotations ( #18452 )
2021-09-13 11:00:16 -07:00
Sven Mika
3803e796ff
[RLlib] Multi-GPU learner thread (IMPALA) error messages/comments/code-cleanup. ( #18540 )
2021-09-13 19:27:53 +02:00
Jiao
1e26ca83ed
[serve] Add ability to recover replica state from actor names ( #18293 )
2021-09-13 11:44:29 -05:00
Edward Oakes
c482779da2
[runtime_env] Improve file-not-found msg in deletion ( #18496 )
2021-09-13 11:32:22 -05:00
Kai Fricke
b6392aa6ea
[ci] upgrade microbenchmark base image to 1.6.0 ( #18542 )
2021-09-13 17:13:01 +01:00
Kai Fricke
092a8a92f2
[tune] Improve HyperOpt KeyError message when metric was not found ( #18549 )
2021-09-13 17:02:20 +01:00
qicosmos
ac0a153b06
[C++ Worker]Add some api of placement group ( #18431 )
2021-09-13 15:10:54 +08:00
Guyang Song
3bc5f0501f
fix WaitPlacementGroupReady API ( #18464 )
2021-09-13 14:07:40 +08:00
Lingxuan Zuo
a67b9ee8d7
Remove custom resource from streaming ( #18490 )
2021-09-12 12:20:59 -07:00
Jiajun Yao
ae10a80d5e
Fix async actor worker process leak after calling ray.actor.exit_actor() ( #18526 )
2021-09-12 11:09:12 -07:00
Eric Liang
53a2a47655
Polish workflows doc, add semantics and best practices for sub-workflows ( #18525 )
2021-09-12 11:08:06 -07:00
Sven Mika
ea4a22249c
[RLlib] Add simple action-masking example script/env/model (tf and torch). ( #18494 )
2021-09-11 23:08:09 +02:00
Yi Cheng
370473fc5f
[workflow] Update documentation ( #18522 )
2021-09-11 13:40:09 -07:00
Yi Cheng
15d67aa775
Support workflows step names via decorator ( #18520 )
2021-09-11 13:39:07 -07:00
Qing Wang
371f03fa48
Remove dynamic resource from client side. ( #18514 )
2021-09-11 10:39:59 -07:00
Chong-Li
d314d0c10e
[GCS] Fix the Windows build of GCS actor scheduling ( #18012 )
2021-09-10 17:17:25 -07:00
Lixin Wei
7e37d6e348
[Core] Add gRPC Server Backpressure Tests ( #18500 )
2021-09-10 17:17:09 -07:00
Alex Wu
1587eb22f0
[workflow] Dedupe object reference uploads ( #18438 )
...
* maybe works
* ?
* .
* seems to work
* seems to work
* .
* .
* lint
* address comments
* .
* test
* test
* .
* works?
* cleanup
* cleanup
* cleanup
* cleanup
* fix test + cleanup
* lint
Co-authored-by: Alex Wu <alex@anyscale.com>
2021-09-10 16:08:11 -07:00
dependabot[bot]
30012c990f
[tune](deps): Bump matplotlib in /python/requirements/tune ( #18025 )
...
Bumps [matplotlib](https://github.com/matplotlib/matplotlib ) from 3.4.2 to 3.4.3.
- [Release notes](https://github.com/matplotlib/matplotlib/releases )
- [Commits](https://github.com/matplotlib/matplotlib/compare/v3.4.2...v3.4.3 )
---
updated-dependencies:
- dependency-name: matplotlib
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-09-10 16:00:16 -07:00
dependabot[bot]
42794c7a3e
[tune](deps): Bump pytorch-lightning in /python/requirements/tune ( #18359 )
...
Bumps [pytorch-lightning](https://github.com/PyTorchLightning/pytorch-lightning ) from 1.4.3 to 1.4.5.
- [Release notes](https://github.com/PyTorchLightning/pytorch-lightning/releases )
- [Changelog](https://github.com/PyTorchLightning/pytorch-lightning/blob/master/CHANGELOG.md )
- [Commits](https://github.com/PyTorchLightning/pytorch-lightning/compare/1.4.3...1.4.5 )
---
updated-dependencies:
- dependency-name: pytorch-lightning
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-09-10 15:59:26 -07:00
Lixin Wei
88de723e62
Fix flaky test_gcs_fault_tolerance.py ( #18493 )
2021-09-10 11:34:07 -07:00
Kai Fricke
7d1e6d3129
[ci/release] Add sanity check for ray wheels hash to release tests ( #18489 )
2021-09-10 17:50:31 +01:00
Kai Fricke
be438fb600
[release] Also download Ray CPP wheels ( #18383 )
2021-09-10 17:49:37 +01:00
Chris K. W
6f94d0f3c9
[client] Use application specific error code to propagate ray errors ( #18278 )
...
* Raise decoded exception if generated by grpc lib
* Switch to missing client_id error to FAILED_PRECONDITION
* switch to ABORTED
* fix comment
* fix decode_exception comment
2021-09-10 09:49:03 -07:00
Sven Mika
3f89f35e52
[RLlib] Better error messages and hints; + failure-mode tests; ( #18466 )
2021-09-10 16:52:47 +02:00