Edward Oakes
50f2cf8a74
[job submission] Allow passing job_id, return DOES_NOT_EXIST when applicable ( #20164 )
2021-11-08 23:10:27 -08:00
Jiao
d46caa9856
[job submission] Remove test_utils dependency ( #20168 )
...
Co-authored-by: Jiao Dong <jiaodong@anyscale.com>
2021-11-08 23:08:43 -08:00
Lingxuan Zuo
97259e33b2
Relink grpc/absl for streaming.so ( #20136 )
...
To avoid exporting thrirdparty library symbol globally, these absl/grpc libs have been applied in _streaming.so.
Side-effect:
Static variables might be uninitialized if core worker lib and streaming lib both use them.
2021-11-09 14:13:53 +08:00
SangBin Cho
5c4fb4dc91
[Core]Chaos testing nightly ( #20059 )
...
* Done initial stage.
* lint
* .
* Finished.
* Fix lint
2021-11-08 21:57:53 -08:00
Stephanie Wang
ffcc5935d7
[core] Evict lineage to bound memory usage ( #19946 )
...
* bound lineage
* Bound lineage in bytes
* test
* Lineage evicted error
* Lineage evicted
* lint
* test
* test
* comment
* doc
* x
* x
* x
* x
2021-11-08 21:53:40 -08:00
architkulkarni
e5e62d8991
[runtime env] Fix runtime env conda test and enable it in CI ( #20121 )
2021-11-08 18:33:19 -08:00
Lixin Wei
8e666ca1e9
[Core] Fix Used Memory Calculation ( #20127 )
...
* fix memory
* fix
2021-11-08 17:36:32 -08:00
Kai Fricke
9c2b8c8501
[tune] Deprecate DurableTrainable ( #19880 )
2021-11-08 20:56:07 +00:00
Amog Kamsetty
f8430e6eca
[CI] Pin shortuuid to fix CI ( #20153 )
2021-11-08 12:08:32 -08:00
gjoliver
d8a61f801f
[RLlib] Create a set of performance benchmark tests to run nightly. ( #19945 )
...
* Create a core set of algorithms tests to run nightly.
* Run release tests under tf, tf2, and torch frameworks.
* Fix
* Add eager_tracing option for tf2 framework.
* make sure core tests can run in parallel.
* cql
* Report progress while running nightly/weekly tests.
* Innclude SAC in nightly lineup.
* Revert changes to learning_tests
* rebrand to performance test.
* update build_pipeline.py with new performance_tests name.
* Record stats.
* bug fix, need to populate experiments dict.
* Alphabetize yaml files.
* Allow specifying frameworks. And do not run tf2 by default.
* remove some debugging code.
* fix
* Undo testing changes.
* Do not run CQL regression for now.
* LINT.
Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-11-08 18:15:13 +01:00
Amog Kamsetty
b1f24768a1
[Tune] More fixes to PTL Tutorial ( #20065 )
...
* ptl-fix-2
* improve
* fix
2021-11-08 09:13:44 -08:00
Gagandeep Singh
31812d026c
Bumped time limit for test_worker_startup_count in test_basic_3.py ( #20056 )
...
Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>
2021-11-08 09:02:28 -08:00
Sven Mika
eea6b40a3e
[RLlib] Minor cleanups in Trainer
; better tf/tf2 info messages about possible tracing speedups. ( #20109 )
2021-11-08 15:37:27 +01:00
Kai Yang
e84391d1d3
[Core] Encode job ID in randomized task IDs for user-created threads ( #19320 )
...
## Why are these changes needed?
Currently, when `WorkerContext::GetCurrentTaskID()` returns a random task ID in user-created threads, and the returned task ID doesn't include the job ID. In this case, subsequent non-actor tasks and return values, and objects created by `ray.put()` don't include the job ID neither. This makes us hard to find the correct job ID from a task or object ID.
This PR updates the task ID generation code to always encode the job ID.
A side-effect of this PR is the change of possibility of task ID collision in user-created threads due to the fixed job ID part. w/o this PR: `sqrt(pi * 256 ^ 12 / 2)` ~= 352 trillion tasks. w/ this PR: `sqrt(pi * 256 ^ 8 / 2)` ~= 5 billion tasks. But this should be OK because the job ID part of task IDs in non-user-created threads are always fixed, so it won't be worse than non-user-created threads.
## Related issue number
## Checks
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for https://docs.ray.io/en/master/ .
- [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
- [ ] Unit tests
- [ ] Release tests
- [ ] This PR is not tested :(
2021-11-08 21:00:40 +08:00
qicosmos
547bfbc4a4
[core]Simplify filesystem ( #18941 )
2021-11-08 17:54:07 +08:00
Qing Wang
f9d94f51aa
Revert "[Java] Skip javadoc when deploying. ( #19428 )" ( #20137 )
...
This reverts commit 1047914ee0
.
2021-11-08 15:53:31 +08:00
Qing Wang
6d8a7291ab
Add getNamespace API for Java worker ( #20057 )
...
[Java API] Add getNamespace API for Java worker.
2021-11-08 15:51:14 +08:00
Linsong Chu
e189d8d4bc
[workflow] fix s3 storage path ( #20115 )
...
## Why are these changes needed?
To fix two path related issues when s3 is used as storage backend:
1. a leading slash will be added to the path due to the behavior of `parse.urlparse`.
2. When `step_id=""`, double slashes will be added in the path.
Details are explained in https://github.com/ray-project/ray/issues/20114
## Related issue number
https://github.com/ray-project/ray/issues/20114
https://github.com/ray-project/ray/issues/19027
2021-11-07 15:57:33 -08:00
xwjiang2010
99826d2ca6
[Release] Increase node memory by 2X in many_ppo test. ( #19591 )
2021-11-08 08:10:09 +09:00
Jiajun Yao
e110d958a1
Support different s3 url formats ( #20133 )
2021-11-07 14:58:51 -08:00
Jules S. Damji
e6343f0e69
Fixed a broken code snippet with a missing method ( #20130 )
...
Signed-off-by: Jules S.Damji <jules@anyscale.com>
Co-authored-by: Jules S.Damji <jules@anyscale.com>
2021-11-08 07:56:32 +09:00
dependabot[bot]
adf39941f4
[data](deps): Bump dask[complete] ( #20125 )
...
Bumps [dask[complete]](https://github.com/dask/dask ) from 2021.9.1 to 2021.11.0.
- [Release notes](https://github.com/dask/dask/releases )
- [Changelog](https://github.com/dask/dask/blob/main/docs/release-procedure.md )
- [Commits](https://github.com/dask/dask/compare/2021.09.1...2021.11.0 )
---
updated-dependencies:
- dependency-name: dask[complete]
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-11-07 11:55:39 -08:00
Alex Wu
45d7ef7c08
[windows][ci] Skip test_multi_node_failure_2 ( #20117 )
2021-11-07 09:17:46 -08:00
Sven Mika
76f8a9f125
[RLlib; testing] Increase size of two time-out'ing test cases from medium to large. ( #20128 )
2021-11-06 21:48:28 +01:00
Jiao
9ef75b27ac
[Job Submission] Add stop API to http & sdk, with better status code + stacktrace ( #20094 )
2021-11-06 12:37:54 -05:00
SangBin Cho
7a18d90a25
Revert "[scheduler] Update local object store usage ( #20026 )" ( #20118 )
...
This reverts commit 7e013366ac
.
2021-11-06 07:34:27 -07:00
SangBin Cho
f65cc72b4c
Revert "Set default max_pending_lease_requests_per_scheduling_category to 10 ( #19924 )" ( #20124 )
...
This reverts commit 0d850f3302
.
2021-11-05 23:35:30 -07:00
Yi Cheng
6a6cc434ba
[nightly] Remove grpc staging test since nightly is stable #20119 ( #20119 )
2021-11-05 21:36:58 -07:00
Amog Kamsetty
3408b60d2b
[Release] Refactor User Tests ( #20028 )
...
* wip
* add directory
* wip
* try again
* Revert "try again"
This reverts commit 82d33ccea6f92848df025e019b87df73cea49e5d.
* finish
* formatting
* fix merge
* fix path
* chmod
* check
* sudo
* wip
* update
* fix horovod
* try
* typo
* reduce num workers
2021-11-05 17:28:37 -07:00
Alex Wu
81194f5660
[workflow][docs] Fix api comparison formatting ( #20069 )
...
## Why are these changes needed?
The API comparison formatting uses \`code\` which is rendered as italicization not code. This PR puts the code in code blocks instead of italics.
## Related issue number
## Checks
2021-11-05 17:05:35 -07:00
mwtian
4d70ce1c86
[Core][Pubsub] add worker failure message to gcs pubsub ( #20075 )
...
## Why are these changes needed?
This is to demonstrate the steps needed to add a GCS pubsub channel, with GCS publisher and C++ subscribers subscribing via GCS client. For new channels, a unit test exercising the publishing and subscribing logic should also be added to `gcs_client_test.cc`.
## Related issue number
2021-11-05 14:52:49 -07:00
Jiajun Yao
0d850f3302
Set default max_pending_lease_requests_per_scheduling_category to 10 ( #19924 )
2021-11-05 14:24:20 -07:00
xwjiang2010
866fa9590f
[tune] clean up legacy branch in update_avail_resources. ( #20071 )
2021-11-05 10:28:46 -07:00
matthewdeng
78e9ff7c91
[train][datasets] add example for big data training ( #20042 )
...
* [train][datasets] add example for big data training
* add title docstring
* lint and dependencies
* add dask_ml requirement
2021-11-05 09:28:48 -07:00
Chen Shen
320f9dc234
[Core][CoreWorker] increase the default port range ( #19541 )
...
* increase the port range
* Update doc/source/configure.rst
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
2021-11-05 09:25:44 -07:00
Alex Wu
146b3d6bcc
[scheduler] Include depth and function descriptor in scheduling class ( #20004 )
2021-11-05 08:19:48 -07:00
Simon Mo
3d5cbc6e62
[Serve] Fix HTTP error handling behavior and add tests ( #20093 )
2021-11-05 10:15:54 -05:00
Sven Mika
a931076f59
[RLlib] Tf2 + eager-tracing same speed as framework=tf; Add more test coverage for tf2+tracing. ( #19981 )
2021-11-05 16:10:00 +01:00
gjoliver
1341bb59bf
[RLlib; Release testing] long_running_tests should use RLlib's app_config. ( #20095 )
2021-11-05 15:18:56 +01:00
SangBin Cho
8299aae918
[Placement Group] Add stats to pg scheduling ( #19841 )
...
* Add an e2e stats to pg scheduling
* Fix bugs.
* fix a bug.
* Revert "fix a bug."
This reverts commit dd7e03d1346fa39e54898effaaf8a2771103176e.
* done except unit tests.
* done except unit tests.
* Add unit tests.
* Address code review.
* done
* Fix
* done
* Fixed the test
2021-11-05 06:51:42 -07:00
Sven Mika
f3397b6f48
[RLlib] Minor fixes/cleanups; chop_into_sequences now handles nested data. ( #19408 )
2021-11-05 14:39:28 +01:00
Amog Kamsetty
adb8d77b2b
[Deps] Bump tensorflow on Docker image and add Codeowners ( #20041 )
2021-11-05 00:58:34 -07:00
dependabot[bot]
60e9737679
[tune](deps): Bump mlflow in /python/requirements/ml ( #19913 )
...
Bumps [mlflow](https://github.com/mlflow/mlflow ) from 1.19.0 to 1.21.0.
- [Release notes](https://github.com/mlflow/mlflow/releases )
- [Changelog](https://github.com/mlflow/mlflow/blob/master/CHANGELOG.rst )
- [Commits](https://github.com/mlflow/mlflow/compare/v1.19.0...v1.21.0 )
---
updated-dependencies:
- dependency-name: mlflow
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-11-04 23:37:01 -07:00
dependabot[bot]
9897ee0eab
[tune](deps): Bump onnxruntime in /python/requirements/ml ( #19666 )
...
Bumps [onnxruntime](https://github.com/microsoft/onnxruntime ) from 1.8.0 to 1.9.0.
- [Release notes](https://github.com/microsoft/onnxruntime/releases )
- [Changelog](https://github.com/microsoft/onnxruntime/blob/master/docs/ReleaseManagement.md )
- [Commits](https://github.com/microsoft/onnxruntime/compare/v1.8.0...v1.9.0 )
---
updated-dependencies:
- dependency-name: onnxruntime
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-11-04 23:34:48 -07:00
dependabot[bot]
f214c4a4ab
[tune](deps): Bump datasets from 1.11.0 to 1.14.0 in /python/requirements/ml ( #19645 )
...
* [tune](deps): Bump datasets in /python/requirements/ml
Bumps [datasets](https://github.com/huggingface/datasets ) from 1.11.0 to 1.14.0.
- [Release notes](https://github.com/huggingface/datasets/releases )
- [Commits](https://github.com/huggingface/datasets/compare/1.11.0...1.14.0 )
---
updated-dependencies:
- dependency-name: datasets
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
* Update requirements_tune.txt
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
2021-11-04 23:33:55 -07:00
Clark Zinzow
6ade6f0be6
[Datasets] Multi-aggregations [1/3]: Add basic support for groupby multi-aggregations. ( #20044 )
2021-11-04 22:48:49 -07:00
mwtian
fb0ede38ba
[CI] [macOS] avoid installing latest setuptools ( #20064 )
2021-11-04 21:35:03 -07:00
architkulkarni
c5175073b2
[runtime env] Add garbage collection for conda envs ( #20072 )
2021-11-04 23:13:34 -05:00
Edward Oakes
360993612c
[serve] Remove lingering backend references ( #20085 )
2021-11-04 20:32:13 -05:00
Eric Liang
6102912494
Dataset doc updates ( #19815 )
2021-11-04 18:13:40 -07:00