Commit graph

10564 commits

Author SHA1 Message Date
Alex Wu
88266a6fce
Revert "Revert "[Docs] More detailed M1 Mac installation instructions"" (#20549)
Reverts ray-project/ray#20547
2021-11-18 20:18:37 -08:00
Eric Liang
65a8698e82
Raise the dataset block size limit to 2GiB (#20551)
The default block size of 500MiB seems too low for some common workloads, e.g. shuffling 500GB. This creates 1000 blocks which means 1 million intermediate shuffle objects until we implement #20500.
2021-11-18 19:36:10 -08:00
Clark Zinzow
2d50bf1302
[Datasets] Bump NumPy version to >= 1.19.0 for Python 3.6. (#20542)
Datasets groupby boundary sampling requires `numpy>=1.19.0` otherwise it fails to concatenate the Arrow table columns.
2021-11-18 17:33:06 -08:00
Clark Zinzow
462e389791
[Datasets] Fix empty Dataset.iter_batches() when trying to prefetch more blocks than exist in the dataset (#20480)
Before this PR, `ds.iter_batches()` would yield no batches if `prefetch_blocks > ds.num_blocks()` was given, since the sliding window semantics were to return no windows if `window_size > len(iterable)`. This PR tweaks the sliding window implementation to always return at least one window, even if the one window is smaller than the given window size.
2021-11-18 17:02:54 -08:00
Simon Mo
add2450b92
[CI] [Hotfix] Skip test_standalone (#20556) 2021-11-18 16:47:18 -08:00
Richard Liaw
c964455642
Revert "[Docs] More detailed M1 Mac installation instructions" (#20547)
Reverts ray-project/ray#20512 due to lint errors.
2021-11-18 12:06:57 -08:00
Alex Wu
a811b2b6d7
[hotfix] Fix stress_test_many_tasks cluster environment (#20519)
This should fix the long running release tests that are failing to build their app configs.

It seems like pip install ray[all] now downgrades the ray version. It's unclear why, but most likely, a dependency has pinned the ray version now. This PR explicitely install the version of Ray that we want after the pip install ray[all] to fix the problem.
2021-11-18 11:51:46 -08:00
Amog Kamsetty
3f1092fb3d
[Release] Revert impala app config (#20397) 2021-11-18 11:24:22 -08:00
Antoni Baum
0b14f38ac7
[tune] Multi-objective support for Optuna (#20489)
This PR adds multi-objective support for Optuna searchers, including a test and example.

Co-authored-by: gjoliver <jungong@anyscale.com>
2021-11-18 18:47:29 +00:00
Simon Mo
7143d5d494
[Serve] Bump timeout for test_standalone to fix windows (#20543) 2021-11-18 10:00:23 -08:00
Alex Wu
540c9e35d1
[Docs] More detailed M1 Mac installation instructions (#20512)
This PR adds more detail the M1 mac installation instructions following the bug bash.
2021-11-18 09:35:43 -08:00
Sven Mika
7a585fb275
[RLlib; Documentation] RLlib README overhaul. (#20249) 2021-11-18 18:08:40 +01:00
Edward Oakes
d26c9e67e8
[job submission] Add a message to the JobStatus to return more detailed errors (#20491) 2021-11-18 10:15:23 -06:00
shrekris-anyscale
a91ddbdeb9
Add smart_open dependency to ray[default] (#20420) 2021-11-18 10:00:30 -06:00
Chen Shen
2012b469f6
fix gcs client hang (#20531) 2021-11-18 07:28:15 -08:00
qicosmos
a49c1d5f55
[C++] Deprecated global named actor and global PGs. (#20468)
Why are these changes needed?
This PR removes global named actor and global PGs.

Related issue number
#20460
2021-11-18 23:21:59 +08:00
Simon Mo
d7f208dea4
[Releaes] Make e2e.py link clickable on buildkite (#20436)
Adds log formatting to output clickable links to buildkite console logs
2021-11-18 12:45:59 +00:00
SangBin Cho
140a180ebb
[xgboost] Fix flaky train_small test (#20529)
Xgboosts train_small timed out because of a CPU borrowing feature related to placement groups. The root bug will be fixed in the coming weeks, but this PR makes the release test consistently pass by requesting 0 CPUs for the remote wrapper script.
2021-11-18 10:20:08 +00:00
shrekris-anyscale
65a023ef71
[runtime_env][docs] Add documentation on using remote URIs for runtime environments (#20352) 2021-11-17 23:17:48 -06:00
Edward Oakes
eae523159f
[job submission] Prefix job ID with raysubmit_ and pass job_name metadata (#20490) 2021-11-17 21:48:22 -06:00
Amog Kamsetty
9796ae56d5
[Train][Data] Change usages of iter_datasets to iter_epochs (#20487) 2021-11-17 18:05:51 -08:00
Gagandeep Singh
33b4245df2
Fix race condition when starting redis (#19836)
Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>
2021-11-17 17:43:35 -08:00
Simon Mo
c85e9e69b3
[Serve] Change multi_deployment_1k_noop_replica threshold (#20514) 2021-11-17 17:25:54 -08:00
Yi Cheng
cbf5826040
[workflow] Fix workflow event doc typo (#20465)
In the example, it says `after_checkpoint`, but this should be `event_checkpointed`
2021-11-17 16:18:20 -08:00
Amog Kamsetty
4cbcb11458
[Docker] Add commit as label (#20504)
Adds the Ray commit sha as a label for the docker image.
2021-11-17 15:20:41 -08:00
Richard Liaw
1cadd61917
Fix horovod failing tests by pinning down (#20484) 2021-11-17 13:54:25 -08:00
Sven Mika
56619b955e
[RLlib; Documentation] Some docstring cleanups; Rename RemoteVectorEnv into RemoteBaseEnv for clarity. (#20250) 2021-11-17 21:40:16 +01:00
gjoliver
724a140795
[rllib] Make sure json can serialize result dict (#20439)
We may have fields in the result dict that are or None.
Make sure our results are json serializable.
2021-11-17 10:27:00 -08:00
xwjiang2010
03aec4e04a
[Tune] Remove runner argument in start_trial. (#20464)
This internal legacy argument was not used by any code.
2021-11-17 16:59:57 +00:00
Alex Wu
d1c624901f
Add hiredis dependency on supported platforms (#20437)
<!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. -->

## Why are these changes needed?

This PR adds the hiredis dependency for non M1 machines. 

This removes the `redis < 4.0` pin.

Since hiredis doesn't have M1 mac wheels yet, so users there will have extra warning messages in their outputs if they use redis 4.0.
<!-- Please give a short summary of the change and the problem this solves. -->

## Related issue number

<!-- For example: "Closes #1234" -->

## Checks

- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(


Co-authored-by: Alex Wu <alex@anyscale.com>
2021-11-17 07:40:58 -08:00
Alex Wu
3d668768de
[docker] Upgrade numpy version (#20450)
<!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. -->

## Why are these changes needed?

The change in #20374 was interpreted as a file redirect, not a "greater than" by docker (strangely enough, differently than bash interprets it locally). 

<!-- Please give a short summary of the change and the problem this solves. -->

## Related issue number

<!-- For example: "Closes #1234" -->

## Checks

- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(


Co-authored-by: Alex <alex@anyscale.com>
2021-11-17 07:15:18 -08:00
Qing Wang
e01f14d7df
[DOC] Add namespace doc for Java part. (#20428)
Add namespace doc for Java part.
2021-11-17 23:02:47 +08:00
Devon Proctor
dba84546d9
[GCP] Filter GCP TPUs by cluster name, matching behavior for GCP compute nodes. (#20311)
Ray currently does not filter GCP TPU nodes based on the cluster name, resulting in conflicts when multiple ray clusters are running on the same GCP account.

This change updates the TPU behavior to match the GCP compute node behavior, i.e. filtering to TPU nodes for the current cluster.
2021-11-17 01:39:58 -08:00
Simon Mo
18d605fa7c
[Serve] Add experimental CLI for serve deploy (#20371) 2021-11-16 20:22:09 -08:00
Larry
454db6902c
[Java] Add timeout parameter for Ray.get() API (#20282)
Why are these changes needed?

Add timeout(ms) param for Java ray.get. The API changes have been updated to doc ([Ray Core Walkthrough]->[Fetching Results]).

eg:
ObjectRef<Integer> objRef = Ray.put(1);
objRef.get(1000) 
Ray.get(Ray.task(MyRayApp::slowFunction).remote(), 3000)

Related issue number
#20247
2021-11-17 11:02:17 +08:00
Simon Mo
2dc7a6c9f8
[CI] Pin manylinux image (#20451) 2021-11-16 17:52:51 -08:00
Antoni Baum
20fc9f907d
[CI] Fix tune dashboard, increase timeout for test_commands (#20453) 2021-11-16 17:52:17 -08:00
Avnish Narayan
dc17f0a241
Add error messages for missing tf and torch imports (#20205)
Co-authored-by: Sven Mika <sven@anyscale.io>
Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-11-16 16:30:53 -08:00
Simon Mo
5fccad4cc9
[Serve] Add experimental pipeline docs (#20292) 2021-11-16 16:13:55 -08:00
Simon Mo
32a4f48aa2
[CI] Don't test tune dashboard (#20452) 2021-11-16 15:07:56 -08:00
Richard Liaw
cf357f6bce
[docs] Add a talks section for ray.data (#20444) 2021-11-16 14:30:08 -08:00
Kai Fricke
05d21497db
[rllib/tune] Fix durable trainable in trainer template, add release test (#20422) 2021-11-16 20:52:42 +00:00
Edward Oakes
48b87d5830
[serve] Fix actor resources error in failure test (#20400) 2021-11-16 12:24:54 -08:00
Eric Liang
12a4489e30
Revert "[core] Nested task support via task depth + backpressure" (#20438)
Reverts ray-project/ray#17887

This causing several tests to be flaky (test_multinode_failures, test_virtual_actor, test_component_failures_2).
2021-11-16 11:14:45 -08:00
gjoliver
6e787f70e0
[Rllib/release] Disable throughput check (#20387)
Throughput check was enabled by d8a61f801f prematurely.
E.g., see state before the commit:
a931076f59/rllib/utils/test_utils.py (L740-L741)
2021-11-16 11:05:51 -08:00
Chen Shen
33c1ee0e86
[Core][actor out-of-order execution 5/n] implement out-of-order scheduling queue #20176
This PR belongs to the stack that enables out of order execution. Previous PR: #20160, Next PR: #20177

In this PR specifically, we implemented a simple out_of_order_scheduling queue which queues the task for execution as soon as the dependency is ready.
2021-11-16 10:53:51 -08:00
Chen Shen
f02b53a810
[Core][actor out-of-order execution 3/n] Introducing out-of-order actor submit queue (#20150)
Why are these changes needed?
This is the third PR in the stack that supports out or order execution for threaded/async actors. Previous PR #20149 Next PR #20160
At a high level, threaded actor/async actor already don't guarantee execution order, and the current "sequential" order implementation has caused some confusion and inconvenience. Please refer to #19822 for detailed discussion.

In this PR, we implemented the out-of-order of queue that supports out of order execution. Conceptually it's very simple: it sends the requests as soon as the dependency is resolved.
2021-11-16 10:48:49 -08:00
Simon Mo
5f2b035bba
Pin Redis version to < 4.0.0 (#20430)
<!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. -->

## Why are these changes needed?

This pin is needed to fix `test_output` on master, which broke when 4.0.0 was released. 

It may also fix the windows build (unsure). 

## Related issue number

<!-- For example: "Closes #1234" -->

## Checks

- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(
2021-11-16 10:48:36 -08:00
Alex Wu
8f21cdbddb
Revert "[dependencies] Use redis[hiredis] in setup.py" (#20435)
Reverts ray-project/ray#20423

`hiredis` will break our M1 support right now.
2021-11-16 10:46:22 -08:00
Kai Fricke
6ec256122c
[dependencies] Use redis[hiredis] in setup.py (#20423)
This is recommended by `redis-py` and as a side effect gets rid of a current error in `test_output` for the minimal dependency test (e.g. https://buildkite.com/ray-project/ray-builders-branch/builds/4746#7444b5d0-87c3-4998-b722-1cbc2d9fe7e3)
2021-11-16 10:25:36 -08:00