Commit graph

11578 commits

Author SHA1 Message Date
Edward Oakes
135cd121b9
[release tests] Fix minor bug in multi-deployment serve test (#22961) 2022-03-09 14:37:27 -06:00
mwtian
3ccc2aa17a
Revert "[Core] Update grpc to 1.44.0 (#22384)" (#22958)
This reverts commit 5ebc32d7c2.
2022-03-09 11:40:35 -08:00
Jiao
ea9069fef4
[6/X][Pipeline] Add HTTP ingress to serve pipeline (#22878) 2022-03-09 11:39:15 -08:00
Simon Mo
3c4827e0b2
[Serve] [2/3 Wrappers] Add Basic HTTP Adapters (#22914) 2022-03-09 11:36:46 -08:00
Antoni Baum
2ead945438
[datasets] Make label_column optional in to_tf (#22916)
Makes the `label_column` argument in `Dataset.to_tf` optional so that it can be used for prediction.
2022-03-09 11:34:18 -08:00
shrekris-anyscale
61e132b478
[serve] Split test_deploy (#22908)
`test_deploy` has become [flakey](https://flakey-tests.ray.io/#) due to timeout. Since `test_deploy` is already a "large" test, this change splits it into two testing files instead of simply increasing the timeout.
2022-03-09 12:22:51 -06:00
Kai Fricke
b267be4758
[ml] Add Ray ML / AIR checkpoint implementation (#22691)
This PR splits up the changes in #22393 and introduces an implementation of the ML Checkpoint interface used by Ray Tune.

This means, the TuneCheckpoint class implements the to/from_[bytes|dict|directory|object_ref|uri] conversion functions, as well as more high-level functions to transition between the different TuneCheckpoint classes. It also includes test cases for Tune's main conversion modes, i.e. dict - intermediate - dict and fs - intermediate - fs.

These changes will be the basis for refactoring the tune interface to use TuneCheckpoint objects instead of TrialCheckpoints (externally) and instead of paths/objects (internally).
2022-03-09 10:02:59 -08:00
Eric Liang
79a3b56015
[ml] Improve the documentation of ml common classes; add kwargs to predictor (#22936) 2022-03-09 10:01:20 -08:00
Kai Fricke
ca87c37c61
[ci/release] Fix result output in Buildkite pipeline run (#22946)
The new buildkite pipeline prints out faulty results due to a confusion of -ge/-gt and -le/-lt in the retry script. This is a cosmetic error (so behavior was still correct) that is resolved with this PR.
2022-03-09 17:29:31 +00:00
Simon Mo
77ead01b65
[Serve] [1/3 Wrappers] Allow @serve.batch to accept args and kwargs (#22913) 2022-03-09 09:15:57 -08:00
Kai Fricke
15601ed79b
Revert "[serve] Support working_dir in serve run (#22760)" (#22956)
This reverts commit ab2741d64b.

The PR breaks ray job submission for anyscale:// URLs
2022-03-09 17:04:46 +00:00
Jiajun Yao
069f5f467c
[Test] Fix and enable test_logging.py (#22904)
Fix and enable test_logging.py
2022-03-09 09:01:38 -08:00
ZhuSenlin
a15890be58
[GCS] refactor the resource related data structures on the GCS (#22924)
* refactor resource data structure in gcs

* fix comment

* fix lint error

* fix

* DISABLED_TestRejectedRequestWorkerLeaseReply as it depends on the update of normal task

Co-authored-by: 黑驰 <senlin.zsl@antgroup.com>
2022-03-09 08:22:02 -08:00
simonsays1980
8627f44d7f
[RLlib] Remove duplicate code block: Config deprecation check for metrics_smoothing_episodes (#22152) 2022-03-09 16:51:42 +01:00
Edward Oakes
2cac49e4b0
[serve][release tests] Mark long-running failure test as non-stable (#22922) 2022-03-09 09:42:47 -06:00
Kai Fricke
ac654dbb9d
[ci/release] Fix schema validation for single tests / add stable field (#22947)
This currently leads to failing builds for schema validation errors after #22901 was merged (the stable column was incorrectly not added to the schema before).
2022-03-09 15:22:49 +00:00
matthewdeng
6b0169b23d
[ml] enable CI tests (#22926)
Follow-up to #22748, enabling tests in CI.

Conditions: A new RAY_CI_ML_AFFECTED condition is added for this test suite. The package currently depends on Ray Data, and will be triggered accordingly.

Dependencies: Adding DATA_PROCESSING_TESTING dependencies (set for install-dependencies.sh) for now.
2022-03-09 14:31:53 +00:00
Jialing He
795b5787dc
[runtime env][bug] Fix RuntimEnv ignore eager_install when _validate is True (#22935)
When _validate is True, RuntimeEnv will ignore field eager_install.
2022-03-09 20:16:55 +08:00
Kai Fricke
cac9d30909
[ci/release] Add schema validation for release test config (#22919)
To avoid breakage like in #22905, this PR adds schema validation to the release test package.
In a follow-up PR, we'll likely switch this to use pydantic instead.
2022-03-09 09:50:51 +00:00
Siyuan (Ryans) Zhuang
b621dc099b
[DAG] Update the example in the doc (#22930)
* update doc
2022-03-08 20:09:45 -08:00
Guyang Song
56287d63e5
[runtime env] remove _rewrite_pip_list_ray_libraries (#22890)
We don't need this logic after using virtualenv.
2022-03-09 11:41:33 +08:00
Edward Oakes
aa907987bf
[serve][release tests] Use m5.8xlarge instance types for 1k replica tests (#22918) 2022-03-08 21:34:01 -06:00
Stephanie Wang
bf09f5071a
[core] Deflake test_plasma_unlimited (#22911)
test_plasma_unlimited::test_task_unlimited is flaky because one of the assertions is race-y and can trigger after the condition is no longer true (see #22883). This fixes the flake by:
- adding an assertion in between two object allocations to force the object store queue to flush
- keeping one of the ObjectRefs in scope to make sure that the object is still fallback-allocated by the time we reach the failing assertion
2022-03-08 22:00:04 -05:00
Chen Shen
bc3f7a7684
[scheduling policy 3/n][rfc] Refactor SchedulingPolicy into interface and implementations (#22907)
* scheduling policy

* update

Co-authored-by: Gagandeep Singh <gdp.1807@gmail.com>
2022-03-08 18:47:56 -08:00
Junwen Yao
0395d0987e
[Train] Add support for automatic pipelining of host to device transfer (#22716)
This PR adds the support for concurrently transferring the input from host to device.
2022-03-08 18:37:23 -08:00
Balaji Veeramani
48af260aaf
[Train] Clarify shuffle documentation in prepare_data_loader (#22876)
We essentially use a hack to determine whether shuffling should be enabled in prepare_data_loader. I've clarified the documentation so the hack is easier to understand.
2022-03-08 18:13:29 -08:00
Alex Wu
b84aaef38a
Promote python 3.9 support to stable (#22923)
Remove the experimental note from python 3.9 since it and its core dependencies have been stable for quite some time now.

Co-authored-by: Alex Wu <alex@anyscale.com>
2022-03-08 17:24:54 -08:00
SangBin Cho
549527687f
Migrate scalability tests (#22901)
This PR migrates scalability tests to the new infra.

I had to copy the benchmarks folder to the release folder to make it work. I will remove some unnecessary files (e.g., benchmark.yaml or wait_for_cluster file) Alternatively we can support a different path than /release from the tool, but I think this way is cleaner. I am open to suggestion though cc @krfricke
2022-03-08 17:22:41 -08:00
Eric Liang
52491c87e2
Make a pass fixing Dataset API issues (#22886) 2022-03-08 13:07:55 -08:00
shrekris-anyscale
ab2741d64b
[serve] Support working_dir in serve run (#22760)
#22714 added `serve run` to the Serve CLI. This change allows the user to specify a local or remote `working_dir` in `serve run`.
2022-03-08 13:18:41 -06:00
Junwen Yao
d1009c8489
[Train] Add support for metrics aggregation (#22099)
This PR allows users to aggregate metrics returned from all workers.
2022-03-08 11:03:04 -08:00
Simon Mo
c8aa6cdf64
Fix Issue Severity Question to Bug Report Template (#22906) 2022-03-08 10:36:32 -08:00
Wendi-anyscale
dd8654fd85
Add Issue Severity Question to Bug Report Template (#22887) 2022-03-08 10:31:53 -08:00
Balaji Veeramani
37c6169027
[Train] Refactor and add Accelerator classes (#22009)
To support mixed precision (see #20643), we need to store a GradScaler instance that is accessibly by both prepare_optimizer and backward functions (these functions will be added later).

This PR introduces the Accelerator, an object that implements methods to perform backend-specific training optimizations.
2022-03-08 10:26:00 -08:00
Balaji Veeramani
04b10ff9e9
[Train] Tell user to specify cluster address if placement group times out (#22845)
If you don't add `ray.init("auto")` to your training script, then your training script might complain that there aren't enough resources, even if `ray status` shows that there are.

Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
2022-03-08 10:24:12 -08:00
matthewdeng
7b5813e94f
[ml] add initial Dataset Preprocessors (#22748) 2022-03-08 09:59:03 -08:00
Kai Fricke
c57abb693b
[ci/release] Add frequency to core nightly test (#22905)
Breaks the scheduled build: https://buildkite.com/ray-project/release-tests-branch/builds/82#3994f5e1-6da3-4c70-8c30-bdcfb1fec851

We should enforce schema validation soon.
2022-03-08 17:44:20 +00:00
Gagandeep Singh
2899dc1bb5
Fixed MRO for DerivedActorClass (#22113)
Comments to be noted from the discussion below,

https://github.com/ray-project/ray/pull/22113#discussion_r802512907

> Problem - We cannot always delegate call to cls.__init__ or modified_cls.__init__. Because if always delegate call to cls.__init__ from here, then user defined class's __init__ method will be ignore leading to issues like, https://github.com/ray-project/ray/issues/21868. If we always delegate call to modified_cls.__init__ then it will allow inheriting from actor classes leading to failure of test_actor_inheritance. So, I have added this if-else check to figure out which __init__ method should be called. If "__module__", "__qualname__" and "__init__" are present in args[-1] then it would mean an actor class is being inherited so cls.__init__ should be called. However, if no such signal is received in args then user defined class's __init__ i.e., modified_class.__init__ should be called.

https://github.com/ray-project/ray/pull/22113#discussion_r808696261

> So I noted that ActorClass.__init__ will anyway raise a TypeError whenever it will be inherited. To exactly figure out whether the exception is due to inheritance of ActorClass, I created a new class ActorClassInheritanceException(TypeError). Now, whenever this will be raised, then DerivedActorClass will get a clear signal about inheritance of ActorClass. In other cases, it will be safe to conclude (AFAICT) that user called __init__ method of their class and we will proceed normally. IMHO, this is a better and more robust solution which just depends on a simple signal i.e., raising a particular exception in a specific event. It doesn't matter how inheritance is prevented as in the end we just need to raise ActorClassInheritanceException and all other code will be able to detect that easily.

https://github.com/ray-project/ray/pull/22113#issuecomment-1048527387
2022-03-08 09:37:19 -08:00
Chen Shen
cd0354e06d
[scheduling-policy 2/n] refactor scheduling policy API (#22885)
* add scheduling-options

* address comments
2022-03-08 09:29:00 -08:00
ZhuSenlin
1e4d7bc1f4
[Core] make StringIdMap thread safe (#22893)
* make StringIdMap thread safe

* fix comment

Co-authored-by: 黑驰 <senlin.zsl@antgroup.com>
2022-03-08 09:23:41 -08:00
xwjiang2010
f5995dccdf
[tune] Trainables will now know TUNE_ORIG_WORKING_DIR (#22803)
Also updated the docs.
2022-03-08 15:56:30 +00:00
Artur Niederfahrenhorst
37d129a965
[RLlib] ReplayBuffer API: Test cases. (#22390) 2022-03-08 16:54:12 +01:00
Max Pumperla
d6bff736f3
[docs] test ray.io snippets (#22822)
Tests all snippets we have on ray.io. There were some minor issues, which I'll fix upstream.

Signed-off-by: Max Pumperla <max.pumperla@googlemail.com>
2022-03-08 15:50:57 +00:00
SangBin Cho
0137fc8e23
[Tests] Add microbenchmark to the new infra test (#22861)
Verified it works. It also addresses the frequency comments from the previous PR
2022-03-08 05:58:49 -08:00
Artur Niederfahrenhorst
c0ade5f0b7
[RLlib] Issue 22625: MultiAgentBatch.timeslices() does not behave as expected. (#22657) 2022-03-08 14:25:48 +01:00
Tao Wang
4576f53fe3
[HOTFIX]fix some compilation failures in core worker test (#22855)
There're some compilation failures in core worker test when we build project using `bazel build //:all`. It seems broken and not integrated in CI.
2022-03-08 16:14:14 +08:00
Qing Wang
9aa0b4e89e
[Java] Add transient for cached hashcode of IDs to reduce serialized size. (#22766)
Use `transient` keyword for reducing the serialized size of  ids for transporting.
2022-03-08 14:36:08 +08:00
Jiajun Yao
7f57268bd0
Fix duplidate test bazel target (#22892) 2022-03-08 14:29:13 +09:00
Jiajun Yao
4801e57c77
[Test] Add missing tests to bazel BUILD (#22827) 2022-03-07 19:54:49 -08:00
Jian Xiao
c2908de401
For a dataset comprised of both empty and non-empty blocks, let the non-empty blocks determine the schema (#22834)
There is a bug in combining the results from map_batches: if we create two dataset out of the same data, but with different num of partitions, we may get different results when run the same map_batches() on them. That is, num of partitions is affecting the map_batches() results, which should not.
2022-03-07 18:17:49 -08:00