Commit graph

11005 commits

Author SHA1 Message Date
Kai Fricke
976ece4bc4
[tune] Add test for heterogeneous resource request deadlocks (#21397)
This adds a test for potential resource deadlocks in experiments with heterogeneous PGFs. If the PGF of a later trial becomes ready before that of a previous trial, we could run into a deadlock. This is currently avoided, but untested, flagging the code path for removal in #21387.
2022-01-06 10:44:30 +00:00
Qing Wang
132e2b2a96
[Core] Remove unused flag put_small_object_in_memory_store (#21284)
Since we have not been using `put_small_object_in_memory_store` flag for a long time, it's should be removed.
2022-01-06 14:46:58 +08:00
Archit Kulkarni
fd02065ce5
[CI] [docker] Fix docker image name regex matching (#21409) 2022-01-05 18:59:10 -08:00
Qing Wang
3c68370fcf
[Core] Cache job_configs instead of ray_namespace. (#21279)
We need to get not only ray_namespace config of a job. In this PR, we cache the job_configs instead of ray_namespaces, so that we can use it for other PR(For example, this PR #21249 needs the num_java_worker_pre_process item).

Also, before this PR, ray_namespaces_ cache will not be cleared, and we clear the cache in this PR.
2022-01-05 17:48:06 -08:00
xwjiang2010
9528ac62cd
[tune] remove unused return_or_clean_cached_pg. (#21403)
Unused code path.
2022-01-05 23:20:43 +00:00
Clark Zinzow
da4cc26449
[CI] Disable Java log rotation test. (#21394) 2022-01-05 14:51:27 -08:00
Gagandeep Singh
62c9fc95ea
[CI] [Serve] Unskipped test and bumped wait time to avoid race condition in test_deploy.py (#21382) 2022-01-05 14:28:42 -08:00
Ian Rodney
1b42a49e71
[CI] [Docker Build] Allow Branches with Double digits in regex matching(#21401) 2022-01-05 14:19:19 -08:00
Simon Mo
f16b422062
[CI] Migrate Windows Wheels to Buildkite (#21388) 2022-01-05 12:49:19 -08:00
Jiajun Yao
76b91efd9b
Fix wrong many_nodes_actor_test app config (#21404)
RAY_GCS_ACTOR_SCHEDULING_ENABLED is wrong should be RAY_gcs_actor_scheduling_enabled. Since gcs based actor scheduling is not enabled yet so I just removed this flag.
2022-01-05 11:52:13 -08:00
Yi Cheng
72c9fef5f3
[nightly] Enable GCS HA nightly test with bootstrap (#21389)
After https://github.com/ray-project/ray/pull/21232 we are able to start ray without redis. We need to bake the test for a while before turning on the flag by default.
This PR add tests for this.
2022-01-05 10:53:07 -08:00
mwtian
24da654d90
[Test] Shard "Small & Large" tests (#21351) 2022-01-05 10:49:14 -08:00
Sven Mika
853d10871c
[RLlib] Issue 18499: PGTrainer with training_iteration fn does not support multi-GPU. (#21376) 2022-01-05 18:22:33 +01:00
Lixin Wei
64a2ba47d3
[Core] Rename PublisherService to SubscriberService (#20666)
`PublisherClient` is a more reasonable name than `SubscriberClient` since XClient means ‘client used to access X’, like GcsClient.

Besides, in the current codebase we already called this client `publisher_client`(line 329/333), while the actual class name is `SubscriberClient`, this is inconsistent.
a8d7897a56/src/ray/pubsub/subscriber.cc (L326-L339)
2022-01-05 05:40:45 -08:00
Sven Mika
9e6b871739
[RLlib] Better utils for flattening complex inputs and enable prev-actions for LSTM/attention for complex action spaces. (#21330) 2022-01-05 11:29:44 +01:00
SangBin Cho
94af7ccc92
[Actor exception message improvement] Unify the schema + improve error messages. (#21219)
This PR is added to handle this comment; https://github.com/ray-project/ray/pull/20903#discussion_r772635662

The PR 
- Unifies the multiple actor died error to a single schema. (cannot unify runtime env or creation task exception)
- Improve each of actor error message to include more metadata.
- Include actor information to actor death cause.
2022-01-04 23:22:57 -08:00
mwtian
70db5c5592
[GCS][Bootstrap n/n] Do not start Redis in GCS bootstrapping mode (#21232)
After this change in GCS bootstrapping mode, Redis no longer starts and `address` is treated as the GCS address of the Ray cluster.

Co-authored-by: Yi Cheng <chengyidna@gmail.com>
Co-authored-by: Yi Cheng <74173148+iycheng@users.noreply.github.com>
2022-01-04 23:06:44 -08:00
Philip Pilgerstorfer
8884cf0f4f
[Java] Bump log4j 2.17.0 to 2.17.1 (#21373)
New log4j version fixes vulnerability:
* https://nvd.nist.gov/vuln/detail/CVE-2021-44832
2022-01-05 09:58:48 +08:00
Qing Wang
240e6efe21
[Java] Try to fix flaky NamespaceTest (#21370) 2022-01-05 09:01:34 +08:00
Gagandeep Singh
819e034023
Unskipped test_reconfigure_with_exception & test_deploy_handle_validation (#21374)
These two tests pass without issues on my Windows machine. Rest time out or fail.
2022-01-04 12:58:11 -08:00
Antoni Baum
3632494ce0
[train] Fix start_training in logging callbacks (#21357)
Fixes outdated `start_training` definitions and calls in Train logging callbacks & abstract classes.
2022-01-04 12:46:39 -08:00
xwjiang2010
fc22200af8
[tune] deflake pbt. (#21366)
We use `trial.checkpoint` to restore a perturbed trial. Currently trial.checkpoint is looking at both in-memory and persistent checkpoints to find the most recent one. The definition of "the most recent one" is based on iteration. This may no longer be a valid assumption in PBT case, considering `trial_low_quantile` may have an iter=2_persistent_checkpoint as well as a iter=1_in_memory_checkpoint (perturbed from `trial_upper_quantile`).
2022-01-04 20:33:17 +00:00
shrekris-anyscale
e45383793f
[Serve] Clean up router.py (#21344) 2022-01-04 09:46:33 -08:00
Sven Mika
c01245763e
[RLlib] Revert "Revert "updated pettingzoo wrappers, env versions, urls"" (#21339) 2022-01-04 18:30:26 +01:00
Kai Fricke
94242e3e6e
[ci/repro] Add SYS_PTRACE to docker container, use unique name (#21377)
This will start repro docker containers with SYS_PTRACE capabilities to enable debugging e.g. via py-spy.
Additionally, default instance name tags for instance re-use will be generated using the buildkite build id and job id.
2022-01-04 16:59:12 +00:00
Jiajun Yao
5aa00ba5eb
[doc] Fix typos in serve documentation (#21379) 2022-01-04 10:56:07 -06:00
Kai Fricke
aa35045b6f
[ci/release] Update to recent anyscale API changes (#21149)
Recent changes in the anyscale API rendered the current e2e script incompatible. This PR resolves these subtle API changes.
2022-01-04 11:21:47 +00:00
Sven Mika
abd3bef63b
[RLlib] QMIX better defaults + added to CI learning tests (#21332) 2022-01-04 08:54:41 +01:00
mwtian
8cc268096c
[GCS][Bootstrap 3/n] Refactor to support GCS bootstrap (#21295)
This PR refactors several components to support switching to GCS address bootstrapping later:
- Treat address from `ray.init()` and `ray` CLI as bootstrap address instead of assuming it is Redis address.
- Ray client servers support `--address` flag instead of `--redis-address`.
- A few other miscellaneous cleanup.

Also, add a test for starting non-head node with `ray start`.
2022-01-03 23:52:12 -08:00
Jiao
6e77b3945d
[Serve] [nit] Remove unreachable line in ActorReplicaWrapper(#21361) 2022-01-03 17:08:58 -08:00
Simon Mo
e60a5f52eb
[Serve] Fix iterator-and-mutate bug in FastAPI view (#21362) 2022-01-03 17:02:31 -08:00
Tao Wang
b9106483af
[Core]Clear the unnecessary fields before broadcasting (#20965)
Only `resource_avaialbe` and `resource_total` are used in raylet, so let's clear the rest before broadcasting.
2022-01-03 15:56:41 -08:00
Balaji Veeramani
7efe1bef11
[Train] Add PrintCallback (#21261)
Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
2022-01-03 14:03:04 -08:00
Chen Shen
704404d408
[BigDataTraining] Fix test script introduced by API change (#21347)
* fix

* fix test failure

* Update release/nightly_tests/dataset/ray_sgd_training.py

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
2022-01-03 12:14:36 -08:00
Archit Kulkarni
4581baa7dc
Revert "WINDOWS: unskip passing runtime_env tests (#21252)" (#21352)
This reverts commit fcb952e1bc.
2022-01-03 11:07:17 -08:00
Balaji Veeramani
43a9e95dc0
[CI] Add support for Black formatting (#21281) 2022-01-03 10:06:41 -08:00
Balaji Veeramani
4e8f90aca2
[Train] Replace abc.ABCMeta with abc.ABC in callbacks (#21262)
Inheriting from `abc.ABC` is more readable than setting the meta class to `abc.ABCMeta`.

Relevant snippet from the Python 3.4 release notes:
> New class ABC has ABCMeta as its meta class. Using ABC as a base class has essentially the same effect as specifying metaclass=abc.ABCMeta, but is simpler to type and easier to read. (Contributed by Bruno Dupuis in bpo-16049.)

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Matthew Deng <matthew.j.deng@gmail.com>
2022-01-03 09:25:44 -08:00
Balaji Veeramani
fa4e41c5b2
[Train] Monkeypatch environment variables in test_json (#21260)
If we use `os.environ` to set environment variables in tests, then our tests become coupled. By using `monkeypatch`, we can safely set environment variables while ensuring our tests remain decoupled. 

For more information, see the [monkeypatching documentation](https://docs.pytest.org/en/6.2.x/monkeypatch.html#monkeypatching-environment-variables).
2022-01-03 09:12:44 -08:00
Antoni Baum
7ce22b72ed
[datasets] Expand to_torch's functionality (#21117)
Expands the `to_torch` method for Datasets with:
* An ability to choose to output a list/dict of feature tensors instead of just one (through setting `feature_columns` to be a list of lists or a dict of lists)
* An ability to choose whether the label should be unsqueezed or not
* An ability to pass `None` as the label (for prediction).

Furthermore, this changes how the `feature_column_dtypes` argument works. Previously, it took a list of dtypes for each feature. However, as the tensor was concatenated in the end, only one dtype mattered (the biggest one). Now, this argument expects a single dtype which will be applied to the features tensor (or a list/dict if `feature_columns` is a list of list/dict of lists).

Unit tests for all cases are included.

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
2022-01-03 09:03:50 -08:00
xwjiang2010
c18caa4db3
[tune] remove TrialExecutor.resume_trial. (#21225)
This removes unused code.
2022-01-03 16:38:40 +00:00
Antoni Baum
6a2dedb41d
[tune] Fix dtype coercion in tune.choice (#21270)
When a list with mixed types is passed to tune.choice, they will be coerced to a single dtype during sampling (due to numpy.choice converting to an array internally). This behaviour is unintentional and surprising. This PR fixes this issue.
2022-01-03 16:32:30 +00:00
Kai Fricke
10290eeb2f
[ci] Pin manylinux docker image (#21341) 2022-01-03 14:36:21 +00:00
Kai Fricke
489e6945a6
Revert "[RLlib] Updated pettingzoo wrappers, env versions, urls (#20113)" (#21338)
This reverts commit 327eb84154.
2022-01-03 10:21:25 +00:00
Benjamin Black
327eb84154
[RLlib] Updated pettingzoo wrappers, env versions, urls (#20113) 2022-01-02 21:29:09 +01:00
Ishant Mrinal
ec34185771
[RLlib] RE3 documentation (#21199) 2022-01-02 17:31:53 +01:00
Carlo Grisetti
ff768ea9d4
[RLlib] Change deprecated rllib/utils/tf_ops.py import (#20978) 2022-01-02 17:29:37 +01:00
Balaji Veeramani
c263008c07
[RLlib] Move __grouping_doc_end__ (#21321)
These changes are needed for two reasons.

**`__grouping_doc_end__` is in the wrong place**
If you look at the part of the Ray documentation where the tag is referenced, you'll read
> You can use the MultiAgentEnv.with_agent_groups() method to define these groups:

However, if you look at the code snippet below, you'll see the implementation of `to_base_env` in addition to the implementation of `with_agent_groups`.

To remove `to_base_env` from the code snippet, we need to move `__grouping_doc__end__`.

**Black cannot format `multi_agent_env.py`**
For some reason, Black errors while formatting `multi_agent_env.py`. However, if we move `__grouping_doc_end__` up, the issue is resolved.
2022-01-01 20:11:06 -08:00
Balaji Veeramani
fae5b9b1af
[Core] Disable formatting in test_add_min_workers_nodes (#21322)
Black errors while formatting `test_resource_demand_scheduler.py`. The issue is caused by the [assertions](https://github.com/ray-project/ray/blob/master/python/ray/tests/test_resource_demand_scheduler.py#L383-L428) at the end of `test_add_min_workers_nodes`. 



To prevent `format.sh` from erroring once we switch to Black, I've disabled formatting around the assertions.
2022-01-01 18:16:33 -08:00
Balaji Veeramani
416bce6378
Ignore E731 in worker_set.py and sampler.py (#21320) 2022-01-01 18:05:14 -08:00
Qing Wang
340fbf53c0
[Java] Support actor handle reference counting. (#21249) 2022-01-01 10:26:22 +08:00