hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 18:41:40 -05:00

Author	SHA1	Message	Date
Amog Kamsetty	8b4cb45088	[Docs] Update Ray Lightning API (#21428 ) Update ray lightning api docs to reflect new changes in ray lightning master. Making this quick change to fix CI and unblock the release, but will follow up on a proper fix for this. Closes #21426	2022-01-06 12:14:33 -08:00
Avnish Narayan	f7a5fc36eb	[rllib] Give rnnsac_stateless cartpole gpu, increase timeout (#21407 ) Increase test_preprocessors runtimes.	2022-01-06 11:54:19 -08:00
Archit Kulkarni	c7b2d549e3	[runtime env] Fix "conda" field for M1 macs (#21229 ) Currently when the "conda" field of runtime_env is specified, we automatically insert the currently running Ray wheel in the conda dependencies (in the nested `pip` list). This Ray wheel is specified by a URL to Amazon S3, which is where we store our Ray wheels. Unfortunately, currently the M1 wheels are built manually and are uploaded directly to PyPI, and this only happens once for each stable release (in contrast to non-M1 wheels which are auto-built and uploaded to S3 for every commit on master and release branches.). So prior to this PR, if you tried to use the `"conda"` field on M1, it would fail with a message saying it couldn't find the appropriate wheel for the platform. To fix this, in the case of our Ray cluster running on M1 Mac the only thing we can do for now is to insert `"ray=={ray.__version__}` as our `pip` specifier, instead of the (nonexistent) S3 URL. The downside of this approach is (1) nightly wheels and wheels built from commits on master remain unsupported for M1, and (2) we cannot end-to-end test this codepath on a new stable version of Ray before that version is actually released to PyPI. However, this PR adds a unit test.	2022-01-06 09:48:59 -06:00
Jiajun Yao	48a5208645	Refactor ObjectManager wait logic to WaitManager (#21369 ) - This PR moves the `ObjectManager::Wait` related logic to a separate WaitManager class. - Fix the wait hang issue by not relying on the async object location notification, but checking if wait is complete when the local object is added, at that time the object is guaranteed to be local.	2022-01-06 10:42:31 -05:00
Kai Fricke	976ece4bc4	[tune] Add test for heterogeneous resource request deadlocks (#21397 ) This adds a test for potential resource deadlocks in experiments with heterogeneous PGFs. If the PGF of a later trial becomes ready before that of a previous trial, we could run into a deadlock. This is currently avoided, but untested, flagging the code path for removal in #21387.	2022-01-06 10:44:30 +00:00
Qing Wang	132e2b2a96	[Core] Remove unused flag put_small_object_in_memory_store (#21284 ) Since we have not been using `put_small_object_in_memory_store` flag for a long time, it's should be removed.	2022-01-06 14:46:58 +08:00
Archit Kulkarni	fd02065ce5	[CI] [docker] Fix docker image name regex matching (#21409 )	2022-01-05 18:59:10 -08:00
Qing Wang	3c68370fcf	[Core] Cache job_configs instead of ray_namespace. (#21279 ) We need to get not only ray_namespace config of a job. In this PR, we cache the job_configs instead of ray_namespaces, so that we can use it for other PR(For example, this PR #21249 needs the num_java_worker_pre_process item). Also, before this PR, ray_namespaces_ cache will not be cleared, and we clear the cache in this PR.	2022-01-05 17:48:06 -08:00
xwjiang2010	9528ac62cd	[tune] remove unused return_or_clean_cached_pg. (#21403 ) Unused code path.	2022-01-05 23:20:43 +00:00
Clark Zinzow	da4cc26449	[CI] Disable Java log rotation test. (#21394 )	2022-01-05 14:51:27 -08:00
Gagandeep Singh	62c9fc95ea	[CI] [Serve] Unskipped test and bumped wait time to avoid race condition in test_deploy.py (#21382 )	2022-01-05 14:28:42 -08:00
Ian Rodney	1b42a49e71	[CI] [Docker Build] Allow Branches with Double digits in regex matching(#21401 )	2022-01-05 14:19:19 -08:00
Simon Mo	f16b422062	[CI] Migrate Windows Wheels to Buildkite (#21388 )	2022-01-05 12:49:19 -08:00
Jiajun Yao	76b91efd9b	Fix wrong many_nodes_actor_test app config (#21404 ) RAY_GCS_ACTOR_SCHEDULING_ENABLED is wrong should be RAY_gcs_actor_scheduling_enabled. Since gcs based actor scheduling is not enabled yet so I just removed this flag.	2022-01-05 11:52:13 -08:00
Yi Cheng	72c9fef5f3	[nightly] Enable GCS HA nightly test with bootstrap (#21389 ) After https://github.com/ray-project/ray/pull/21232 we are able to start ray without redis. We need to bake the test for a while before turning on the flag by default. This PR add tests for this.	2022-01-05 10:53:07 -08:00
mwtian	24da654d90	[Test] Shard "Small & Large" tests (#21351 )	2022-01-05 10:49:14 -08:00
Sven Mika	853d10871c	[RLlib] Issue 18499: PGTrainer with training_iteration fn does not support multi-GPU. (#21376 )	2022-01-05 18:22:33 +01:00
Lixin Wei	64a2ba47d3	[Core] Rename PublisherService to SubscriberService (#20666 ) `PublisherClient` is a more reasonable name than `SubscriberClient` since XClient means ‘client used to access X’, like GcsClient. Besides, in the current codebase we already called this client `publisher_client`(line 329/333), while the actual class name is `SubscriberClient`, this is inconsistent. `a8d7897a56/src/ray/pubsub/subscriber.cc (L326-L339)`	2022-01-05 05:40:45 -08:00
Sven Mika	9e6b871739	[RLlib] Better utils for flattening complex inputs and enable prev-actions for LSTM/attention for complex action spaces. (#21330 )	2022-01-05 11:29:44 +01:00
SangBin Cho	94af7ccc92	[Actor exception message improvement] Unify the schema + improve error messages. (#21219 ) This PR is added to handle this comment; https://github.com/ray-project/ray/pull/20903#discussion_r772635662 The PR - Unifies the multiple actor died error to a single schema. (cannot unify runtime env or creation task exception) - Improve each of actor error message to include more metadata. - Include actor information to actor death cause.	2022-01-04 23:22:57 -08:00
mwtian	70db5c5592	[GCS][Bootstrap n/n] Do not start Redis in GCS bootstrapping mode (#21232 ) After this change in GCS bootstrapping mode, Redis no longer starts and `address` is treated as the GCS address of the Ray cluster. Co-authored-by: Yi Cheng <chengyidna@gmail.com> Co-authored-by: Yi Cheng <74173148+iycheng@users.noreply.github.com>	2022-01-04 23:06:44 -08:00
Philip Pilgerstorfer	8884cf0f4f	[Java] Bump log4j 2.17.0 to 2.17.1 (#21373 ) New log4j version fixes vulnerability: * https://nvd.nist.gov/vuln/detail/CVE-2021-44832	2022-01-05 09:58:48 +08:00
Qing Wang	240e6efe21	[Java] Try to fix flaky NamespaceTest (#21370 )	2022-01-05 09:01:34 +08:00
Gagandeep Singh	819e034023	Unskipped `test_reconfigure_with_exception` & `test_deploy_handle_validation` (#21374 ) These two tests pass without issues on my Windows machine. Rest time out or fail.	2022-01-04 12:58:11 -08:00
Antoni Baum	3632494ce0	[train] Fix `start_training` in logging callbacks (#21357 ) Fixes outdated `start_training` definitions and calls in Train logging callbacks & abstract classes.	2022-01-04 12:46:39 -08:00
xwjiang2010	fc22200af8	[tune] deflake pbt. (#21366 ) We use `trial.checkpoint` to restore a perturbed trial. Currently trial.checkpoint is looking at both in-memory and persistent checkpoints to find the most recent one. The definition of "the most recent one" is based on iteration. This may no longer be a valid assumption in PBT case, considering `trial_low_quantile` may have an iter=2_persistent_checkpoint as well as a iter=1_in_memory_checkpoint (perturbed from `trial_upper_quantile`).	2022-01-04 20:33:17 +00:00
shrekris-anyscale	e45383793f	[Serve] Clean up router.py (#21344 )	2022-01-04 09:46:33 -08:00
Sven Mika	c01245763e	[RLlib] Revert "Revert "updated pettingzoo wrappers, env versions, urls"" (#21339 )	2022-01-04 18:30:26 +01:00
Kai Fricke	94242e3e6e	[ci/repro] Add SYS_PTRACE to docker container, use unique name (#21377 ) This will start repro docker containers with SYS_PTRACE capabilities to enable debugging e.g. via py-spy. Additionally, default instance name tags for instance re-use will be generated using the buildkite build id and job id.	2022-01-04 16:59:12 +00:00
Jiajun Yao	5aa00ba5eb	[doc] Fix typos in serve documentation (#21379 )	2022-01-04 10:56:07 -06:00
Kai Fricke	aa35045b6f	[ci/release] Update to recent anyscale API changes (#21149 ) Recent changes in the anyscale API rendered the current e2e script incompatible. This PR resolves these subtle API changes.	2022-01-04 11:21:47 +00:00
Sven Mika	abd3bef63b	[RLlib] QMIX better defaults + added to CI learning tests (#21332 )	2022-01-04 08:54:41 +01:00
mwtian	8cc268096c	[GCS][Bootstrap 3/n] Refactor to support GCS bootstrap (#21295 ) This PR refactors several components to support switching to GCS address bootstrapping later: - Treat address from `ray.init()` and `ray` CLI as bootstrap address instead of assuming it is Redis address. - Ray client servers support `--address` flag instead of `--redis-address`. - A few other miscellaneous cleanup. Also, add a test for starting non-head node with `ray start`.	2022-01-03 23:52:12 -08:00
Jiao	6e77b3945d	[Serve] [nit] Remove unreachable line in `ActorReplicaWrapper`(#21361 )	2022-01-03 17:08:58 -08:00
Simon Mo	e60a5f52eb	[Serve] Fix iterator-and-mutate bug in FastAPI view (#21362 )	2022-01-03 17:02:31 -08:00
Tao Wang	b9106483af	[Core]Clear the unnecessary fields before broadcasting (#20965 ) Only `resource_avaialbe` and `resource_total` are used in raylet, so let's clear the rest before broadcasting.	2022-01-03 15:56:41 -08:00
Balaji Veeramani	7efe1bef11	[Train] Add `PrintCallback` (#21261 ) Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>	2022-01-03 14:03:04 -08:00
Chen Shen	704404d408	[BigDataTraining] Fix test script introduced by API change (#21347 ) * fix * fix test failure * Update release/nightly_tests/dataset/ray_sgd_training.py Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>	2022-01-03 12:14:36 -08:00
Archit Kulkarni	4581baa7dc	Revert "WINDOWS: unskip passing runtime_env tests (#21252 )" (#21352 ) This reverts commit `fcb952e1bc`.	2022-01-03 11:07:17 -08:00
Balaji Veeramani	43a9e95dc0	[CI] Add support for Black formatting (#21281 )	2022-01-03 10:06:41 -08:00
Balaji Veeramani	4e8f90aca2	[Train] Replace `abc.ABCMeta` with `abc.ABC` in callbacks (#21262 ) Inheriting from `abc.ABC` is more readable than setting the meta class to `abc.ABCMeta`. Relevant snippet from the Python 3.4 release notes: > New class ABC has ABCMeta as its meta class. Using ABC as a base class has essentially the same effect as specifying metaclass=abc.ABCMeta, but is simpler to type and easier to read. (Contributed by Bruno Dupuis in bpo-16049.) Co-authored-by: Richard Liaw <rliaw@berkeley.edu> Co-authored-by: Matthew Deng <matthew.j.deng@gmail.com>	2022-01-03 09:25:44 -08:00
Balaji Veeramani	fa4e41c5b2	[Train] Monkeypatch environment variables in `test_json` (#21260 ) If we use `os.environ` to set environment variables in tests, then our tests become coupled. By using `monkeypatch`, we can safely set environment variables while ensuring our tests remain decoupled. For more information, see the [monkeypatching documentation](https://docs.pytest.org/en/6.2.x/monkeypatch.html#monkeypatching-environment-variables).	2022-01-03 09:12:44 -08:00
Antoni Baum	7ce22b72ed	[datasets] Expand `to_torch`'s functionality (#21117 ) Expands the `to_torch` method for Datasets with: * An ability to choose to output a list/dict of feature tensors instead of just one (through setting `feature_columns` to be a list of lists or a dict of lists) * An ability to choose whether the label should be unsqueezed or not * An ability to pass `None` as the label (for prediction). Furthermore, this changes how the `feature_column_dtypes` argument works. Previously, it took a list of dtypes for each feature. However, as the tensor was concatenated in the end, only one dtype mattered (the biggest one). Now, this argument expects a single dtype which will be applied to the features tensor (or a list/dict if `feature_columns` is a list of list/dict of lists). Unit tests for all cases are included. Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>	2022-01-03 09:03:50 -08:00
xwjiang2010	c18caa4db3	[tune] remove TrialExecutor.resume_trial. (#21225 ) This removes unused code.	2022-01-03 16:38:40 +00:00
Antoni Baum	6a2dedb41d	[tune] Fix dtype coercion in tune.choice (#21270 ) When a list with mixed types is passed to tune.choice, they will be coerced to a single dtype during sampling (due to numpy.choice converting to an array internally). This behaviour is unintentional and surprising. This PR fixes this issue.	2022-01-03 16:32:30 +00:00
Kai Fricke	10290eeb2f	[ci] Pin manylinux docker image (#21341 )	2022-01-03 14:36:21 +00:00
Kai Fricke	489e6945a6	Revert "[RLlib] Updated pettingzoo wrappers, env versions, urls (#20113 )" (#21338 ) This reverts commit `327eb84154`.	2022-01-03 10:21:25 +00:00
Benjamin Black	327eb84154	[RLlib] Updated pettingzoo wrappers, env versions, urls (#20113 )	2022-01-02 21:29:09 +01:00
Ishant Mrinal	ec34185771	[RLlib] RE3 documentation (#21199 )	2022-01-02 17:31:53 +01:00
Carlo Grisetti	ff768ea9d4	[RLlib] Change deprecated `rllib/utils/tf_ops.py` import (#20978 )	2022-01-02 17:29:37 +01:00

1 2 3 4 5 ...

10859 commits