Commit graph

5872 commits

Author SHA1 Message Date
Archit Kulkarni
c7b2d549e3
[runtime env] Fix "conda" field for M1 macs (#21229)
Currently when the "conda" field of runtime_env is specified, we automatically insert the currently running Ray wheel in the conda dependencies (in the nested `pip` list).  This Ray wheel is specified by a URL to Amazon S3, which is where we store our Ray wheels.  

Unfortunately, currently the M1 wheels are built manually and are uploaded directly to PyPI, and this only happens once for each stable release (in contrast to non-M1 wheels which are auto-built and uploaded to S3 for every commit on master and release branches.).  So prior to this PR, if you tried to use the `"conda"` field on M1, it would fail with a message saying it couldn't find the appropriate wheel for the platform.

To fix this, in the case of our Ray cluster running on M1 Mac the only thing we can do for now is to insert `"ray=={ray.__version__}` as our `pip` specifier, instead of the (nonexistent) S3 URL.  

The downside of this approach is (1) nightly wheels and wheels built from commits on master remain unsupported for M1, and (2) we cannot end-to-end test this codepath on a new stable version of Ray before that version is actually released to PyPI.  However, this PR adds a unit test.
2022-01-06 09:48:59 -06:00
Kai Fricke
976ece4bc4
[tune] Add test for heterogeneous resource request deadlocks (#21397)
This adds a test for potential resource deadlocks in experiments with heterogeneous PGFs. If the PGF of a later trial becomes ready before that of a previous trial, we could run into a deadlock. This is currently avoided, but untested, flagging the code path for removal in #21387.
2022-01-06 10:44:30 +00:00
Qing Wang
132e2b2a96
[Core] Remove unused flag put_small_object_in_memory_store (#21284)
Since we have not been using `put_small_object_in_memory_store` flag for a long time, it's should be removed.
2022-01-06 14:46:58 +08:00
xwjiang2010
9528ac62cd
[tune] remove unused return_or_clean_cached_pg. (#21403)
Unused code path.
2022-01-05 23:20:43 +00:00
Gagandeep Singh
62c9fc95ea
[CI] [Serve] Unskipped test and bumped wait time to avoid race condition in test_deploy.py (#21382) 2022-01-05 14:28:42 -08:00
Simon Mo
f16b422062
[CI] Migrate Windows Wheels to Buildkite (#21388) 2022-01-05 12:49:19 -08:00
mwtian
24da654d90
[Test] Shard "Small & Large" tests (#21351) 2022-01-05 10:49:14 -08:00
mwtian
70db5c5592
[GCS][Bootstrap n/n] Do not start Redis in GCS bootstrapping mode (#21232)
After this change in GCS bootstrapping mode, Redis no longer starts and `address` is treated as the GCS address of the Ray cluster.

Co-authored-by: Yi Cheng <chengyidna@gmail.com>
Co-authored-by: Yi Cheng <74173148+iycheng@users.noreply.github.com>
2022-01-04 23:06:44 -08:00
Gagandeep Singh
819e034023
Unskipped test_reconfigure_with_exception & test_deploy_handle_validation (#21374)
These two tests pass without issues on my Windows machine. Rest time out or fail.
2022-01-04 12:58:11 -08:00
Antoni Baum
3632494ce0
[train] Fix start_training in logging callbacks (#21357)
Fixes outdated `start_training` definitions and calls in Train logging callbacks & abstract classes.
2022-01-04 12:46:39 -08:00
xwjiang2010
fc22200af8
[tune] deflake pbt. (#21366)
We use `trial.checkpoint` to restore a perturbed trial. Currently trial.checkpoint is looking at both in-memory and persistent checkpoints to find the most recent one. The definition of "the most recent one" is based on iteration. This may no longer be a valid assumption in PBT case, considering `trial_low_quantile` may have an iter=2_persistent_checkpoint as well as a iter=1_in_memory_checkpoint (perturbed from `trial_upper_quantile`).
2022-01-04 20:33:17 +00:00
shrekris-anyscale
e45383793f
[Serve] Clean up router.py (#21344) 2022-01-04 09:46:33 -08:00
Sven Mika
c01245763e
[RLlib] Revert "Revert "updated pettingzoo wrappers, env versions, urls"" (#21339) 2022-01-04 18:30:26 +01:00
mwtian
8cc268096c
[GCS][Bootstrap 3/n] Refactor to support GCS bootstrap (#21295)
This PR refactors several components to support switching to GCS address bootstrapping later:
- Treat address from `ray.init()` and `ray` CLI as bootstrap address instead of assuming it is Redis address.
- Ray client servers support `--address` flag instead of `--redis-address`.
- A few other miscellaneous cleanup.

Also, add a test for starting non-head node with `ray start`.
2022-01-03 23:52:12 -08:00
Jiao
6e77b3945d
[Serve] [nit] Remove unreachable line in ActorReplicaWrapper(#21361) 2022-01-03 17:08:58 -08:00
Simon Mo
e60a5f52eb
[Serve] Fix iterator-and-mutate bug in FastAPI view (#21362) 2022-01-03 17:02:31 -08:00
Balaji Veeramani
7efe1bef11
[Train] Add PrintCallback (#21261)
Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
2022-01-03 14:03:04 -08:00
Archit Kulkarni
4581baa7dc
Revert "WINDOWS: unskip passing runtime_env tests (#21252)" (#21352)
This reverts commit fcb952e1bc.
2022-01-03 11:07:17 -08:00
Balaji Veeramani
43a9e95dc0
[CI] Add support for Black formatting (#21281) 2022-01-03 10:06:41 -08:00
Balaji Veeramani
4e8f90aca2
[Train] Replace abc.ABCMeta with abc.ABC in callbacks (#21262)
Inheriting from `abc.ABC` is more readable than setting the meta class to `abc.ABCMeta`.

Relevant snippet from the Python 3.4 release notes:
> New class ABC has ABCMeta as its meta class. Using ABC as a base class has essentially the same effect as specifying metaclass=abc.ABCMeta, but is simpler to type and easier to read. (Contributed by Bruno Dupuis in bpo-16049.)

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Matthew Deng <matthew.j.deng@gmail.com>
2022-01-03 09:25:44 -08:00
Balaji Veeramani
fa4e41c5b2
[Train] Monkeypatch environment variables in test_json (#21260)
If we use `os.environ` to set environment variables in tests, then our tests become coupled. By using `monkeypatch`, we can safely set environment variables while ensuring our tests remain decoupled. 

For more information, see the [monkeypatching documentation](https://docs.pytest.org/en/6.2.x/monkeypatch.html#monkeypatching-environment-variables).
2022-01-03 09:12:44 -08:00
Antoni Baum
7ce22b72ed
[datasets] Expand to_torch's functionality (#21117)
Expands the `to_torch` method for Datasets with:
* An ability to choose to output a list/dict of feature tensors instead of just one (through setting `feature_columns` to be a list of lists or a dict of lists)
* An ability to choose whether the label should be unsqueezed or not
* An ability to pass `None` as the label (for prediction).

Furthermore, this changes how the `feature_column_dtypes` argument works. Previously, it took a list of dtypes for each feature. However, as the tensor was concatenated in the end, only one dtype mattered (the biggest one). Now, this argument expects a single dtype which will be applied to the features tensor (or a list/dict if `feature_columns` is a list of list/dict of lists).

Unit tests for all cases are included.

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
2022-01-03 09:03:50 -08:00
xwjiang2010
c18caa4db3
[tune] remove TrialExecutor.resume_trial. (#21225)
This removes unused code.
2022-01-03 16:38:40 +00:00
Antoni Baum
6a2dedb41d
[tune] Fix dtype coercion in tune.choice (#21270)
When a list with mixed types is passed to tune.choice, they will be coerced to a single dtype during sampling (due to numpy.choice converting to an array internally). This behaviour is unintentional and surprising. This PR fixes this issue.
2022-01-03 16:32:30 +00:00
Kai Fricke
489e6945a6
Revert "[RLlib] Updated pettingzoo wrappers, env versions, urls (#20113)" (#21338)
This reverts commit 327eb84154.
2022-01-03 10:21:25 +00:00
Benjamin Black
327eb84154
[RLlib] Updated pettingzoo wrappers, env versions, urls (#20113) 2022-01-02 21:29:09 +01:00
Balaji Veeramani
fae5b9b1af
[Core] Disable formatting in test_add_min_workers_nodes (#21322)
Black errors while formatting `test_resource_demand_scheduler.py`. The issue is caused by the [assertions](https://github.com/ray-project/ray/blob/master/python/ray/tests/test_resource_demand_scheduler.py#L383-L428) at the end of `test_add_min_workers_nodes`. 



To prevent `format.sh` from erroring once we switch to Black, I've disabled formatting around the assertions.
2022-01-01 18:16:33 -08:00
WanXing Wang
412cd6be76
[Core]Add RAY_REDIS_ADDRESS environment to specify external address. (#20966)
Support RAY_REDIS_ADDRESS environment variable option when ray start.
2021-12-31 16:12:56 +08:00
mwtian
20ca1d85c2
[GCS][Bootstrap 2/n] Fix tests to enable using GCS address for bootstrapping (#21288)
This PR contains most of the fixes @iycheng made in #21232, to make tests pass with GCS bootstrapping by supporting both Redis and GCS address as the bootstrap address. The main change is to use address_info["address"] to obtain the bootstrap address to pass to ray.init(), instead of using address_info["redis_address"]. In a subsequent PR, address_info["address"] will return the Redis or GCS address depending on whether using GCS to bootstrap.
2021-12-29 19:25:51 -07:00
Jiajun Yao
9776e21842
Revert "Round robin during spread scheduling (#19968)" (#21293)
This reverts commit 60388b2834.
2021-12-30 10:33:06 +09:00
mwtian
5377832383
[GCS][Bootstrap 1/n] Support bootstrapping with GCS in node.py (#21267) 2021-12-28 08:14:38 -07:00
Philipp Moritz
4b9e865fd7
Unskip remaining tests in test_basic.py on Windows (#21273) 2021-12-27 21:20:45 -08:00
Matti Picus
3de18d2ada
WINDOWS: enable passing/skipping tests (#21136) 2021-12-27 11:59:00 -08:00
Israël Hallé
59209d695b
Includes .pyi files in package data. (#21247) 2021-12-27 11:50:02 -08:00
Matti Picus
fcb952e1bc
WINDOWS: unskip passing runtime_env tests (#21252) 2021-12-26 20:49:02 -08:00
Akash Patel
cbcd03b779
Upgrade cython to 0.29.26 for py310 (#21244) 2021-12-26 20:26:08 -08:00
xwjiang2010
0b9cdb1eae
[tune] Have one canonical way of stopping trial. (#21021)
This PR is introducing a canonical impl for stopping trials by collecting scattered logic from process_trial_result back into stop_trial. This way, we know what is expected (e.g. what callbacks are invoked and when they are invoked).
This PR will correct the current wrong logic that on_trial_complete callback is invoked before on_trial_checkpoint, which is the source of Syncer clean up issues.
2021-12-25 10:13:30 +01:00
Gagandeep Singh
c5c5fec22b
Unskip test_standalone from ci.sh (#21235) 2021-12-25 00:21:58 -08:00
Yi Cheng
0d537c5d70
[5/gcs] Bootstrap default worker and update pubsub unit test (#21211)
This PR passes gcs address to worker and also update pubsub unit test.

Co-authored-by: mwtian <81660174+mwtian@users.noreply.github.com>
Co-authored-by: Mingwei Tian <mwtian@anyscale.com>
2021-12-23 07:57:14 -07:00
Jiajun Yao
60388b2834
Round robin during spread scheduling (#19968) 2021-12-22 20:27:34 -08:00
Yi Cheng
11ab412db1
[4/gcs] Bootstrap global accessor from gcs (#21195)
This is part of redis removal. This PR enable global accessor to be able to start from gcs

Co-authored-by: mwtian <81660174+mwtian@users.noreply.github.com>
Co-authored-by: Mingwei Tian <mwtian@anyscale.com>
2021-12-22 01:27:25 -08:00
Gagandeep Singh
92bf609a08
Unskip tests in `test_basic_3.py` (#20433) 2021-12-22 00:09:32 -08:00
Yi Cheng
0c786b1109
[3/gcs] Bootstrap log monitor and monitor from gcs (#21194)
This is part of redis removal. This PR enable log monitor and monitor to bootstrap from gcs

Co-authored-by: mwtian <81660174+mwtian@users.noreply.github.com>
Co-authored-by: Mingwei Tian <mwtian@anyscale.com>
2021-12-21 23:15:55 -08:00
Sidhartha Parhi
5d6409fe2e
[Train] Remove run_dir param from BackendExecutor (#21231)
The run_dir argument in ray.train.backend.BackendExecutor.start_training isn't used but is causing the following error: if your host computer and job cluster use different OS, then you get a pathlib error because, for e.g., you can't instantiate a pathlib.WindowsPath in a Linux system.

Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
2021-12-21 19:54:43 -08:00
Amog Kamsetty
57db4640ca
[Train] [Tune] Refactor MLflow (#20802)
Pulls out Tune's MLflow logging logic to a shared MLflow util.
Adds an MLflow logger callback to Ray Train

Closes #20642
2021-12-21 17:17:52 -08:00
Yi Cheng
09421a4ca6
[2/gcs] Bootstrap dashboard for gcs ha (#21179)
This is part of gcs ha project. This PR try to bootstrap dashboard with gcs address instead of redis.

Co-authored-by: mwtian <81660174+mwtian@users.noreply.github.com>
2021-12-21 16:58:03 -08:00
Eric Liang
1db03862a7
Isolate function exports by job in separate queues (#20882) 2021-12-21 16:19:00 -08:00
Gagandeep Singh
5dc0f90ada
[Windows] Unskipped tests in test_standalone.py (#21213) 2021-12-21 11:37:23 -08:00
Yi Cheng
f62faca04c
[1/gcs] gcs ha bootstrap for raylet (#21174)
This is part of #21129

This PR tries to cover the cpp/ray part of the bootstrap, some updates there:

remove the unused function/tests
some API updates

Co-authored-by: mwtian <81660174+mwtian@users.noreply.github.com>
2021-12-21 08:50:42 -08:00
SangBin Cho
5d3042ed9d
[Internal Observability] Record Raylet Gauge (#21049)
* Revert "[Please revert] Remove new metrics temporarily"

This reverts commit baf7846daa3d1dad50dbedac19b7afbae3e197fc.

* Addressed code review.

* [Please revert] Revert plasma stats for the next PR

* improve grammar

* Addressed code review v1.

* Addressed code review.

* Add code owner.

* Fix tests.

* Add code owner to metric_defs.cc
2021-12-21 00:34:48 -08:00