hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Amog Kamsetty	bb4ff42eec	[ml] `TorchTrainer` bug fixes + GPU test (#23293 ) Co-authored-by: Eric Liang <ekhliang@gmail.com>	2022-03-17 23:49:42 -07:00
Simon Mo	78d6ed7029	[Serve] [CI] Split Serve tests into multiple shards (#23145 )	2022-03-15 16:32:30 -07:00
mwtian	6eb805b357	[CI] remove GCS-Ray CI tests (#23149 ) * remove redis ci tests * remove mac	2022-03-14 18:18:59 -07:00
matthewdeng	6b0169b23d	[ml] enable CI tests (#22926 ) Follow-up to #22748, enabling tests in CI. Conditions: A new RAY_CI_ML_AFFECTED condition is added for this test suite. The package currently depends on Ray Data, and will be triggered accordingly. Dependencies: Adding DATA_PROCESSING_TESTING dependencies (set for install-dependencies.sh) for now.	2022-03-09 14:31:53 +00:00
mwtian	f67ff312a8	run mac c++ tests with static linking (#22829 ) There are problems with running C++ tests in MacOS 10.15 Catalina, when upgrading to the newest grpc due to dynamic linking: #22384 (comment). The problem does not exist for Python tests in Catalina, or in C++ tests of other systems. Upgrading MacOS CI from Catalina is also blocked in the short term: ray-project/buildkite-ci-stack#24 (comment) So working around the issue by using static linking for C++ tests on Mac.	2022-03-05 10:39:32 +09:00
Kai Fricke	a9bf5e9e2f	[ci] Update GPU docker image to Ubuntu 20.04 (#22759 ) This updates the GPU image to run on the same Ubuntu version as the regular (non-GPU) image. This implicitly updates cmake etc for compatibility with newer versions of downstream libraries, e.g. Horovod.	2022-03-02 10:28:26 +01:00
Sven Mika	e50bd212a1	[RLlib] Disable flakey Pendulum-v1 tests (until further investigation). (#22686 )	2022-03-01 16:44:17 +01:00
Sven Mika	7b687e6cd8	[RLlib] SlateQ: Add a hard-task learning test to weekly regression suite. (#22544 )	2022-02-25 21:58:16 +01:00
Simon Mo	3d3218d153	[CI] Add K8s Builder Step (#22035 )	2022-02-24 13:11:38 -08:00
Yi Cheng	e3051ebf67	[ci] Fix grpcio 1.44 break test_output (#22494 ) This PR limit grpc to be <= 1.42. This will fix testoutput.	2022-02-22 13:59:25 -08:00
Sven Mika	6522935291	[RLlib] Slate-Q tf implementation and tests/benchmarks. (#22389 )	2022-02-22 09:36:44 +01:00
Simon Mo	3e7511e84f	[CI] Disable privileged test (#22484 )	2022-02-17 15:34:02 -08:00
Kai Fricke	331b71ea8d	[ci/release] Refactor release test e2e into package (#22351 ) Adds a unit-tested and restructured ray_release package for running release tests. Relevant changes in behavior: Per default, Buildkite will wait for the wheels of the current commit to be available. Alternatively, users can a) specify a different commit hash, b) a wheels URL (which we will also wait for to be available) or c) specify a branch (or user/branch combination), in which case the latest available wheels will be used (e.g. if master is passed, behavior matches old default behavior). The main subpackages are: Cluster manager: Creates cluster envs/computes, starts cluster, terminates cluster Command runner: Runs commands, e.g. as client command or sdk command File manager: Uploads/downloads files to/from session Reporter: Reports results (e.g. to database) Much of the code base is unit tested, but there are probably some pieces missing. Example build (waited for wheels to be built): https://buildkite.com/ray-project/kf-dev/builds/51#_ Wheel build: https://buildkite.com/ray-project/ray-builders-branch/builds/6023	2022-02-16 17:35:02 +00:00
Eric Liang	92550500bc	Split workflow and dataset tests (#22415 )	2022-02-16 01:47:55 -08:00
matthewdeng	2c204a755b	[train] add minimal installation test suite (#22300 ) Adding a minimal test suite to catch any regressions from accidentally adding backend imports (e.g. `torch`, `tensorflow`, `horovod`) to the main import path. Example: If I'm running Ray Train with `tensorflow`, I should not be required to have `torch` installed.	2022-02-11 10:09:00 -08:00
SangBin Cho	20ab9188c6	[Ray Usage Stats] Record cluster metadata + Refactoring. (#22170 ) This is the first PR to implement usage stats on Ray. Please refer to the file `usage_lib.py` for more details. The full specification is here https://docs.google.com/document/d/1ZT-l9YbGHh-iWRUC91jS-ssQ5Qe2UQ43Lsoc1edCalc/edit#heading=h.17dss3b9evbj. You can see the full PR for phase 1 from here; https://github.com/rkooo567/ray/pull/108/files. The PR is doing some basic refactoring + adding cluster metadata to GCS instead of the version numbers. After this PR, we will add code to enable usage report "off by default".	2022-02-08 22:12:36 -08:00
Avnish Narayan	0d2ba41e41	[RLlib] [CI] Deflake longer running RLlib learning tests for off policy algorithms. Fix seeding issue in TransformedAction Environments (#21685 )	2022-02-04 14:59:56 +01:00
Kai Fricke	b51b5afaea	[ci/gpu] Move ML dependency install to Dockerfile (#21711 ) Instead of installing dependencies in each Buildkite job, let's move this to the Dockerfile instead. This will update GPU tests to always use Python 3.7.	2022-02-01 12:04:55 +00:00
SangBin Cho	3566cfd279	[Dashboard] Enable dashboard in the minimal ray installation (#21896 ) This is the last PR to enable dashboard in the minimal ray installation. Look https://docs.google.com/document/d/12qP3x5uaqZSKS-A_kK0ylPOp0E02_l-deAbmm8YtdFw/edit# for more details;	2022-01-31 22:34:40 -08:00
Balaji Veeramani	7f1bacc7dc	[CI] Format Python code with Black (#21975 ) See #21316 and #21311 for the motivation behind these changes.	2022-01-29 18:41:57 -08:00
SangBin Cho	e62c0052a0	[Dashboard] Agent in minimal ray installation (#21817 ) This is the second part of https://docs.google.com/document/d/12qP3x5uaqZSKS-A_kK0ylPOp0E02_l-deAbmm8YtdFw/edit#. After this PR, dashboard agents will fully work with minimal ray installation. Note that this PR requires to introduce "aioredis", "frozenlist", and "aiosignal" to the minimal installation. These dependencies are very small (or will be removed soon), and including them to minimal makes thing very easy. Please see the below for the reasoning.	2022-01-26 04:03:54 -08:00
Lingxuan Zuo	ec62d7f510	[Streaming]Farewell : remove all of streaming related from ray repo. (#21770 ) New repo url is https://github.com/ray-project/mobius Co-authored-by: 林濯 <lingxuzn.zlx@antgroup.com>	2022-01-23 17:53:41 +08:00
shrekris-anyscale	75b3080834	[Serve] Serve Autoscaling Release tests (#21208 )	2022-01-21 12:08:25 -08:00
Yi Cheng	3c63a8410d	[gcs/ha] Fix java related error when enable redisless ray (#21692 ) This PR enables ray java to be able to run without redis. It also fixes java related tests and updated the pipeline.	2022-01-20 13:56:25 -08:00
Yi Cheng	82103bf7c1	[gcs/ha] Fix cpp tests related to redis removal (#21628 ) This PR fixed cpp tests and also make ray cpp able to pass.	2022-01-19 01:26:34 -08:00
mwtian	4faf3e1e31	[GCS] reenable test_client_reconnect.py for GCS HA builds (#21589 ) In test_client_reconnect.py, each test case starts a Ray cluster via client server's default_connect_handler(). The Ray cluster shuts down implicitly when the start_middleman_server() ended and Python GC'es the client server. After turning on GCS pubsub, the time when client server is GC'ed changes. Sometimes the Ray cluster from a previous test cases stays alive after the next test case starts and shuts down later, leading to test failures due to lost data or crashes (race during worker shutdown, will be investigated separately). This PR makes sure each test case shuts down its Ray cluster.	2022-01-17 23:08:47 -08:00
Yi Cheng	87d852fc28	[gcs/ha] Fix some tests failed in HA mode (#21587 ) This PR fixed and reenabled tests in HA mode - //python/ray/tests:test_healthcheck - //python/ray/tests:test_autoscaler_drain_node_api - //python/ray/tests:test_ray_debugger	2022-01-16 21:53:14 -08:00
Yi Cheng	6dccfbffa9	Revert "Revert "[gcs] turn on grpc pubsub by default"" (#21585 ) Reverts ray-project/ray#21584 and turn the flag off	2022-01-13 16:12:03 -08:00
mwtian	30968a9358	[GCS] support external Redis in GCS bootstrapping mode (#21436 ) External Redis should still be supported with GCS bootstrapping, to avoid breaking users. In GCS mode, some logic are removed for external Redis: - Printing external Redis addresses to terminal: hard to implement across `ray start`, `ray.init()` and Ray cluster util. - Starting local Redis if external Redis is unavailable: failing loudly here seems more appropriate. Also, re-enable a few tests which restarts GCS in GCS bootstrapping mode, by using external Redis for KV storage.	2022-01-13 16:01:11 -08:00
Yi Cheng	bc696212d2	Revert "[gcs] turn on grpc pubsub by default" (#21584 ) test-reconnect seems flaky. Reverts ray-project/ray#21513	2022-01-13 12:34:02 -08:00
Kai Fricke	a3442df584	[ci/multinode] Build multinode image with OpenSSH before running tests (#21544 ) Currently we install OpenSSH on the fly in fake multinode docker testing. Instead we can speed testing up a fair bit by building a Docker image which includes OpenSSH first and then run tests with this image.	2022-01-13 08:47:04 -08:00
Yi Cheng	6194783312	[gcs] turn on grpc pubsub by default (#21513 ) Turn on grpc pubsub by default. This PR also fixed several tests which are failed before. Co-authored-by: Mingwei Tian <mwtian@anyscale.com>	2022-01-12 22:13:03 -08:00
mwtian	0e5de61c18	remove unnecessary test filter (#21510 ) (Comment from the PR:) If a GRPC call exceeds timeout, the calls is cancelled at client side but server may still reply to it, leading to missed messages and test failures. Using a sequence number to ensure no message is dropped can be the long term solution, but its complexity and the fact the Ray subscribers do not use deadline in production makes it less preferred. Therefore, a simpler workaround is used instead: a different subscriber is used for each get_error_message() call. Also, re-enable some additional tests in GCS HA mode.	2022-01-11 10:17:03 -08:00
Kai Fricke	5a7f6e4fdd	[rfc][ci] create fake docker-compose cluster environment (#20256 ) Following #18987 this PR adds a docker-compose based local multi node cluster. The fake multinode docker comprises two parts. The docker_monitor.py script is a watch script calling docker compose up whenever the docker-compose.yaml changes. The node provider creates and updates the docker compose according to the autoscaling requirements. This mode fully supports autoscaling and comes with test utilities to start and connect to docker-compose autoscaling environments. There's also a sample test case showing how this can be used.	2022-01-11 04:35:36 +00:00
Yi Cheng	65598b3bb0	[gcs] Re-enable release tests with GCS HA (#21511 ) Re-enable release tests with GCS HA mode.	2022-01-10 16:35:57 -08:00
Yi Cheng	4ab059eaa1	[gcs] Fix the server standalone tests in HA mode (#21480 ) CoreWorker hangs there before exiting if gcs exits first due to in correct ordering of destruction. This PR fixed this. It'll stop gcs client first and then job the thread.	2022-01-07 22:54:50 -08:00
Simon Mo	f16b422062	[CI] Migrate Windows Wheels to Buildkite (#21388 )	2022-01-05 12:49:19 -08:00
mwtian	24da654d90	[Test] Shard "Small & Large" tests (#21351 )	2022-01-05 10:49:14 -08:00
mwtian	70db5c5592	[GCS][Bootstrap n/n] Do not start Redis in GCS bootstrapping mode (#21232 ) After this change in GCS bootstrapping mode, Redis no longer starts and `address` is treated as the GCS address of the Ray cluster. Co-authored-by: Yi Cheng <chengyidna@gmail.com> Co-authored-by: Yi Cheng <74173148+iycheng@users.noreply.github.com>	2022-01-04 23:06:44 -08:00
Sven Mika	c01245763e	[RLlib] Revert "Revert "updated pettingzoo wrappers, env versions, urls"" (#21339 )	2022-01-04 18:30:26 +01:00
Kai Fricke	489e6945a6	Revert "[RLlib] Updated pettingzoo wrappers, env versions, urls (#20113 )" (#21338 ) This reverts commit `327eb84154`.	2022-01-03 10:21:25 +00:00
Benjamin Black	327eb84154	[RLlib] Updated pettingzoo wrappers, env versions, urls (#20113 )	2022-01-02 21:29:09 +01:00
Gagandeep Singh	92bf609a08	Unskip tests in ``test_basic_3.py`` (#20433 )	2021-12-22 00:09:32 -08:00
Simon Mo	cfe0897d05	[CI] Migrate Windows tests to Buildkite (#21227 )	2021-12-21 20:16:34 -08:00
Amog Kamsetty	57db4640ca	[Train] [Tune] Refactor MLflow (#20802 ) Pulls out Tune's MLflow logging logic to a shared MLflow util. Adds an MLflow logger callback to Ray Train Closes #20642	2021-12-21 17:17:52 -08:00
Yi Cheng	09421a4ca6	[2/gcs] Bootstrap dashboard for gcs ha (#21179 ) This is part of gcs ha project. This PR try to bootstrap dashboard with gcs address instead of redis. Co-authored-by: mwtian <81660174+mwtian@users.noreply.github.com>	2021-12-21 16:58:03 -08:00
Yi Cheng	f62faca04c	[1/gcs] gcs ha bootstrap for raylet (#21174 ) This is part of #21129 This PR tries to cover the cpp/ray part of the bootstrap, some updates there: remove the unused function/tests some API updates Co-authored-by: mwtian <81660174+mwtian@users.noreply.github.com>	2021-12-21 08:50:42 -08:00
Eric Liang	6f93ea437e	Remove the flaky test tag (#21006 )	2021-12-11 01:03:17 -08:00
mwtian	b9bcd6215a	Disable two tests that are very flaky in GCS HA build (#21012 ) `//python/ray/tests:test_client_reconnect` seems to only flake under GCS HA build. The client server starts to shutdown under injected failures, unlike the behavior without GCS KV or pubsub. `//python/ray/tests:test_multi_node_3` seems to flake more often under GCS HA build, although it is still flaky without GCS HA feature flags. It seems raylet termination did not notify other processes properly. Disable these two tests before they are fixed.	2021-12-10 17:08:25 -08:00
mwtian	6871a72a5c	[Core][Dashboard Pubsub 3/n] Migrate pubsub usages in dashboard to GCS pubsub (#20860 ) Add support for Ray pubsub in dashboard. https://github.com/ray-project/ray/pull/20954 is the prerequisite, and contains more complete change under src/.	2021-12-10 14:36:57 -08:00

1 2 3 4 5

243 commits