hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-04 17:41:43 -05:00

Author	SHA1	Message	Date
Amog Kamsetty	ea6d53dbf3	[CI/AIR] Cleanup Deprecated ml_utils (#28278 ) ray.util.ml_utils was deprecated in Ray 2.0. This PR does some final cleanup of our CI pipeline Signed-off-by: Amog Kamsetty <amogkamsetty@yahoo.com>	2022-09-06 08:39:21 +01:00
Kai Fricke	57484b28cf	[ci/air] Only run examples that need credentials in branch builds (#28260 )	2022-09-02 09:10:36 -07:00
Kai Fricke	5d31f2d4bc	[tune] Run SigOpt tests in CI (#28225 )	2022-09-02 09:10:01 -07:00
zcin	4c970cc882	[serve] Visualize Deployment Graph with Gradio (#27897 )	2022-09-01 10:46:15 -07:00
Amog Kamsetty	acc4903db1	[AIR/Serve] Auto-enable GPU Prediction (#26549 ) Automatically enable GPU prediction for Predictors if num_gpus is set for the PredictorDeployment. Signed-off-by: Amog Kamsetty <amogkamsetty@yahoo.com>	2022-08-29 13:47:56 -07:00
Akash Patel	96d579a4fe	Add support for Python 3.10 (#21221 ) Signed-off-by: acxz <17132214+acxz@users.noreply.github.com>	2022-08-26 11:01:12 -07:00
zcin	8cb09a9fc5	Revert "Revert "[serve] Integrate and Document Bring-Your-Own Gradio Applications"" (#27662 )	2022-08-12 15:12:20 -07:00
zcin	64c550a2b1	Revert "[serve] Integrate and Document Bring-Your-Own Gradio Applications (#26403 )" (#27587 ) This reverts commit `8a9d994dd0`.	2022-08-06 21:38:55 -07:00
zcin	8a9d994dd0	[serve] Integrate and Document Bring-Your-Own Gradio Applications (#26403 ) Integration between Ray Serve and Gradio. Users of Gradio can wrap their Gradio app in a Serve deployment by using `GradioIngress`, and scale it up through more replicas or more CPU/GPU resources.	2022-08-05 11:31:00 -05:00
SangBin Cho	5298ee83b2	[Test] Revert (partially) Fix windows buildkite (#26615 ) (#27495 ) Root cause: https://www.shell-tips.com/bash/source-dot-command/#gsc.tab=0 Using . will execute the command in the "current shell" in a bash script. It looks like removing . command from ci.sh init means that we will lose the set -eo command used within ci.sh init applied to next test running commands because set -eo is called within a child process, not the current shell (so the future command won't have the set -eo configured).	2022-08-04 13:55:48 -07:00
matthewdeng	01d473b355	[ci] print out test environment info for all python tests (#27312 ) Recently there have been a number of CI test failures due to direct or transitive dependency version upgrades. Printing out environment information for each test suite allows us to quickly check the diff between failed and successful runs. Notes: 1. In this PR I just manually added `./ci/env/env_info.sh` to each test suite. We may want to generalize this in the future. 2. This is just for CI now, but is applicable to release tests as well. Signed-off-by: Matthew Deng <matt@anyscale.com>	2022-08-01 09:55:13 +01:00
Simon Mo	1f1234fde1	[Serve] Disable Serve on macOS Round 2 (#27271 )	2022-07-29 12:04:34 -07:00
Chen Shen	fda345335a	Revert "Allow grpcio >= 1.48 (#26765 )" (#27244 ) This reverts commit `6acd0a4c9b`.	2022-07-28 22:25:21 -07:00
Simon Mo	ca9e8b3d0b	[Serve] Disable macOS tests (#27218 )	2022-07-28 16:34:46 -07:00
Amog Kamsetty	862d10c162	[AIR] Remove ML code from `ray.util` (#27005 ) Removes all ML related code from `ray.util` Removes: - `ray.util.xgboost` - `ray.util.lightgbm` - `ray.util.horovod` - `ray.util.ray_lightning` Moves `ray.util.ml_utils` to other locations Closes #23900 Signed-off-by: Amog Kamsetty <amogkamsetty@yahoo.com> Signed-off-by: Kai Fricke <kai@anyscale.com> Co-authored-by: Kai Fricke <kai@anyscale.com>	2022-07-27 14:24:19 +01:00
Alex Wu	837ef777a3	[ci][kuberay] Always cleanup KinD cluster (#27073 ) When cleaning up after the k8s operator tests, we should always delete the k8s cluster even if something went wrong (in fact, it's not clear we even need to clean up the resources within the cluster. Signed-off-by: Alex Wu <itswu.alex@gmail.com>	2022-07-26 21:46:19 -07:00
Amog Kamsetty	aa8a7dcb48	[Docker] Add Cuda 11.6 support (#26695 ) Signed-off-by: Amog Kamsetty amogkamsetty@yahoo.com Latest Pytorch version has wheels for CUDA 11.6. Per user request, adding a 11.6 image as part of our build pipeline.	2022-07-26 10:15:53 -07:00
Yi Cheng	33997da299	[core] Introduce a flag which allows a longer timeout for raylet when GCS restarts. (#26919 ) ## Why are these changes needed? When GCS restarts, sometimes, raylet needs a while to reconnect to the GCS, for example, in k8s env, it needs a while to move GSC to the service. This PR try to fix this by allowing a longer timeout for the first ping when GCS restarts. Once GCS get the first ping, it'll just use the regular timeout instead.	2022-07-25 16:57:19 -07:00
mwtian	6acd0a4c9b	Allow grpcio >= 1.48 (#26765 ) The previously observed Python grpc warning / logspam seems to have been fixed for grpcio >= 1.48. And users would like to upgrade beyond grpcio 1.43 for better M1 support. However, grpcio 1.48 has not been released yet, so there is still a risk this change needs to be reverted if any problem is discovered later with Ray nightly + grpcio 1.48.	2022-07-21 10:03:41 -07:00
Jiajun Yao	1b2b526a2b	Fix windows buildkite (#26615 ) - Stop using dot command to run ci.sh script: it doesn't fail the build if the command fails for windows and is generally dangerous since it will make unexpected changes to the current shell. - Fix uncovered windows build issues.	2022-07-18 09:15:49 -07:00
Tao Wang	6ddbdaa81a	[CI]Split C++, Java tests in MacOS from the big one (#26434 )	2022-07-14 18:33:47 -07:00
Antoni Baum	9b2cd29511	[CI] Install Horovod in doc tests to fix notebook (#26476 ) Fixes the Horovod notebook example as found in #26410 by installing Horovod in doc tests jobs. Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>	2022-07-13 16:27:20 +01:00
xwjiang2010	03671c961e	[CI] run air related doc/example tests as part of pre-submit CI. (#26466 ) Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>	2022-07-12 18:30:37 +01:00
Dmitri Gekhtman	8f8f036957	[autoscaler][kuberay] Deflake KubeRay autoscaling test (#26411 ) Improves stability of KubeRay autoscaling test.	2022-07-12 00:56:36 -07:00
Jiao	d95dc2f2e5	[AIR][GPU Batch Prediction] Add basic support for GPU batch prediction (#26251 ) This PR adds GPU support for pytorch and tensorflow predictor, as well as automatic setting `use_gpu` flag in `BatchPredictor`. Notable changes: - Added `use_gpu` flag in the constructor of `TorchPredictor` and `TensorflowPredictor` (note it's slightly different from our latest design doc that puts this flag at `predict()` call) - Added `use_gpu` flag to `SklearnPredictor` so its interface is compatible with `BatchPredictor` - Code to move both model weights and input tensor to default visible GPU at index 0 if flag is set - parametrized existing predictor tests to use GPU for both CPU & GPU coverage - Changed BUILD CI tests with an added `gpu` tag (I'm not 100% sure if that's a right way tho) Follow ups: https://github.com/ray-project/ray/issues/26249 is created in case our host has multiple GPU devices. It's a bit out of scope for this PR, but for GPU batch inference ideally we should be able to evenly use all GPU devices on host where CPU & DRAM are busy with pre-fetching + data movement to GPU. We might approximately do the same by scheduling same # of Predictor instances on the host, but that's worth verifying once benchmarks are set.	2022-07-11 13:04:15 -07:00
Amog Kamsetty	b01e11d721	[Docker] Add support for Cuda 11.3 (#26233 ) Start building Ray docker images with cuda 11.3	2022-07-10 21:50:42 -07:00
Yi Cheng	818bb78542	[ci] Stop syncer staging tests (#26273 ) The tests has been running for 1-2 months, and the overall observation is that it's not very useful to catch the actual regression. Basically, we didn't notice any regression. Stop this test for now to save some resources.	2022-07-03 11:17:10 -07:00
Sven Mika	96693055bd	[RLlib] More Trainer -> Algorithm renaming cleanups. (#25869 )	2022-06-20 15:54:00 +02:00
clarng	2b270fd9cb	apply isort uniformly for a subset of directories (#25824 ) Simplify isort filters and move it into isort cfg file. With this change, isort will not longer apply to diffs other than to files that are in whitelisted directory (isort only supports blacklist so we implement that instead) This is much simpler than building our own whitelist logic since our formatter runs multiple codepaths depending on whether it is formatting a single file / PR / entire repo in CI.	2022-06-17 13:40:32 -07:00
Jiao	f6735f90c7	[Ray DAG] Move `dag` project folder out of `experimental` (#25532 )	2022-06-16 19:15:39 -07:00
Simon Mo	503e197f8c	[CI] Upload macOS bazel test files (#25744 )	2022-06-15 10:09:04 -07:00
Antoni Baum	5e9a8eb5f6	[AIR/data] Move preprocessors to `ray.data` (#25599 ) Moves ray.air.Preprocessor and ray.air.preprocessors to ray.data to converge on the agreed upon package structure discussed internally.	2022-06-13 12:57:59 -07:00
Amog Kamsetty	1316a2d05e	[AIR/Train] Move `ray.air.train` to `ray.train` (#25570 )	2022-06-08 21:34:18 -07:00
Amog Kamsetty	80ae651f25	[Train] Clean up `ray.train` package (#25566 )	2022-06-08 10:22:36 -07:00
Yi Cheng	acf210fcac	[flakey] Skip ray_syncer_test for ubsan. (#25477 ) From the message: ``` [ OK ] SyncerTest.TestMToN (13132 ms) [----------] 5 tests from SyncerTest (43175 ms total) [----------] Global test environment tear-down [==========] 8 tests from 2 test suites ran. (43176 ms total) [ PASSED ] 8 tests. external/com_github_grpc_grpc/src/core/lib/iomgr/ev_posix.cc:314:19: runtime error: member access within null pointer of type 'const struct grpc_event_engine_vtable' ``` This can only be reproduced by running with Bazel test so far. With gdb, it won't be reproduced. It seems like some issue with the grpc maybe the reactor API. Given that the ASAN test, which is supposed to catch the issue, runs well, and a considerable time has been spent investigating this one but no progress, skip this test for now.	2022-06-04 23:06:57 -07:00
Sven Mika	b5bc2b93c3	[RLlib] Move all remaining algos into `algorithms` directory. (#25366 )	2022-06-04 07:35:24 +02:00
Kai Fricke	4b9a89ad90	[air] Move python/ray/ml to python/ray/air (#25449 ) The package "ml" should be renamed to "air". Main question: Keep a `ml.py` with `from ray.air import *` for some level of backwards compatibility? I'd go for no to force people to use the new structure.	2022-06-03 21:53:44 +01:00
Yi Cheng	fd0f967d2e	Revert "[RLlib] Move (A/DD)?PPO and IMPALA algos to `algorithms` dir and rename policy and trainer classes. (#25346 )" (#25420 ) This reverts commit `e4ceae19ef`. Reverts #25346 linux://python/ray/tests:test_client_library_integration never fail before this PR. In the CI of the reverted PR, it also fails (https://buildkite.com/ray-project/ray-builders-pr/builds/34079#01812442-c541-4145-af22-2a012655c128). So high likely it's because of this PR. And test output failure seems related as well (https://buildkite.com/ray-project/ray-builders-branch/builds/7923#018125c2-4812-4ead-a42f-7fddb344105b)	2022-06-02 20:38:44 -07:00
Sven Mika	e4ceae19ef	[RLlib] Move (A/DD)?PPO and IMPALA algos to `algorithms` dir and rename policy and trainer classes. (#25346 )	2022-06-02 16:47:05 +02:00
Yi Cheng	cb1f08a3c1	[core] Basic end-2-end multi-node tests for GCS HA in CI. (#25114 ) In this PR we simulate the case where serve can continue to function even when GCS is down and the reconfig continue to work once GCS is back. To make it close to the real-world case, the docker is used for isolation: It starts a head node (0 cpus) and a worker node It tried the basic function and make sure it's working It kills GCS and make sure everything is working. It starts GCS and make sure reconfig continues to work. This is the basic cases for serve HA. We'll add more once we get better integrations.	2022-06-02 02:41:38 +00:00
Amog Kamsetty	983d8b3db2	[AIR] Fix failing CI on master (#25201 ) The AIR CI build has been failing on master since #25022. #25022 moved the tests that require credentials, but we left the bazel command in the build pipeline still. So even though all the tests are passing, the buildkite stage itself was failing since it tries run tests that require credentials, but these tests no longer exist in the directory. This is only a problem for master build since we don't run this command for PR builds.	2022-05-26 11:34:57 +02:00
Antoni Baum	2b6c6301e2	[CI] Fix typo in CI label (#25185 )	2022-05-25 17:31:29 +02:00
Kai Fricke	d57ba750f5	[docs/air] Move upload example to docs (#25022 )	2022-05-21 12:16:33 -07:00
Kai Fricke	e76efffec6	[air/docs] Move RL examples to docs (#24962 ) Following #24959, this PR moves the RL examples (online/offline/serving) into the Ray ML docs. It also splits the online and offline parts.	2022-05-20 14:55:01 +01:00
Yi Cheng	8ec558dcb9	[core] Reenable GCS test with redis as backend. (#23506 ) Since ray supports Redis as a storage backend, we should ensure the code path with Redis as storage is still being covered e2e. The tests don't run for a while after we switch to memory mode by default. This PR tries to fix this and make it run with every commit. In the future, if we support more and more storage backends, this should be revised to be more efficient and selective. But now I think the cost should be ok. This PR is part of GCS HA testing-related work.	2022-05-19 21:46:55 -07:00
mwtian	502c3e132d	Revert "[Core] allow using grpcio > 1.44.0 (#23722 )" (#24935 ) This reverts commit `b02029b29f`.	2022-05-18 18:16:39 -07:00
SangBin Cho	fb60d68bbb	[WIP] Run minimal tests against all supported python version (#24830 ) Run minimal CI tests to all Python versions.	2022-05-18 09:42:26 -07:00
Antoni Baum	1d5e6d908d	[AIR] HuggingFace Text Classification example (#24402 )	2022-05-18 09:35:12 -07:00
Chen Shen	1325cf7876	[python3.10] Build py310 images (#24859 ) Build python 3.10 images so we can run release tests.	2022-05-18 08:48:20 -07:00
Antoni Baum	c74886a55e	[CI] Run doc notebooks in CI (#24816 ) Currently, we are not running doc notebooks in CI due to a bazel misconfiguration - we are using `glob` in a top level package in order to get the paths for the notebooks, but those are contained inside subpackages, which glob purposefully ignores. Therefore, the lists of notebooks to run are empty. This PR fixes that by: * Running the `py_test_run_all_notebooks` macro inside the relevant subpackages * Editing the `test_myst_doc.py` script to allow for recursive search for the target file, allowing to deal with mismatches between `name` and `data` arguments in `py_test_run_all_notebooks` * Setting the `allow_empty=False` flag inside `glob` calls in our macros to ensure that this oversight is caught early * Enabling detection of changes in doc folder for `*.ipynb` and `BUILD` files This PR also adds a GPU runner for doc tests, allowing one of our examples to pass - and setting the infra for more to come. Finally, a misconfigured path for one set of doc tests is also fixed.	2022-05-17 09:50:42 +01:00

1 2 3 4 5 ...

274 commits