hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-05 10:01:43 -05:00

Author	SHA1	Message	Date
Akash Patel	96d579a4fe	Add support for Python 3.10 (#21221 ) Signed-off-by: acxz <17132214+acxz@users.noreply.github.com>	2022-08-26 11:01:12 -07:00
Amog Kamsetty	aa8a7dcb48	[Docker] Add Cuda 11.6 support (#26695 ) Signed-off-by: Amog Kamsetty amogkamsetty@yahoo.com Latest Pytorch version has wheels for CUDA 11.6. Per user request, adding a 11.6 image as part of our build pipeline.	2022-07-26 10:15:53 -07:00
Dmitri Gekhtman	0c1b6df368	Fix redis dependency (#26459 ) Fix the specification of the Redis dependency for the Ray image.	2022-07-12 16:07:09 -07:00
Dmitri Gekhtman	aa182b1941	Add Redis dependency to ray-deps	2022-07-11 17:56:02 -07:00
Amog Kamsetty	b01e11d721	[Docker] Add support for Cuda 11.3 (#26233 ) Start building Ray docker images with cuda 11.3	2022-07-10 21:50:42 -07:00
mwtian	513881584d	[Core] install jemalloc in Ray docker and use jemalloc in `benchmark` release tests (#26112 ) There are mysterious memory usage growth in Ray clusters that disappear when running with jemalloc. Before we are able to figure out the root cause, it seems using jemalloc by default can be a good walkaround. Because of its efficiency, using jemalloc by default can be beneficial, but we need to run more benchmarks to verify.	2022-06-27 23:26:56 -07:00
mwtian	b2d41fc427	[Doc] update docker readme files to include Python versions (#25099 ) Similar to #25053, update the documentations on the docker site.	2022-05-25 19:42:24 -07:00
Chen Shen	1325cf7876	[python3.10] Build py310 images (#24859 ) Build python 3.10 images so we can run release tests.	2022-05-18 08:48:20 -07:00
Ian Rodney	46a9574c84	[Docker] Explain how to update tagging lambda (#24862 ) Explains how to replace the existing lambda when the code changes.	2022-05-16 16:16:49 -07:00
Kai Fricke	6282090401	[ci] Fix GPU docker builds (#24336 ) NVIDIA Docker builds are currently broken, e.g.: https://buildkite.com/ray-project/ray-builders-branch/builds/7239#e9dea1d6-7dea-4323-801c-b7efe917be03 Following this workaround: https://forums.developer.nvidia.com/t/invalid-public-key-for-cuda-apt-repository/212901/11 to hopefully fix this for now.	2022-04-29 17:10:18 +01:00
Amog Kamsetty	60ded3ef79	[Docker] Start building `ray-ml` CPU Docker image again (#24266 )	2022-04-28 15:29:23 -07:00
Akash Patel	8eb99428ce	remove unmaintained blist (#23957 ) This PR removes the unused `blist` dep. Causing issues during `py310` upgrade path.	2022-04-17 16:06:04 -07:00
Kai Fricke	65d9a410f7	[ci] Clean up ci/ directory (refactor ci/travis) (#23866 ) Clean up the ci/ directory. This means getting rid of the travis/ path completely and moving the files into sensible subdirectories. Details: - Moves everything under ci/travis into subdirectories, e.g. ci/build, ci/lint, etc. - Minor adjustments to some scripts (variable renames) - Removes the outdated (unused) asan tests	2022-04-13 18:11:30 +01:00
ddelange	e109c13b83	[ci] Clean up ray-ml requirements (#23325 ) In https://github.com/ray-project/ray/blob/ray-1.11.0/docker/ray-ml/Dockerfile, the order of pip install commands currently matters (potentially a lot). It would be good to run one big pip install command to avoid ending up with a broken env. Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>	2022-03-25 15:59:54 +00:00
Amog Kamsetty	d31d6bc9bb	[Docker] Add Train requirements to ray-ml docker image (#22645 )	2022-03-17 15:07:32 -07:00
Dmitri Gekhtman	413fe08f87	Move KubeRay autoscaler files into Ray autoscaler directory, add an entry-point. (#22847 ) This PR consists of the following clean-up items for KubeRay autoscaler integration: Remove the docker/kuberay directory Move the Python files formerly in docker/kuberay to the autoscaler directory. Use a rayproject/ray image for the autoscaler. Add an entry point for the kuberay autoscaler to scripts.py. Use the entry point in the example config. Slightly simplify the code that starts the autoscaler. Ray versions are updated to Ray 1.11.0, which will be officially released within the next couple of days. By default, Ray >= 1.11.0 runs without Redis. References to Redis are removed from the example config. Add the autoscaler configuration test to the CI. Update development documentation to reflect the changes in this PR.	2022-03-09 18:26:57 -08:00
Kai Fricke	84a163a2c4	[RLlib] Remove atari rom install script (#22797 )	2022-03-03 16:55:56 +01:00
Jiaxin Shan	32829ff9ad	[KubeRay] Provide a new Dockerfile for fast build (#22689 ) Adds a new Dockerfile for fast build and development of KubeRay.	2022-02-28 17:09:16 -08:00
Dmitri Gekhtman	b2b442297e	[autoscaler] Fix initialization artifacts (#22570 ) This PR fixes initializations artifacts related to the load metric summary and autoscaler summary. Load metrics summaries are defined to be Falsey if the autoscaler has never received a resource message from the GCS. We skip most autoscaler actions if load metrics is Falsey, because it doesn't makes sense to autoscale without load metrics. This also allows us to execute the TODO here: #22348 (comment) and remove the time.wait(). As for the autoscaler summary, it is possible for autoscaler.summary() to error outside of an autoscaler update in this scenario: The very first call to NodeProvider.non_terminated_nodes fails, self.non_terminated_nodes remains a None object, and autoscaler.summary() fails trying to get an attribute of this None object. The result is a confusing error message, as in #22515. This PR fixes that. Closes #22515	2022-02-24 20:05:44 -08:00
Dmitri Gekhtman	a402e956a4	[KubeRay] Format autoscaling config based on RayCluster CR (#22348 ) Closes #21655. At the start of each autoscaler iteration, we read the Ray Cluster CR from K8s and use it to extract the autoscaling config.	2022-02-22 11:06:37 -08:00
Ian Rodney	3fca295871	[Docker] Update echo in fix-docker-latest.sh (#22123 )	2022-02-07 08:50:36 -08:00
Balaji Veeramani	7f1bacc7dc	[CI] Format Python code with Black (#21975 ) See #21316 and #21311 for the motivation behind these changes.	2022-01-29 18:41:57 -08:00
Philipp Moritz	fbc51d6d0e	[Kuberay] Ray Autoscaler integration with Kuberay (MVP) (#21086 ) This is a minimum viable product for Ray Autoscaler integration with Kuberay. It is not ready for prime time/general use, but should be enough for interested parties to get started (see the documentation in kuberay.md).	2022-01-19 19:42:17 -08:00
Akash Patel	cbcd03b779	Upgrade cython to 0.29.26 for py310 (#21244 )	2021-12-26 20:26:08 -08:00
Scott Graham	7153d58cbd	Updates to azure autoscaler for authentication and dependency updates (#19603 ) * updating azure autoscaler versions and backwards compatibility, and moving to azure-identity based authentication * adding azure sdk rqmts for tests * updating azure test requirements and adding wrapper function for azure sdk function resolution * adding docstring to get_azure_sdk_function Co-authored-by: Scott Graham <scgraham@microsoft.com>	2021-12-16 09:23:32 -08:00
Alex Wu	3d668768de	[docker] Upgrade numpy version (#20450 ) <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? The change in #20374 was interpreted as a file redirect, not a "greater than" by docker (strangely enough, differently than bash interprets it locally). <!-- Please give a short summary of the change and the problem this solves. --> ## Related issue number <!-- For example: "Closes #1234" --> ## Checks - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( Co-authored-by: Alex <alex@anyscale.com>	2021-11-17 07:15:18 -08:00
Alex Wu	884bb3de33	[Dataset] Bump `numpy >=1.20` dependency (#20374 ) * done? * . Co-authored-by: Alex Wu <alex@anyscale.com>	2021-11-15 14:10:00 -08:00
Philipp Moritz	a64e32c53b	[docs] Fix broken links in documentation and add linkcheck to documentation (#20030 ) Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2021-11-04 13:19:43 -07:00
Sven Mika	4cb23d1c95	[Tune; Testing] Revert to 3.7 (undone by accident by previous PR); + some minor comment cleanups. (#20031 )	2021-11-04 10:58:34 +01:00
Avnish Narayan	026bf01071	[RLlib] Upgrade gym version to 0.21 and deprecate pendulum-v0. (#19535 ) * Fix QMix, SAC, and MADDPA too. * Unpin gym and deprecate pendulum v0 Many tests in rllib depended on pendulum v0, however in gym 0.21, pendulum v0 was deprecated in favor of pendulum v1. This may change reward thresholds, so will have to potentially rerun all of the pendulum v1 benchmarks, or use another environment in favor. The same applies to frozen lake v0 and frozen lake v1 Lastly, all of the RLlib tests and have been moved to python 3.7 * Add gym installation based on python version. Pin python<= 3.6 to gym 0.19 due to install issues with atari roms in gym 0.20 * Reformatting * Fixing tests * Move atari-py install conditional to req.txt * migrate to new ale install method * Fix QMix, SAC, and MADDPA too. * Unpin gym and deprecate pendulum v0 Many tests in rllib depended on pendulum v0, however in gym 0.21, pendulum v0 was deprecated in favor of pendulum v1. This may change reward thresholds, so will have to potentially rerun all of the pendulum v1 benchmarks, or use another environment in favor. The same applies to frozen lake v0 and frozen lake v1 Lastly, all of the RLlib tests and have been moved to python 3.7 * Add gym installation based on python version. Pin python<= 3.6 to gym 0.19 due to install issues with atari roms in gym 0.20 Move atari-py install conditional to req.txt migrate to new ale install method Make parametric_actions_cartpole return float32 actions/obs Adding type conversions if obs/actions don't match space Add utils to make elements match gym space dtypes Co-authored-by: Jun Gong <jungong@anyscale.com> Co-authored-by: sven1977 <svenmika1977@gmail.com>	2021-11-03 16:24:00 +01:00
Amog Kamsetty	5d54412f1c	[Docker] Alias `ray-ml:nightly` to `ray-ml:nightly-gpu` (#19726 ) * wip * wip * update * finish * deprecate * debug * fix and address comments * try catch * fix * split tests * force * merge * docs * wip * fix and check * update readme * fix * fix * fix sanity checking * format * alias * fix * comment	2021-10-27 11:30:49 -07:00
Amog Kamsetty	db863aafc0	Revert "Revert "[Docker] Support multiple CUDA Versions (#19505 )" (#19756 )" (#19763 ) This reverts commit `e58fcca404`.	2021-10-26 17:32:56 -07:00
Amog Kamsetty	e58fcca404	Revert "[Docker] Support multiple CUDA Versions (#19505 )" (#19756 ) This reverts commit `f0053d405b`.	2021-10-26 12:55:20 -07:00
Amog Kamsetty	f0053d405b	[Docker] Support multiple CUDA Versions (#19505 ) * wip * wip * update * finish * deprecate * debug * fix and address comments * try catch * fix * split tests * force * merge * docs * wip * fix and check * update readme * fix * fix * fix sanity checking * format	2021-10-25 18:57:05 -07:00
Ian Rodney	02090afc26	[Docker] Re-Tag Docker Images with a lambda (#19081 ) * lil lambda * Better Credential Handling * use a script for this :) * better timeout and link & echo messages	2021-10-19 14:06:31 -07:00
Amog Kamsetty	84e958f330	[ML] Consolidate and upgrade Deep Learning Dependencies (#18574 ) * wip ' * upgrade requirements * add file * fix * fixes * Apply suggestions from code review Try mlagents==0.21.0 for now (works with torch 1.9). * Apply suggestions from code review * wip * wip * fix * fix * upgrade lightning bolts * address comment Co-authored-by: Sven Mika <sven@anyscale.io>	2021-09-16 20:16:40 -07:00
Sven Mika	8a00154038	[RLlib] Bump tf version in ML docker to tf==2.5.0; add tfp to ML-docker. (#18544 )	2021-09-15 08:46:37 +02:00
Simon Mo	497c5f56fa	[CI] Temporary disable worker-in-container test (#18606 ) * revert again * disable tmp	2021-09-14 22:38:20 -07:00
Kai Fricke	fb38d06cfb	Move RLLib GPU release test dependencies to ml docker (#18208 )	2021-09-03 09:35:18 +01:00
Kai Fricke	ff68251f89	[release] Add python 3.9 to fix-docker-latest.sh (#18037 )	2021-08-24 10:07:03 +02:00
mwtian	b8e71f641c	[Build] Ray Docker image for Python 3.9. (#16571 )	2021-07-22 13:38:57 -07:00
Vince Jankovics	05c9dfbbda	[RLlib] CV2 to Skimage dependency change (#16841 )	2021-07-21 22:24:18 -04:00
chenk008	afd59be8ca	[Core] Add worker resource limit (#17179 ) * add resource restricted * fix test * lint * lint	2021-07-21 22:00:34 +08:00
SongGuyang	a57de0e224	support build different python wheel in setup.py (#16998 )	2021-07-16 13:01:48 +08:00
Scott Graham	3334357c58	[autoscaler] [azure] Fix Azure Autoscaling Failures (#16640 ) Co-authored-by: Scott Graham <scgraham@microsoft.com>	2021-07-10 11:55:00 -07:00
chenk008	06c7db7dca	[Core] Rename container option and ray-nest-container (#16771 ) * rename container_option to container * rename ray-nest-container to ray-worker-container * lint Co-authored-by: wuhua.ck <wuhua.ck@alibaba-inc.com>	2021-07-01 13:12:26 +08:00
chenk008	c318293d9f	[Core] start worker in container (#16671 )	2021-06-29 10:12:47 -07:00
Travis Addair	7802ff66d4	[docker] Updated GPU Dockerfiles to CUDA 11.2 (#16269 )	2021-06-07 16:15:19 -07:00
Travis Addair	050a076de9	[k8s] Refactored k8s operator to use kopf for controller logic (#15787 ) Co-authored-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>	2021-06-01 12:00:55 -07:00
Ian Rodney	ca43e61949	[docker] Make Docker Build More Human Friendly (#15543 )	2021-05-05 11:22:17 -07:00

1 2 3 4

193 commits