Commit graph

193 commits

Author SHA1 Message Date
Akash Patel
96d579a4fe
Add support for Python 3.10 (#21221)
Signed-off-by: acxz <17132214+acxz@users.noreply.github.com>
2022-08-26 11:01:12 -07:00
Amog Kamsetty
aa8a7dcb48
[Docker] Add Cuda 11.6 support (#26695)
Signed-off-by: Amog Kamsetty amogkamsetty@yahoo.com

Latest Pytorch version has wheels for CUDA 11.6. Per user request, adding a 11.6 image as part of our build pipeline.
2022-07-26 10:15:53 -07:00
Dmitri Gekhtman
0c1b6df368
Fix redis dependency (#26459)
Fix the specification of the Redis dependency for the Ray image.
2022-07-12 16:07:09 -07:00
Dmitri Gekhtman
aa182b1941
Add Redis dependency to ray-deps 2022-07-11 17:56:02 -07:00
Amog Kamsetty
b01e11d721
[Docker] Add support for Cuda 11.3 (#26233)
Start building Ray docker images with cuda 11.3
2022-07-10 21:50:42 -07:00
mwtian
513881584d
[Core] install jemalloc in Ray docker and use jemalloc in benchmark release tests (#26112)
There are mysterious memory usage growth in Ray clusters that disappear when running with jemalloc. Before we are able to figure out the root cause, it seems using jemalloc by default can be a good walkaround. Because of its efficiency, using jemalloc by default can be beneficial, but we need to run more benchmarks to verify.
2022-06-27 23:26:56 -07:00
mwtian
b2d41fc427
[Doc] update docker readme files to include Python versions (#25099)
Similar to #25053, update the documentations on the docker site.
2022-05-25 19:42:24 -07:00
Chen Shen
1325cf7876
[python3.10] Build py310 images (#24859)
Build python 3.10 images so we can run release tests.
2022-05-18 08:48:20 -07:00
Ian Rodney
46a9574c84
[Docker] Explain how to update tagging lambda (#24862)
Explains how to replace the existing lambda when the code changes.
2022-05-16 16:16:49 -07:00
Kai Fricke
6282090401
[ci] Fix GPU docker builds (#24336)
NVIDIA Docker builds are currently broken, e.g.: https://buildkite.com/ray-project/ray-builders-branch/builds/7239#e9dea1d6-7dea-4323-801c-b7efe917be03

Following this workaround: https://forums.developer.nvidia.com/t/invalid-public-key-for-cuda-apt-repository/212901/11
to hopefully fix this for now.
2022-04-29 17:10:18 +01:00
Amog Kamsetty
60ded3ef79
[Docker] Start building ray-ml CPU Docker image again (#24266) 2022-04-28 15:29:23 -07:00
Akash Patel
8eb99428ce
remove unmaintained blist (#23957)
This PR removes the unused `blist` dep. Causing issues during `py310` upgrade path.
2022-04-17 16:06:04 -07:00
Kai Fricke
65d9a410f7
[ci] Clean up ci/ directory (refactor ci/travis) (#23866)
Clean up the ci/ directory. This means getting rid of the travis/ path completely and moving the files into sensible subdirectories.

Details:

- Moves everything under ci/travis into subdirectories, e.g. ci/build, ci/lint, etc.
- Minor adjustments to some scripts (variable renames)
- Removes the outdated (unused) asan tests
2022-04-13 18:11:30 +01:00
ddelange
e109c13b83
[ci] Clean up ray-ml requirements (#23325)
In https://github.com/ray-project/ray/blob/ray-1.11.0/docker/ray-ml/Dockerfile, the order of pip install commands currently matters (potentially a lot). It would be good to run one big pip install command to avoid ending up with a broken env.

Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
2022-03-25 15:59:54 +00:00
Amog Kamsetty
d31d6bc9bb
[Docker] Add Train requirements to ray-ml docker image (#22645) 2022-03-17 15:07:32 -07:00
Dmitri Gekhtman
413fe08f87
Move KubeRay autoscaler files into Ray autoscaler directory, add an entry-point. (#22847)
This PR consists of the following clean-up items for KubeRay autoscaler integration:

Remove the docker/kuberay directory

Move the Python files formerly in docker/kuberay to the autoscaler directory.

Use a rayproject/ray image for the autoscaler.

Add an entry point for the kuberay autoscaler to scripts.py. Use the entry point in the example config.

Slightly simplify the code that starts the autoscaler.

Ray versions are updated to Ray 1.11.0, which will be officially released within the next couple of days.

By default, Ray >= 1.11.0 runs without Redis. References to Redis are removed from the example config.

Add the autoscaler configuration test to the CI.

Update development documentation to reflect the changes in this PR.
2022-03-09 18:26:57 -08:00
Kai Fricke
84a163a2c4
[RLlib] Remove atari rom install script (#22797) 2022-03-03 16:55:56 +01:00
Jiaxin Shan
32829ff9ad
[KubeRay] Provide a new Dockerfile for fast build (#22689)
Adds a new Dockerfile for fast build and development of KubeRay.
2022-02-28 17:09:16 -08:00
Dmitri Gekhtman
b2b442297e
[autoscaler] Fix initialization artifacts (#22570)
This PR fixes initializations artifacts related to the load metric summary and autoscaler summary.

Load metrics summaries are defined to be Falsey if the autoscaler has never received a resource message from the GCS.
We skip most autoscaler actions if load metrics is Falsey, because it doesn't makes sense to autoscale without load metrics. This also allows us to execute the TODO here: #22348 (comment) and remove the time.wait().

As for the autoscaler summary, it is possible for autoscaler.summary() to error outside of an autoscaler update in this scenario:
The very first call to NodeProvider.non_terminated_nodes fails, self.non_terminated_nodes remains a None object, and autoscaler.summary() fails trying to get an attribute of this None object.
The result is a confusing error message, as in #22515. This PR fixes that.

Closes #22515
2022-02-24 20:05:44 -08:00
Dmitri Gekhtman
a402e956a4
[KubeRay] Format autoscaling config based on RayCluster CR (#22348)
Closes #21655. At the start of each autoscaler iteration, we read the Ray Cluster CR from K8s and use it to extract the autoscaling config.
2022-02-22 11:06:37 -08:00
Ian Rodney
3fca295871
[Docker] Update echo in fix-docker-latest.sh (#22123) 2022-02-07 08:50:36 -08:00
Balaji Veeramani
7f1bacc7dc
[CI] Format Python code with Black (#21975)
See #21316 and #21311 for the motivation behind these changes.
2022-01-29 18:41:57 -08:00
Philipp Moritz
fbc51d6d0e
[Kuberay] Ray Autoscaler integration with Kuberay (MVP) (#21086)
This is a minimum viable product for Ray Autoscaler integration with Kuberay. It is not ready for prime time/general use, but should be enough for interested parties to get started (see the documentation in kuberay.md).
2022-01-19 19:42:17 -08:00
Akash Patel
cbcd03b779
Upgrade cython to 0.29.26 for py310 (#21244) 2021-12-26 20:26:08 -08:00
Scott Graham
7153d58cbd
Updates to azure autoscaler for authentication and dependency updates (#19603)
* updating azure autoscaler versions and backwards compatibility, and moving to azure-identity based authentication

* adding azure sdk rqmts for tests

* updating azure test requirements and adding wrapper function for azure sdk function resolution

* adding docstring to get_azure_sdk_function

Co-authored-by: Scott Graham <scgraham@microsoft.com>
2021-12-16 09:23:32 -08:00
Alex Wu
3d668768de
[docker] Upgrade numpy version (#20450)
<!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. -->

## Why are these changes needed?

The change in #20374 was interpreted as a file redirect, not a "greater than" by docker (strangely enough, differently than bash interprets it locally). 

<!-- Please give a short summary of the change and the problem this solves. -->

## Related issue number

<!-- For example: "Closes #1234" -->

## Checks

- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(


Co-authored-by: Alex <alex@anyscale.com>
2021-11-17 07:15:18 -08:00
Alex Wu
884bb3de33
[Dataset] Bump numpy >=1.20 dependency (#20374)
* done?

* .

Co-authored-by: Alex Wu <alex@anyscale.com>
2021-11-15 14:10:00 -08:00
Philipp Moritz
a64e32c53b
[docs] Fix broken links in documentation and add linkcheck to documentation (#20030)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-11-04 13:19:43 -07:00
Sven Mika
4cb23d1c95
[Tune; Testing] Revert to 3.7 (undone by accident by previous PR); + some minor comment cleanups. (#20031) 2021-11-04 10:58:34 +01:00
Avnish Narayan
026bf01071
[RLlib] Upgrade gym version to 0.21 and deprecate pendulum-v0. (#19535)
* Fix QMix, SAC, and MADDPA too.

* Unpin gym and deprecate pendulum v0

Many tests in rllib depended on pendulum v0,
however in gym 0.21, pendulum v0 was deprecated
in favor of pendulum v1. This may change reward
thresholds, so will have to potentially rerun
all of the pendulum v1 benchmarks, or use another
environment in favor. The same applies to frozen
lake v0 and frozen lake v1

Lastly, all of the RLlib tests and have
been moved to python 3.7

* Add gym installation based on python version.

Pin python<= 3.6 to gym 0.19 due to install
issues with atari roms in gym 0.20

* Reformatting

* Fixing tests

* Move atari-py install conditional to req.txt

* migrate to new ale install method

* Fix QMix, SAC, and MADDPA too.

* Unpin gym and deprecate pendulum v0

Many tests in rllib depended on pendulum v0,
however in gym 0.21, pendulum v0 was deprecated
in favor of pendulum v1. This may change reward
thresholds, so will have to potentially rerun
all of the pendulum v1 benchmarks, or use another
environment in favor. The same applies to frozen
lake v0 and frozen lake v1

Lastly, all of the RLlib tests and have
been moved to python 3.7
* Add gym installation based on python version.

Pin python<= 3.6 to gym 0.19 due to install
issues with atari roms in gym 0.20

Move atari-py install conditional to req.txt

migrate to new ale install method

Make parametric_actions_cartpole return float32 actions/obs

Adding type conversions if obs/actions don't match space

Add utils to make elements match gym space dtypes

Co-authored-by: Jun Gong <jungong@anyscale.com>
Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-11-03 16:24:00 +01:00
Amog Kamsetty
5d54412f1c
[Docker] Alias ray-ml:nightly to ray-ml:nightly-gpu (#19726)
* wip

* wip

* update

* finish

* deprecate

* debug

* fix and address comments

* try catch

* fix

* split tests

* force

* merge

* docs

* wip

* fix and check

* update readme

* fix

* fix

* fix sanity checking

* format

* alias

* fix

* comment
2021-10-27 11:30:49 -07:00
Amog Kamsetty
db863aafc0
Revert "Revert "[Docker] Support multiple CUDA Versions (#19505)" (#19756)" (#19763)
This reverts commit e58fcca404.
2021-10-26 17:32:56 -07:00
Amog Kamsetty
e58fcca404
Revert "[Docker] Support multiple CUDA Versions (#19505)" (#19756)
This reverts commit f0053d405b.
2021-10-26 12:55:20 -07:00
Amog Kamsetty
f0053d405b
[Docker] Support multiple CUDA Versions (#19505)
* wip

* wip

* update

* finish

* deprecate

* debug

* fix and address comments

* try catch

* fix

* split tests

* force

* merge

* docs

* wip

* fix and check

* update readme

* fix

* fix

* fix sanity checking

* format
2021-10-25 18:57:05 -07:00
Ian Rodney
02090afc26
[Docker] Re-Tag Docker Images with a lambda (#19081)
* lil lambda

* Better Credential Handling

* use a script for this :)

* better timeout and link & echo messages
2021-10-19 14:06:31 -07:00
Amog Kamsetty
84e958f330
[ML] Consolidate and upgrade Deep Learning Dependencies (#18574)
* wip
'

* upgrade requirements

* add file

* fix

* fixes

* Apply suggestions from code review

Try mlagents==0.21.0 for now (works with torch 1.9).

* Apply suggestions from code review

* wip

* wip

* fix

* fix

* upgrade lightning bolts

* address comment

Co-authored-by: Sven Mika <sven@anyscale.io>
2021-09-16 20:16:40 -07:00
Sven Mika
8a00154038
[RLlib] Bump tf version in ML docker to tf==2.5.0; add tfp to ML-docker. (#18544) 2021-09-15 08:46:37 +02:00
Simon Mo
497c5f56fa
[CI] Temporary disable worker-in-container test (#18606)
* revert again

* disable tmp
2021-09-14 22:38:20 -07:00
Kai Fricke
fb38d06cfb
Move RLLib GPU release test dependencies to ml docker (#18208) 2021-09-03 09:35:18 +01:00
Kai Fricke
ff68251f89
[release] Add python 3.9 to fix-docker-latest.sh (#18037) 2021-08-24 10:07:03 +02:00
mwtian
b8e71f641c
[Build] Ray Docker image for Python 3.9. (#16571) 2021-07-22 13:38:57 -07:00
Vince Jankovics
05c9dfbbda
[RLlib] CV2 to Skimage dependency change (#16841) 2021-07-21 22:24:18 -04:00
chenk008
afd59be8ca
[Core] Add worker resource limit (#17179)
* add resource restricted

* fix test

* lint

* lint
2021-07-21 22:00:34 +08:00
SongGuyang
a57de0e224
support build different python wheel in setup.py (#16998) 2021-07-16 13:01:48 +08:00
Scott Graham
3334357c58
[autoscaler] [azure] Fix Azure Autoscaling Failures (#16640)
Co-authored-by: Scott Graham <scgraham@microsoft.com>
2021-07-10 11:55:00 -07:00
chenk008
06c7db7dca
[Core] Rename container option and ray-nest-container (#16771)
* rename container_option to container

* rename ray-nest-container to ray-worker-container

* lint

Co-authored-by: wuhua.ck <wuhua.ck@alibaba-inc.com>
2021-07-01 13:12:26 +08:00
chenk008
c318293d9f
[Core] start worker in container (#16671) 2021-06-29 10:12:47 -07:00
Travis Addair
7802ff66d4
[docker] Updated GPU Dockerfiles to CUDA 11.2 (#16269) 2021-06-07 16:15:19 -07:00
Travis Addair
050a076de9
[k8s] Refactored k8s operator to use kopf for controller logic (#15787)
Co-authored-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
2021-06-01 12:00:55 -07:00
Ian Rodney
ca43e61949
[docker] Make Docker Build More Human Friendly (#15543) 2021-05-05 11:22:17 -07:00