Commit graph

15 commits

Author SHA1 Message Date
Amog Kamsetty
adb8d77b2b
[Deps] Bump tensorflow on Docker image and add Codeowners (#20041) 2021-11-05 00:58:34 -07:00
gjoliver
2c1fa459d4
[RLlib] Add an RLlib Tune experiment to UserTest suite. (#19807)
* Add an RLlib Tune experiment to UserTest suite.

* Add ray.init()

* Move example script to example/tune/, so it can be imported as module.

* add __init__.py so our new module will get included in python wheel.

* Add block device to RLlib test instances.

* Reduce disk size a little bit.

* Add metrics reporting

* Allow max of 5 workers to accomodate all the worker tasks.

* revert disk size change.

* Minor updates

* Trigger build

* set max num workers

* Add a compute cfg for autoscaled cpu and gpu nodes.

* use 1gpu instance.

* install tblib for debugging worker crashes.

* Manually upgrade to pytorch 1.9.0

* -y

* torch=1.9.0

* install torch on driver

* Add an RLlib Tune experiment to UserTest suite.

* Add ray.init()

* Move example script to example/tune/, so it can be imported as module.

* add __init__.py so our new module will get included in python wheel.

* Add block device to RLlib test instances.

* Reduce disk size a little bit.

* Add metrics reporting

* Allow max of 5 workers to accomodate all the worker tasks.

* revert disk size change.

* Minor updates

* Trigger build

* set max num workers

* Add a compute cfg for autoscaled cpu and gpu nodes.

* use 1gpu instance.

* install tblib for debugging worker crashes.

* Manually upgrade to pytorch 1.9.0

* -y

* torch=1.9.0

* install torch on driver

* bump timeout

* Write a more informational result dict.

* Revert changes to compute config files that are not used.

* add smoke test

* update

* reduce timeout

* Reduce the # of env per worker to 1.

* Small fix for getting trial_states

* Trigger build

* simply result dict

* lint

* more lint

* fix smoke test

Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
2021-11-03 17:04:27 -07:00
Amog Kamsetty
3a52187da8
[Release/Lightning] Add Ray lightning user test (#19812)
* wip

* wip

* add ray lightning test

* fix

* update

* merge and add

* fix

* fix

* rename

* autoscale

* add tblib

* gloo backend

* typo

* upgrade torch

* latest and master
2021-11-01 18:29:48 -07:00
Amog Kamsetty
84e958f330
[ML] Consolidate and upgrade Deep Learning Dependencies (#18574)
* wip
'

* upgrade requirements

* add file

* fix

* fixes

* Apply suggestions from code review

Try mlagents==0.21.0 for now (works with torch 1.9).

* Apply suggestions from code review

* wip

* wip

* fix

* fix

* upgrade lightning bolts

* address comment

Co-authored-by: Sven Mika <sven@anyscale.io>
2021-09-16 20:16:40 -07:00
Sven Mika
8a00154038
[RLlib] Bump tf version in ML docker to tf==2.5.0; add tfp to ML-docker. (#18544) 2021-09-15 08:46:37 +02:00
Kai Fricke
fb38d06cfb
Move RLLib GPU release test dependencies to ml docker (#18208) 2021-09-03 09:35:18 +01:00
Sven Mika
0bc0e17712
CUDA 11.2 in docker images 2021-08-16 12:31:19 +02:00
Amog Kamsetty
9f5dc5ec9f
[Docker] Downgrade to CUDA 11.0 (#17806) 2021-08-13 20:39:06 +02:00
Amog Kamsetty
c0560dadef
[Docker] Pin Tensorflow (#16741) 2021-06-29 11:14:46 -07:00
Amog Kamsetty
544dff80fa
[Docker] Fix torch GPU install on Ray Docker images (#15473)
Co-authored-by: Ian Rodney <ian.rodney@gmail.com>
2021-04-26 16:22:25 -07:00
Ian Rodney
813a7ab0e2
[docker] Build Python3.6 & Python3.8 Docker Images (#13548) 2021-01-28 15:24:50 -08:00
Ian Rodney
b4bcb9b60a
[Docker] Use Cuda 11 (#13691) 2021-01-27 13:45:30 -08:00
Amog Kamsetty
3f42e6bafe
[Tune] Pin Transitive Dependencies (#13358) 2021-01-13 19:10:21 -08:00
Ian Rodney
47d7d83b6f
[docker] Fix GPU support for tensorflow (#10779) 2020-09-17 10:56:58 -07:00
Ian Rodney
4324dd5929
[docker] Refactor "autoscaler" image into "-autoscaler" tag and "ray-ml" image. (#10351) 2020-09-02 13:03:35 -07:00