hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Amog Kamsetty	732175e245	[AIR] Add distributed `torch_geometric` example (#23580 ) Add example for distributed pytorch geometric (graph learning) with Ray AIR This only showcases distributed training, but with data small enough that it can be loaded in by each training worker individually. Distributed data ingest is out of scope for this PR. Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>	2022-04-21 09:48:43 -07:00
ddelange	e109c13b83	[ci] Clean up ray-ml requirements (#23325 ) In https://github.com/ray-project/ray/blob/ray-1.11.0/docker/ray-ml/Dockerfile, the order of pip install commands currently matters (potentially a lot). It would be good to run one big pip install command to avoid ending up with a broken env. Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>	2022-03-25 15:59:54 +00:00
Amog Kamsetty	adb8d77b2b	[Deps] Bump tensorflow on Docker image and add Codeowners (#20041 )	2021-11-05 00:58:34 -07:00
gjoliver	2c1fa459d4	[RLlib] Add an RLlib Tune experiment to UserTest suite. (#19807 ) * Add an RLlib Tune experiment to UserTest suite. * Add ray.init() * Move example script to example/tune/, so it can be imported as module. * add __init__.py so our new module will get included in python wheel. * Add block device to RLlib test instances. * Reduce disk size a little bit. * Add metrics reporting * Allow max of 5 workers to accomodate all the worker tasks. * revert disk size change. * Minor updates * Trigger build * set max num workers * Add a compute cfg for autoscaled cpu and gpu nodes. * use 1gpu instance. * install tblib for debugging worker crashes. * Manually upgrade to pytorch 1.9.0 * -y * torch=1.9.0 * install torch on driver * Add an RLlib Tune experiment to UserTest suite. * Add ray.init() * Move example script to example/tune/, so it can be imported as module. * add __init__.py so our new module will get included in python wheel. * Add block device to RLlib test instances. * Reduce disk size a little bit. * Add metrics reporting * Allow max of 5 workers to accomodate all the worker tasks. * revert disk size change. * Minor updates * Trigger build * set max num workers * Add a compute cfg for autoscaled cpu and gpu nodes. * use 1gpu instance. * install tblib for debugging worker crashes. * Manually upgrade to pytorch 1.9.0 * -y * torch=1.9.0 * install torch on driver * bump timeout * Write a more informational result dict. * Revert changes to compute config files that are not used. * add smoke test * update * reduce timeout * Reduce the # of env per worker to 1. * Small fix for getting trial_states * Trigger build * simply result dict * lint * more lint * fix smoke test Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>	2021-11-03 17:04:27 -07:00
Amog Kamsetty	3a52187da8	[Release/Lightning] Add Ray lightning user test (#19812 ) * wip * wip * add ray lightning test * fix * update * merge and add * fix * fix * rename * autoscale * add tblib * gloo backend * typo * upgrade torch * latest and master	2021-11-01 18:29:48 -07:00
Amog Kamsetty	84e958f330	[ML] Consolidate and upgrade Deep Learning Dependencies (#18574 ) * wip ' * upgrade requirements * add file * fix * fixes * Apply suggestions from code review Try mlagents==0.21.0 for now (works with torch 1.9). * Apply suggestions from code review * wip * wip * fix * fix * upgrade lightning bolts * address comment Co-authored-by: Sven Mika <sven@anyscale.io>	2021-09-16 20:16:40 -07:00
Sven Mika	8a00154038	[RLlib] Bump tf version in ML docker to tf==2.5.0; add tfp to ML-docker. (#18544 )	2021-09-15 08:46:37 +02:00
Kai Fricke	fb38d06cfb	Move RLLib GPU release test dependencies to ml docker (#18208 )	2021-09-03 09:35:18 +01:00
Sven Mika	0bc0e17712	CUDA 11.2 in docker images	2021-08-16 12:31:19 +02:00
Amog Kamsetty	9f5dc5ec9f	[Docker] Downgrade to CUDA 11.0 (#17806 )	2021-08-13 20:39:06 +02:00
Amog Kamsetty	c0560dadef	[Docker] Pin Tensorflow (#16741 )	2021-06-29 11:14:46 -07:00
Amog Kamsetty	544dff80fa	[Docker] Fix torch GPU install on Ray Docker images (#15473 ) Co-authored-by: Ian Rodney <ian.rodney@gmail.com>	2021-04-26 16:22:25 -07:00
Ian Rodney	813a7ab0e2	[docker] Build Python3.6 & Python3.8 Docker Images (#13548 )	2021-01-28 15:24:50 -08:00
Ian Rodney	b4bcb9b60a	[Docker] Use Cuda 11 (#13691 )	2021-01-27 13:45:30 -08:00
Amog Kamsetty	3f42e6bafe	[Tune] Pin Transitive Dependencies (#13358 )	2021-01-13 19:10:21 -08:00
Ian Rodney	47d7d83b6f	[docker] Fix GPU support for tensorflow (#10779 )	2020-09-17 10:56:58 -07:00
Ian Rodney	4324dd5929	[docker] Refactor "autoscaler" image into "-autoscaler" tag and "ray-ml" image. (#10351 )	2020-09-02 13:03:35 -07:00

17 commits