Amog Kamsetty
e58fcca404
Revert "[Docker] Support multiple CUDA Versions ( #19505 )" ( #19756 )
...
This reverts commit f0053d405b
.
2021-10-26 12:55:20 -07:00
Avnish Narayan
ad87ddf93e
[rllib] Add deterministic test to gpu ( #19306 )
...
Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-10-26 10:11:39 -07:00
Amog Kamsetty
f0053d405b
[Docker] Support multiple CUDA Versions ( #19505 )
...
* wip
* wip
* update
* finish
* deprecate
* debug
* fix and address comments
* try catch
* fix
* split tests
* force
* merge
* docs
* wip
* fix and check
* update readme
* fix
* fix
* fix sanity checking
* format
2021-10-25 18:57:05 -07:00
Jiajun Yao
256bf0bf3a
[Release] Bump up dask to latest compatible version 2021.9.1 ( #19592 )
...
* Bump up dask to latest compatible version 2021.9.1
* Bump up dask to latest compatible version 2021.9.1
2021-10-22 09:16:28 -07:00
Simon Mo
03805d4064
[Serve] Good error message when Serve not installed and ensure Serve installs ray[default] ( #19570 )
2021-10-21 13:47:29 -07:00
architkulkarni
b8941338d3
[runtime env] Raise error when creating runtime env when ray[default] is not installed ( #19491 )
2021-10-19 09:16:04 -05:00
matthewdeng
4674c78050
[Train] Rename Ray SGD v2 to Ray Train ( #19436 )
2021-10-18 22:27:46 -07:00
Kai Fricke
d8d8901192
[ci/tune] Remove deprecated jenkins_only
tag from test tags ( #19287 )
2021-10-12 10:05:46 +01:00
SangBin Cho
0ef0d9a77d
Revert "[core] Assign tasks to the first available worker ( #18167 )" ( #19180 )
...
This reverts commit 545db13800
.
2021-10-07 10:38:37 -07:00
Stephanie Wang
545db13800
[core] Assign tasks to the first available worker ( #18167 )
...
* Convert worker pool to queue
* Start up to backlog size more workers
* fixes
* Prestart workers according to num available CPUs
* lint
* x
* Update src/ray/raylet/worker_pool.h
Co-authored-by: Eric Liang <ekhliang@gmail.com>
* Update src/ray/raylet/worker_pool.h
Co-authored-by: Eric Liang <ekhliang@gmail.com>
* dedicated workers
* Fix tests
* x
* fix
* asan
* asan
* Workers can only exec tasks with same job ID
* size_t for runtime env hash, fix unit tests
* include job ID in runtime env hash, remove from worker registration msg
* x
* conflict
* debug
* Schedule and dispatch periodically, skip if no new tasks
* Update src/ray/common/task/task_spec.h
Co-authored-by: Eric Liang <ekhliang@gmail.com>
* Update src/ray/raylet/scheduling/cluster_task_manager.h
Co-authored-by: Eric Liang <ekhliang@gmail.com>
* Update src/ray/raylet/worker_pool.h
Co-authored-by: Eric Liang <ekhliang@gmail.com>
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2021-10-05 13:45:50 -07:00
Kai Fricke
3dc176c42e
[ci/tune] Add SGD and Tune GPU pipeline step to CI ( #18469 )
...
* [ci/tune] Add Tune GPU pipeline step to CI
* cont.
* add sgd gpu tests
* format yaml, fix imports
* install horovod; fix line wrapping
* set GPU per worker to 0.5
* fix import
* move test to 4gpu machine
* fix lint
* lint
* set visible devices
* pull in tf gpu fix
* Fix Tune GPU pipeline step
* nit
* Disable GPU tests until we have some
* Re-add empty rllib tests
Co-authored-by: Matthew Deng <matthew.j.deng@gmail.com>
2021-10-01 18:34:05 -07:00
architkulkarni
0f0b161ea1
Revert "Revert "[Serve] [doc] Improve runtime env doc"" ( #18943 )
...
* Revert "Revert "[Serve] [doc] Improve runtime env doc (#18782 )" (#18935 )"
This reverts commit e4f4c79252
.
2021-09-30 13:28:44 -05:00
Yi Cheng
e4f4c79252
Revert "[Serve] [doc] Improve runtime env doc ( #18782 )" ( #18935 )
...
This reverts commit d4d71985d5
.
2021-09-27 21:52:13 -07:00
architkulkarni
d4d71985d5
[Serve] [doc] Improve runtime env doc ( #18782 )
2021-09-27 16:12:03 -05:00
Chen Shen
35aa944ef4
Fix thread-safety in global state accessor ( #18746 )
2021-09-19 12:01:31 -07:00
mwtian
efdbfcfdfb
[Build] Generate Bazel config for compiling with clang and libc++ in CI ( #18622 )
...
* Add Bazel config for building with llvm. Upgrade C++ std to 17.
* Fix redis. Try fixing asan and tsan
* Fix asan and format
* Update comments.
Co-authored-by: Chen Shen <scv119@gmail.com>
2021-09-17 19:01:07 -07:00
Sven Mika
8a72824c63
[RLlib Testig] Split and unflake more CI tests (make sure all jobs are < 30min). ( #18591 )
2021-09-15 22:16:48 +02:00
Edward Oakes
7736cdd91d
[dashboard] Rename "new_dashboard" -> "dashboard" ( #18214 )
2021-09-15 11:17:15 -05:00
Simon Mo
497c5f56fa
[CI] Temporary disable worker-in-container test ( #18606 )
...
* revert again
* disable tmp
2021-09-14 22:38:20 -07:00
mwtian
a3f399ef10
[Client] fix propagating errors to async calls during disconnect, and other cleanup ( #18539 )
...
* cleanup tests and errors for clients
* Fix lock and async get
* rerun
* Avoid running callback under lock. Make lock non-reentrant
* Add all necessary apis
* Removed unused APIs
2021-09-14 18:48:27 +03:00
Yi Cheng
7d1f408de9
[workflow] Move experimental/workflow
to workflow
( #18521 )
2021-09-13 17:45:18 -07:00
Chen Shen
5f57079041
use clang for C++ debug testing ( #18343 )
2021-09-09 15:48:36 -07:00
Simon Mo
a29da81cfc
Revert "Revert "Fix tracing bug when actors are defined before connecting to …" ( #16122 )
2021-09-07 16:19:49 -07:00
matthewdeng
a3123b6860
[SGD] v2 Horovod backend ( #18047 )
...
* [SGD] add Horovod backend
* address comments: set CUDA_VISIBLE_DEVICES, refactor code
* fix gpu test
* fix lint/test import
* address comments, add example cluster config
* delay horovod imports
2021-08-31 12:54:59 -07:00
Kai Fricke
a8dbc44f9a
[ci] minimal dependency install test ( #18071 )
2021-08-31 15:26:25 +02:00
Sven Mika
599e589481
[RLlib] Move existing fake multi-GPU learning tests into separate buildkite job. ( #18065 )
2021-08-31 14:56:53 +02:00
Antoni Baum
5be6bda4cf
[tests] Add Ludwig CI test ( #18126 )
2021-08-30 12:27:39 -07:00
Amog Kamsetty
3b77840c1b
PyTorch Lightning Updates ( #17876 )
2021-08-27 23:15:51 -07:00
Chen Shen
7e3e0d1535
[Test] Add C++ tsan test ( #17875 )
2021-08-24 00:57:32 -07:00
Chen Shen
880797d5c2
[Core][Test] Add ubsan support for C++ tests ( #17812 )
...
* support ubsan
* update
2021-08-17 10:22:03 -07:00
SangBin Cho
4971e13941
[Build] Asan wheel test ( #17685 )
...
* in progerss
* ASAN tests.
* d
* in progress
* in progress without the asan wheel
* Support the asan wheel.
* Support the asan wheels
* Not build a binary for asan
* Fix issues
* Remove a wrong build
* Separate out asan wheel build
* Try preparing more deps.
* ip
* Try different version
* done
* d
* Trial
* Another try
* Another try
* skip cpp build to see what happens
* add more des
* ip
* abc
* Try next
* completed
* try
* Try without static libasan
* dbg
* Try static link
* Fix issues
* abc
2021-08-17 10:21:41 -07:00
Sven Mika
f3bbe4ea44
[RLlib] Test cases/BUILD cleanup; split "everything else" (longest running one rn) tests in 2. ( #17640 )
2021-08-16 22:01:01 +02:00
Clark Zinzow
d6eeb5dc70
[Datasets] Add local and S3 filesystem test coverage for file-based datasources. ( #17158 )
2021-08-12 08:39:31 -07:00
Chen Shen
0fd3f761b9
[ci][rfc] build debug wheels and run python test on debug build ( #17399 )
...
* enable debug mode
* add
* :upload debug wheels
* upload debug wheels
* add
* fix bug
* add dbg
* Update python/setup.py
Co-authored-by: Simon Mo <simon.mo@hey.com>
* skip windows
Co-authored-by: Simon Mo <simon.mo@hey.com>
2021-08-05 17:58:19 -07:00
Eric Liang
d4f9d3620e
Move ray.data out of experimental ( #17560 )
2021-08-04 13:31:10 -07:00
Sven Mika
5231fdd996
[Testing] Split RLlib example scripts CI tests into 4 jobs (from 2). ( #17331 )
2021-07-26 10:52:55 -04:00
matthewdeng
fdbeef6046
[SGD] RaySGD v2 skeleton code ( #17300 )
...
* [SGD] RaySGD v2 skeleton code
* add build file
* move file
* empty
* rename
* address comments
* add method interfaces
* move BUILD file out of tests dir
Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
2021-07-25 17:39:24 -07:00
mwtian
b8e71f641c
[Build] Ray Docker image for Python 3.9. ( #16571 )
2021-07-22 13:38:57 -07:00
Amog Kamsetty
5c589debfa
Revert "Set runs_per_test_detect_flakes for core tests on master ( #16863 )" ( #16936 )
...
This reverts commit 44042519af
.
2021-07-07 10:25:46 -07:00
Eric Liang
44042519af
Set runs_per_test_detect_flakes for core tests on master ( #16863 )
2021-07-06 18:46:48 -07:00
Clark Zinzow
52da2cce68
[Dataset] Adds JSON, CSV, Pandas, and Dask IO layers, and adds the write side of the Parquet IO layer. ( #16724 )
2021-07-01 11:57:40 -07:00
chenk008
06c7db7dca
[Core] Rename container option and ray-nest-container ( #16771 )
...
* rename container_option to container
* rename ray-nest-container to ray-worker-container
* lint
Co-authored-by: wuhua.ck <wuhua.ck@alibaba-inc.com>
2021-07-01 13:12:26 +08:00
Amog Kamsetty
69507f53db
[Horovod] Add Horovod example ( #16742 )
...
* wip
* updates
* updates
* update
* formatting
* updates
* updates
* update
* fix
* add timeout
2021-06-29 19:15:15 -07:00
chenk008
c318293d9f
[Core] start worker in container ( #16671 )
2021-06-29 10:12:47 -07:00
Eric Liang
6bfa97eed7
Check in the first iteration of an Arrow-based dataset api ( #16648 )
2021-06-25 18:45:13 -07:00
Amog Kamsetty
e6d9f0b393
[Dask] Support Dask 2021.06.1 ( #16547 )
2021-06-19 18:22:23 -07:00
Antoni Baum
ec7d7c8630
[Tune] Add soft imports test ( #16450 )
2021-06-15 18:50:21 -07:00
architkulkarni
412085dea7
[Runtime Env] filter out post wheel tests from doc tests ( #16439 )
2021-06-15 15:34:45 -07:00
Amog Kamsetty
f9936c4252
[Dask] Dask Example Tests ( #16346 )
...
* add examples
* update dask docs
* add build file
* formatting
* fix ci command
* fix
* Update python/ray/util/dask/BUILD
* newline
* fix pytest fixtures
* fixes
* formatting
* fix shuffle example
2021-06-12 20:25:45 -07:00
Chris K. W
3fa9f2e5d6
[Modin] Add tests for modin ( #16260 )
...
Adds modin tests that run with and without ray client.
2021-06-11 12:23:33 -07:00