Commit graph

172 commits

Author SHA1 Message Date
xwjiang2010
c48d86e469
[CI] change git protocol to use https. (#19964) 2021-11-01 19:38:58 -07:00
mwtian
7afdfdc6dd
[CI] narrow down tests that run when files change (#19656) 2021-10-29 16:47:54 -07:00
matthewdeng
bfb0ef1b08
move jsonschema to core dependencies and update default AutoscalerPrometheusMetrics (#19831) 2021-10-28 13:04:22 -07:00
Simon Mo
5e927b01ad
Revert "[CI] Remove config that disables Bazel test result cache" (#19818)
* Revert "[CI] Remove config that disables Bazel test result cache (#18701)"

This reverts commit 098ff36faa.

* Remove all RLlib tests from BUILD that currently fail.

Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-10-28 15:54:53 +02:00
Amog Kamsetty
db863aafc0
Revert "Revert "[Docker] Support multiple CUDA Versions (#19505)" (#19756)" (#19763)
This reverts commit e58fcca404.
2021-10-26 17:32:56 -07:00
Amog Kamsetty
e58fcca404
Revert "[Docker] Support multiple CUDA Versions (#19505)" (#19756)
This reverts commit f0053d405b.
2021-10-26 12:55:20 -07:00
Avnish Narayan
ad87ddf93e
[rllib] Add deterministic test to gpu (#19306)
Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-10-26 10:11:39 -07:00
Amog Kamsetty
f0053d405b
[Docker] Support multiple CUDA Versions (#19505)
* wip

* wip

* update

* finish

* deprecate

* debug

* fix and address comments

* try catch

* fix

* split tests

* force

* merge

* docs

* wip

* fix and check

* update readme

* fix

* fix

* fix sanity checking

* format
2021-10-25 18:57:05 -07:00
Jiajun Yao
256bf0bf3a
[Release] Bump up dask to latest compatible version 2021.9.1 (#19592)
* Bump up dask to latest compatible version 2021.9.1

* Bump up dask to latest compatible version 2021.9.1
2021-10-22 09:16:28 -07:00
Simon Mo
03805d4064
[Serve] Good error message when Serve not installed and ensure Serve installs ray[default] (#19570) 2021-10-21 13:47:29 -07:00
mwtian
098ff36faa
[CI] Remove config that disables Bazel test result cache (#18701) 2021-10-19 13:31:42 -07:00
architkulkarni
b8941338d3
[runtime env] Raise error when creating runtime env when ray[default] is not installed (#19491) 2021-10-19 09:16:04 -05:00
matthewdeng
4674c78050
[Train] Rename Ray SGD v2 to Ray Train (#19436) 2021-10-18 22:27:46 -07:00
Antoni Baum
e9df253f5d
[CI/docs] Remove [default] from xgboost-ray (#19186)
Co-authored-by: Kai Fricke <kai@anyscale.com>
2021-10-14 16:29:55 +01:00
Kai Fricke
d8d8901192
[ci/tune] Remove deprecated jenkins_only tag from test tags (#19287) 2021-10-12 10:05:46 +01:00
Matti Picus
9ca34c7192
add dependencies to BUILD.bazel and update windows bazel to 4.2.1 (#19132)
* add dependencies to BUILD.bazel and update windows bazel to 4.2.1

* fixes from review
2021-10-11 10:25:19 -07:00
SangBin Cho
0ef0d9a77d
Revert "[core] Assign tasks to the first available worker (#18167)" (#19180)
This reverts commit 545db13800.
2021-10-07 10:38:37 -07:00
Stephanie Wang
545db13800
[core] Assign tasks to the first available worker (#18167)
* Convert worker pool to queue

* Start up to backlog size more workers

* fixes

* Prestart workers according to num available CPUs

* lint

* x

* Update src/ray/raylet/worker_pool.h

Co-authored-by: Eric Liang <ekhliang@gmail.com>

* Update src/ray/raylet/worker_pool.h

Co-authored-by: Eric Liang <ekhliang@gmail.com>

* dedicated workers

* Fix tests

* x

* fix

* asan

* asan

* Workers can only exec tasks with same job ID

* size_t for runtime env hash, fix unit tests

* include job ID in runtime env hash, remove from worker registration msg

* x

* conflict

* debug

* Schedule and dispatch periodically, skip if no new tasks

* Update src/ray/common/task/task_spec.h

Co-authored-by: Eric Liang <ekhliang@gmail.com>

* Update src/ray/raylet/scheduling/cluster_task_manager.h

Co-authored-by: Eric Liang <ekhliang@gmail.com>

* Update src/ray/raylet/worker_pool.h

Co-authored-by: Eric Liang <ekhliang@gmail.com>

Co-authored-by: Eric Liang <ekhliang@gmail.com>
2021-10-05 13:45:50 -07:00
Kai Fricke
3dc176c42e
[ci/tune] Add SGD and Tune GPU pipeline step to CI (#18469)
* [ci/tune] Add Tune GPU pipeline step to CI

* cont.

* add sgd gpu tests

* format yaml, fix imports

* install horovod; fix line wrapping

* set GPU per worker to 0.5

* fix import

* move test to 4gpu machine

* fix lint

* lint

* set visible devices

* pull in tf gpu fix

* Fix Tune GPU pipeline step

* nit

* Disable GPU tests until we have some

* Re-add empty rllib tests

Co-authored-by: Matthew Deng <matthew.j.deng@gmail.com>
2021-10-01 18:34:05 -07:00
architkulkarni
0f0b161ea1
Revert "Revert "[Serve] [doc] Improve runtime env doc"" (#18943)
* Revert "Revert "[Serve] [doc] Improve runtime env doc (#18782)" (#18935)"

This reverts commit e4f4c79252.
2021-09-30 13:28:44 -05:00
Yi Cheng
e4f4c79252
Revert "[Serve] [doc] Improve runtime env doc (#18782)" (#18935)
This reverts commit d4d71985d5.
2021-09-27 21:52:13 -07:00
architkulkarni
d4d71985d5
[Serve] [doc] Improve runtime env doc (#18782) 2021-09-27 16:12:03 -05:00
mwtian
43ac18bbc0
[Build] include minimal debug info in C++ build; upgrade clang-format to 12 (#18888)
* Revert "Revert "[Build] include minimal debug info in C++ build; upgrade clang-format to 12 (#18840)" (#18886)"

This reverts commit f851a072f3.

* use gcc 8
2021-09-24 17:59:05 -07:00
Chen Shen
f851a072f3
Revert "[Build] include minimal debug info in C++ build; upgrade clang-format to 12 (#18840)" (#18886)
This reverts commit 07e1366383.
2021-09-24 12:55:08 -07:00
mwtian
07e1366383
[Build] include minimal debug info in C++ build; upgrade clang-format to 12 (#18840)
* debug info and clang-format

* doc

* fix

* no clang-format on all files

* gcc

* keep gcc 7
2021-09-24 12:26:33 -07:00
Chen Shen
35aa944ef4
Fix thread-safety in global state accessor (#18746) 2021-09-19 12:01:31 -07:00
mwtian
efdbfcfdfb
[Build] Generate Bazel config for compiling with clang and libc++ in CI (#18622)
* Add Bazel config for building with llvm. Upgrade C++ std to 17.

* Fix redis. Try fixing asan and tsan

* Fix asan and format

* Update comments.

Co-authored-by: Chen Shen <scv119@gmail.com>
2021-09-17 19:01:07 -07:00
Sven Mika
8a72824c63
[RLlib Testig] Split and unflake more CI tests (make sure all jobs are < 30min). (#18591) 2021-09-15 22:16:48 +02:00
Antoni Baum
7e95f330d5
[ci] Fix xgboost_ray install from git (#18640) 2021-09-15 18:07:15 +01:00
Edward Oakes
7736cdd91d
[dashboard] Rename "new_dashboard" -> "dashboard" (#18214) 2021-09-15 11:17:15 -05:00
Antoni Baum
eeb67a42cc
pip install xgboost_ray -> xgboost_ray[default] (#18607)
Co-authored-by: Kai Fricke <kai@anyscale.com>
2021-09-15 14:45:56 +01:00
Simon Mo
497c5f56fa
[CI] Temporary disable worker-in-container test (#18606)
* revert again

* disable tmp
2021-09-14 22:38:20 -07:00
SangBin Cho
0684531e22
[Test] Break down placement group tests (#18612) 2021-09-14 21:55:18 -07:00
mwtian
a3f399ef10
[Client] fix propagating errors to async calls during disconnect, and other cleanup (#18539)
* cleanup tests and errors for clients

* Fix lock and async get

* rerun

* Avoid running callback under lock. Make lock non-reentrant

* Add all necessary apis

* Removed unused APIs
2021-09-14 18:48:27 +03:00
Yi Cheng
7d1f408de9
[workflow] Move experimental/workflow to workflow (#18521) 2021-09-13 17:45:18 -07:00
Chen Shen
5f57079041
use clang for C++ debug testing (#18343) 2021-09-09 15:48:36 -07:00
mwtian
26fd10c9e8
[CI] Add clang-tidy to lint (#18124)
* clang-tidy

* fix

* fix script

* test clang compiler

* fix clang-tidy rules

* Fix windows and other issues.

* Fix

* Improve information when running check-git-clang-tidy-output.sh on different OS
2021-09-09 00:41:53 -07:00
Simon Mo
a29da81cfc
Revert "Revert "Fix tracing bug when actors are defined before connecting to …" (#16122) 2021-09-07 16:19:49 -07:00
ellimac54
772d25cc38
Add Initial Windows Dockerfile (#17474) 2021-09-03 11:41:06 -07:00
Kai Fricke
fb38d06cfb
Move RLLib GPU release test dependencies to ml docker (#18208) 2021-09-03 09:35:18 +01:00
matthewdeng
a3123b6860
[SGD] v2 Horovod backend (#18047)
* [SGD] add Horovod backend

* address comments: set CUDA_VISIBLE_DEVICES, refactor code

* fix gpu test

* fix lint/test import

* address comments, add example cluster config

* delay horovod imports
2021-08-31 12:54:59 -07:00
Kai Fricke
a8dbc44f9a
[ci] minimal dependency install test (#18071) 2021-08-31 15:26:25 +02:00
Sven Mika
599e589481
[RLlib] Move existing fake multi-GPU learning tests into separate buildkite job. (#18065) 2021-08-31 14:56:53 +02:00
Kai Fricke
012f9eb687
[buildkite] Fix jar upload directory (#18253) 2021-08-31 11:18:34 +02:00
Simon Mo
2e0b816d64
[Buildkite] Upload jars to os specific dir (#18229) 2021-08-31 09:32:01 +02:00
Antoni Baum
5be6bda4cf
[tests] Add Ludwig CI test (#18126) 2021-08-30 12:27:39 -07:00
Amog Kamsetty
3b77840c1b
PyTorch Lightning Updates (#17876) 2021-08-27 23:15:51 -07:00
Chen Shen
7e3e0d1535
[Test] Add C++ tsan test (#17875) 2021-08-24 00:57:32 -07:00
Kai Fricke
d058f98546
[RLlib] Add GPU tests to CI (run per-PR). (#17891)
Co-authored-by: simon-mo <simon.mo@hey.com>
2021-08-24 09:20:45 +02:00
Chen Shen
0f894e9cbd
revert ebs cold start (#18010) 2021-08-23 13:40:31 -07:00