Commit graph

177 commits

Author SHA1 Message Date
SangBin Cho
0ef0d9a77d
Revert "[core] Assign tasks to the first available worker (#18167)" (#19180)
This reverts commit 545db13800.
2021-10-07 10:38:37 -07:00
Stephanie Wang
545db13800
[core] Assign tasks to the first available worker (#18167)
* Convert worker pool to queue

* Start up to backlog size more workers

* fixes

* Prestart workers according to num available CPUs

* lint

* x

* Update src/ray/raylet/worker_pool.h

Co-authored-by: Eric Liang <ekhliang@gmail.com>

* Update src/ray/raylet/worker_pool.h

Co-authored-by: Eric Liang <ekhliang@gmail.com>

* dedicated workers

* Fix tests

* x

* fix

* asan

* asan

* Workers can only exec tasks with same job ID

* size_t for runtime env hash, fix unit tests

* include job ID in runtime env hash, remove from worker registration msg

* x

* conflict

* debug

* Schedule and dispatch periodically, skip if no new tasks

* Update src/ray/common/task/task_spec.h

Co-authored-by: Eric Liang <ekhliang@gmail.com>

* Update src/ray/raylet/scheduling/cluster_task_manager.h

Co-authored-by: Eric Liang <ekhliang@gmail.com>

* Update src/ray/raylet/worker_pool.h

Co-authored-by: Eric Liang <ekhliang@gmail.com>

Co-authored-by: Eric Liang <ekhliang@gmail.com>
2021-10-05 13:45:50 -07:00
Kai Fricke
3dc176c42e
[ci/tune] Add SGD and Tune GPU pipeline step to CI (#18469)
* [ci/tune] Add Tune GPU pipeline step to CI

* cont.

* add sgd gpu tests

* format yaml, fix imports

* install horovod; fix line wrapping

* set GPU per worker to 0.5

* fix import

* move test to 4gpu machine

* fix lint

* lint

* set visible devices

* pull in tf gpu fix

* Fix Tune GPU pipeline step

* nit

* Disable GPU tests until we have some

* Re-add empty rllib tests

Co-authored-by: Matthew Deng <matthew.j.deng@gmail.com>
2021-10-01 18:34:05 -07:00
architkulkarni
0f0b161ea1
Revert "Revert "[Serve] [doc] Improve runtime env doc"" (#18943)
* Revert "Revert "[Serve] [doc] Improve runtime env doc (#18782)" (#18935)"

This reverts commit e4f4c79252.
2021-09-30 13:28:44 -05:00
Yi Cheng
e4f4c79252
Revert "[Serve] [doc] Improve runtime env doc (#18782)" (#18935)
This reverts commit d4d71985d5.
2021-09-27 21:52:13 -07:00
architkulkarni
d4d71985d5
[Serve] [doc] Improve runtime env doc (#18782) 2021-09-27 16:12:03 -05:00
Chen Shen
35aa944ef4
Fix thread-safety in global state accessor (#18746) 2021-09-19 12:01:31 -07:00
mwtian
efdbfcfdfb
[Build] Generate Bazel config for compiling with clang and libc++ in CI (#18622)
* Add Bazel config for building with llvm. Upgrade C++ std to 17.

* Fix redis. Try fixing asan and tsan

* Fix asan and format

* Update comments.

Co-authored-by: Chen Shen <scv119@gmail.com>
2021-09-17 19:01:07 -07:00
Sven Mika
8a72824c63
[RLlib Testig] Split and unflake more CI tests (make sure all jobs are < 30min). (#18591) 2021-09-15 22:16:48 +02:00
Edward Oakes
7736cdd91d
[dashboard] Rename "new_dashboard" -> "dashboard" (#18214) 2021-09-15 11:17:15 -05:00
Simon Mo
497c5f56fa
[CI] Temporary disable worker-in-container test (#18606)
* revert again

* disable tmp
2021-09-14 22:38:20 -07:00
mwtian
a3f399ef10
[Client] fix propagating errors to async calls during disconnect, and other cleanup (#18539)
* cleanup tests and errors for clients

* Fix lock and async get

* rerun

* Avoid running callback under lock. Make lock non-reentrant

* Add all necessary apis

* Removed unused APIs
2021-09-14 18:48:27 +03:00
Yi Cheng
7d1f408de9
[workflow] Move experimental/workflow to workflow (#18521) 2021-09-13 17:45:18 -07:00
Chen Shen
5f57079041
use clang for C++ debug testing (#18343) 2021-09-09 15:48:36 -07:00
Simon Mo
a29da81cfc
Revert "Revert "Fix tracing bug when actors are defined before connecting to …" (#16122) 2021-09-07 16:19:49 -07:00
matthewdeng
a3123b6860
[SGD] v2 Horovod backend (#18047)
* [SGD] add Horovod backend

* address comments: set CUDA_VISIBLE_DEVICES, refactor code

* fix gpu test

* fix lint/test import

* address comments, add example cluster config

* delay horovod imports
2021-08-31 12:54:59 -07:00
Kai Fricke
a8dbc44f9a
[ci] minimal dependency install test (#18071) 2021-08-31 15:26:25 +02:00
Sven Mika
599e589481
[RLlib] Move existing fake multi-GPU learning tests into separate buildkite job. (#18065) 2021-08-31 14:56:53 +02:00
Antoni Baum
5be6bda4cf
[tests] Add Ludwig CI test (#18126) 2021-08-30 12:27:39 -07:00
Amog Kamsetty
3b77840c1b
PyTorch Lightning Updates (#17876) 2021-08-27 23:15:51 -07:00
Chen Shen
7e3e0d1535
[Test] Add C++ tsan test (#17875) 2021-08-24 00:57:32 -07:00
Chen Shen
880797d5c2
[Core][Test] Add ubsan support for C++ tests (#17812)
* support ubsan

* update
2021-08-17 10:22:03 -07:00
SangBin Cho
4971e13941
[Build] Asan wheel test (#17685)
* in progerss

* ASAN tests.

* d

* in progress

* in progress without the asan wheel

* Support the asan wheel.

* Support the asan wheels

* Not build a binary for asan

* Fix issues

* Remove a wrong build

* Separate out asan wheel build

* Try preparing more deps.

* ip

* Try different version

* done

* d

* Trial

* Another try

* Another try

* skip cpp build to see what happens

* add more des

* ip

* abc

* Try next

* completed

* try

* Try without static libasan

* dbg

* Try static link

* Fix issues

* abc
2021-08-17 10:21:41 -07:00
Sven Mika
f3bbe4ea44
[RLlib] Test cases/BUILD cleanup; split "everything else" (longest running one rn) tests in 2. (#17640) 2021-08-16 22:01:01 +02:00
Clark Zinzow
d6eeb5dc70
[Datasets] Add local and S3 filesystem test coverage for file-based datasources. (#17158) 2021-08-12 08:39:31 -07:00
Chen Shen
0fd3f761b9
[ci][rfc] build debug wheels and run python test on debug build (#17399)
* enable debug mode

* add

* :upload debug wheels

* upload debug wheels

* add

* fix bug

* add dbg

* Update python/setup.py

Co-authored-by: Simon Mo <simon.mo@hey.com>

* skip windows

Co-authored-by: Simon Mo <simon.mo@hey.com>
2021-08-05 17:58:19 -07:00
Eric Liang
d4f9d3620e
Move ray.data out of experimental (#17560) 2021-08-04 13:31:10 -07:00
Sven Mika
5231fdd996
[Testing] Split RLlib example scripts CI tests into 4 jobs (from 2). (#17331) 2021-07-26 10:52:55 -04:00
matthewdeng
fdbeef6046
[SGD] RaySGD v2 skeleton code (#17300)
* [SGD] RaySGD v2 skeleton code

* add build file

* move file

* empty

* rename

* address comments

* add method interfaces

* move BUILD file out of tests dir

Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
2021-07-25 17:39:24 -07:00
mwtian
b8e71f641c
[Build] Ray Docker image for Python 3.9. (#16571) 2021-07-22 13:38:57 -07:00
Amog Kamsetty
5c589debfa
Revert "Set runs_per_test_detect_flakes for core tests on master (#16863)" (#16936)
This reverts commit 44042519af.
2021-07-07 10:25:46 -07:00
Eric Liang
44042519af
Set runs_per_test_detect_flakes for core tests on master (#16863) 2021-07-06 18:46:48 -07:00
Clark Zinzow
52da2cce68
[Dataset] Adds JSON, CSV, Pandas, and Dask IO layers, and adds the write side of the Parquet IO layer. (#16724) 2021-07-01 11:57:40 -07:00
chenk008
06c7db7dca
[Core] Rename container option and ray-nest-container (#16771)
* rename container_option to container

* rename ray-nest-container to ray-worker-container

* lint

Co-authored-by: wuhua.ck <wuhua.ck@alibaba-inc.com>
2021-07-01 13:12:26 +08:00
Amog Kamsetty
69507f53db
[Horovod] Add Horovod example (#16742)
* wip

* updates

* updates

* update

* formatting

* updates

* updates

* update

* fix

* add timeout
2021-06-29 19:15:15 -07:00
chenk008
c318293d9f
[Core] start worker in container (#16671) 2021-06-29 10:12:47 -07:00
Eric Liang
6bfa97eed7
Check in the first iteration of an Arrow-based dataset api (#16648) 2021-06-25 18:45:13 -07:00
Amog Kamsetty
e6d9f0b393
[Dask] Support Dask 2021.06.1 (#16547) 2021-06-19 18:22:23 -07:00
Antoni Baum
ec7d7c8630
[Tune] Add soft imports test (#16450) 2021-06-15 18:50:21 -07:00
architkulkarni
412085dea7
[Runtime Env] filter out post wheel tests from doc tests (#16439) 2021-06-15 15:34:45 -07:00
Amog Kamsetty
f9936c4252
[Dask] Dask Example Tests (#16346)
* add examples

* update dask docs

* add build file

* formatting

* fix ci command

* fix

* Update python/ray/util/dask/BUILD

* newline

* fix pytest fixtures

* fixes

* formatting

* fix shuffle example
2021-06-12 20:25:45 -07:00
Chris K. W
3fa9f2e5d6
[Modin] Add tests for modin (#16260)
Adds modin tests that run with and without ray client.
2021-06-11 12:23:33 -07:00
architkulkarni
7d029f8e71
[Doc] [Core] [runtime env] Add runtime env doc (#16290) 2021-06-09 20:02:16 -05:00
Amog Kamsetty
de4045703d
[SGD] Fix SGD Client CI (#16301) 2021-06-08 10:08:14 -07:00
Siyuan (Ryans) Zhuang
480e5e822e
Inital workflow API implementation (#16174) 2021-06-07 10:00:15 -07:00
Simon Mo
d6b3050632
[Buildkite] Wheels and Docker fixup (#16241) 2021-06-04 00:48:12 -07:00
Simon Mo
ab93a3a64a
[CI] Buildkite upload wheels to S3 and docker (#16138) 2021-06-03 20:10:31 -07:00
Eric Liang
43c97c2afb
Disable timeline events collection in Ray by default (#15989) 2021-06-02 18:04:29 -07:00
Amog Kamsetty
65f1d67e9c
[SGD] Ray Client Support and tests (#16111)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-06-01 13:21:26 -07:00
Amog Kamsetty
cfa2997b86
[XGBoost] Add test with Ray Client (#16103) 2021-05-28 16:13:06 -07:00