Sven Mika
|
2357bbc0c8
|
[RLlib] Issue 18231: Better (earlier) env validation and error message improvement. (#18249)
|
2021-09-02 09:28:16 +02:00 |
|
gjoliver
|
6621bb5611
|
[RLlib] Minor renaming and cleanups related to last rollout worker seed fix. (#18155)
|
2021-09-02 06:57:46 +02:00 |
|
xwjiang2010
|
9fa7951171
|
[core] Log once when get_gpu_ids is called on driver. (#18282)
|
2021-09-01 16:47:00 -07:00 |
|
Stephanie Wang
|
d43d297d9a
|
[core] Attach call site to ObjectRefs, print on error (#17971)
* Attach call site to ObjectRef
* flag
* Fix build
* build
* build
* build
* x
* x
* skip on windows
* lint
|
2021-09-01 15:29:05 -07:00 |
|
Yi Cheng
|
d470e679df
|
[core] Add some mock headers for ray core (#18265)
* up
* up
* up
* format
* up
* up
* format
|
2021-09-01 13:04:35 -07:00 |
|
Chris K. W
|
1a10108765
|
[core] Release function actor lock while waiting for actor class to be loaded by import thread (#18175)
|
2021-09-01 12:59:48 -07:00 |
|
Sven Mika
|
a7670d9fab
|
[RLlib; Testing] Fix smoke-test settings for nightly learning_tests and stress_test ; Add pybullet_envs to app-config. (#18274)
|
2021-09-01 21:46:06 +02:00 |
|
Amog Kamsetty
|
9c2e7ffd97
|
[SGD] v2 Fault Tolerance (#18090)
* wip
* wip
* wip
* wip
* update
* finish
* remove
* fix
* update
* update
* update comment
* handle backend failures
* bump test timeout
* address comments
* fix
* fix
* address comments
* formatting
* add comment
* address comment
* fix failing test
* update error message
* Update python/ray/util/sgd/v2/trainer.py
* wip
* fix failing test
* formatting
* fix
|
2021-09-01 12:43:10 -07:00 |
|
Edward Oakes
|
0326bbb30a
|
[serve] Skip test_standalone namespace test on windows (#18277)
|
2021-09-01 12:58:59 -05:00 |
|
Jiajun Yao
|
fbb3ac6a86
|
Retry application-level errors (#18176)
* Retry application-level errors
* Retry application-level errors
* Push retry message to the driver
|
2021-09-01 10:53:06 -07:00 |
|
Edward Oakes
|
673bf35c1f
|
Refactor BackendState to be per-backend instead of global (#18255)
|
2021-09-01 09:46:22 -05:00 |
|
mwtian
|
be50c13251
|
[Client] Use a single RPC to fetch ClientObjectRefs passed in a list (#16944)
|
2021-08-31 16:31:13 -07:00 |
|
Edward Oakes
|
5d122cf7b7
|
[runtime_env] Move working dir setup to the agent (#18170)
|
2021-08-31 17:22:49 -05:00 |
|
Guyang Song
|
be772df4dc
|
[Event] Add some error level events (#18118)
* add event 'RAY_WORKER_FAILURE' and 'RAY_DRIVER_FAILURE'
* add some events
* move event 'EL_RAY_NODE_REMOVED' to 'RemoveNode()'
|
2021-08-31 14:15:13 -07:00 |
|
Sven Mika
|
82465f9342
|
[RLlib] Better PolicyServer example (w/ or w/o tune) and add printing out actual listen port address in log-level=INFO. (#18254)
|
2021-08-31 22:03:23 +02:00 |
|
matthewdeng
|
a3123b6860
|
[SGD] v2 Horovod backend (#18047)
* [SGD] add Horovod backend
* address comments: set CUDA_VISIBLE_DEVICES, refactor code
* fix gpu test
* fix lint/test import
* address comments, add example cluster config
* delay horovod imports
|
2021-08-31 12:54:59 -07:00 |
|
Wesley Gifford
|
6133a561e9
|
Dataset from modin (#18122)
|
2021-08-31 11:19:35 -07:00 |
|
Nikita Vemuri
|
c5b99ab590
|
[serve] Start RayInternalKVStore in controller namespace (#18164)
|
2021-08-31 13:09:33 -05:00 |
|
Edward Oakes
|
17dded543c
|
Support passing gcs_client to internal_kv (#18235)
|
2021-08-31 12:46:41 -05:00 |
|
xwjiang2010
|
63f00843f3
|
[Tune] Inform users of the setup needed for uploading results to cloud. (#18220)
|
2021-08-31 10:27:50 -07:00 |
|
mwtian
|
134ac0ef55
|
[CI] Fix clang-format to always compare against master (#18140)
|
2021-08-31 10:16:33 -07:00 |
|
SangBin Cho
|
34026a7bd5
|
Change instance type for some tests (#18248)
|
2021-08-31 10:10:46 -07:00 |
|
SangBin Cho
|
d240d26525
|
[Object Spilling] Fix a bug where object url is empty. (#18193)
* Fix a bug
* Addressed code review.
* Fix a test
|
2021-08-31 10:10:28 -07:00 |
|
Antoni Baum
|
2c0dcec18f
|
[test] Fix golden notebook tests always failing (#17873)
|
2021-08-31 17:07:47 +02:00 |
|
Ryan L. Melvin
|
c081c68de7
|
[tune] Conditional search space example using hyperopt (#18130)
Co-authored-by: Ryan Melvin <rmelvin@uabmc.edu>
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
|
2021-08-31 17:06:22 +02:00 |
|
Kai Fricke
|
a8dbc44f9a
|
[ci] minimal dependency install test (#18071)
|
2021-08-31 15:26:25 +02:00 |
|
Sven Mika
|
599e589481
|
[RLlib] Move existing fake multi-GPU learning tests into separate buildkite job. (#18065)
|
2021-08-31 14:56:53 +02:00 |
|
Sven Mika
|
4888d7c9af
|
[RLlib] Replay buffers: Add config option to store contents in checkpoints. (#17999)
|
2021-08-31 12:21:49 +02:00 |
|
Kai Fricke
|
012f9eb687
|
[buildkite] Fix jar upload directory (#18253)
|
2021-08-31 11:18:34 +02:00 |
|
Simon Mo
|
2e0b816d64
|
[Buildkite] Upload jars to os specific dir (#18229)
|
2021-08-31 09:32:01 +02:00 |
|
SangBin Cho
|
eab506cc37
|
[Test] Disable non streaming shuffle 5000 partitions (#18224)
* Disable non streaming shuffle 5000 partitions
* increase timeout for 5000 partition shuffle
|
2021-08-31 00:28:15 -07:00 |
|
Chen Shen
|
5f3ec7634b
|
Fix off by one test bug (#18239)
|
2021-08-31 00:07:03 -07:00 |
|
Clark Zinzow
|
e154f87cab
|
Added split_at_indices to DatasetPipeline. (#18243)
|
2021-08-31 00:06:35 -07:00 |
|
Eric Liang
|
db9b5f142d
|
Disable worker logs temporarily during driver breakpoints (#18192)
|
2021-08-30 20:26:16 -07:00 |
|
Stephanie Wang
|
8e06db7280
|
Revert "[Core] revert: revert Unified worker starter (#18008)" (#18228)
This reverts commit b9978dd02b .
|
2021-08-30 17:28:41 -07:00 |
|
Tim Hopper
|
fd2a8a6b9c
|
[docs] Fix broken urls (#18206)
|
2021-08-30 17:24:06 -07:00 |
|
Yi Cheng
|
7a65815108
|
[workflow] Defer input preparation until run (#18225)
|
2021-08-30 16:37:34 -07:00 |
|
Antoni Baum
|
5be6bda4cf
|
[tests] Add Ludwig CI test (#18126)
|
2021-08-30 12:27:39 -07:00 |
|
SangBin Cho
|
2ee1b90c17
|
[Core] Batch obod location updates (#18016)
* Batch impl
* done
* Remove a client pool
* in progress
* Added unit tests.
* Handle owner failure case.
* Fix unit tests
* Addressed code review.
|
2021-08-30 11:04:08 -07:00 |
|
SangBin Cho
|
dfbad8668a
|
Support better infra failure detection + stable flag (#18202)
|
2021-08-30 10:51:03 -07:00 |
|
Eric Liang
|
1adce7da4e
|
Revert "Auto discover dashboard agent port (#17855)" (#18217)
This reverts commit 53ddb551d5 .
|
2021-08-30 10:46:37 -07:00 |
|
Yi Cheng
|
f579822790
|
[workflow] Workflow inside virtual actor (#18066)
|
2021-08-30 10:40:22 -07:00 |
|
Alex Wu
|
ca86098680
|
Revert "[core] Refactor test_many_tasks (#18169)" (#18216)
This reverts commit eb6fd20d53 .
|
2021-08-30 10:35:23 -07:00 |
|
Stephanie Wang
|
eb6fd20d53
|
[core] Refactor test_many_tasks (#18169)
* Improve test
test
* lint
|
2021-08-30 10:33:23 -07:00 |
|
Chen Shen
|
7631d042bb
|
[Test] increase timeout for object spilling test caused by EBS cold storage issue (#18200)
|
2021-08-30 00:28:26 -07:00 |
|
SangBin Cho
|
0e968c1e82
|
[Core] Reduce spilling threshold (#17910)
* Lower the threshold
* ip
* Handle test failure
* lint
* last fix
* .
* Retry
|
2021-08-30 00:09:35 -07:00 |
|
Zhi Lin
|
d3786ac131
|
Bump Java version to 2.0.0-SNAPSHOT (#15394)
* bump java version to 2.0.0-SNAPSHOT
* update
|
2021-08-30 12:25:30 +08:00 |
|
fyrestone
|
53ddb551d5
|
Auto discover dashboard agent port (#17855)
|
2021-08-30 12:06:28 +08:00 |
|
Stephanie Wang
|
7bc1ef0dd9
|
[core] Prestart workers up to available CPU limit (#18166)
* Prestart workers according to num available CPUs
* lint
* Prestart min(available CPU, backlog)
* Fix test, adjust policy
* debug
* retry
* lint
|
2021-08-29 14:11:53 -07:00 |
|
Yi Cheng
|
d5cd95364b
|
[workflow] Some usability issues fixing (#18133)
|
2021-08-28 16:51:00 -07:00 |
|