Eric Liang
8dded14798
Refactor LazyBlockList to simplify union of lists ( #19214 )
2021-10-07 22:07:52 -07:00
SangBin Cho
afaee05e1e
[Placement Group] Fix placement group removal leak ( #19138 )
2021-10-07 22:04:12 -07:00
Simon Mo
46e80348ad
[Serve] Make long poll wait for non-existent keys ( #19205 )
2021-10-07 19:10:22 -07:00
mwtian
9f066485a3
Tweak clang-tidy rules ( #19210 )
2021-10-07 18:53:18 -07:00
Kai Fricke
8d89e2d546
[tune] Prevent errors with retained trainables in global registry ( #19184 )
...
This PR fixes #19183 by introducing three improvements:
String trainables are prefixed with Durable, e.g. DurablePPO
Durable trainables cannot be wrapped twice with tune.durable()
MRO resolution in _WrappedDurableTrainables indicates we already have a DurableTrainable - thus we catch this with a try/except block
2021-10-07 17:17:01 -07:00
Clark Zinzow
ca731d7c86
[Datasets] Fix API breakage in Datasets nightly test.
2021-10-07 15:07:19 -07:00
Sven Mika
c3e3fc7637
[RLlib] Issue 18280: A3C/IMPALA multi-agent not working. ( #19100 )
2021-10-07 23:57:53 +02:00
Sven Mika
fd438d5630
[RLlib] Issue 18104: Cannot set remote_worker_envs=True for non local-mode and MultiAgentEnv. ( #19133 )
2021-10-07 22:39:21 +02:00
Edward Oakes
454163912f
Revert "[serve] Delete kv store local path after unit tests ( #19165 )" ( #19188 )
...
This reverts commit b90af4dae5
.
2021-10-07 14:26:18 -05:00
Edward Oakes
1fa81673bd
[runtime_env] Clean up validation logic ( #18984 )
...
Splits the runtime_env parsing/validation and overriding into two separate codepaths. Adds unit testing for both.
2021-10-07 14:24:41 -05:00
Kai Fricke
45aad4ee9a
[tune] Add resume="AUTO" and enhance resume error messages ( #19181 )
2021-10-07 19:00:56 +01:00
Stephanie Wang
940f84cedb
[core] Remove unused plasma promotion path ( #19122 )
...
* remove unused
* lint
* lint
* lint
2021-10-07 10:55:50 -07:00
SangBin Cho
0ef0d9a77d
Revert "[core] Assign tasks to the first available worker ( #18167 )" ( #19180 )
...
This reverts commit 545db13800
.
2021-10-07 10:38:37 -07:00
xwjiang2010
7ffd9cbed1
[Tune] Fix column width in doc. ( #19159 )
2021-10-07 18:16:21 +01:00
Antoni Baum
f1587c06fd
[tune] Ensure loc in progress reporter is filled ( #19182 )
2021-10-07 15:43:49 +01:00
Antoni Baum
27b8633198
[docs] Remove outdated note in Tune docs ( #19110 )
2021-10-07 15:42:11 +01:00
Edward Oakes
0f33aaf933
Revert "[Doc] Document existing runtime env's container support ( #19076 )" ( #19160 )
...
This reverts commit 4beba3f727
.
2021-10-07 08:55:30 -05:00
Edward Oakes
b90af4dae5
[serve] Delete kv store local path after unit tests ( #19165 )
2021-10-07 08:55:22 -05:00
Jiajun Yao
5045f0a293
Use bash to run sanity_check_cpp.sh ( #19179 )
2021-10-07 13:34:25 +01:00
SangBin Cho
22f4ffed08
Disable cpu-only-nodes preferred scheduling that breaks placement groups. ( #19129 )
...
* Add a regression test for the short term
* done
* address code review
* lint
2021-10-07 05:34:04 -07:00
Kai Fricke
a8cf8c648c
[tune] track and print elapsed time in reporters ( #19139 )
2021-10-07 10:56:17 +01:00
Avnish Narayan
bbc64a7c3d
[RLlib] Pin Gym to 0.19 ( #19170 )
...
Gym appears to have cut a release, 0.21.
It isn't clear what changes were made
between 0.19/0.20 and 0.21, as there is
no change log available for the 0.21 release,
so for now we'll pin gym to 0.19 until we
can fully understand the breaking changes
in gym 0.21. I suspect some things have
just been removed from the regular gym installation
that rllib has previously relied on. Will address
later.
2021-10-07 07:59:02 +02:00
mwtian
fe413c3c5e
[Client] disable auto init for get_runtime_context() ( #19127 )
2021-10-06 20:20:47 -07:00
Eric Liang
86cbe3e833
[data] Add support for repeating and re-windowing a DatasetPipeline ( #19091 )
2021-10-06 20:13:43 -07:00
Chen Shen
1ed5f622c2
[Core] QuickExit CoreWorker when GetCoreWorker is called after shutdown
2021-10-06 15:07:57 -07:00
Edward Oakes
0f915820e1
[serve] Rename backend_worker -> replica ( #19150 )
2021-10-06 16:39:17 -05:00
Chris K. W
d1517c33ab
[client] deflake test_object_ref_cleanup ( #19153 )
2021-10-06 14:06:43 -07:00
Kai Fricke
9f77cd8d28
[tune] Deflake PBT Async test ( #19135 )
2021-10-06 12:24:22 -07:00
Edward Oakes
9316a9977f
[serve] Support kwargs to deployment constructor ( #19023 )
2021-10-06 14:16:23 -05:00
Frank Luan
77d0a08c38
[docker] Fix missing space in docker.py warning ( #19128 )
2021-10-06 12:09:26 -07:00
Ian Rodney
8cab8d3ae9
[Datasets] Clean Up docs around pipelining -> windowing rename ( #19142 )
2021-10-06 11:07:55 -07:00
Chris K. W
db1105fa83
[client] Skip test_valid_actor_state tests on windows ( #19114 )
...
* skip test_wrapped_actor_creation on windows
* rerun windows ci
* mark test_valid_actor_state_2 as flaky
* mark test_valid_actor_state
* rerun
2021-10-06 09:17:59 -07:00
Simon Mo
4beba3f727
[Doc] Document existing runtime env's container support ( #19076 )
2021-10-06 10:25:57 -05:00
architkulkarni
281fcaa91a
[Serve] [Doc] Add note about serving multiple deployments defined by the same class ( #19118 )
2021-10-06 10:24:42 -05:00
Kai Fricke
234b015b42
[ci] Clean wheels directory before build, validate wheel commit strings ( #19097 )
2021-10-06 13:48:24 +01:00
Sven Mika
1f0646f658
[RLlib] Issue 18418: SAC w/ dict space not working. ( #19101 )
2021-10-06 09:05:50 +02:00
Eric Liang
f8a91c7fad
Revert "[Lint] run clang-tidy
in scripts/format.h
, update clang-tidy rules ( #19055 )" ( #19119 )
...
This reverts commit 5d9e3a0121
.
2021-10-05 16:33:12 -07:00
Eric Liang
0702974f21
Add CODEOWNERS for format.sh script ( #19121 )
2021-10-05 16:31:08 -07:00
Amog Kamsetty
db0483a29a
[SGD] SGD Namespace Consistency ( #19048 )
...
* wip
* update
* add callbacks
* fix
* fix
* update
* add
* address comments
2021-10-05 15:56:42 -07:00
Philipp Moritz
53f1d5de61
Fix C++17 support on some windows machines ( #19088 )
2021-10-05 15:15:59 -07:00
Matti Picus
63dd22c7c2
add msvcp140.dll to the wheel on windows ( #19062 )
...
* add msvcp140.dll to the wheel on windows
* fixes from review
* be more verbose
* Update setup.py
Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>
2021-10-05 15:12:46 -07:00
mwtian
5d9e3a0121
[Lint] run clang-tidy
in scripts/format.h
, update clang-tidy rules ( #19055 )
2021-10-05 14:03:27 -07:00
Stephanie Wang
545db13800
[core] Assign tasks to the first available worker ( #18167 )
...
* Convert worker pool to queue
* Start up to backlog size more workers
* fixes
* Prestart workers according to num available CPUs
* lint
* x
* Update src/ray/raylet/worker_pool.h
Co-authored-by: Eric Liang <ekhliang@gmail.com>
* Update src/ray/raylet/worker_pool.h
Co-authored-by: Eric Liang <ekhliang@gmail.com>
* dedicated workers
* Fix tests
* x
* fix
* asan
* asan
* Workers can only exec tasks with same job ID
* size_t for runtime env hash, fix unit tests
* include job ID in runtime env hash, remove from worker registration msg
* x
* conflict
* debug
* Schedule and dispatch periodically, skip if no new tasks
* Update src/ray/common/task/task_spec.h
Co-authored-by: Eric Liang <ekhliang@gmail.com>
* Update src/ray/raylet/scheduling/cluster_task_manager.h
Co-authored-by: Eric Liang <ekhliang@gmail.com>
* Update src/ray/raylet/worker_pool.h
Co-authored-by: Eric Liang <ekhliang@gmail.com>
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2021-10-05 13:45:50 -07:00
Yi Cheng
ecf7b86585
[workflow] Avoid running workflow step multiple times. ( #19090 )
...
When workflow recover, it'll try to reconstruct the DAG. However, it's step scoped, which means if a workflow is passed to multiple steps, it'll be executed multiple times which breaks the exactly-once semantic.
For ObjectRef it's ok since it'll be cached with serialization context, but we also need a similar thing for Workflow input.
This logic is put in workflow layer instead of serialization layer because it's dedupe on app layer.
Issue #18997 has race conditions, and it's also related to this one. The reason is that multiple steps will try to issue writes to virtual actors at the same time which is not allowed right now and can lead to race condition.
2021-10-05 13:43:27 -07:00
Kai Fricke
42116badba
[ci/release] Check test result alerts after test finished ( #19105 )
2021-10-05 21:27:27 +01:00
Kai Fricke
957f9e9d99
[client] Undo PySpark's monkey patching of namedtuples for PickleStub ( #19034 )
2021-10-05 10:43:50 -07:00
matthewdeng
3fbe135a24
[docs] add modin_xgboost and dask_xgboost notebook tutorials ( #18775 )
...
* Add xgboost-dask golden notebook
* [examples] add modin-xgboost Jupyter notebook
* Add xgboost dast gn
* update modin notebook to sphinx-gallery compatible python file
* fix build file
* fix test
* fix test
* Add modin notebook anyscale connect test
* Add missing file
* add dask_xgboost notebook
* Add the new modin golden notebook to CI
* fix lint and filter out tests with py37
* Update release/golden_notebook_tests_new/golden_notebook_tests.yaml
Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
* Add dask, wait for cluster client, remove pytest
* Replace folder
* Fix
* Update dask_xgboost_app_config.yaml
* Update modin_xgboost_app_config.yaml
* comment on filtered out tests
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
2021-10-05 09:17:33 -07:00
Chen Shen
1efcf5c3d5
[Core][CoreWorker ThreadSafety 1/n] Ensure global_worker_ is protected by mutex #19073
2021-10-05 05:32:28 -07:00
Yi Cheng
2cff293810
fix ( #19094 )
2021-10-05 01:53:05 -07:00
Yi Cheng
1eecb7d80b
up ( #19092 )
2021-10-04 23:54:31 -07:00