Commit graph

9841 commits

Author SHA1 Message Date
Sven Mika
bd2d2079d2
[RLlib] Support >1 loss terms and optimizers for framework=tf2 (already supported for framework=[tf|torch]) (#19269) 2021-10-10 12:19:47 +02:00
gjoliver
635010d460
Update build rules and patches for darwin_arm64 platform. (#19037)
* Update build rules and patches for darwin_arm64 platform.

Changes include:

Update nelhage/rules_boost package from current version (08/5/2020) to 5/27/2021 version.
Remove rules_boost-undefine-boost_fallthrough.patch, since BOOST_FALLTHROUGH seems to be defined now.
Minor changes to rules_boost-windows-linkopts.patch to use default condition to add -lpthread flag for all platforms.
Add darwin_arm64 config to BUILD files for lib civetweb pulled in via prometheu dependency.

* upgrade boost to 1.74.0 from 1.71.0 to match the udpated build file for windows.

* Fix ray_cpp_pkg

* Use boost/bind/bind.hpp

boost/bind.hpp and global namespace placeholders are deprecated.

* lint

* Use absl::bind_front when possible. Otherwise, NOLINT

* lint

* lint

* lint

* lint

* more lint

* final lint

* trigger build
2021-10-09 18:48:35 -07:00
Chen Shen
eff4f694cb
[Core] loosen some clang-tidy check (#19246)
Some of the clang-tidy rules are too strict, loosen them.
2021-10-09 12:22:13 -07:00
Guyang Song
bae543c956
[runtime env] support eager_install in runtime env (#17949) 2021-10-09 17:59:57 +08:00
Eric Liang
a92f1fedf4
Revert "[tune/wip] Exclude trial checkpoints in experiment sync (#19185)" (#19245)
This reverts commit 44b0b6eb20.
2021-10-08 19:47:12 -07:00
Eric Liang
7e84cd9b67
Add CODEOWNERS for data/ and workflow/ libraries (#19244) 2021-10-08 19:21:06 -07:00
Eric Liang
b59317520d
Revert "[Workflow] workflow.delete (#19178)" (#19247)
This reverts commit 7ea512f343.
2021-10-08 19:12:55 -07:00
Alex Wu
7ea512f343
[Workflow] workflow.delete (#19178)
Why are these changes needed?
This PR implements workflow.delete which allows users to delete the information in storage related to a workflow. (This assumes the workflow isn't currently running).

Related issue number
Closes #18848
2021-10-08 16:11:59 -07:00
Jiajun Yao
c31f0e17e6
Replace ray.__commit__ with the actual commit SHA when we build the windows (#19213)
wheel
2021-10-08 16:06:52 -07:00
Sven Mika
d439fd7f17
[RLlib] TF2/eager memory leak fixes. (#19198) 2021-10-09 00:11:53 +02:00
Edward Oakes
47447c71e0
[serve] Remove excessive backend_state.update() calls in unit tests (#19225)
These extra update cycles are no longer needed now that we removed the SHOULD_START and SHOULD_STOP states.
2021-10-08 16:36:44 -05:00
mwtian
b066627539 [Object manager] don't abort entire pull request on race condition from concurrent chunk receive - #2 (#19216) 2021-10-08 12:58:18 -07:00
Patrick Ames
fa047c050b
[data] Make directory creation in dataset output path optional. (#19202) 2021-10-08 12:36:10 -07:00
Carlo Grisetti
d6dbc6dc97
Fix warning message spacing (#19164) 2021-10-08 11:46:02 -07:00
chenk008
3780a73b45
[Core] Add worker resource info to runtime env (#18804) 2021-10-08 10:37:29 -07:00
Edward Oakes
9cf19b67cc
[serve] Remove log poll client from replicas (#19145)
In general, broadcasting changes to the replicas via the LongPollClient is hard to reason about (it circumvents our versioning semantics as there's no rolling update). Ideally we would only be using the LongPollClient to broadcast replica membership and nothing else.
2021-10-08 12:32:42 -05:00
Edward Oakes
86d1a5bfc6
[serve] Catch ConnectionError during shutdown in LongPollClient (#19224) 2021-10-08 12:31:35 -05:00
Edward Oakes
93bcea7bdd
[serve] Clean up kv store file, skip on windows (#19194) 2021-10-08 12:30:48 -05:00
Kai Fricke
44b0b6eb20
[tune/wip] Exclude trial checkpoints in experiment sync (#19185) 2021-10-08 18:26:03 +01:00
Kai Fricke
e5e1ba93d9
[tune] Use queue to display JupyterNotebookReporter updates in Ray client (#19137) 2021-10-08 18:23:20 +01:00
Antoni Baum
c7d6f838f6
[tune] Optional forcible trial cleanup, return default autofilled metrics even if Trainable doesn't report at least once (#19144) 2021-10-08 18:16:26 +01:00
Eric Liang
8beabb283b
Force disable placement_group for all dataset tasks (#19208) 2021-10-08 10:16:09 -07:00
Kai Fricke
f1606acc2b
[tune] Fix durable(str) name for class trainables, preventing trial recovery (#19223) 2021-10-08 17:32:05 +01:00
SangBin Cho
dd1c1f9787
[Nightly test] remove env vars from tests (#19221)
When testing it we should minimize unnecessary env vars (and it's better working with the default config). This PR removes unnecessary env vars that are set.
2021-10-08 06:53:23 -07:00
Guyang Song
c4bc05bbab
set event_log_reporter_enabled True by default (#18112) 2021-10-07 23:09:36 -07:00
architkulkarni
1aab892623
[Runtime Env] add excludes to known fields for runtime env (#19206) 2021-10-07 22:47:49 -07:00
Eric Liang
8dded14798
Refactor LazyBlockList to simplify union of lists (#19214) 2021-10-07 22:07:52 -07:00
SangBin Cho
afaee05e1e
[Placement Group] Fix placement group removal leak (#19138) 2021-10-07 22:04:12 -07:00
Simon Mo
46e80348ad
[Serve] Make long poll wait for non-existent keys (#19205) 2021-10-07 19:10:22 -07:00
mwtian
9f066485a3
Tweak clang-tidy rules (#19210) 2021-10-07 18:53:18 -07:00
Kai Fricke
8d89e2d546
[tune] Prevent errors with retained trainables in global registry (#19184)
This PR fixes #19183 by introducing three improvements:

String trainables are prefixed with Durable, e.g. DurablePPO
Durable trainables cannot be wrapped twice with tune.durable()
MRO resolution in _WrappedDurableTrainables indicates we already have a DurableTrainable - thus we catch this with a try/except block
2021-10-07 17:17:01 -07:00
Clark Zinzow
ca731d7c86
[Datasets] Fix API breakage in Datasets nightly test. 2021-10-07 15:07:19 -07:00
Sven Mika
c3e3fc7637
[RLlib] Issue 18280: A3C/IMPALA multi-agent not working. (#19100) 2021-10-07 23:57:53 +02:00
Sven Mika
fd438d5630
[RLlib] Issue 18104: Cannot set remote_worker_envs=True for non local-mode and MultiAgentEnv. (#19133) 2021-10-07 22:39:21 +02:00
Edward Oakes
454163912f
Revert "[serve] Delete kv store local path after unit tests (#19165)" (#19188)
This reverts commit b90af4dae5.
2021-10-07 14:26:18 -05:00
Edward Oakes
1fa81673bd
[runtime_env] Clean up validation logic (#18984)
Splits the runtime_env parsing/validation and overriding into two separate codepaths. Adds unit testing for both.
2021-10-07 14:24:41 -05:00
Kai Fricke
45aad4ee9a
[tune] Add resume="AUTO" and enhance resume error messages (#19181) 2021-10-07 19:00:56 +01:00
Stephanie Wang
940f84cedb
[core] Remove unused plasma promotion path (#19122)
* remove unused

* lint

* lint

* lint
2021-10-07 10:55:50 -07:00
SangBin Cho
0ef0d9a77d
Revert "[core] Assign tasks to the first available worker (#18167)" (#19180)
This reverts commit 545db13800.
2021-10-07 10:38:37 -07:00
xwjiang2010
7ffd9cbed1
[Tune] Fix column width in doc. (#19159) 2021-10-07 18:16:21 +01:00
Antoni Baum
f1587c06fd
[tune] Ensure loc in progress reporter is filled (#19182) 2021-10-07 15:43:49 +01:00
Antoni Baum
27b8633198
[docs] Remove outdated note in Tune docs (#19110) 2021-10-07 15:42:11 +01:00
Edward Oakes
0f33aaf933
Revert "[Doc] Document existing runtime env's container support (#19076)" (#19160)
This reverts commit 4beba3f727.
2021-10-07 08:55:30 -05:00
Edward Oakes
b90af4dae5
[serve] Delete kv store local path after unit tests (#19165) 2021-10-07 08:55:22 -05:00
Jiajun Yao
5045f0a293
Use bash to run sanity_check_cpp.sh (#19179) 2021-10-07 13:34:25 +01:00
SangBin Cho
22f4ffed08
Disable cpu-only-nodes preferred scheduling that breaks placement groups. (#19129)
* Add a regression test for the short term

* done

* address code review

* lint
2021-10-07 05:34:04 -07:00
Kai Fricke
a8cf8c648c
[tune] track and print elapsed time in reporters (#19139) 2021-10-07 10:56:17 +01:00
Avnish Narayan
bbc64a7c3d
[RLlib] Pin Gym to 0.19 (#19170)
Gym appears to have cut a release, 0.21.
It isn't clear what changes were made
between 0.19/0.20 and 0.21, as there is
no change log available for the 0.21 release,
so for now we'll pin gym to 0.19 until we
can fully understand the breaking changes
in gym 0.21. I suspect some things have
just been removed from the regular gym installation
that rllib has previously relied on. Will address
later.
2021-10-07 07:59:02 +02:00
mwtian
fe413c3c5e
[Client] disable auto init for get_runtime_context() (#19127) 2021-10-06 20:20:47 -07:00
Eric Liang
86cbe3e833
[data] Add support for repeating and re-windowing a DatasetPipeline (#19091) 2021-10-06 20:13:43 -07:00