Commit graph

9876 commits

Author SHA1 Message Date
gjoliver
e61f2c72d7
Upgrade bazel version to 4.2.1 (#18996) 2021-09-30 10:50:54 -07:00
mwtian
d12e35ce53
[Object manager] don't abort entire pull request on race condition in concurrent chunk receive (#18955) 2021-09-30 10:19:54 -07:00
Simon Mo
910553c3bb
[Core] Add private method to retrieve current task queue length (#18964) 2021-09-30 09:20:04 -07:00
Amog Kamsetty
98ac3f601c
[SGD] v1 to v2 Migration Guide (#18887)
* wip

* add guide

* fix test

* address comments

* add to docs

* fix

* remove markdown

* add warning to all pages

* formatting

* fix

* links

* Update doc/source/raysgd/v2/migration-guide.rst

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>

* Update doc/source/raysgd/v2/migration-guide.rst

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>

* Update doc/source/raysgd/v2/migration-guide.rst

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>

* Update doc/source/raysgd/v2/migration-guide.rst

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>

* Update doc/source/raysgd/v2/migration-guide.rst

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>

* address comments

* address comments

* fix

* address comments

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
2021-09-30 09:15:21 -07:00
architkulkarni
bf6e50813c
[runtime env] Parse local pip/conda requirements files locally upon task/actor definition (#18988) 2021-09-30 09:47:15 -05:00
Sven Mika
ac3371a148
[RLlib] Discussion 3644: Fix bug for complex obs spaces containing Box([2D shape]) and discrete component. (#18917) 2021-09-30 16:39:38 +02:00
Sven Mika
ed85f59194
[RLlib] Unify all RLlib Trainer.train() -> results[info][learner][policy ID][learner_stats] and add structure tests. (#18879) 2021-09-30 16:39:05 +02:00
Sven Mika
828f5d26b7
[RLlib] Custom view requirements (e.g. for prev-n-obs) work with compute_single_action and compute_actions_from_input_dict. (#18921) 2021-09-30 15:03:37 +02:00
Avnish Narayan
6dc1a6b72f
[RLlib] Raise error for kl penalty ddpo (#18959)
* [RLlib] Raise error for kl penalty ddpo

DDPPO doesn't support KL penalties like PPO-1.
In order to support KL penalties, DDPPO would need to
become undecentralized, which defeats the purpose of the
algorithm. Users can still tune the entropy coefficient to
control the policy entropy (similar to controlling the KL
penalty.)

* Update rllib/agents/ppo/ddppo.py

Co-authored-by: avnishn <avnishnarayan@gmail.com>
Co-authored-by: Sven Mika <sven@anyscale.io>
2021-09-30 10:56:22 +02:00
Chris K. W
291fd36dee
_ray_trace_ctx fix follow-up (#18950)
* sanity check

* add test case

* fix assert

* refactor

* check kwargs instead of _kwargs

* format
2021-09-29 23:53:04 -07:00
Sven Mika
05a55a9335
[RLlib] Issue 18668: Unity3D env client/server example not working (fix + add to test cases). (#18942) 2021-09-30 08:30:20 +02:00
SangBin Cho
55227a15b9
Handle retry to avoid statement timeout exception/ (#18968) 2021-09-29 23:04:35 -07:00
Clark Zinzow
74b5d3d8f7
[Datasets] Minimize truncation on balanced splits. (#18953)
* Minimize truncation on balanced splits.

* Refactor into subroutines.

* Feedback and fixes.
2021-09-29 21:57:08 -07:00
Alex Wu
5709c6501b
[dataset][usability] Dataset dependencies (#18346) 2021-09-29 17:29:31 -07:00
Yi Cheng
a993f3a262
[nightly] update nightly test for many node test 2021-09-29 17:28:44 -07:00
Clark Zinzow
73a6cda812
Handle empty datasets properly in most Dataset transformations. (#18983) 2021-09-29 17:27:03 -07:00
Eric Liang
aa985e1a9c
Fix false positive error message from autoscaler events (#18981) 2021-09-29 15:51:18 -07:00
Jiajun Yao
be29d27e8a
[Scalability Envelope] Include broadcast time in test_object_store result json (#18974) 2021-09-29 13:49:16 -07:00
Stephanie Wang
5eddaabd11
[core] Fix bug in dependency resolution for actor handles (#18862)
* x

* lint
2021-09-29 13:25:31 -07:00
Antoni Baum
573c66a755
[GCP] Update GCP TPU config (#18634)
* [autoscaler] Update GCP TPU config

* Preemptible by default

* Remove libtpu link from head node

* Workaround
2021-09-29 12:41:26 -07:00
Sven Mika
9c9b482661
[RLlib] Allow n-step > 1 and prio. replay for R2D2 and RNNSAC. (#18939) 2021-09-29 21:31:34 +02:00
Sven Mika
b99943806e
[RLlib] Add support for IMPALA to handle more than one loss/optimizer (analogous to recent enhancement for APPO). (#18971) 2021-09-29 21:30:04 +02:00
Jiajun Yao
ed9118393c
Listen to 127.0.0.1 by default on mac osx (#18904) 2021-09-29 11:40:19 -07:00
Eric Liang
3665c99896
Deflake test_failure_2.py::test_warning_for_infeasible_zero_cpu_actor 2021-09-29 11:39:16 -07:00
Dmitri Gekhtman
944309c017
Revert "[nightly] Deflaky nightly test many_nodes_actor_test (#18582)" (#18954)
* Revert "[nightly] Deflaky nightly test many_nodes_actor_test (#18582)"

This reverts commit fc6a739e4b.

* move to large test

Co-authored-by: Yi Cheng <chengyidna@gmail.com>
2021-09-29 11:02:14 -04:00
Jiajun Yao
35774fd399
[CI] Print out the mismatched commit in ci (#18956) 2021-09-29 15:48:57 +01:00
Chong-Li
42744f29ee
[GCS] Make Gcs-based actor scheduler's bookkeeping consistent (#18546)
* Make Gcs-based scheduler's bookkeeping consistent

* Remove this from lambda function

* Fix lambda function

* Trigger SchedulePendingActors

* Test for acquiring/releasing resources

* Reorganize structure

* Avoid overloading post

* Fix gcs_actor_manager_test

* Fix post counter and rename some func

* Fix unique_ptr

* Fix unique_ptr

* Fix book lint error

* Lint

Co-authored-by: Chong-Li <lc300133@antgroup.com>
2021-09-29 05:53:34 -07:00
Lixin Wei
a6a02779fe
[Core] remove verbose log from task execution (#18736) 2021-09-29 00:31:33 -07:00
matthewdeng
91a5f67261
[SGD] add share_cuda_visible_devices config flag (#18958) 2021-09-29 00:21:46 -07:00
Chu Xiangyang
505aa89d12
[Dashboard] Add start/end time for job (#18901) 2021-09-28 20:57:13 -07:00
Yi Cheng
16cf719aff
[core] hot fix of build failure (#18963) 2021-09-28 20:29:28 -07:00
Yi Cheng
96dff6e46d
[core] fix implicit merge conflict (#18961) 2021-09-28 19:18:54 -07:00
matthewdeng
83a28cc901
[client] update documentation with AWS security_group (#18905)
* [client] update documentation with AWS security_group

* Update doc/source/cluster/ray-client.rst

Co-authored-by: Ian Rodney <ian.rodney@gmail.com>

* lint

Co-authored-by: Ian Rodney <ian.rodney@gmail.com>
2021-09-28 16:47:46 -07:00
Eric Liang
4d763d3ffd
Increase metrics fetch timeout in autoscaler for large clusters 2021-09-28 15:24:44 -07:00
Edward Oakes
73b8936aa8
[runtime_env] Unify rpc::RuntimeEnv with serialized_runtime_env field (#18641) 2021-09-28 15:13:15 -05:00
Yi Cheng
4af07a8917
[rpc] cpu improvement of protobuf in gcs (#17933) 2021-09-28 11:47:19 -07:00
SangBin Cho
a0a02f4982
[Placement Group] Fix placement group high cpu usage part 1 (#18652) 2021-09-28 11:14:59 -07:00
Edward Oakes
96054953cc
[serve] Remove python_methods logic and raise an error dynamically instead (#18927) 2021-09-28 09:51:46 -07:00
Chris K. W
191af472ac
[client] remove ray_trace_ctx from kwargs if tracing disabled (#18926) 2021-09-28 09:47:43 -07:00
Ian Rodney
0d3544588e
[AutoRun] Fix Auto-Run for Client (#18457)
* remove old version

* auto init first attempt

* arg for fn decorator

* default to True

* ray.method should not autostart

* comments

* no auto init on global state fns

* tiny test fix

* quick tests

* respond to comments

* explain func

* fix comments

* forgot to save

* fix again

* fix reconnect tests

* fix medium tests

* fix workflows test

* Better fix for workflows
2021-09-28 08:00:26 -07:00
Yi Cheng
e3dd1e3751
Revert "Revert "[test] add unit test for PR #17634 (#18585)" (#18830)" (#18871)
* Revert "Revert "[test] add unit test for PR #17634 (#18585)" (#18830)"

This reverts commit 8dd3057644.

* up
2021-09-28 05:53:52 -07:00
Richard Liaw
227aa9e89b
[tune] change delimiter for results (#16573)
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Kai Fricke <kai@anyscale.com>
2021-09-28 10:03:00 +01:00
Kai Fricke
6be87a3453
[tune] fix tune list-trials metric (#18914) 2021-09-28 09:59:32 +01:00
Chen Shen
057c425122
[Core][CoreWorker] call shutdown in the correct thread (#18910) 2021-09-28 01:29:47 -07:00
Chen Shen
62a73f4ce8
[nightly test][event] enable event logs in nightly tests (#18936) 2021-09-28 01:29:26 -07:00
Yi Cheng
e4f4c79252
Revert "[Serve] [doc] Improve runtime env doc (#18782)" (#18935)
This reverts commit d4d71985d5.
2021-09-27 21:52:13 -07:00
matthewdeng
d2caa00be8
[SGD] add SGDv2 survey link to docs (#18934) 2021-09-27 19:15:37 -07:00
Chen Shen
25d14cb4de
Ensure task_execution_service_ is destructed first (#18913) 2021-09-27 18:31:56 -07:00
Eric Liang
caf34a452c
Unify ArrowTensorType tables and Tensor blocks (#18867) 2021-09-27 16:24:09 -07:00
Maxim Egorushkin
be0133da1d
[Autoscaler][GCP] Allow Google Compute Engine instance templates. (#18620)
Co-authored-by: Maxim Egorushkin <maxim.egorushkin@gmail.com>
Co-authored-by: Ian <ian.rodney@gmail.com>
2021-09-27 16:08:41 -07:00