Commit graph

4826 commits

Author SHA1 Message Date
Stephanie Wang
bd169749e0
Option to retry failed actor tasks (#8330)
* Python

* Consolidate state in the direct actor transport, set the caller starts at

* todo

* Remove unused

* Update and unit tests

* Doc

* Remove unused

* doc

* Remove debug

* Update src/ray/core_worker/transport/direct_actor_transport.h

Co-authored-by: Eric Liang <ekhliang@gmail.com>

* Update src/ray/core_worker/transport/direct_actor_transport.cc

Co-authored-by: Eric Liang <ekhliang@gmail.com>

* lint and fix build

* Update

* Fix build

* Fix tests

* Unit test for max_task_retries=0

* Fix java?

* Fix bad test

* Cross language fix

* fix java

Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-05-15 20:15:15 -07:00
Robert Nishihara
41d8c2bd0a
[CI] Don't uninstall ruby in Travis. (#8463) 2020-05-15 18:10:55 -07:00
SangBin Cho
1b734ba045
Pin sklearn version (#8465) 2020-05-15 16:54:54 -07:00
Edward Oakes
ef498e8aa5
[serve] Add basic session affinity via shard key (#8449) 2020-05-15 16:18:52 -05:00
Sven Mika
c9435cad43
WIP. (#8456)
Fix multi-GPU histogram metrics for > 0D tensors.
2020-05-15 21:43:27 +02:00
krfricke
4633d81c39
[tune] added average scope to experiment analysis (#8445) 2020-05-14 15:20:43 -07:00
Edward Oakes
ef20564d8e
[serve] Pin http proxy to the node that serve.init() is run on (#8436) 2020-05-14 16:38:29 -05:00
Max Fitton
00325eb2b2
Rename max_reconstructions to max_restarts and use -1 for infinite (#8274)
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
2020-05-14 10:30:29 -05:00
Sven Mika
5f4c196fed
[RLlib] Make PyTorch Model forward pass faster in vf-case. (#8422) 2020-05-14 10:15:50 +02:00
Hao Chen
212f78f735
Small improvements on build.sh (#8418) 2020-05-14 15:30:09 +08:00
fangfengbin
08b612052b
Add redis store client AsyncGetAll/AsyncBatchDelete/AsyncDeleteByIndex API (#8390) 2020-05-14 14:38:25 +08:00
Eric Liang
eabb801a40
less important (#8439) 2020-05-13 22:52:38 -07:00
Simon Mo
122353c392
[Serve] Fix SKLearn example against newest version (#8428) 2020-05-13 14:09:57 -07:00
Eric Liang
6bf1dc0888
[rllib] [hotfix] Build broken due to merge conflict: MixInReplay has no attribute buffer 2020-05-13 12:21:04 -07:00
mehrdadn
cd0037064c
Windows wheels for multiple Python versions (#8369)
* Upload wheels to latest directory as well on GitHub Actions

* Fix bug in install-dependencies.sh

* Move out bazel build //:* from install_ray, since it isn't really necessary for that purpose

* Build wheels for different versions of Python on Windows

* Compile Windows in opt mode

Co-authored-by: Mehrdad <noreply@github.com>
2020-05-12 22:06:04 -07:00
Siyuan (Ryans) Zhuang
ab278071ac
Update serialization doc (#8381)
* update serialization doc
2020-05-12 16:47:00 -07:00
Eric Liang
96f4d82cc3
[rllib] Qmix replay ratio is wrong 2020-05-12 13:07:19 -07:00
Edward Oakes
bb494a3be8
[serve] Refactor router policies to remove inheritance (#8372) 2020-05-12 12:34:01 -05:00
Eric Liang
7ce138a6dc
[rllib] Support free_log_std in ModelV2 (#8380)
* update

* factor

* update

* fix test failures

* fix torch net
2020-05-12 10:14:05 -07:00
mehrdadn
ac1ed293e3
Patch redis-py bug for Windows (#8386) 2020-05-12 10:41:45 -05:00
mehrdadn
a3b95d4664
Make Travis clone the full repo and the exact commit requested (#8331)
Co-authored-by: Mehrdad <noreply@github.com>
2020-05-12 10:40:45 -05:00
Edward Oakes
b84fe56bed
Split test_basic to avoid timeouts in CI (#8405) 2020-05-12 10:18:21 -05:00
Hao Chen
a593fde606
Fix core dumps in ExitActor (#8382) 2020-05-12 20:06:04 +08:00
Sven Mika
57544b1ff9
[RLlib] Examples folder restructuring (Model examples; final part). (#8278)
- This PR completes any previously missing PyTorch Model counterparts to TFModels in examples/models.
- It also makes sure, all example scripts in the rllib/examples folder are tested for both frameworks and learn the given task (this is often currently not checked) using a --as-test flag in connection with a --stop-reward.
2020-05-12 08:23:10 +02:00
Eric Liang
9d012626e5
[rllib] Distributed exec workflow for impala (#8321) 2020-05-11 20:24:43 -07:00
Sven Mika
c7cb2f5416
[RLlib] IMPALA PyTorch GPU fixes (#8397) 2020-05-11 22:03:27 +02:00
Edward Oakes
fdf0e5ceb1
Update README to say that python 2 is deprecated (#8404) 2020-05-11 14:49:49 -05:00
Jason McGhee
24ced808cd
Fix config key in docs for using PyTorch (#8300)
Docs improperly suggest using "torch" when the actual flag is called "use_pytorch"
2020-05-11 12:41:21 -07:00
Stephanie Wang
f97f466cec
Fix test (#8391) 2020-05-11 10:15:53 -07:00
mehrdadn
66b3edccb9
Prefer built-in system compilers over Clang download (#8355)
Co-authored-by: Mehrdad <noreply@github.com>
2020-05-11 11:53:35 -05:00
fangfengbin
515afa6809
Fix AsyncGetAll miss override bug (#8402)
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-05-11 11:08:16 -05:00
fangfengbin
8d0c1b5e06
GCS adapts to actor table pub sub (#8347) 2020-05-11 13:53:53 +08:00
Simon Mo
501b936114
[Serve] Improve error message when result is not a list (#8378) 2020-05-10 17:18:06 -07:00
Stephanie Wang
3a25f5f5b4
Clean up actor state from the GCS (#8261)
* parametrize test

* Regression test and logging

* Test no restart after actor deletion

* Unit tests

* Refactor to subscribe to and lookup from worker failure table

* Refactor ActorManager to remove dependencies

* Revert "Regression test and logging"

This reverts commit 835e1a9091b51ca8efb00392d4cc4a665145de24.

* Revert "parametrize test"

This reverts commit f31272082831ba1a494816dd5511d87b24eca4c9.

* Revert "Test no restart after actor deletion"

This reverts commit 114a83de14329aa6ab787c80cd5757cf074a9072.

* doc

* merge

* Revert "Refactor to subscribe to and lookup from worker failure table"

This reverts commit 6aa13a05178d0b9aa1db9dee5c978c911b74fa3a.

* Revert "Revert "Test no restart after actor deletion""

This reverts commit 1bd92d09172aa8ab42632551cf9c56463f9598fe.

* Revert "Revert "parametrize test""

This reverts commit 639ba4d3b02167fb2b05e9878f9aa600bcec95b3.

* Revert "Revert "Regression test and logging""

This reverts commit f18b5f0db699a23cbccde32789e3639425e99ca4.

* Clean up actors that have gone out of scope

* Use actor ID instead of shared_ptr

* Clean up actors owned by dead workers

* Use actor ID instead of shared_ptr

* TODO and lint

* Fix unit tests

* Add unit tests for supervision and docs

* xx

* Fix tests

* Fix tests

* fix build
2020-05-09 18:43:49 -07:00
Thomas Lecat
4421f3a000
[tune] Close loggers after updating trial (#8307) (#8366) 2020-05-09 13:26:59 -07:00
Edward Oakes
2677b71003
Implement named actors using the GCS service (#8328) 2020-05-09 08:58:10 -05:00
Hao Chen
93138e617a
Fix a bad usage of std::move (#8364) 2020-05-09 14:24:24 +08:00
Eric Liang
1126fe4d23
[tune] Add UUID back to trial names (#8377) 2020-05-08 20:20:36 -07:00
fangfengbin
7fec602f2e
GCS adapts to node resource table pub sub (#8305) 2020-05-09 10:31:35 +08:00
A Kharitonov
304e31b7e5
Fixed: contrib/MADDPG MADDPGTFPolicy missing self.config assignment (#8343) 2020-05-08 12:05:06 -07:00
Sven Mika
754290daad
[RLlib] Add light-weight Trainer.compute_action() tests for all Algos. (#8356) 2020-05-08 16:31:31 +02:00
Sven Mika
d946f58fd0
LINT fixes. (#8370) 2020-05-08 16:24:20 +02:00
gehring
7f14fb577d
[RLlib] Added TransformerXL and "stabilized for RL" variant, GTrXL (#6470) 2020-05-08 14:10:23 +02:00
Eric Liang
2c599dbf05
[rllib] Port QMIX, MADDPG to new execution API (#8344) 2020-05-07 23:41:10 -07:00
Eric Liang
9f04a65922
[rllib] Add PPO+DQN two trainer multiagent workflow example (#8334) 2020-05-07 23:40:29 -07:00
Sven Mika
d7eaacb5fe
[RLlib] Issue 8319 DDPG (MA or num_envs_per_worker > 1) broken. (#8324) 2020-05-08 08:26:32 +02:00
Sven Mika
5f278c6411
[RLlib] Examples folder restructuring (models) part 1 (#8353) 2020-05-08 08:20:18 +02:00
Eric Liang
413db0902d
Trigger global GC when resources may be occupied by deleted actors 2020-05-07 14:57:21 -07:00
Edward Oakes
f2f118df9e
[serve] Clear serve cluster state between tests. (#8357) 2020-05-07 16:45:20 -05:00
Eric Liang
30db920787
[rllib] Fix centralized critic example to use right policy (#8341)
* update

* update
2020-05-07 10:47:55 -07:00