Commit graph

2395 commits

Author SHA1 Message Date
Luca Cappelletti
2ff26f13d2
[tune] Added EarlyStopping and relative test suite (#8459) 2020-05-17 12:18:59 -07:00
Joseph Lucas
42c9fa19d1
[autoscaler] Ray Up url-arg (#8279) 2020-05-17 12:18:00 -07:00
Edward Oakes
16f48078d9
Remove use of ObjectID transport flag (#7699) 2020-05-17 11:29:49 -05:00
Edward Oakes
fb23bd6fc0
[serve] Optionally namespace serve clusters (#8447) 2020-05-17 00:14:42 -05:00
Richard Liaw
67c01455fe
[tune] tune.track -> tune.report (#8388) 2020-05-16 12:55:08 -07:00
Stephanie Wang
bd169749e0
Option to retry failed actor tasks (#8330)
* Python

* Consolidate state in the direct actor transport, set the caller starts at

* todo

* Remove unused

* Update and unit tests

* Doc

* Remove unused

* doc

* Remove debug

* Update src/ray/core_worker/transport/direct_actor_transport.h

Co-authored-by: Eric Liang <ekhliang@gmail.com>

* Update src/ray/core_worker/transport/direct_actor_transport.cc

Co-authored-by: Eric Liang <ekhliang@gmail.com>

* lint and fix build

* Update

* Fix build

* Fix tests

* Unit test for max_task_retries=0

* Fix java?

* Fix bad test

* Cross language fix

* fix java

Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-05-15 20:15:15 -07:00
Edward Oakes
ef498e8aa5
[serve] Add basic session affinity via shard key (#8449) 2020-05-15 16:18:52 -05:00
Sven Mika
c9435cad43
WIP. (#8456)
Fix multi-GPU histogram metrics for > 0D tensors.
2020-05-15 21:43:27 +02:00
krfricke
4633d81c39
[tune] added average scope to experiment analysis (#8445) 2020-05-14 15:20:43 -07:00
Edward Oakes
ef20564d8e
[serve] Pin http proxy to the node that serve.init() is run on (#8436) 2020-05-14 16:38:29 -05:00
Max Fitton
00325eb2b2
Rename max_reconstructions to max_restarts and use -1 for infinite (#8274)
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
2020-05-14 10:30:29 -05:00
Simon Mo
122353c392
[Serve] Fix SKLearn example against newest version (#8428) 2020-05-13 14:09:57 -07:00
Edward Oakes
bb494a3be8
[serve] Refactor router policies to remove inheritance (#8372) 2020-05-12 12:34:01 -05:00
mehrdadn
ac1ed293e3
Patch redis-py bug for Windows (#8386) 2020-05-12 10:41:45 -05:00
Edward Oakes
b84fe56bed
Split test_basic to avoid timeouts in CI (#8405) 2020-05-12 10:18:21 -05:00
Eric Liang
9d012626e5
[rllib] Distributed exec workflow for impala (#8321) 2020-05-11 20:24:43 -07:00
Simon Mo
501b936114
[Serve] Improve error message when result is not a list (#8378) 2020-05-10 17:18:06 -07:00
Stephanie Wang
3a25f5f5b4
Clean up actor state from the GCS (#8261)
* parametrize test

* Regression test and logging

* Test no restart after actor deletion

* Unit tests

* Refactor to subscribe to and lookup from worker failure table

* Refactor ActorManager to remove dependencies

* Revert "Regression test and logging"

This reverts commit 835e1a9091b51ca8efb00392d4cc4a665145de24.

* Revert "parametrize test"

This reverts commit f31272082831ba1a494816dd5511d87b24eca4c9.

* Revert "Test no restart after actor deletion"

This reverts commit 114a83de14329aa6ab787c80cd5757cf074a9072.

* doc

* merge

* Revert "Refactor to subscribe to and lookup from worker failure table"

This reverts commit 6aa13a05178d0b9aa1db9dee5c978c911b74fa3a.

* Revert "Revert "Test no restart after actor deletion""

This reverts commit 1bd92d09172aa8ab42632551cf9c56463f9598fe.

* Revert "Revert "parametrize test""

This reverts commit 639ba4d3b02167fb2b05e9878f9aa600bcec95b3.

* Revert "Revert "Regression test and logging""

This reverts commit f18b5f0db699a23cbccde32789e3639425e99ca4.

* Clean up actors that have gone out of scope

* Use actor ID instead of shared_ptr

* Clean up actors owned by dead workers

* Use actor ID instead of shared_ptr

* TODO and lint

* Fix unit tests

* Add unit tests for supervision and docs

* xx

* Fix tests

* Fix tests

* fix build
2020-05-09 18:43:49 -07:00
Thomas Lecat
4421f3a000
[tune] Close loggers after updating trial (#8307) (#8366) 2020-05-09 13:26:59 -07:00
Edward Oakes
2677b71003
Implement named actors using the GCS service (#8328) 2020-05-09 08:58:10 -05:00
Eric Liang
1126fe4d23
[tune] Add UUID back to trial names (#8377) 2020-05-08 20:20:36 -07:00
Eric Liang
9f04a65922
[rllib] Add PPO+DQN two trainer multiagent workflow example (#8334) 2020-05-07 23:40:29 -07:00
Eric Liang
413db0902d
Trigger global GC when resources may be occupied by deleted actors 2020-05-07 14:57:21 -07:00
Edward Oakes
f2f118df9e
[serve] Clear serve cluster state between tests. (#8357) 2020-05-07 16:45:20 -05:00
Philipp Moritz
325aec81bd
Hide aliased autoscaler commands (#8348) 2020-05-07 10:17:59 -07:00
Simon Mo
c5a5a5de89
[Serve] Refactor Metric System: Counter + Measure Support (#8114) 2020-05-06 17:44:02 -07:00
Eric Liang
1f312debbe
Document all ray commands. (#8340) 2020-05-06 16:49:37 -07:00
SangBin Cho
e631827a9f
[Core] Show_webui segfault fix. (#8323) 2020-05-06 11:45:07 -05:00
Alex Wu
04813c2ef5
[Parallel Iterator] Foreach concur (#8140) 2020-05-06 10:00:01 -05:00
Thomas Desrosiers
ec9357b486
[autoscaler] Fix filesystem permission race conditions (#8327) 2020-05-05 17:22:03 -07:00
mehrdadn
4bdef78e2e
Various CI fixes and cleanup (#8289) 2020-05-05 10:47:49 -07:00
fangfengbin
97430b2d0f
GCS adapts to node table pub sub (#8209) 2020-05-05 18:34:41 +08:00
Eric Liang
ee0eb44a32
Rename async_queue_depth -> num_async (#8207)
* rename

* lint
2020-05-05 01:38:10 -07:00
Simon Mo
1480bf4295
[Serve] Improve batch size inconsistency error (#8315) 2020-05-04 20:32:12 -07:00
Simon Mo
ca929671b6
[Serve] Simplify Validation (#8316) 2020-05-04 20:31:23 -07:00
ijrsvt
cc7bd6650a
[core] Enabling Remote Task Cancelation (#8225) 2020-05-04 15:24:22 -07:00
Eric Liang
1228369a87
Remove "This tab is experimental" (#8281) 2020-05-02 22:41:28 -07:00
Simon Mo
ec6631ae58
Pin redis-py version (#8290) 2020-05-02 22:09:02 -07:00
SangBin Cho
0f54d5ab65
Async actor microbenchmark Script (#8275) 2020-05-02 21:51:00 -07:00
Richard Liaw
40dfb337bf
[tune] Hotfix Ax breakage when fixing backwards-compat (#8285) 2020-05-02 20:42:50 -07:00
Xianyang Liu
eda526c154
[SGD] Support multiple input model (#8246) 2020-05-02 16:49:09 -07:00
Maksim Smolin
c2acb7ffe2
[SGD] Add imagenet example CI (#8150) 2020-05-02 16:48:35 -07:00
Edward Oakes
518ef4c0b3
[serve] Increase timeout waiting for HTTP server (#8286) 2020-05-02 16:55:13 -05:00
Edward Oakes
8d3236f1d0
Lower test_utils.wait_for_condition default timeout to 30s (#8283) 2020-05-02 10:19:00 -05:00
Edward Oakes
d4e64709ba
Shorten test_joblib (#8273) 2020-05-01 17:11:32 -05:00
Edward Oakes
13f718846d
[serve] Always use internal KV store (#8270) 2020-05-01 14:18:18 -05:00
Richard Liaw
07daff8794
[tune] Avoid breakage - soft deprecation warning for search algs (#8258) 2020-05-01 10:36:43 -07:00
Edward Oakes
3aec683f61
Avoid fate sharing with owner for detached actors (#8267) 2020-05-01 11:58:47 -05:00
Edward Oakes
63bc7dc522
service -> endpoint in router (#8269) 2020-05-01 11:55:34 -05:00
Edward Oakes
421b3c9d8b
Fix serve long running test (#8268) 2020-05-01 11:54:27 -05:00