Commit graph

2625 commits

Author SHA1 Message Date
Edward Oakes
137519e19d
[serve] Remove start_server flag (#8620) 2020-05-26 14:34:18 -05:00
Amog Kamsetty
ae2e1f0883
[Parallel Iterators] Batching + Pipelining optimizations (#7931)
* batching + get_shard pipelining

* duplicate fix

* formatting

* adding performance benchmark

* minor changes

* turn batching off by default
2020-05-26 00:37:57 -07:00
fyrestone
f39760a4d3
Use uuid4() for actor creation function id hash (#8589) 2020-05-26 15:20:03 +08:00
fangfengbin
765d470c40
Add gcs object manager (#8298) 2020-05-25 17:21:35 +08:00
Edward Oakes
860eb6f13a
Update named actor API (#8559) 2020-05-24 20:08:03 -05:00
Tao Wang
92c2e41dfd
[GCS]profile info getting implementation based gcs service (#8536) 2020-05-24 22:23:01 +08:00
Luca Cappelletti
822de1b7f7
[Tune] Introduced preliminary random search to BayesOpt (#8541) 2020-05-23 12:20:43 -07:00
Kai Yang
2e5e789294
Allow enabling logging in core worker with empty log_dir (#8529) 2020-05-22 18:02:37 +08:00
Siyuan (Ryans) Zhuang
83a819572b
Update the pickle5 revision to match the upstream candidate (#8493) 2020-05-21 18:21:37 -07:00
SangBin Cho
aa1cbe8abc
[Dashboard] Ray memory dashboard backend (#8461) 2020-05-21 12:22:28 -07:00
Eric Liang
9a83908c46
[rllib] Deprecate policy optimizers (#8345) 2020-05-21 10:16:18 -07:00
Hao Chen
d27e6da1b2
Fix a lint issue (#8530) 2020-05-21 16:12:44 +08:00
Sven Mika
3a234ed9e3
[RLlib] Error: "Unknown trainable [some rllib algo name]" (#8525) 2020-05-21 08:59:32 +02:00
fangfengbin
e261b4778e
Adjust the state initialization sequence and put it after core worker google logging initialization (#8511) 2020-05-21 11:30:28 +08:00
Simon Mo
ed2f434593
[Serve] Start Replicas in Parallel (#8433) 2020-05-20 19:46:03 -07:00
Edward Oakes
a76434ccde
Add ability to specify worker and driver ports (#8071) 2020-05-20 15:31:13 -05:00
mehrdadn
ebf060d484
Make more tests run on Windows (#8446)
* Remove worker Wait() call due to SIGCHLD being ignored

* Port _pid_alive to Windows

* Show PID as well as TID in glog

* Update TensorFlow version for Python 3.8 on Windows

* Handle missing Pillow on Windows

* Work around dm-tree PermissionError on Windows

* Fix some lint errors on Windows with Python 3.8

* Simplify torch requirements

* Quiet git clean

* Handle finalizer issues

* Exit with the signal number

* Get rid of wget

* Fix some Windows compatibility issues with tests

Co-authored-by: Mehrdad <noreply@github.com>
2020-05-20 12:25:04 -07:00
Eric Liang
aa7a58e92f
[rllib] Support training intensity for dqn / apex (#8396) 2020-05-20 11:22:30 -07:00
Luca Cappelletti
c9898eff24
[Tune] Added method to integrate previous analysis in BO (#8486) 2020-05-19 23:26:43 -07:00
Edward Oakes
85cb721f19
[serve] Fix worker replica leak (#8506) 2020-05-19 20:51:50 -05:00
Ian Rodney
1163ddbe45
Remove timeouts in test_cancel (#8272) 2020-05-19 12:35:16 -05:00
internetcoffeephone
a73c488c74
Change tf_utils.py get_weights to evaluate all tensors at once rather than calling tensor.eval per-tensor. (#8491) 2020-05-18 22:06:03 -07:00
Luca Cappelletti
5b330de182
[Tune] Introduced patience to early stopping (#8484) 2020-05-18 13:12:16 -07:00
Luca Cappelletti
d1ef70da16
[Tune] Added default values for utility kwargs (#8488) 2020-05-18 13:10:43 -07:00
Robert Nishihara
14aeb30473
[Serve] Require traffic weights to sum more closely to 1. (#8476) 2020-05-18 11:46:34 -07:00
Max Fitton
0fadc11437
[dashboard] Only show workers from the correct cluster (#8434) 2020-05-18 13:30:41 -05:00
Max Fitton
13231ba63b
Rename redis-port to port and add default (#8406) 2020-05-18 13:25:34 -05:00
Robert Nishihara
2cff471d2c
Don't print Redis connection warning in ray.init(). (#8475) 2020-05-18 11:19:13 -07:00
fangfengbin
9347a5d10c
Add global state accessor of jobs (#8401) 2020-05-18 20:32:05 +08:00
Richard Liaw
87cbf2aedd
[docs][tune] Make search algorithm, scheduler docs better! (#8179) 2020-05-17 12:19:44 -07:00
Luca Cappelletti
2ff26f13d2
[tune] Added EarlyStopping and relative test suite (#8459) 2020-05-17 12:18:59 -07:00
Joseph Lucas
42c9fa19d1
[autoscaler] Ray Up url-arg (#8279) 2020-05-17 12:18:00 -07:00
Edward Oakes
16f48078d9
Remove use of ObjectID transport flag (#7699) 2020-05-17 11:29:49 -05:00
Edward Oakes
fb23bd6fc0
[serve] Optionally namespace serve clusters (#8447) 2020-05-17 00:14:42 -05:00
Richard Liaw
67c01455fe
[tune] tune.track -> tune.report (#8388) 2020-05-16 12:55:08 -07:00
Stephanie Wang
bd169749e0
Option to retry failed actor tasks (#8330)
* Python

* Consolidate state in the direct actor transport, set the caller starts at

* todo

* Remove unused

* Update and unit tests

* Doc

* Remove unused

* doc

* Remove debug

* Update src/ray/core_worker/transport/direct_actor_transport.h

Co-authored-by: Eric Liang <ekhliang@gmail.com>

* Update src/ray/core_worker/transport/direct_actor_transport.cc

Co-authored-by: Eric Liang <ekhliang@gmail.com>

* lint and fix build

* Update

* Fix build

* Fix tests

* Unit test for max_task_retries=0

* Fix java?

* Fix bad test

* Cross language fix

* fix java

Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-05-15 20:15:15 -07:00
Edward Oakes
ef498e8aa5
[serve] Add basic session affinity via shard key (#8449) 2020-05-15 16:18:52 -05:00
Sven Mika
c9435cad43
WIP. (#8456)
Fix multi-GPU histogram metrics for > 0D tensors.
2020-05-15 21:43:27 +02:00
krfricke
4633d81c39
[tune] added average scope to experiment analysis (#8445) 2020-05-14 15:20:43 -07:00
Edward Oakes
ef20564d8e
[serve] Pin http proxy to the node that serve.init() is run on (#8436) 2020-05-14 16:38:29 -05:00
Max Fitton
00325eb2b2
Rename max_reconstructions to max_restarts and use -1 for infinite (#8274)
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
2020-05-14 10:30:29 -05:00
Simon Mo
122353c392
[Serve] Fix SKLearn example against newest version (#8428) 2020-05-13 14:09:57 -07:00
Edward Oakes
bb494a3be8
[serve] Refactor router policies to remove inheritance (#8372) 2020-05-12 12:34:01 -05:00
mehrdadn
ac1ed293e3
Patch redis-py bug for Windows (#8386) 2020-05-12 10:41:45 -05:00
Edward Oakes
b84fe56bed
Split test_basic to avoid timeouts in CI (#8405) 2020-05-12 10:18:21 -05:00
Eric Liang
9d012626e5
[rllib] Distributed exec workflow for impala (#8321) 2020-05-11 20:24:43 -07:00
Simon Mo
501b936114
[Serve] Improve error message when result is not a list (#8378) 2020-05-10 17:18:06 -07:00
Stephanie Wang
3a25f5f5b4
Clean up actor state from the GCS (#8261)
* parametrize test

* Regression test and logging

* Test no restart after actor deletion

* Unit tests

* Refactor to subscribe to and lookup from worker failure table

* Refactor ActorManager to remove dependencies

* Revert "Regression test and logging"

This reverts commit 835e1a9091b51ca8efb00392d4cc4a665145de24.

* Revert "parametrize test"

This reverts commit f31272082831ba1a494816dd5511d87b24eca4c9.

* Revert "Test no restart after actor deletion"

This reverts commit 114a83de14329aa6ab787c80cd5757cf074a9072.

* doc

* merge

* Revert "Refactor to subscribe to and lookup from worker failure table"

This reverts commit 6aa13a05178d0b9aa1db9dee5c978c911b74fa3a.

* Revert "Revert "Test no restart after actor deletion""

This reverts commit 1bd92d09172aa8ab42632551cf9c56463f9598fe.

* Revert "Revert "parametrize test""

This reverts commit 639ba4d3b02167fb2b05e9878f9aa600bcec95b3.

* Revert "Revert "Regression test and logging""

This reverts commit f18b5f0db699a23cbccde32789e3639425e99ca4.

* Clean up actors that have gone out of scope

* Use actor ID instead of shared_ptr

* Clean up actors owned by dead workers

* Use actor ID instead of shared_ptr

* TODO and lint

* Fix unit tests

* Add unit tests for supervision and docs

* xx

* Fix tests

* Fix tests

* fix build
2020-05-09 18:43:49 -07:00
Thomas Lecat
4421f3a000
[tune] Close loggers after updating trial (#8307) (#8366) 2020-05-09 13:26:59 -07:00
Edward Oakes
2677b71003
Implement named actors using the GCS service (#8328) 2020-05-09 08:58:10 -05:00