Commit graph

4686 commits

Author SHA1 Message Date
Amog Kamsetty
ae2e1f0883
[Parallel Iterators] Batching + Pipelining optimizations (#7931)
* batching + get_shard pipelining

* duplicate fix

* formatting

* adding performance benchmark

* minor changes

* turn batching off by default
2020-05-26 00:37:57 -07:00
Kai Yang
26cffb9c7c
Fix shutdown hook in worker mode (#8098) 2020-05-26 15:23:44 +08:00
fyrestone
f39760a4d3
Use uuid4() for actor creation function id hash (#8589) 2020-05-26 15:20:03 +08:00
fangfengbin
c41976938d
Add node table subscribe retry when gcs service restart (#8591) 2020-05-26 14:42:48 +08:00
Tao Wang
7e5b3dc0d9
GCS server task info handler use storage instead of redis accessor (#8584) 2020-05-26 10:38:31 +08:00
Eric Liang
90b05983d6
Lower ASAN build parallelism to avoid OOMs (#8592)
* fix it

* Update .travis.yml
2020-05-25 12:20:01 -07:00
fangfengbin
765d470c40
Add gcs object manager (#8298) 2020-05-25 17:21:35 +08:00
fangfengbin
f22d12d2fc
fix TestGetUncommittedLineage npe bug (#8585) 2020-05-25 15:48:58 +08:00
fangfengbin
229af662c6
Add job table&actor table subscribe retry when gcs service restart (#8442) 2020-05-25 14:38:25 +08:00
Edward Oakes
860eb6f13a
Update named actor API (#8559) 2020-05-24 20:08:03 -05:00
Tao Wang
92c2e41dfd
[GCS]profile info getting implementation based gcs service (#8536) 2020-05-24 22:23:01 +08:00
Luca Cappelletti
822de1b7f7
[Tune] Introduced preliminary random search to BayesOpt (#8541) 2020-05-23 12:20:43 -07:00
Jan Blumenkamp
d6f78f58dc
Fix missing learning rate and entropy coeff schedule for torch PPO (#8572) 2020-05-23 10:54:18 -07:00
fangfengbin
2ab1b773d4
GCS server worker info handler use storage instead of redis accessor (#8543) 2020-05-23 23:17:36 +08:00
Eric Liang
351839bf69
Revert "GCS server task info handler use storage instead of redis accessor (#8531)" (#8562)
This reverts commit 9823e15311.
2020-05-22 19:16:43 -07:00
Kai Yang
2e5e789294
Allow enabling logging in core worker with empty log_dir (#8529) 2020-05-22 18:02:37 +08:00
Sven Mika
8870270164
[RLlib] Add QMIX support for complex obs spaces (Issue 8523). (#8533) 2020-05-22 10:17:51 +02:00
fangfengbin
9823e15311
GCS server task info handler use storage instead of redis accessor (#8531) 2020-05-22 12:04:03 +08:00
Siyuan (Ryans) Zhuang
83a819572b
Update the pickle5 revision to match the upstream candidate (#8493) 2020-05-21 18:21:37 -07:00
Eric Liang
bb8d3c5cd0
ASAN build for ray core tests (#8431) 2020-05-21 15:11:03 -07:00
SangBin Cho
aa1cbe8abc
[Dashboard] Ray memory dashboard backend (#8461) 2020-05-21 12:22:28 -07:00
Eric Liang
9a83908c46
[rllib] Deprecate policy optimizers (#8345) 2020-05-21 10:16:18 -07:00
Hao Chen
d27e6da1b2
Fix a lint issue (#8530) 2020-05-21 16:12:44 +08:00
Sven Mika
3a234ed9e3
[RLlib] Error: "Unknown trainable [some rllib algo name]" (#8525) 2020-05-21 08:59:32 +02:00
fangfengbin
e261b4778e
Adjust the state initialization sequence and put it after core worker google logging initialization (#8511) 2020-05-21 11:30:28 +08:00
Simon Mo
ed2f434593
[Serve] Start Replicas in Parallel (#8433) 2020-05-20 19:46:03 -07:00
Edward Oakes
a76434ccde
Add ability to specify worker and driver ports (#8071) 2020-05-20 15:31:13 -05:00
Sven Mika
d76578700d
[RLlib] Policy.compute_single_action() broken for nested actions (Issue 8411). (#8514) 2020-05-20 22:29:08 +02:00
mehrdadn
ebf060d484
Make more tests run on Windows (#8446)
* Remove worker Wait() call due to SIGCHLD being ignored

* Port _pid_alive to Windows

* Show PID as well as TID in glog

* Update TensorFlow version for Python 3.8 on Windows

* Handle missing Pillow on Windows

* Work around dm-tree PermissionError on Windows

* Fix some lint errors on Windows with Python 3.8

* Simplify torch requirements

* Quiet git clean

* Handle finalizer issues

* Exit with the signal number

* Get rid of wget

* Fix some Windows compatibility issues with tests

Co-authored-by: Mehrdad <noreply@github.com>
2020-05-20 12:25:04 -07:00
Eric Liang
aa7a58e92f
[rllib] Support training intensity for dqn / apex (#8396) 2020-05-20 11:22:30 -07:00
Ian Rodney
f56b3be916
[Docs] Add Cancelation to main docs. (#8508)
* Update walkthrough.rst

* Adding example

* Better example

* Better example

* Adding Ray Kill Info
2020-05-20 10:31:57 -07:00
Lingxuan Zuo
cd706f40c4
[Stats] add nodeaddress tag for stats test (#8423) 2020-05-20 12:30:01 -05:00
Luca Cappelletti
c9898eff24
[Tune] Added method to integrate previous analysis in BO (#8486) 2020-05-19 23:26:43 -07:00
Bill Chambers
f8f7efc24f
[Serve] Rename RayServe -> "Ray Serve" in Documentation (#8504) 2020-05-19 19:13:54 -07:00
Edward Oakes
85cb721f19
[serve] Fix worker replica leak (#8506) 2020-05-19 20:51:50 -05:00
Simon Mo
c9c84c87f4
[Serve] Add Instructions for GPU (#8495) 2020-05-19 18:33:58 -07:00
Ian Rodney
1163ddbe45
Remove timeouts in test_cancel (#8272) 2020-05-19 12:35:16 -05:00
mehrdadn
8da084bc54
Try to address linting issues (#8485) 2020-05-19 10:29:17 -05:00
internetcoffeephone
a73c488c74
Change tf_utils.py get_weights to evaluate all tensors at once rather than calling tensor.eval per-tensor. (#8491) 2020-05-18 22:06:03 -07:00
Hao Chen
6c5ea32857
Fix installing pickle5-backport for Python 3.8.2 (#8453) 2020-05-18 17:03:13 -07:00
Luca Cappelletti
5b330de182
[Tune] Introduced patience to early stopping (#8484) 2020-05-18 13:12:16 -07:00
Luca Cappelletti
d1ef70da16
[Tune] Added default values for utility kwargs (#8488) 2020-05-18 13:10:43 -07:00
Robert Nishihara
14aeb30473
[Serve] Require traffic weights to sum more closely to 1. (#8476) 2020-05-18 11:46:34 -07:00
Max Fitton
0fadc11437
[dashboard] Only show workers from the correct cluster (#8434) 2020-05-18 13:30:41 -05:00
Max Fitton
13231ba63b
Rename redis-port to port and add default (#8406) 2020-05-18 13:25:34 -05:00
Robert Nishihara
2cff471d2c
Don't print Redis connection warning in ray.init(). (#8475) 2020-05-18 11:19:13 -07:00
Richard Liaw
b6c4f45ae0
[tune] Fix links (#8477) 2020-05-18 10:08:29 -07:00
Edward Oakes
9a721ed71a
Link to serve in tune overview (#8487) 2020-05-18 11:29:38 -05:00
Sven Mika
796a834c48
[RLlib] Attention Net integration into ModelV2 and learning RL example. (#8371) 2020-05-18 17:26:40 +02:00
fangfengbin
9347a5d10c
Add global state accessor of jobs (#8401) 2020-05-18 20:32:05 +08:00