Commit graph

1281 commits

Author SHA1 Message Date
Leon Sievers
6b93ec3034 Fixed calculation of num_steps_trained for multi_gpu_optimizer (#4364) 2019-03-14 19:46:02 -07:00
Eric Liang
2c1131e8b2
[tune] Add warnings if tune event loop gets clogged (#4353)
* add guards

* comemnts
2019-03-14 19:44:01 -07:00
Yuhong Guo
becffc6cef
Fix checkpoint crash for actor creation task. (#4327)
* Fix checkpoint crash for actor creation task.

* Lint

* Move test to test_actor.py

* Revert unused code in test_failure.py

* Refine test according to Raul's suggestion.
2019-03-14 23:42:57 +08:00
Philipp Moritz
2f37cd7e27 fix wheel building doc (#4360) 2019-03-13 23:11:30 -07:00
Philipp Moritz
b0c4e60ffb Build wheels for Linux with Bazel (#4281) 2019-03-13 15:57:33 -07:00
Ameer Haj Ali
8a6403c26e [rllib] bug fix: merging --config params with params.pkl (#4336) 2019-03-13 11:26:55 -07:00
Andrew Tan
87bfa1cf82 [tune] add output flag for Tune CLI (#4322) 2019-03-12 23:56:59 -07:00
Eric Liang
d5f4698305
[tune] Avoid scheduler blocking, add reuse_actors optimization (#4218) 2019-03-12 23:49:31 -07:00
Stefan Pantic
2202a81773 Fix multi discrete (#4338)
* Revert "Revert "[wingman -> rllib] IMPALA MultiDiscrete changes (#3967)" (#4332)"

This reverts commit 3c41cb9b60.

* Fix a bug with log rhos for vtrace

* Reformat

* lint
2019-03-12 20:32:11 -07:00
Eric Liang
3c41cb9b60
Revert "[wingman -> rllib] IMPALA MultiDiscrete changes (#3967)" (#4332)
This reverts commit 962b17f567.
2019-03-11 22:51:26 -07:00
Kai Yang
7ff56ce826 Introduce set data structure in GCS (#4199)
* Introduce set data structure in GCS. Change object table to Set instance.

* Fix a logic bug. Update python code.

* lint

* lint again

* Remove CURRENT_VALUE mode

* Remove 'CURRENT_VALUE'

* Add more test cases

* rename has_been_created to subscribed.

* Make `changed` parameter type of `bool *`

* Rename mode to notification_mode

* fix build

* RAY.SET_REMOVE return error if entry doesn't exist

* lint

* Address comments

* lint and fix build
2019-03-11 14:42:58 -07:00
Andrew Tan
c435013b27 [tune] add-note command for Tune CLI (#4321)
Co-Authored-By: andrewztan <andrewztan12@gmail.com>
2019-03-11 14:16:44 -07:00
Stefan Pantic
36cbde651a Add action space to model (#4210) 2019-03-09 19:23:12 -08:00
justinwyang
5adb4a6941 Set _remote() function args and kwargs as optional (#4305) 2019-03-09 16:40:14 -08:00
Richard Liaw
6630a35353
[tune] Initial Commit for Tune CLI (#3983)
This introduces a light CLI for Tune.
2019-03-08 16:46:05 -08:00
Simon Mo
3064fad96b Add ray.experimental.serve Module (#4095) 2019-03-08 16:22:05 -08:00
Eric Liang
c7f74dbdc7
[rllib] Add async remote workers (#4253) 2019-03-08 15:39:48 -08:00
Robert Nishihara
fd2d8c2c06 Remove Jenkins backend tests and add new long running stress test. (#4288) 2019-03-08 15:29:39 -08:00
Richard Liaw
c3a3360a4a
[tune] Add custom field for serializations (#4237) 2019-03-08 11:00:25 -08:00
Kristian Hartikainen
7e4b4822cf [tune] Fix worker recovery by setting force=False when calling logger sync_now (#4302)
## What do these changes do?
Fixes a tune autoscaling problem where worker recovery causes things to stall.
2019-03-08 10:59:31 -08:00
Philipp Moritz
95254b3d71 Remove the old web UI (#4301) 2019-03-07 23:15:11 -08:00
Yuhong Guo
b9ea821d16
Use strongly typed IDs in C++. (#4185)
*  Use strongly typed IDs for C++.

* Avoid heap allocation in cython.

* Fix JNI part

* Fix rebase conflict

* Refine

* Remove type check from __init__

* Remove unused constructor declarations.
2019-03-07 21:43:01 +08:00
Eric Liang
b0332551dd
[rllib] Fix APPO + continuous spaces, feed prev_rew/act to A3C properly (#4286) 2019-03-06 21:36:26 -08:00
Hao Chen
f0465bc68c
[Java] Refine tests and fix single-process mode (#4265) 2019-03-07 09:59:13 +08:00
Philipp Moritz
39eed24d47 update version from 0.7.0.dev0 to 0.7.0.dev1 (#4282) 2019-03-06 14:43:09 -08:00
Eric Liang
2781d74680
[rllib] Reserve CPUs for replay actors in apex (#4217) 2019-03-06 10:22:12 -08:00
Eric Liang
6d705036f3
[rllib] Add callback accessor for raw observation, fix prev actions (#4212) 2019-03-06 10:21:05 -08:00
Eric Liang
0e77a8f8c0
[rllib] Add end-to-end tests for RNN sequencing (#4258) 2019-03-06 09:55:07 -08:00
Philipp Moritz
ff5e3384ce Update version to 0.7.0.dev1 and update docs 0.6.3 -> 0.6.4 (#4276) 2019-03-05 22:22:29 -08:00
Stephanie Wang
b7ebf17650 Fix test (#4264) 2019-03-05 18:37:00 -08:00
Eric Liang
78ad9c4cbb Add "ray timeline" command to auto-dump Chrome trace for the current Ray instance (#4239) 2019-03-05 16:28:00 -08:00
Richard Liaw
a5441a3381
[tune] Fix testTrialNoSave (#4262)
Left a `last_result == None` after changing last_result to always be a
dict.



Fixes https://github.com/ray-project/ray/issues/4259.
2019-03-05 09:28:33 -08:00
Robert Nishihara
fa8c07dd19 Sleep for half a second at exit in order to avoid losing log messages… (#4254) 2019-03-04 20:39:09 -08:00
Eric Liang
30bf8e46c7
[rllib] Use nested scope in custom loss example 2019-03-04 18:29:22 -08:00
Kristian Hartikainen
df9beb7123 [tune] Fix trial result fetching (#4219)
* Fix trial results wait in RayTrialExecutor.get_next_available_trial

* Add comment for the results shuffling

* Remove timeout from the wait

* Change random.sample to random.shuffle
2019-03-04 14:26:10 -08:00
Eric Liang
6e3384a719
[rllib] Add three new long-running stress tests {APEX, IMPALA, PBT} (#4215) 2019-03-04 14:05:42 -08:00
Stephanie Wang
8b871af555
Fix ray.wait bug for tasks on remote nodes and timeout=0 (#4242)
* Regression test

* Fix

* cleaner code
2019-03-04 11:46:06 -08:00
Hao Chen
a22d6ef955
Fix RemoteFunction._last_export_session (#4243) 2019-03-04 19:57:42 +08:00
Yuhong Guo
5866fd7005
Add type check in free and change Exception to TypeError (#4221) 2019-03-04 16:40:04 +08:00
Philipp Moritz
e96e06e031 bump version to 0.6.4 (#4226) 2019-03-03 14:39:05 -08:00
Adi Zimmerman
9551f2a92e [tune] Properly handle closing files in Trainable (#4232)
Fixes #3965.

Using the with keyword/block will close to file immediately after the block ends
2019-03-03 14:23:05 -08:00
Richard Liaw
3483282254
[tune] Local Mode support (#4138) 2019-03-03 14:05:59 -08:00
Peiren Yang
e2e6ef198b [autoscaler] Make commands bash -i to support newer bash (#4181)
The generated command in autoscaler/updater.py throws non-zero exit status 127 on Ubuntu 18.04.

## Related issue number
Closes #4155, Closes #1444.
2019-03-03 13:46:07 -08:00
Richard Liaw
fb1369d96f
[tune] Dynamic Resources for Trials (#3974)
## What do these changes do?

Provides a small helper function for modifying the resource requirements of a trial.

Also implements the following:
 - setting the last_result to be {} instead of None
 - Adding a shuffle to the BasicVariantGenerator
2019-03-03 11:38:36 -08:00
Eric Liang
ba03048254
[rllib] TF model custom_loss() should actually allow access to full rollout data (#4220) 2019-03-02 22:57:51 -08:00
Eric Liang
ff6dd8459a
[autoscaler] Timeout ssh master connection after 5 minutes 2019-03-02 22:57:22 -08:00
Richard Liaw
a27cb225b6
Modularize Tune tests from multi-node tests (#4204) 2019-03-02 19:21:08 -08:00
Robert Nishihara
4b89eebfc7 Move test folders under rllib/tune from test -> tests. (#4214) 2019-03-02 13:37:16 -08:00
Yuhong Guo
6f46edca51 Skip dead nodes to avoid connection timeout. (#4154) 2019-03-02 13:11:19 -08:00
Eric Liang
9950f63e8c Send task error instead of raw exception for signal (#4150) 2019-03-01 23:59:29 -08:00