Commit graph

1211 commits

Author SHA1 Message Date
Philipp Moritz
95254b3d71 Remove the old web UI (#4301) 2019-03-07 23:15:11 -08:00
Yuhong Guo
b9ea821d16
Use strongly typed IDs in C++. (#4185)
*  Use strongly typed IDs for C++.

* Avoid heap allocation in cython.

* Fix JNI part

* Fix rebase conflict

* Refine

* Remove type check from __init__

* Remove unused constructor declarations.
2019-03-07 21:43:01 +08:00
Eric Liang
b0332551dd
[rllib] Fix APPO + continuous spaces, feed prev_rew/act to A3C properly (#4286) 2019-03-06 21:36:26 -08:00
Hao Chen
f0465bc68c
[Java] Refine tests and fix single-process mode (#4265) 2019-03-07 09:59:13 +08:00
Philipp Moritz
39eed24d47 update version from 0.7.0.dev0 to 0.7.0.dev1 (#4282) 2019-03-06 14:43:09 -08:00
Eric Liang
2781d74680
[rllib] Reserve CPUs for replay actors in apex (#4217) 2019-03-06 10:22:12 -08:00
Eric Liang
6d705036f3
[rllib] Add callback accessor for raw observation, fix prev actions (#4212) 2019-03-06 10:21:05 -08:00
Eric Liang
0e77a8f8c0
[rllib] Add end-to-end tests for RNN sequencing (#4258) 2019-03-06 09:55:07 -08:00
Philipp Moritz
ff5e3384ce Update version to 0.7.0.dev1 and update docs 0.6.3 -> 0.6.4 (#4276) 2019-03-05 22:22:29 -08:00
Stephanie Wang
b7ebf17650 Fix test (#4264) 2019-03-05 18:37:00 -08:00
Eric Liang
78ad9c4cbb Add "ray timeline" command to auto-dump Chrome trace for the current Ray instance (#4239) 2019-03-05 16:28:00 -08:00
Richard Liaw
a5441a3381
[tune] Fix testTrialNoSave (#4262)
Left a `last_result == None` after changing last_result to always be a
dict.



Fixes https://github.com/ray-project/ray/issues/4259.
2019-03-05 09:28:33 -08:00
Robert Nishihara
fa8c07dd19 Sleep for half a second at exit in order to avoid losing log messages… (#4254) 2019-03-04 20:39:09 -08:00
Eric Liang
30bf8e46c7
[rllib] Use nested scope in custom loss example 2019-03-04 18:29:22 -08:00
Kristian Hartikainen
df9beb7123 [tune] Fix trial result fetching (#4219)
* Fix trial results wait in RayTrialExecutor.get_next_available_trial

* Add comment for the results shuffling

* Remove timeout from the wait

* Change random.sample to random.shuffle
2019-03-04 14:26:10 -08:00
Eric Liang
6e3384a719
[rllib] Add three new long-running stress tests {APEX, IMPALA, PBT} (#4215) 2019-03-04 14:05:42 -08:00
Stephanie Wang
8b871af555
Fix ray.wait bug for tasks on remote nodes and timeout=0 (#4242)
* Regression test

* Fix

* cleaner code
2019-03-04 11:46:06 -08:00
Hao Chen
a22d6ef955
Fix RemoteFunction._last_export_session (#4243) 2019-03-04 19:57:42 +08:00
Yuhong Guo
5866fd7005
Add type check in free and change Exception to TypeError (#4221) 2019-03-04 16:40:04 +08:00
Philipp Moritz
e96e06e031 bump version to 0.6.4 (#4226) 2019-03-03 14:39:05 -08:00
Adi Zimmerman
9551f2a92e [tune] Properly handle closing files in Trainable (#4232)
Fixes #3965.

Using the with keyword/block will close to file immediately after the block ends
2019-03-03 14:23:05 -08:00
Richard Liaw
3483282254
[tune] Local Mode support (#4138) 2019-03-03 14:05:59 -08:00
Peiren Yang
e2e6ef198b [autoscaler] Make commands bash -i to support newer bash (#4181)
The generated command in autoscaler/updater.py throws non-zero exit status 127 on Ubuntu 18.04.

## Related issue number
Closes #4155, Closes #1444.
2019-03-03 13:46:07 -08:00
Richard Liaw
fb1369d96f
[tune] Dynamic Resources for Trials (#3974)
## What do these changes do?

Provides a small helper function for modifying the resource requirements of a trial.

Also implements the following:
 - setting the last_result to be {} instead of None
 - Adding a shuffle to the BasicVariantGenerator
2019-03-03 11:38:36 -08:00
Eric Liang
ba03048254
[rllib] TF model custom_loss() should actually allow access to full rollout data (#4220) 2019-03-02 22:57:51 -08:00
Eric Liang
ff6dd8459a
[autoscaler] Timeout ssh master connection after 5 minutes 2019-03-02 22:57:22 -08:00
Richard Liaw
a27cb225b6
Modularize Tune tests from multi-node tests (#4204) 2019-03-02 19:21:08 -08:00
Robert Nishihara
4b89eebfc7 Move test folders under rllib/tune from test -> tests. (#4214) 2019-03-02 13:37:16 -08:00
Yuhong Guo
6f46edca51 Skip dead nodes to avoid connection timeout. (#4154) 2019-03-02 13:11:19 -08:00
Eric Liang
9950f63e8c Send task error instead of raw exception for signal (#4150) 2019-03-01 23:59:29 -08:00
Robert Nishihara
f21e6a2cff Update documentation regarding UI and timeline. (#4189) 2019-03-01 19:54:33 -08:00
bjg2
962b17f567 [wingman -> rllib] IMPALA MultiDiscrete changes (#3967) 2019-03-01 19:47:06 -08:00
Antoine Galataud
8288deb92d Add multi agent support in rollout.py (#4114) 2019-03-01 19:45:39 -08:00
Hao Chen
48f6cd3e5d Release GIL in prepare_actor_checkpoint (#4208) 2019-03-01 19:43:28 -08:00
Hao Chen
6f1a29ad3f Consodiate CI Python tests and fix bug about multiple ray.init (#4195) 2019-03-01 14:38:28 -08:00
bjg2
9c48cc27aa [wingman -> rllib] Removed remote evaluators assert (#4165) 2019-03-01 13:27:27 -08:00
Eric Liang
b5799b5286
[rllib] Set PPO observation filter to NoFilter by default (#4191) 2019-03-01 13:19:33 -08:00
adoda
11a28834fa [tune] Reduce the times for flushing json object to file (#4198)
<!--
Thank you for your contribution!

Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request.
-->

## What do these changes do?

<!-- Please give a short brief about these changes. -->
When we write one result using JsonLogger, it will call 'flush' many times, which may cost a lot of time when writing  to a remote distributed filesystem.

## Related issue number
#4197 
<!-- Are there any issues opened that will be resolved by merging this change? -->
2019-03-01 02:15:48 -08:00
Hao Chen
14ff402d70
Make ray stop command also kill Java workers (#4179) 2019-03-01 11:05:19 +08:00
Robert Nishihara
d9bcaa20b5 Turn UI off by default. (#4188) 2019-02-28 17:29:52 -08:00
Richard Liaw
c695402dc3
[tune] Introduce ability to turn off default logging. (#4104) 2019-02-28 17:02:41 -08:00
Eric Liang
b809ef0107
[rllib] Silent tests (#4151) 2019-02-28 16:32:22 -08:00
Ion
88e14feb53 Reset signal counters when a task finishes (#4173) 2019-02-28 15:15:03 -08:00
Robert Nishihara
9c5fdbb63c Release gil when doing ray.wait. (#4190) 2019-02-28 00:32:07 -08:00
Robert Nishihara
387c98cf01 Make sure dashboard is packaged with wheels. (#4175) 2019-02-27 18:36:49 -08:00
Ion
7395c86a50 A few fixes in receive() signal. (#4142) 2019-02-27 18:00:59 -08:00
Philipp Moritz
9ca9691cdc Fix mnist sgd jenkins tests on master (#4168) 2019-02-27 16:02:18 -08:00
Robert Nishihara
75504b9586 Add script for running infinitely long stress tests. (#4163)
Running `./ci/long_running_tests/start_workloads.sh` will start several workloads running (each in their own EC2 instance).
- The workloads run forever.
- The workloads all simulate multiple nodes but use a single machine.
- You can get the tail of each workload by running `./ci/long_running_tests/check_workloads.sh`.
- You have to manually shut down the instances.

As discussed with @ericl @richardliaw, the idea here is to optimize for the debuggability of the tests. If one of them fails, you can ssh to the relevant instance and see all of the logs.
2019-02-27 14:33:06 -08:00
Yuhong Guo
41b81af11b Downgrade six to 1.0.0 (#4180) 2019-02-27 13:05:25 -08:00
Yuhong Guo
0a11b27971
Fix the case of use decorator directly to raw class and add test case (#4177) 2019-02-28 00:09:42 +08:00