Eric Liang
07d8cbf414
[rllib] Support batch norm layers ( #3369 )
...
* batch norm
* lint
* fix dqn/ddpg update ops
* bn model
* Update tf_policy_graph.py
* Update multi_gpu_impl.py
* Apply suggestions from code review
Co-Authored-By: ericl <ekhliang@gmail.com>
2018-11-29 13:33:39 -08:00
Devin Petersohn
4d2010a852
Ship Modin with Ray. ( #3109 )
2018-11-29 20:05:24 +01:00
Chunyang Wen
fd7e494344
Remove: duplicate feed_dict constructing ( #3431 )
2018-11-29 10:21:46 -08:00
Kristian Hartikainen
7e319dbf0c
Automatically indent tune logger params ( #3399 )
2018-11-29 00:15:50 -08:00
Eric Liang
c46ea2ff4b
Click 0.7 changes the naming convention for commands; fix this
2018-11-28 14:59:58 -08:00
Robert Nishihara
82863b5251
[autoscaler] Update autoscaler to use heartbeat batches. ( #3409 )
2018-11-27 23:46:27 -08:00
Eric Liang
f0df97db6f
[rllib] example and docs on how to use parametric actions with DQN / PG algorithms ( #3384 )
2018-11-27 23:35:19 -08:00
Eric Liang
0d56fc10cc
Move setproctitle to ray[debug] package ( #3415 )
2018-11-27 09:50:59 -08:00
Eric Liang
e3c088fa1e
[rllib] PPO doesn't work with fractional num gpus ( #3396 )
...
* frac ppo
* gpu test
2018-11-27 01:14:10 -08:00
Eric Liang
aa94d3dd50
[autoscaler] Allow more than 5s from node creation to first heartbeat ( #3385 )
2018-11-26 17:25:05 -08:00
Robert Nishihara
0f0099fb90
UI changes, fix the task timeline and add the object transfer timeline to UI. ( #3397 )
...
* Saving
* Fix cmake and remove object/task search boxes.
* Add comment
2018-11-25 10:16:49 -08:00
Eric Liang
b85e7b43f3
[rllib] Refactor the sampler ( #3387 )
...
* refactor
* fix test
* add perf test
* Update sampler.py
2018-11-24 18:16:54 -08:00
Robert Nishihara
3856533065
Fix incompatibility with most recent version of Redis. ( #3379 )
...
* Fix incompatibility with most recent version of Redis.
* Fix
* Fixes.
2018-11-24 16:36:38 -08:00
Eric Liang
18a8dbfcfb
[rllib] Clip DDPG ou-noise to avoid exceeding action bounds ( #3386 )
...
Closes #2965
2018-11-24 00:56:50 -08:00
Eric Liang
55fca828ce
[rllib] Fix use_lstm option when using custom model with dict space ( #3368 )
...
## What do these changes do?
This passes in the right obs space to the lstm model wrapper, so that it doesn't attempt to un-flatten the already processed dict observation.
## Related issue number
Closes https://github.com/ray-project/ray/issues/3367
2018-11-23 22:51:08 -08:00
Eric Liang
8b76bab25c
[rllib] docs for td3 ( #3381 )
...
* td3 doc
* Update rllib-env.rst
2018-11-22 13:36:47 -08:00
Eric Liang
41b6b50d09
fix py3 ( #3382 )
2018-11-22 11:43:52 -08:00
GiliR4t1qbit
b9ae5edf74
When getting a role/profile, catch only exception that indicates the role/profile already exists, allow others to be raised ( #3383 )
2018-11-22 09:42:58 -08:00
Jones Wong
24bfe8ab76
Enable Twin Delayed DDPG for RLlib DDPG agent ( #3353 )
2018-11-21 20:03:20 -08:00
Richard Liaw
784a6399b0
[tune] Node Fault Tolerance ( #3238 )
...
This PR introduces single-node fault tolerance for Tune.
## Previous behavior:
- Actors will be restarted without checking if resources are available. This can lead to problems if we lose resources.
## New behavior:
- RUNNING trials will be resumed on another node on a best effort basis (meaning they will run if resources available).
- If the cluster is saturated, RUNNING trials on that failed node will become PENDING and queued.
- During recovery, TrialSchedulers and SearchAlgorithms should receive notification of this (via `trial_runner.stop_trial`) so that they don’t wait/block for a trial that isn’t running.
Remaining questions:
- Should `last_result` be consistent during restore?
Yes; but not for earlier trials (trials that are yet to be checkpointed).
- Waiting for some PRs to merge first (#3239 )
Closes #2851 .
2018-11-21 12:38:16 -08:00
Richard Liaw
c24d87b4d1
[autoscaler] Submit command ( #3312 )
2018-11-20 14:03:34 -08:00
Eric Liang
abdc3b592e
[rllib] Update multi-gpu impala numbers ( #3327 )
2018-11-19 20:55:27 -08:00
Eric Liang
5972c29d28
[rllib] Set ape-x local exploration to 0, also load explorations before training steps ( #3349 )
...
## What do these changes do?
This should fix high explorations being used after restore / for rollouts.
## Related issue number
(dev list issue)
2018-11-19 20:36:25 -08:00
Eric Liang
afc48d7b77
Don't setpgid() on actors ( #3347 )
2018-11-19 17:35:26 -08:00
Eric Liang
e4bb5d8d16
Fix logging when ray cluster utils is used
2018-11-18 21:49:27 -08:00
Wenting Shen
ab1e0f5c2f
support home path and relative path for temp-dir ( #3329 )
2018-11-16 17:41:10 -08:00
Eric Liang
e0bf9d7305
Add debug string to raylet ( #3317 )
...
* initial debug string
* format
* wip debug string
* fix compile
* fix
* update
* finished
* to file
* logs dir
* use temp root
* fix
* override
2018-11-15 21:47:50 -08:00
Robert Nishihara
d10cb570ab
Rename _submit -> _remote. ( #3321 )
2018-11-15 15:30:18 -08:00
Eric Liang
5723291db6
Raise exception if the node is nearly out of memory ( #3323 )
...
* wip
* add
* comment
* escape hatch
* update
* object store too
* .2
2018-11-15 12:55:25 -08:00
Lewis Belcher
5319fd044c
Update redis version in setup.py ( #3333 )
...
* `redis` has released a new version (https://github.com/andymccurdy/redis-py/releases/tag/3.0.0 )
* `ray` is not compatible with this version
* This PR adds the "compatible release" operator for `redis` version 2.10.6.
2018-11-15 10:40:08 -08:00
Eric Liang
706dc1d473
[rllib] Add test for multi-agent support and fix IMPALA multi-agent ( #3289 )
...
IMPALA support for multiagent was broken since IMPALA has a requirement that batch sizes be of a certain length. However multi-agent envs can create variable-length batches.
Fix this by adding zero-padding as needed (similar to the RNN case).
2018-11-14 14:14:07 -08:00
andrewztan
57c7b4238e
KL Divergence Metrics ( #3300 )
...
* added KL divergence metrics
* fix
2018-11-13 23:12:35 -08:00
Eric Liang
1660c9d627
Kill actor child processes on shutdown ( #3297 )
...
* example
* add env
* test pg
* change to test
* add atexit test
* Update rllib-env.rst
* comment
* revert unnecessary file
* fix title when actor is idle
* Update python/ray/actor.py
Co-Authored-By: ericl <ekhliang@gmail.com>
2018-11-13 19:16:42 -08:00
Eric Liang
65c27c70cf
[rllib] Clean up agent resource configurations ( #3296 )
...
Closes #3284
2018-11-13 18:00:03 -08:00
Philipp Moritz
d4fad222e1
Update profiling instructions for raylet ( #3311 )
2018-11-13 17:48:33 -05:00
Richard Liaw
97f423781b
Clean up Ray processes after cluster util exits ( #3278 )
2018-11-13 13:18:12 -08:00
Richard Liaw
c3a2c7ebed
[tune] Doc: Autofilled, StatusReporter ( #3294 )
...
* autofill and revise doc page for things
* lint
* comments
2018-11-13 13:15:56 -08:00
Eric Liang
6ee7a3b571
[rllib] Raise worker TF intra_op threads to 2, lower driver intra_op threads to 8 ( #3299 )
2018-11-13 11:41:58 -08:00
Richard Liaw
c0423db05c
[core] Add Global State Test for multi-node setting ( #3239 )
...
* add test for adding node
* multinode test fixes
* First pass at allowing updatable values
* Fix compilation issues
* Add config file parsing
* Full initialization
* Wrote a good test
* configuration parsing and stuff
* docs
* write some tests, make it good
* fixed init
* Add all config options and bring back stress tests.
* Update python/ray/worker.py
* Update python/ray/worker.py
* Fix internalization
* some last changes
* Linting and Java fix
* add docstring
* Fix test, add assertions
* pytest ext
* lint
* lint
2018-11-13 10:35:24 -08:00
Eric Liang
d90f365394
[rllib] Add self-supervised loss to model ( #3291 )
...
# What do these changes do?
Allow self-supervised losses to be easily defined in custom models. Add this to the reference policy graphs.
2018-11-12 18:55:24 -08:00
Eric Liang
bd0dbde149
[rllib] Rename ServingEnv => ExternalEnv ( #3302 )
2018-11-12 16:31:27 -08:00
Richard Liaw
e37891d79d
[tune] Fix default handling for timesteps ( #3293 )
...
This PR fixes an issue where previously if timesteps_this_iter = 0,
then it would render as "None".
Closes #3057 .
2018-11-12 15:52:17 -08:00
Eric Liang
49e2085d78
[rllib] Don't reset envs when possible ( #3290 )
...
* laz
* better errors
2018-11-11 01:45:37 -08:00
Eric Liang
463511f8a6
[tune] Track and warn on low memory ( #3298 )
2018-11-11 00:29:45 -08:00
Eric Liang
53489d2f85
[sgd] Document and add simple MNIST example ( #3236 )
2018-11-10 21:52:20 -08:00
Richard Liaw
29c182d449
[tune] Support "None" for upload_dir
2018-11-09 22:02:08 -08:00
Eric Liang
a51d618d88
[autoscaler] missing example-full.yaml file in the latest wheel for provider type "local"
2018-11-09 21:25:15 -08:00
Eric Liang
9dd3eedbac
[rllib] rollout.py should reduce num workers ( #3263 )
...
## What do these changes do?
Don't create an excessive amount of workers for rollout.py, and also fix up the env wrapping to be consistent with the internal agent wrapper.
## Related issue number
Closes #3260 .
2018-11-09 12:29:16 -08:00
Richard Liaw
22113be04c
[tune] Annotated Example Page and showcase Tutorials ( #3267 )
...
Adds an example page and link in codebase.
Closes #2728 .
2018-11-08 23:45:05 -08:00
Eric Liang
588705b6fa
[autoscaler] Add option to allow private ips only ( #3270 )
...
* merge
* update
* upd
* Update python/ray/autoscaler/autoscaler.py
Co-Authored-By: ericl <ekhliang@gmail.com>
* Update python/ray/autoscaler/autoscaler.py
Co-Authored-By: ericl <ekhliang@gmail.com>
* Update python/ray/autoscaler/aws/config.py
Co-Authored-By: ericl <ekhliang@gmail.com>
* fix
2018-11-08 17:07:31 -08:00