Stephanie Wang
3e33f6f71b
Fix failure handling for actor death ( #3359 )
...
* Broadcast actor death, clean up dummy objects
* Reduce logging and clean up state when failing a task
* lint
* Make actor failure test nicer, reduce node timeout
2018-11-21 12:26:22 -08:00
Philipp Moritz
1a926c9b7c
Fix $MACOSX_DEPLOYMENT_TARGET ( #3337 )
2018-11-21 10:56:17 -08:00
Eric Liang
686cf20951
Remove uses of std::list::size ( #3358 )
...
* worker pool and client conn
* Fix linting
* unordered set
* move
2018-11-20 14:47:55 -08:00
Richard Liaw
c24d87b4d1
[autoscaler] Submit command ( #3312 )
2018-11-20 14:03:34 -08:00
Philipp Moritz
d3697ce4e1
Ready queue refactor to make Dispatching tasks more efficient ( #3324 )
...
* put queues outside
* working version, still needs to be optimized
* implement round robin
* proper round robin
* fix spillback
* update
* fix
* cleanup
* more cleanups
* fix
* fix
* add documentation
* explanation for hash combiner
* speed it up
* cleanup and linting
* linting
* comments
* Update scheduling_queue.h
* temp commit
* fixes
* update
* fix
* cleanup
* cleanup
* lint
* more prints
* more prints
* increase sleep
* documentation
* sleep
* fix
* fix
* sleep longer
* update
* fix
* fix
* fix
* Add ordered_set container.
* Fix
* Linting
* Constructors
* Remove O(n) call to list.size().
* fixes
* use ordered set
* Fix.
* Add documentation.
* Add iterators to ordered_set container implementation.
* iterator_type -> iterator
* Make typedefs private
* Add const_iterator
* fix
* fix test
* linting
* lint
* update
* add documentation
* linting
2018-11-20 13:14:12 -08:00
Ujval Misra
b0bfd104f2
Batch heartbeats from node manager together in the monitor. ( #3011 )
2018-11-20 09:52:27 -08:00
Eric Liang
abdc3b592e
[rllib] Update multi-gpu impala numbers ( #3327 )
2018-11-19 20:55:27 -08:00
Eric Liang
5972c29d28
[rllib] Set ape-x local exploration to 0, also load explorations before training steps ( #3349 )
...
## What do these changes do?
This should fix high explorations being used after restore / for rollouts.
## Related issue number
(dev list issue)
2018-11-19 20:36:25 -08:00
Eric Liang
afc48d7b77
Don't setpgid() on actors ( #3347 )
2018-11-19 17:35:26 -08:00
Robert Nishihara
f2b5500642
Add ordered_set container. ( #3352 )
...
* Add ordered_set container.
* Fix
* Linting
* Constructors
* Remove O(n) call to list.size().
* Fix.
* Add documentation.
* Add iterators to ordered_set container implementation.
* iterator_type -> iterator
* Make typedefs private
* Add const_iterator
2018-11-19 17:01:18 -08:00
Eric Liang
d4dbd27e0d
Don't retry IPC connect an absurd number of times ( #3355 )
2018-11-19 16:23:59 -08:00
Eric Liang
e4bb5d8d16
Fix logging when ray cluster utils is used
2018-11-18 21:49:27 -08:00
Eric Liang
61e3bbbfee
Update stale example links
2018-11-17 15:40:38 -08:00
Robert Nishihara
5cbc597494
Suppress duplicate pre-emptive object pushes. ( #3276 )
...
* Suppress duplicate pre-emptive object pushes.
* Add test.
* Fix linting
* Remove timer and inline recent_pushes_ into local_objects_.
* Improve test.
* Fix
* Fix linting
* Enable retrying pull from same object manager. Randomize object manager.
* Speed up test
* Linting
* Add test.
* Minor
* Lengthen pull timeout and reissue pull every time a new object becomes available.
* Increase pull timeout in test.
* Wait for nodes to start in object manager test.
* Wait longer for nodes to start up in test.
* Small fixes.
* _submit -> _remote
* Change assert to warning.
2018-11-16 23:02:45 -08:00
Wenting Shen
ab1e0f5c2f
support home path and relative path for temp-dir ( #3329 )
2018-11-16 17:41:10 -08:00
Robert Nishihara
60b22d9a72
Don't unsubscribe dependencies for infeasible tasks. ( #3338 )
...
* Make scheduling queues RemoveTasks return task states as well.
* Add test
* Don't unsubscribe for infeasible tasks when spilling over.
* Linting
* Address comments.
2018-11-16 11:33:00 -08:00
Eric Liang
e0bf9d7305
Add debug string to raylet ( #3317 )
...
* initial debug string
* format
* wip debug string
* fix compile
* fix
* update
* finished
* to file
* logs dir
* use temp root
* fix
* override
2018-11-15 21:47:50 -08:00
Robert Nishihara
d10cb570ab
Rename _submit -> _remote. ( #3321 )
2018-11-15 15:30:18 -08:00
Robert Nishihara
98edf752a9
Note requirement cython==0.27.3 in installation instructions. ( #3322 )
2018-11-15 15:27:19 -08:00
Philipp Moritz
1be1455d86
Fix redis crash when duplicate messages are appended to log. ( #3316 )
2018-11-15 15:09:39 -08:00
Eric Liang
5723291db6
Raise exception if the node is nearly out of memory ( #3323 )
...
* wip
* add
* comment
* escape hatch
* update
* object store too
* .2
2018-11-15 12:55:25 -08:00
Philipp Moritz
b6a12d1f97
Fix socket retry message ( #3325 )
2018-11-15 12:14:19 -08:00
Lewis Belcher
5319fd044c
Update redis version in setup.py ( #3333 )
...
* `redis` has released a new version (https://github.com/andymccurdy/redis-py/releases/tag/3.0.0 )
* `ray` is not compatible with this version
* This PR adds the "compatible release" operator for `redis` version 2.10.6.
2018-11-15 10:40:08 -08:00
Eric Liang
706dc1d473
[rllib] Add test for multi-agent support and fix IMPALA multi-agent ( #3289 )
...
IMPALA support for multiagent was broken since IMPALA has a requirement that batch sizes be of a certain length. However multi-agent envs can create variable-length batches.
Fix this by adding zero-padding as needed (similar to the RNN case).
2018-11-14 14:14:07 -08:00
andrewztan
57c7b4238e
KL Divergence Metrics ( #3300 )
...
* added KL divergence metrics
* fix
2018-11-13 23:12:35 -08:00
Eric Liang
1660c9d627
Kill actor child processes on shutdown ( #3297 )
...
* example
* add env
* test pg
* change to test
* add atexit test
* Update rllib-env.rst
* comment
* revert unnecessary file
* fix title when actor is idle
* Update python/ray/actor.py
Co-Authored-By: ericl <ekhliang@gmail.com>
2018-11-13 19:16:42 -08:00
Stephanie Wang
577c1dda74
Release sender connections as soon as WriteMessageAsync completes ( #3313 )
2018-11-13 21:32:24 -05:00
Wang Qing
9d4847ad2d
[hot-fix] Fix error when calling Ray.init() twice. ( #3314 )
2018-11-13 21:21:54 -05:00
Eric Liang
65c27c70cf
[rllib] Clean up agent resource configurations ( #3296 )
...
Closes #3284
2018-11-13 18:00:03 -08:00
Philipp Moritz
d4fad222e1
Update profiling instructions for raylet ( #3311 )
2018-11-13 17:48:33 -05:00
Richard Liaw
97f423781b
Clean up Ray processes after cluster util exits ( #3278 )
2018-11-13 13:18:12 -08:00
Richard Liaw
c3a2c7ebed
[tune] Doc: Autofilled, StatusReporter ( #3294 )
...
* autofill and revise doc page for things
* lint
* comments
2018-11-13 13:15:56 -08:00
Eric Liang
6ee7a3b571
[rllib] Raise worker TF intra_op threads to 2, lower driver intra_op threads to 8 ( #3299 )
2018-11-13 11:41:58 -08:00
Richard Liaw
c0423db05c
[core] Add Global State Test for multi-node setting ( #3239 )
...
* add test for adding node
* multinode test fixes
* First pass at allowing updatable values
* Fix compilation issues
* Add config file parsing
* Full initialization
* Wrote a good test
* configuration parsing and stuff
* docs
* write some tests, make it good
* fixed init
* Add all config options and bring back stress tests.
* Update python/ray/worker.py
* Update python/ray/worker.py
* Fix internalization
* some last changes
* Linting and Java fix
* add docstring
* Fix test, add assertions
* pytest ext
* lint
* lint
2018-11-13 10:35:24 -08:00
Eric Liang
d90f365394
[rllib] Add self-supervised loss to model ( #3291 )
...
# What do these changes do?
Allow self-supervised losses to be easily defined in custom models. Add this to the reference policy graphs.
2018-11-12 18:55:24 -08:00
Philipp Moritz
ce6e01b988
enable incremental builds ( #3292 )
2018-11-12 21:49:09 -05:00
Eric Liang
bd0dbde149
[rllib] Rename ServingEnv => ExternalEnv ( #3302 )
2018-11-12 16:31:27 -08:00
Richard Liaw
e37891d79d
[tune] Fix default handling for timesteps ( #3293 )
...
This PR fixes an issue where previously if timesteps_this_iter = 0,
then it would render as "None".
Closes #3057 .
2018-11-12 15:52:17 -08:00
Eric Liang
49e2085d78
[rllib] Don't reset envs when possible ( #3290 )
...
* laz
* better errors
2018-11-11 01:45:37 -08:00
Eric Liang
463511f8a6
[tune] Track and warn on low memory ( #3298 )
2018-11-11 00:29:45 -08:00
Eric Liang
53489d2f85
[sgd] Document and add simple MNIST example ( #3236 )
2018-11-10 21:52:20 -08:00
Ion
d681893b0f
Speed up task dispatch. ( #3234 )
...
* speed up task dispatch
* minor changes
* improved comments
* improved comments
* change argument of DispatchTasks to list of tasks
* dispatch only tasks whose dependencies have been fullfiled
* some updated comments
* refactored DispatchQueue() and Assigntask() to avoid the copy of the ready list
* minor fixes
* some more minor fixes
* some more minor fixes
* added more comments
* better comments?
* fixed all feedback comments, minus making the argument of AssignTask() const
* Assigntask() now taskes a const argument
* Do the task copy outside of the callback
* fix linting
2018-11-10 09:55:12 -08:00
Richard Liaw
29c182d449
[tune] Support "None" for upload_dir
2018-11-09 22:02:08 -08:00
Eric Liang
a51d618d88
[autoscaler] missing example-full.yaml file in the latest wheel for provider type "local"
2018-11-09 21:25:15 -08:00
Eric Liang
9dd3eedbac
[rllib] rollout.py should reduce num workers ( #3263 )
...
## What do these changes do?
Don't create an excessive amount of workers for rollout.py, and also fix up the env wrapping to be consistent with the internal agent wrapper.
## Related issue number
Closes #3260 .
2018-11-09 12:29:16 -08:00
Richard Liaw
22113be04c
[tune] Annotated Example Page and showcase Tutorials ( #3267 )
...
Adds an example page and link in codebase.
Closes #2728 .
2018-11-08 23:45:05 -08:00
Eric Liang
588705b6fa
[autoscaler] Add option to allow private ips only ( #3270 )
...
* merge
* update
* upd
* Update python/ray/autoscaler/autoscaler.py
Co-Authored-By: ericl <ekhliang@gmail.com>
* Update python/ray/autoscaler/autoscaler.py
Co-Authored-By: ericl <ekhliang@gmail.com>
* Update python/ray/autoscaler/aws/config.py
Co-Authored-By: ericl <ekhliang@gmail.com>
* fix
2018-11-08 17:07:31 -08:00
Philipp Moritz
8894883153
Force kill web UI in ray stop ( #3257 )
2018-11-08 00:05:32 -08:00
Eric Liang
9b2794101d
[minor] Change chunk already exists to DEBUG, add flags for rllib multi node testing ( #3228 )
2018-11-08 00:04:20 -08:00
Stephanie Wang
d950e92f63
Allow multiple threads to call ray.get and ray.wait ( #3244 )
...
* Handle multiple threads calling ray.get
* Multithreaded ray.wait
* Pass in current task ID in java backend
* Add multithreaded actor to tests, add warning messages to worker for multithreaded ray.get
* Fix test
* Some cleanups
* Improve error message
* Add assertion
* Cleanup, throw error in HandleTaskUnblocked if task not actually blocked
* lint
* Fix python worker reset
* Fix references to reconstruct_objects
* Linting
* java lint
* Fix java
* Fix iterator
2018-11-07 22:39:28 -08:00