Eric Liang
9dd3eedbac
[rllib] rollout.py should reduce num workers ( #3263 )
...
## What do these changes do?
Don't create an excessive amount of workers for rollout.py, and also fix up the env wrapping to be consistent with the internal agent wrapper.
## Related issue number
Closes #3260 .
2018-11-09 12:29:16 -08:00
Richard Liaw
22113be04c
[tune] Annotated Example Page and showcase Tutorials ( #3267 )
...
Adds an example page and link in codebase.
Closes #2728 .
2018-11-08 23:45:05 -08:00
Eric Liang
588705b6fa
[autoscaler] Add option to allow private ips only ( #3270 )
...
* merge
* update
* upd
* Update python/ray/autoscaler/autoscaler.py
Co-Authored-By: ericl <ekhliang@gmail.com>
* Update python/ray/autoscaler/autoscaler.py
Co-Authored-By: ericl <ekhliang@gmail.com>
* Update python/ray/autoscaler/aws/config.py
Co-Authored-By: ericl <ekhliang@gmail.com>
* fix
2018-11-08 17:07:31 -08:00
Philipp Moritz
8894883153
Force kill web UI in ray stop ( #3257 )
2018-11-08 00:05:32 -08:00
Eric Liang
9b2794101d
[minor] Change chunk already exists to DEBUG, add flags for rllib multi node testing ( #3228 )
2018-11-08 00:04:20 -08:00
Stephanie Wang
d950e92f63
Allow multiple threads to call ray.get and ray.wait ( #3244 )
...
* Handle multiple threads calling ray.get
* Multithreaded ray.wait
* Pass in current task ID in java backend
* Add multithreaded actor to tests, add warning messages to worker for multithreaded ray.get
* Fix test
* Some cleanups
* Improve error message
* Add assertion
* Cleanup, throw error in HandleTaskUnblocked if task not actually blocked
* lint
* Fix python worker reset
* Fix references to reconstruct_objects
* Linting
* java lint
* Fix java
* Fix iterator
2018-11-07 22:39:28 -08:00
Richard Liaw
0bab8ed95c
Expose internal config parameters for starting Ray ( #3246 )
...
## What do these changes do?
This PR exposes the CL option for using a config parameter. This is important for certain tests (i.e., FT tests that removing nodes) to run quickly.
Note that this is bad practice and should be replaced with GFLAGS or some equivalent as soon as possible.
#3239 depends on this.
TODO:
- [x] Add documentation to method arguments before merging.
- [x] Add test to verify this works?
## Related issue number
2018-11-07 21:46:02 -08:00
Eric Liang
43df405d07
[rllib] Add some debug logs during agent setup ( #3247 )
2018-11-07 14:54:28 -08:00
Richard Liaw
cf9e838326
[tune] Raise Error when overstepping ( #3235 )
2018-11-07 14:27:09 -08:00
Eric Liang
29e3362905
Better errors on process deaths ( #3252 )
2018-11-07 14:08:16 -08:00
Robert Nishihara
1dd5d92789
Enable timeline visualizations of object transfers. ( #3255 )
...
* Plot object transfers.
* Linting
2018-11-07 12:45:59 -08:00
Philipp Moritz
4182b85611
Cache resources in SchedulingQueue ( #3232 )
...
* cache resources
* fix
* documentation and remove old code
* fix PR
* update documentation
* linting
2018-11-06 21:23:31 -08:00
Eric Liang
2e04ffe00c
Change dict serialization warning to debug ( #3230 )
2018-11-06 21:23:07 -08:00
Stephanie Wang
ca585703b2
Refactor ObjectDirectory to reduce and fix callback usage ( #3227 )
2018-11-06 20:33:10 -08:00
eugenevinitsky
344b4ef0ff
[rllib] Fix filter sync for ES and ARS ( #2918 )
2018-11-06 19:09:34 -08:00
Eric Liang
725df3a485
Set the process title in workers and actors ( #3219 )
2018-11-06 14:59:22 -08:00
Peter Schafhalter
f3efcd2342
Fix password authentication in worker ( #3124 )
2018-11-06 13:40:03 -08:00
Eric Liang
8356a01dd6
Remove suppressing duplicate error message (missed a couple)
2018-11-05 23:37:14 -08:00
Eric Liang
80f63696ac
Cap object store memory to 20GB when size is None ( #3243 )
...
* Update services.py
* Update services.py
2018-11-05 18:34:19 -08:00
Wang Qing
4968cc5d70
Fix a small typo ( #3240 )
2018-11-05 18:30:53 -08:00
Stephanie Wang
bf88aa5013
Increase timeout before reconstruction is triggered ( #3217 )
...
* Increase timeout to 10s
* Skip eviction reconstruction tests
* Add stress test for many actors to one
* Fix test by shortening it.
* lower number of processes in stress test
* Skip slow test
2018-11-05 18:03:50 -08:00
Ion
d8ae9de99c
Caching task resource requirements. ( #3231 )
...
* caching resource requirements
* small fixes
* avoid copying the resource map
2018-11-05 15:14:09 -08:00
Eric Liang
813f51769f
[rllib] Fix rllib rollouts script and add test ( #3211 )
...
## What do these changes do?
Clean up the checkpointing to handle the new checkpoint dirs. Add a test for rollout.py
## Related issue number
https://github.com/ray-project/ray/issues/3206
https://github.com/ray-project/ray/issues/3204
2018-11-05 00:33:25 -08:00
Philipp Moritz
99bac44375
Update CMake to support Mac OS X 10.14 ( #3218 )
2018-11-04 16:32:58 -08:00
xutianming
fb6ac28b44
single sourcing the package version ( #3220 )
2018-11-04 13:53:55 -08:00
Eric Liang
369cb833fe
[rllib] Implement custom metrics ( #3144 )
2018-11-03 18:48:32 -07:00
Eric Liang
7d69c77a19
[rllib] Decouple ape-x sampling and learning speed
2018-11-03 18:07:39 -07:00
Philipp Moritz
0da15b1c1f
Fix build system dependency for local_scheduler_client ( #3215 )
2018-11-03 13:19:02 -07:00
Eric Liang
9a0f0db070
Add ray stack
tool for debugging ( #3213 )
2018-11-03 13:13:02 -07:00
Wang Qing
ca7d4c2cf5
Enable to specify driver id by user. ( #3084 )
2018-11-02 19:01:50 -07:00
Si-Yuan
5ce7ed7dad
Fix 'tempfile' docs ( #3180 )
...
* Fix docs.
* Update doc/source/tempfile.rst
Co-Authored-By: suquark <suquark@gmail.com>
* Remove doc for raylet socket.
2018-11-02 16:50:55 -07:00
Eric Liang
8c03683573
Add warning about using latest wheels ( #3207 )
2018-11-02 15:41:10 -07:00
Robert Nishihara
e495ab5e7c
Fix some paths /tmp/raylogs -> /tmp/ray. ( #3189 )
2018-11-02 12:10:53 -07:00
Robert Nishihara
5822aa2388
Rename get_task -> worker_idle in timeline. ( #3179 )
...
* Rename get_task -> worker_idle in timeline.
* Fix test.
2018-11-02 12:08:46 -07:00
Eric Liang
2bef9844bf
Revert "[autoscaler] Also grant roles to worker nodes" ( #3199 )
...
This reverts commit 55d161b49f
.
2018-11-01 23:23:06 -07:00
Robert Nishihara
e612e26103
Add use_raylet option for backwards compatibility. ( #3176 )
...
* Add use_raylet option for backwards compatibility.
* Update message.
2018-11-01 14:16:04 -07:00
Robert Nishihara
57d6e98302
Update actor fault tolerance documentation. ( #3175 )
2018-11-01 11:52:05 -07:00
Robert Nishihara
60f28040ea
Document fractional resources. ( #3174 )
2018-11-01 10:50:56 -07:00
Eric Liang
b2caed9651
[minor] fix a3c pytorch example dim 80 => 84
2018-10-31 22:00:14 -07:00
Eric Liang
cd284bb487
[rllib] Document env compatibility, Ape-X support for multi-agent ( #3147 )
2018-10-31 21:59:34 -07:00
Richard Liaw
2086a57e61
[tune] Add Fractional GPU example/docs ( #3169 )
...
* Add example for fractional GPU support
* Update tune_mnist_keras.py
* Update doc/source/tune-usage.rst
2018-10-31 18:53:16 -07:00
Robert Nishihara
1f29a960f4
Update task_table and object_table API. ( #3161 )
...
* Update task_table and object_table API.
* Fix
2018-10-31 12:52:50 -07:00
Dennis Chung
9df2e6e6f4
[tune] Modify stop criteria in hyperopt example ( #3102 )
...
Modify `training_iteraion` to `timesteps_total` because only `timesteps_total` is inside the reporter.
2018-10-30 13:26:40 -07:00
Stephanie Wang
aacbd007a0
[xray] Implement faster flush policy for lineage cache ( #3071 )
...
* Policy that flushes the lineage stash immediately
* Fix bug where remote tasks in uncommitted lineage weren't getting subscribed to, add reg test
* test
* Fix bug where waiting task was getting subscribed
* Cleanup
* Update src/ray/raylet/lineage_cache.cc
Co-Authored-By: stephanie-wang <swang@cs.berkeley.edu>
* Update src/ray/raylet/lineage_cache.cc
Co-Authored-By: stephanie-wang <swang@cs.berkeley.edu>
* cleanup
* cleanup
* Add another test for task with many parents
* fix, unsubscribe to new waiting tasks
* Unsubscribe as soon as the commit notification is handled
2018-10-30 09:59:50 -07:00
Eric Liang
a221f55b0d
[rllib] Add custom value functions, fix up and document multi-agent variable sharing ( #3151 )
2018-10-29 19:37:27 -07:00
Robert Nishihara
e49839c73f
Fix linting. ( #3155 )
2018-10-28 20:43:29 -07:00
Robert Nishihara
32f0d6b77e
Deprecate num_workers argument to ray.init and ray start. ( #3114 )
...
* Remove num_workers argument.
* Fix
* Fix
2018-10-28 20:12:49 -07:00
Robert Nishihara
9868af4c7c
Use /tmp instead of /dev/shm for object store on Linux if /dev/shm is too small. ( #3149 )
...
* Use /tmp instead of /dev/shm for object store on Linux if /dev/shm is too small.
* Add logging statement and address comments.
* Fix
2018-10-28 20:09:06 -07:00
Robert Nishihara
08fc9e5bcd
Add more description to setup.py. ( #3153 )
2018-10-28 19:49:52 -07:00
Robert Nishihara
fd854ff090
Allow the node manager port and object manager port to be set through… ( #3130 )
...
* Allow the node manager port and object manager port to be set through ray start.
* Linting
* Fix Java test
* Address comments.
2018-10-28 17:28:41 -07:00