Eric Liang
2e04ffe00c
Change dict serialization warning to debug ( #3230 )
2018-11-06 21:23:07 -08:00
Stephanie Wang
ca585703b2
Refactor ObjectDirectory to reduce and fix callback usage ( #3227 )
2018-11-06 20:33:10 -08:00
eugenevinitsky
344b4ef0ff
[rllib] Fix filter sync for ES and ARS ( #2918 )
2018-11-06 19:09:34 -08:00
Eric Liang
725df3a485
Set the process title in workers and actors ( #3219 )
2018-11-06 14:59:22 -08:00
Peter Schafhalter
f3efcd2342
Fix password authentication in worker ( #3124 )
2018-11-06 13:40:03 -08:00
Eric Liang
8356a01dd6
Remove suppressing duplicate error message (missed a couple)
2018-11-05 23:37:14 -08:00
Eric Liang
80f63696ac
Cap object store memory to 20GB when size is None ( #3243 )
...
* Update services.py
* Update services.py
2018-11-05 18:34:19 -08:00
Wang Qing
4968cc5d70
Fix a small typo ( #3240 )
2018-11-05 18:30:53 -08:00
Stephanie Wang
bf88aa5013
Increase timeout before reconstruction is triggered ( #3217 )
...
* Increase timeout to 10s
* Skip eviction reconstruction tests
* Add stress test for many actors to one
* Fix test by shortening it.
* lower number of processes in stress test
* Skip slow test
2018-11-05 18:03:50 -08:00
Ion
d8ae9de99c
Caching task resource requirements. ( #3231 )
...
* caching resource requirements
* small fixes
* avoid copying the resource map
2018-11-05 15:14:09 -08:00
Eric Liang
813f51769f
[rllib] Fix rllib rollouts script and add test ( #3211 )
...
## What do these changes do?
Clean up the checkpointing to handle the new checkpoint dirs. Add a test for rollout.py
## Related issue number
https://github.com/ray-project/ray/issues/3206
https://github.com/ray-project/ray/issues/3204
2018-11-05 00:33:25 -08:00
Philipp Moritz
99bac44375
Update CMake to support Mac OS X 10.14 ( #3218 )
2018-11-04 16:32:58 -08:00
xutianming
fb6ac28b44
single sourcing the package version ( #3220 )
2018-11-04 13:53:55 -08:00
Eric Liang
369cb833fe
[rllib] Implement custom metrics ( #3144 )
2018-11-03 18:48:32 -07:00
Eric Liang
7d69c77a19
[rllib] Decouple ape-x sampling and learning speed
2018-11-03 18:07:39 -07:00
Philipp Moritz
0da15b1c1f
Fix build system dependency for local_scheduler_client ( #3215 )
2018-11-03 13:19:02 -07:00
Eric Liang
9a0f0db070
Add ray stack
tool for debugging ( #3213 )
2018-11-03 13:13:02 -07:00
Wang Qing
ca7d4c2cf5
Enable to specify driver id by user. ( #3084 )
2018-11-02 19:01:50 -07:00
Si-Yuan
5ce7ed7dad
Fix 'tempfile' docs ( #3180 )
...
* Fix docs.
* Update doc/source/tempfile.rst
Co-Authored-By: suquark <suquark@gmail.com>
* Remove doc for raylet socket.
2018-11-02 16:50:55 -07:00
Eric Liang
8c03683573
Add warning about using latest wheels ( #3207 )
2018-11-02 15:41:10 -07:00
Robert Nishihara
e495ab5e7c
Fix some paths /tmp/raylogs -> /tmp/ray. ( #3189 )
2018-11-02 12:10:53 -07:00
Robert Nishihara
5822aa2388
Rename get_task -> worker_idle in timeline. ( #3179 )
...
* Rename get_task -> worker_idle in timeline.
* Fix test.
2018-11-02 12:08:46 -07:00
Eric Liang
2bef9844bf
Revert "[autoscaler] Also grant roles to worker nodes" ( #3199 )
...
This reverts commit 55d161b49f
.
2018-11-01 23:23:06 -07:00
Robert Nishihara
e612e26103
Add use_raylet option for backwards compatibility. ( #3176 )
...
* Add use_raylet option for backwards compatibility.
* Update message.
2018-11-01 14:16:04 -07:00
Robert Nishihara
57d6e98302
Update actor fault tolerance documentation. ( #3175 )
2018-11-01 11:52:05 -07:00
Robert Nishihara
60f28040ea
Document fractional resources. ( #3174 )
2018-11-01 10:50:56 -07:00
Eric Liang
b2caed9651
[minor] fix a3c pytorch example dim 80 => 84
2018-10-31 22:00:14 -07:00
Eric Liang
cd284bb487
[rllib] Document env compatibility, Ape-X support for multi-agent ( #3147 )
2018-10-31 21:59:34 -07:00
Richard Liaw
2086a57e61
[tune] Add Fractional GPU example/docs ( #3169 )
...
* Add example for fractional GPU support
* Update tune_mnist_keras.py
* Update doc/source/tune-usage.rst
2018-10-31 18:53:16 -07:00
Robert Nishihara
1f29a960f4
Update task_table and object_table API. ( #3161 )
...
* Update task_table and object_table API.
* Fix
2018-10-31 12:52:50 -07:00
Dennis Chung
9df2e6e6f4
[tune] Modify stop criteria in hyperopt example ( #3102 )
...
Modify `training_iteraion` to `timesteps_total` because only `timesteps_total` is inside the reporter.
2018-10-30 13:26:40 -07:00
Stephanie Wang
aacbd007a0
[xray] Implement faster flush policy for lineage cache ( #3071 )
...
* Policy that flushes the lineage stash immediately
* Fix bug where remote tasks in uncommitted lineage weren't getting subscribed to, add reg test
* test
* Fix bug where waiting task was getting subscribed
* Cleanup
* Update src/ray/raylet/lineage_cache.cc
Co-Authored-By: stephanie-wang <swang@cs.berkeley.edu>
* Update src/ray/raylet/lineage_cache.cc
Co-Authored-By: stephanie-wang <swang@cs.berkeley.edu>
* cleanup
* cleanup
* Add another test for task with many parents
* fix, unsubscribe to new waiting tasks
* Unsubscribe as soon as the commit notification is handled
2018-10-30 09:59:50 -07:00
Eric Liang
a221f55b0d
[rllib] Add custom value functions, fix up and document multi-agent variable sharing ( #3151 )
2018-10-29 19:37:27 -07:00
Robert Nishihara
e49839c73f
Fix linting. ( #3155 )
2018-10-28 20:43:29 -07:00
Robert Nishihara
32f0d6b77e
Deprecate num_workers argument to ray.init and ray start. ( #3114 )
...
* Remove num_workers argument.
* Fix
* Fix
2018-10-28 20:12:49 -07:00
Robert Nishihara
9868af4c7c
Use /tmp instead of /dev/shm for object store on Linux if /dev/shm is too small. ( #3149 )
...
* Use /tmp instead of /dev/shm for object store on Linux if /dev/shm is too small.
* Add logging statement and address comments.
* Fix
2018-10-28 20:09:06 -07:00
Robert Nishihara
08fc9e5bcd
Add more description to setup.py. ( #3153 )
2018-10-28 19:49:52 -07:00
Robert Nishihara
fd854ff090
Allow the node manager port and object manager port to be set through… ( #3130 )
...
* Allow the node manager port and object manager port to be set through ray start.
* Linting
* Fix Java test
* Address comments.
2018-10-28 17:28:41 -07:00
Eric Liang
a404401dc6
Update agent.py to fix lint error
2018-10-28 15:28:08 -07:00
Jones Wong
d6bf890648
Solve hang caused by ray.get in collect_metrics ( #3096 )
2018-10-28 11:52:18 -07:00
Eric Liang
af0c1174cd
[sgd] Merge sharded param server based SGD implementation ( #3033 )
...
This includes most of the TF code used for the OSDI experiment. Perf sanity check on p3.16xl instances: Overall scaling looks ok, with the multi-node results within 5% of OSDI final numbers. This seems reasonable given that hugepages are not enabled here, and the param server shards are placed randomly.
$ RAY_USE_XRAY=1 ./test_sgd.py --gpu --batch-size=64 --num-workers=N \
--devices-per-worker=M --strategy=<simple|ps> \
--warmup --object-store-memory=10000000000
Images per second total
gpus total | simple | ps
========================================
1 | 218
2 (1 worker) | 388
4 (1 worker) | 759
4 (2 workers) | 176 | 623
8 (1 worker) | 985
8 (2 workers) | 349 | 1031
16 (2 nodes, 2 workers) | 600 | 1661
16 (2 nodes, 4 workers) | 468 | 1712 <--- OSDI perf was 1817
2018-10-27 21:25:02 -07:00
Yuhong Guo
befbf78048
Delete empty pubsub keys ( #3146 )
...
We found that there are large amount of pub-sub keys with no content in it (This case is worse when wait-id is used in the key name.).
This logic of deleting empty pub-sub keys from GCS was in legacy ray but not in raylet.
2018-10-27 11:58:39 -07:00
Eric Liang
6531eed2d0
[rllib] Better error message when action space dim too high ( #3119 )
2018-10-26 16:55:00 -07:00
Robert Nishihara
658c14282c
Remove legacy Ray code. ( #3121 )
...
* Remove legacy Ray code.
* Fix cmake and simplify monitor.
* Fix linting
* Updates
* Fix
* Implement some methods.
* Remove more plasma manager references.
* Fix
* Linting
* Fix
* Fix
* Make sure class IDs are strings.
* Some path fixes
* Fix
* Path fixes and update arrow
* Fixes.
* linting
* Fixes
* Java fixes
* Some java fixes
* TaskLanguage -> Language
* Minor
* Fix python test and remove unused method signature.
* Fix java tests
* Fix jenkins tests
* Remove commented out code.
2018-10-26 13:36:58 -07:00
Eric Liang
055daf17a0
[autoscaler] better message if there are more than 10 key pairs
2018-10-26 12:42:11 -07:00
bibabolynn
b4614ae69a
[java] customize path of ray.conf ( #3100 )
...
users can add custom path of ray.config by using -Dray.config=/path/to/ray.conf
2018-10-26 13:36:34 +08:00
Philipp Moritz
d3148cc3ab
[SGD] Provide better error message if model graphs have different numbers of variables ( #3139 )
2018-10-25 22:18:10 -07:00
Philipp Moritz
d34516f1f8
Update Gemfile Jekyll version ( #3140 )
2018-10-25 21:43:08 -07:00
Robert Nishihara
5aa29613db
Fix linting errors. ( #3127 )
2018-10-24 16:30:00 -07:00
Eric Liang
55d161b49f
[autoscaler] Also grant roles to worker nodes
2018-10-24 13:57:36 -07:00