Wang Qing
ca7d4c2cf5
Enable to specify driver id by user. ( #3084 )
2018-11-02 19:01:50 -07:00
Robert Nishihara
e495ab5e7c
Fix some paths /tmp/raylogs -> /tmp/ray. ( #3189 )
2018-11-02 12:10:53 -07:00
Robert Nishihara
5822aa2388
Rename get_task -> worker_idle in timeline. ( #3179 )
...
* Rename get_task -> worker_idle in timeline.
* Fix test.
2018-11-02 12:08:46 -07:00
Eric Liang
2bef9844bf
Revert "[autoscaler] Also grant roles to worker nodes" ( #3199 )
...
This reverts commit 55d161b49f
.
2018-11-01 23:23:06 -07:00
Robert Nishihara
e612e26103
Add use_raylet option for backwards compatibility. ( #3176 )
...
* Add use_raylet option for backwards compatibility.
* Update message.
2018-11-01 14:16:04 -07:00
Eric Liang
b2caed9651
[minor] fix a3c pytorch example dim 80 => 84
2018-10-31 22:00:14 -07:00
Eric Liang
cd284bb487
[rllib] Document env compatibility, Ape-X support for multi-agent ( #3147 )
2018-10-31 21:59:34 -07:00
Richard Liaw
2086a57e61
[tune] Add Fractional GPU example/docs ( #3169 )
...
* Add example for fractional GPU support
* Update tune_mnist_keras.py
* Update doc/source/tune-usage.rst
2018-10-31 18:53:16 -07:00
Robert Nishihara
1f29a960f4
Update task_table and object_table API. ( #3161 )
...
* Update task_table and object_table API.
* Fix
2018-10-31 12:52:50 -07:00
Dennis Chung
9df2e6e6f4
[tune] Modify stop criteria in hyperopt example ( #3102 )
...
Modify `training_iteraion` to `timesteps_total` because only `timesteps_total` is inside the reporter.
2018-10-30 13:26:40 -07:00
Eric Liang
a221f55b0d
[rllib] Add custom value functions, fix up and document multi-agent variable sharing ( #3151 )
2018-10-29 19:37:27 -07:00
Robert Nishihara
e49839c73f
Fix linting. ( #3155 )
2018-10-28 20:43:29 -07:00
Robert Nishihara
32f0d6b77e
Deprecate num_workers argument to ray.init and ray start. ( #3114 )
...
* Remove num_workers argument.
* Fix
* Fix
2018-10-28 20:12:49 -07:00
Robert Nishihara
9868af4c7c
Use /tmp instead of /dev/shm for object store on Linux if /dev/shm is too small. ( #3149 )
...
* Use /tmp instead of /dev/shm for object store on Linux if /dev/shm is too small.
* Add logging statement and address comments.
* Fix
2018-10-28 20:09:06 -07:00
Robert Nishihara
08fc9e5bcd
Add more description to setup.py. ( #3153 )
2018-10-28 19:49:52 -07:00
Robert Nishihara
fd854ff090
Allow the node manager port and object manager port to be set through… ( #3130 )
...
* Allow the node manager port and object manager port to be set through ray start.
* Linting
* Fix Java test
* Address comments.
2018-10-28 17:28:41 -07:00
Eric Liang
a404401dc6
Update agent.py to fix lint error
2018-10-28 15:28:08 -07:00
Jones Wong
d6bf890648
Solve hang caused by ray.get in collect_metrics ( #3096 )
2018-10-28 11:52:18 -07:00
Eric Liang
af0c1174cd
[sgd] Merge sharded param server based SGD implementation ( #3033 )
...
This includes most of the TF code used for the OSDI experiment. Perf sanity check on p3.16xl instances: Overall scaling looks ok, with the multi-node results within 5% of OSDI final numbers. This seems reasonable given that hugepages are not enabled here, and the param server shards are placed randomly.
$ RAY_USE_XRAY=1 ./test_sgd.py --gpu --batch-size=64 --num-workers=N \
--devices-per-worker=M --strategy=<simple|ps> \
--warmup --object-store-memory=10000000000
Images per second total
gpus total | simple | ps
========================================
1 | 218
2 (1 worker) | 388
4 (1 worker) | 759
4 (2 workers) | 176 | 623
8 (1 worker) | 985
8 (2 workers) | 349 | 1031
16 (2 nodes, 2 workers) | 600 | 1661
16 (2 nodes, 4 workers) | 468 | 1712 <--- OSDI perf was 1817
2018-10-27 21:25:02 -07:00
Eric Liang
6531eed2d0
[rllib] Better error message when action space dim too high ( #3119 )
2018-10-26 16:55:00 -07:00
Robert Nishihara
658c14282c
Remove legacy Ray code. ( #3121 )
...
* Remove legacy Ray code.
* Fix cmake and simplify monitor.
* Fix linting
* Updates
* Fix
* Implement some methods.
* Remove more plasma manager references.
* Fix
* Linting
* Fix
* Fix
* Make sure class IDs are strings.
* Some path fixes
* Fix
* Path fixes and update arrow
* Fixes.
* linting
* Fixes
* Java fixes
* Some java fixes
* TaskLanguage -> Language
* Minor
* Fix python test and remove unused method signature.
* Fix java tests
* Fix jenkins tests
* Remove commented out code.
2018-10-26 13:36:58 -07:00
Eric Liang
055daf17a0
[autoscaler] better message if there are more than 10 key pairs
2018-10-26 12:42:11 -07:00
Philipp Moritz
d3148cc3ab
[SGD] Provide better error message if model graphs have different numbers of variables ( #3139 )
2018-10-25 22:18:10 -07:00
Robert Nishihara
5aa29613db
Fix linting errors. ( #3127 )
2018-10-24 16:30:00 -07:00
Eric Liang
55d161b49f
[autoscaler] Also grant roles to worker nodes
2018-10-24 13:57:36 -07:00
Robert Nishihara
9c1826ed69
Use XRay backend by default. ( #3020 )
...
* Use XRay backend by default.
* Remove irrelevant valgrind tests.
* Fix
* Move tests around.
* Fix
* Fix test
* Fix test.
* String/unicode fix.
* Fix test
* Fix unicode issue.
* Minor changes
* Fix bug in test_global_state.py.
* Fix test.
* Linting
* Try arrow change and other object manager changes.
* Use newer plasma client API
* Small updates
* Revert plasma client api change.
* Update
* Update arrow and allow SendObjectHeaders to fail.
* Update arrow
* Update python/ray/experimental/state.py
Co-Authored-By: robertnishihara <robertnishihara@gmail.com>
* Address comments.
2018-10-23 12:46:39 -07:00
Robert Nishihara
9d2e864caf
Fix Python linting error. ( #3113 )
2018-10-22 23:41:42 -07:00
Eric Liang
73a092e08c
update ( #3112 )
2018-10-22 22:55:43 -07:00
Richard Liaw
eff7cb4458
[tune] Fix SearchAlg finishing early ( #3081 )
...
* Fix trial search alg finishing early
* Fix lint
* fix lint
* nit fix
2018-10-22 12:17:13 -07:00
Eric Liang
221d1663c1
[rllib] switch to python logger ( #3098 )
...
* logg
* set rllib logger
* comment
* info
* rlib
* comment
* add format
* fix lint
* add file info
* update
* add ts
* lint
* better docs
* fix value error
* soft log level
2018-10-21 23:43:57 -07:00
Richard Liaw
40c4148d4f
Cluster Utilities for Fault Tolerance Tests ( #3008 )
2018-10-20 22:56:29 -07:00
Eric Liang
59901a88a0
[rllib] Native support for Dict and Tuple spaces; fix Tuple action spaces; add prev a, r to LSTM ( #3051 )
2018-10-20 15:21:22 -07:00
Peter Schafhalter
fa469783d8
Fix bug when connecting to password-secured cluster ( #3083 )
2018-10-18 21:43:03 -07:00
Devin Petersohn
8fcdafc6ea
Adding Python3.7 wheels support ( #2546 )
...
* Adding Python3.7 wheels support
* Adding Mac wheels update
* fix
* numpy version
* choose different numpy versions depending on python version
* fix
2018-10-18 17:58:39 -07:00
Peter Schafhalter
b82fd157a7
Remove Redis protected mode ( #3073 )
...
Follow-up to #2925 and #2952 . Removes the Redis protected mode implementation from Ray which was replaced by Redis port authentication.
2018-10-17 22:48:14 -07:00
Philipp Moritz
2c52d9dfa0
Fix actor handle id creation when actor handle was pickled ( #3074 )
2018-10-17 18:00:52 -07:00
Richard Liu
3c0803e7e9
[rllib] use ray.wait
to get next worker result in async sample optimizer ( #2993 )
2018-10-17 17:44:51 -07:00
Peter Schafhalter
a41bbc10ef
Add password authentication to Redis ports ( #2952 )
...
* Implement Redis authentication
* Throw exception for legacy Ray
* Add test
* Formatting
* Fix bugs in CLI
* Fix bugs in Raylet
* Move default password to constants.h
* Use pytest.fixture
* Fix bug
* Authenticate using formatted strings
* Add missing passwords
* Add test
* Improve authentication of async contexts
* Disable Redis authentication for credis
* Update test for credis
* Fix rebase artifacts
* Fix formatting
* Add workaround for issue #3045
* Increase timeout for test
* Improve C++ readability
* Fixes for CLI
* Add security docs
* Address comments
* Address comments
* Adress comments
* Use ray.get
* Fix lint
2018-10-16 22:48:30 -07:00
Eric Liang
a9e454f6fd
[rllib] Include config dicts in the sphinx docs ( #3064 )
2018-10-16 15:55:11 -07:00
Praveen Palanisamy
4d8cfc0bf5
[tune] Fix (some more) misleading comments in tune/results.py ( #3068 )
...
## What do these changes do?
Fix the misleading comments in code for:
- `EPISODES_THIS_ITER`
- `EPISODES_TOTAL`
Had noted it before and planned to fix it along with some other changes but seemed very relevant to stay next to #3058 so sending this now.
2018-10-16 11:07:53 -07:00
Eric Liang
6240ccbc6e
[rllib] Add more warnings when multi-agent envs might not be set up right ( #3061 )
2018-10-15 13:42:56 -07:00
Eric Liang
3c891c6ece
[rllib] Parallel-data loading and multi-gpu support for IMPALA ( #2766 )
2018-10-15 11:02:50 -07:00
Marlon
4dc78b735b
[tune] Fix misleading comment ( #3058 )
2018-10-14 22:25:39 -07:00
Eric Liang
866c7a574c
[rllib] Don't crash printing out error message ( #3054 )
...
* fix er
* update
2018-10-13 19:50:23 -07:00
Eric Liang
473ee4eb3f
[rllib] Add unit test and some better error messages for custom policy states ( #3032 )
2018-10-13 00:03:52 -07:00
Richard Liaw
f9b58d7b02
[tune] Tweaks to Trainable and Verbosity ( #2889 )
2018-10-11 23:42:13 -07:00
Kristian Hartikainen
2d35a97a76
Bug/log syncer fails with parentheses ( #2653 )
...
* Update rsync command
* Escape rsync locations
* Fix the accidental variable move
* Update rsync to use -s flag
2018-10-06 00:34:53 -07:00
Richard Liaw
ecd8f39580
[core] Improve logging message when plasma store is started. ( #3029 )
...
Improve logging message when plasma store is started.
2018-10-05 15:24:24 -07:00
Richard Liaw
0651d3b629
[tune/core] Use Global State API for resources ( #3004 )
2018-10-04 17:23:17 -07:00
Robert Nishihara
faa31ae018
Introduce concept of resources required for placing a task. ( #2837 )
...
* Introduce concept of resources required for placement.
* Add placement resources to task spec
* Update java worker
* Update taskinfo.java
2018-10-04 10:35:39 -07:00