Commit graph

481 commits

Author SHA1 Message Date
Eric Liang
abdc3b592e
[rllib] Update multi-gpu impala numbers (#3327) 2018-11-19 20:55:27 -08:00
Eric Liang
61e3bbbfee
Update stale example links 2018-11-17 15:40:38 -08:00
Robert Nishihara
98edf752a9 Note requirement cython==0.27.3 in installation instructions. (#3322) 2018-11-15 15:27:19 -08:00
Eric Liang
706dc1d473
[rllib] Add test for multi-agent support and fix IMPALA multi-agent (#3289)
IMPALA support for multiagent was broken since IMPALA has a requirement that batch sizes be of a certain length. However multi-agent envs can create variable-length batches.

Fix this by adding zero-padding as needed (similar to the RNN case).
2018-11-14 14:14:07 -08:00
Eric Liang
65c27c70cf [rllib] Clean up agent resource configurations (#3296)
Closes #3284
2018-11-13 18:00:03 -08:00
Philipp Moritz
d4fad222e1 Update profiling instructions for raylet (#3311) 2018-11-13 17:48:33 -05:00
Richard Liaw
c3a2c7ebed [tune] Doc: Autofilled, StatusReporter (#3294)
* autofill and revise doc page for things

* lint

* comments
2018-11-13 13:15:56 -08:00
Eric Liang
d90f365394 [rllib] Add self-supervised loss to model (#3291)
# What do these changes do?

Allow self-supervised losses to be easily defined in custom models. Add this to the reference policy graphs.
2018-11-12 18:55:24 -08:00
Eric Liang
bd0dbde149
[rllib] Rename ServingEnv => ExternalEnv (#3302) 2018-11-12 16:31:27 -08:00
Eric Liang
53489d2f85
[sgd] Document and add simple MNIST example (#3236) 2018-11-10 21:52:20 -08:00
Eric Liang
9dd3eedbac [rllib] rollout.py should reduce num workers (#3263)
## What do these changes do?

Don't create an excessive amount of workers for rollout.py, and also fix up the env wrapping to be consistent with the internal agent wrapper.

## Related issue number

Closes #3260.
2018-11-09 12:29:16 -08:00
Richard Liaw
22113be04c
[tune] Annotated Example Page and showcase Tutorials (#3267)
Adds an example page and link in codebase.

Closes #2728.
2018-11-08 23:45:05 -08:00
eugenevinitsky
344b4ef0ff [rllib] Fix filter sync for ES and ARS (#2918) 2018-11-06 19:09:34 -08:00
Eric Liang
725df3a485 Set the process title in workers and actors (#3219) 2018-11-06 14:59:22 -08:00
Eric Liang
369cb833fe
[rllib] Implement custom metrics (#3144) 2018-11-03 18:48:32 -07:00
Eric Liang
9a0f0db070 Add ray stack tool for debugging (#3213) 2018-11-03 13:13:02 -07:00
Si-Yuan
5ce7ed7dad Fix 'tempfile' docs (#3180)
* Fix docs.

* Update doc/source/tempfile.rst

Co-Authored-By: suquark <suquark@gmail.com>

* Remove doc for raylet socket.
2018-11-02 16:50:55 -07:00
Eric Liang
8c03683573 Add warning about using latest wheels (#3207) 2018-11-02 15:41:10 -07:00
Robert Nishihara
e495ab5e7c Fix some paths /tmp/raylogs -> /tmp/ray. (#3189) 2018-11-02 12:10:53 -07:00
Robert Nishihara
57d6e98302 Update actor fault tolerance documentation. (#3175) 2018-11-01 11:52:05 -07:00
Robert Nishihara
60f28040ea Document fractional resources. (#3174) 2018-11-01 10:50:56 -07:00
Eric Liang
cd284bb487
[rllib] Document env compatibility, Ape-X support for multi-agent (#3147) 2018-10-31 21:59:34 -07:00
Richard Liaw
2086a57e61
[tune] Add Fractional GPU example/docs (#3169)
* Add example for fractional GPU support

* Update tune_mnist_keras.py

* Update doc/source/tune-usage.rst
2018-10-31 18:53:16 -07:00
Eric Liang
a221f55b0d
[rllib] Add custom value functions, fix up and document multi-agent variable sharing (#3151) 2018-10-29 19:37:27 -07:00
Eric Liang
af0c1174cd
[sgd] Merge sharded param server based SGD implementation (#3033)
This includes most of the TF code used for the OSDI experiment. Perf sanity check on p3.16xl instances: Overall scaling looks ok, with the multi-node results within 5% of OSDI final numbers. This seems reasonable given that hugepages are not enabled here, and the param server shards are placed randomly.

$ RAY_USE_XRAY=1 ./test_sgd.py --gpu --batch-size=64 --num-workers=N \
  --devices-per-worker=M --strategy=<simple|ps> \
  --warmup --object-store-memory=10000000000

Images per second total
gpus total              | simple | ps
========================================
1                       | 218
2 (1 worker)            | 388
4 (1 worker)            | 759
4 (2 workers)           | 176    | 623
8 (1 worker)            | 985
8 (2 workers)           | 349    | 1031
16 (2 nodes, 2 workers) | 600    | 1661
16 (2 nodes, 4 workers) | 468    | 1712   <--- OSDI perf was 1817
2018-10-27 21:25:02 -07:00
Robert Nishihara
658c14282c Remove legacy Ray code. (#3121)
* Remove legacy Ray code.

* Fix cmake and simplify monitor.

* Fix linting

* Updates

* Fix

* Implement some methods.

* Remove more plasma manager references.

* Fix

* Linting

* Fix

* Fix

* Make sure class IDs are strings.

* Some path fixes

* Fix

* Path fixes and update arrow

* Fixes.

* linting

* Fixes

* Java fixes

* Some java fixes

* TaskLanguage -> Language

* Minor

* Fix python test and remove unused method signature.

* Fix java tests

* Fix jenkins tests

* Remove commented out code.
2018-10-26 13:36:58 -07:00
Richard Liaw
eff7cb4458 [tune] Fix SearchAlg finishing early (#3081)
* Fix trial search alg finishing early

* Fix lint

* fix lint

* nit fix
2018-10-22 12:17:13 -07:00
Eric Liang
59901a88a0
[rllib] Native support for Dict and Tuple spaces; fix Tuple action spaces; add prev a, r to LSTM (#3051) 2018-10-20 15:21:22 -07:00
Robert Nishihara
9a2b5333ef Add links for latest Python 3.7 wheels to documentation. (#3091) 2018-10-19 12:15:22 -07:00
Peter Schafhalter
a41bbc10ef Add password authentication to Redis ports (#2952)
* Implement Redis authentication

* Throw exception for legacy Ray

* Add test

* Formatting

* Fix bugs in CLI

* Fix bugs in Raylet

* Move default password to constants.h

* Use pytest.fixture

* Fix bug

* Authenticate using formatted strings

* Add missing passwords

* Add test

* Improve authentication of async contexts

* Disable Redis authentication for credis

* Update test for credis

* Fix rebase artifacts

* Fix formatting

* Add workaround for issue #3045

* Increase timeout for test

* Improve C++ readability

* Fixes for CLI

* Add security docs

* Address comments

* Address comments

* Adress comments

* Use ray.get

* Fix lint
2018-10-16 22:48:30 -07:00
Eric Liang
a9e454f6fd
[rllib] Include config dicts in the sphinx docs (#3064) 2018-10-16 15:55:11 -07:00
Eric Liang
3c891c6ece
[rllib] Parallel-data loading and multi-gpu support for IMPALA (#2766) 2018-10-15 11:02:50 -07:00
Richard Liaw
f9b58d7b02
[tune] Tweaks to Trainable and Verbosity (#2889) 2018-10-11 23:42:13 -07:00
Robert Nishihara
d73ee36e60 Update links to use latest 0.5.3 wheels instead of 0.5.2. (#3018) 2018-10-03 13:43:40 -07:00
Si-Yuan
cc7e2ecdd5 Change logfile names and also allow plasma store socket to be passed in. (#2862) 2018-10-03 10:03:53 -07:00
Eric Liang
b45bed4bce
[rllib] Propagate model options correctly in ARS / ES, to action dist of PPO (#2974)
* fix

* fix

* fix it

* propagate conf to action dist

* move carla example too

* rr

* Update policies.py

* wip

* lint
2018-10-01 12:49:39 -07:00
Eric Liang
814c35b7d7
[rllib] Simplify sample batch size and num envs config, n_step adjustment (#2995)
* simplify vec batch requirements

* Update rllib-training.rst

* Update rllib-training.rst

* Update rllib-training.rst

* Update rllib-training.rst

* Update rllib-training.rst

* Update rllib-models.rst
2018-09-30 18:36:22 -07:00
Eric Liang
b06c604a51
[rllib] Add some more tuned atari results to documentation (#2991)
* dqn results ++

* add scale

* hour

* fix

* small dqn table

* update

* steps

* upd

* apex

* up

* add apex results

* tip
2018-09-29 23:13:36 -07:00
Richard Liaw
1c9617bc1c
[autoscaler] Add tmux support for attach and exec (#2907)
Adds a tmux flag that can be used to support background execution of experiments. Cannot be used together with screen. Seems to be useful feature that has shown up with different users.
2018-09-26 23:22:45 -07:00
Eric Liang
3cde5957b3
[rllib] Better document APIs to access policy state (#2932)
* fix

* doc

* example

* up
2018-09-24 19:08:32 -07:00
Robert Nishihara
ea9d1cc887 Remove dependence on psutil. Add utility functions for getting system memory. (#2892) 2018-09-18 15:03:29 +08:00
Joerg Schad
a1b8e79c30 Fixed Typo. (#2865) 2018-09-13 13:32:56 +08:00
Robert Nishihara
3f6ed537a4 Add ray.is_initialized() function. (#2818)
* Add ray.is_initialized() function.

* Add assert.
2018-09-06 21:20:59 -07:00
Eric Liang
995ac24a2c
[rllib] clarify train batch size for PPO (#2793)
It's possible to configure PPO in a way that ends up discarding most of the samples (they are treated as "stragglers"). Add a warning when this happens, and raise an exception if the waste is particularly egregious.
2018-09-05 12:06:13 -07:00
Eric Liang
df4788e501
[rllib/tune] Add test for fractional gpu support in xray mode; add rllib support for fractional gpu (#2768)
* frac gpu

* doc

* Update rllib-training.rst

* yapf

* remove xray
2018-09-03 11:12:23 -07:00
Philipp Moritz
4db196438b fix 'from ray.rllib import ppo' in doc (#2794) 2018-08-31 23:34:47 -07:00
Robert Nishihara
5021795190 Update documents to replace 0.5.0 with 0.5.2. (#2761)
* Update documents to replace 0.5.0 with 0.5.1.

* Update documentation from 0.5.1 -> 0.5.2.
2018-08-29 21:05:09 -07:00
Praveen Palanisamy
357c0d6156 [tune] Adds option to checkpoint at end of trials (#2754)
* Added checkpoint_at_end option. To fix #2740

* Added ability to checkpoint at the end of trials if the option is set to True

* checkpoint_at_end option added; Consistent with Experience and Trial runner

* checkpoint_at_end option mentioned in the tune usage guide

* Moved the redundant checkpoint criteria check out of the if-elif

* Added note that checkpoint_at_end is enabled only when checkpoint_freq is not 0

* Added test case for checkpoint_at_end

* Made checkpoint_at_end have an effect regardless of checkpoint_freq

* Removed comment from the test case

* Fixed the indentation

* Fixed pep8 E231

* Handled cases when trainable does not have _save implemented

* Constrained test case to a particular exp using the MockAgent

* Revert "Constrained test case to a particular exp using the MockAgent"

This reverts commit e965a9358ec7859b99a3aabb681286d6ba3c3906.

* Revert "Handled cases when trainable does not have _save implemented"

This reverts commit 0f5382f996ff0cbf3d054742db866c33494d173a.

* Simpler test case for checkpoint_at_end

* Preserved bools from loosing their actual value

* Revert "Moved the redundant checkpoint criteria check out of the if-elif"

This reverts commit 783005122902240b0ee177e9e206e397356af9c5.

* Fix linting error.
2018-08-29 13:14:17 -07:00
Eric Liang
69d1354016
[rllib] Document ARS & rainbow (#2744)
* wip

* rainbow doc too

* e not used

* fix ppo doc

* clean list

* use same title
2018-08-28 18:13:36 -07:00
Robert Nishihara
5fd44afb8a Add note about huge pages using up memory. (#2733)
* Add note about huge pages using up memory.

* Update doc

* Update
2018-08-24 17:02:54 -07:00