Commit graph

1494 commits

Author SHA1 Message Date
Philipp Moritz
c1e8f9477a add Ray paper (#1387) 2018-01-02 16:33:07 -08:00
Eric Liang
1bc55e182d Update the pip wheel in example.yaml and add docs (#1381) 2018-01-01 13:02:05 -08:00
Eric Liang
6e6674a824
[rllib] Split docs into user and development guide (#1377)
* docs

* Update README.rst

* Sat Dec 30 15:23:49 PST 2017

* comments

* Sun Dec 31 23:33:30 PST 2017

* Sun Dec 31 23:33:38 PST 2017

* Sun Dec 31 23:37:46 PST 2017

* Sun Dec 31 23:39:28 PST 2017

* Sun Dec 31 23:43:05 PST 2017

* Sun Dec 31 23:51:55 PST 2017

* Sun Dec 31 23:52:51 PST 2017
2018-01-01 11:10:44 -08:00
Eric Liang
b6c42f96be
Auto-scale ray clusters based on GCS load metrics (#1348)
This adds (experimental) auto-scaling support for Ray clusters based on GCS load metrics. The auto-scaling algorithm is as follows:

Based on current (instantaneous) load information, we compute the approximate number of "used workers". This is based on the bottleneck resource, e.g. if 8/8 GPUs are used in a 8-node cluster but all the CPUs are idle, the number of used nodes is still counted as 8. This number can also be fractional.
We scale that number by 1 / target_utilization_fraction and round up to determine the target cluster size (subject to the max_workers constraint). The autoscaler control loop takes care of launching new nodes until the target cluster size is met.
When a node is idle for more than idle_timeout_minutes, we remove it from the cluster if that would not drop the cluster size below min_workers.
Note that we'll need to update the wheel in the example yaml file after this PR is merged.
2017-12-31 14:39:57 -08:00
Robert Nishihara
e970e24ea5 Update arrow, and pass memcopy_threads into put. (#1374) 2017-12-31 13:32:06 -08:00
Richard Liaw
3304099cc4
[rllib] Evaluators and Optimizers Refactoring (#1339) 2017-12-30 00:24:54 -08:00
Eric Liang
22c7c87e14 [rllib] [tune] Custom preprocessors and models, various fixes (#1372) 2017-12-28 13:19:04 -08:00
Philipp Moritz
3d224c4edf Second Part of Internal API Refactor (#1326) 2017-12-26 16:22:04 -08:00
Richard Liaw
4bb5b6bd5b [rllib] A3C Configurations (#1370)
* initial introduction of a3c configs

* fix sample batch

* flake but need to check save

* save,resotre

* fix

* pickles

* entropy

* fix

* moving ppo

* results

* jenkins
2017-12-24 12:25:13 -08:00
Richard Liaw
b217a5ef14
[rllib] Fix Pong-PPO tuned example Config (#1369) 2017-12-23 01:36:33 -08:00
Eric Liang
715737cc06 [docs] Add backlinks from hyperopt / rl algorithm examples to the built-on Ray libraries (#1356) 2017-12-23 00:31:33 -08:00
Eric Liang
43e78217f8 Thu Dec 21 23:19:24 PST 2017 (#1367) 2017-12-22 17:29:45 -08:00
Robert Nishihara
22460ff7af Use Anaconda for autoscaling example and add example config for devel… (#1361)
* Use Anaconda for autoscaling example and add example config for development.

* Install Python2 for building the web ui.
2017-12-22 01:59:02 -08:00
Eric Liang
0ae660ce4e [carla] In carla example, save all images and measurements to local disk (#1350)
* revamp saving

* smaller jpgs

* hide verbose

* Tue Dec 19 22:25:01 PST 2017

* make sure temp dirs sort lexiographically

* save total reward too

* zero pad i

* 160x160 dqn

* ever higher res dqn
2017-12-21 15:19:55 -08:00
Philipp Moritz
3a301c3d56 Fix pyarrow version check (#1360) 2017-12-21 13:00:36 -08:00
Melih Elibol
4a2d62e7ef fix thirdparty install bug. (#1354) 2017-12-20 23:08:53 -08:00
Eric Liang
fa3a41366c
[minor] Use a better timeline pic in the documentation 2017-12-20 12:54:25 -08:00
Devin Petersohn
a75a473d7f Add a distributed Dataframe API to Ray (#1330)
* Adding dataframe object and minor APIs

* Adding reduce functionality

* Adding some print and making reduce work on current Ray

* Cleanup

* Added new functionality and docs.

* Adding more functionality.

* New functionality with older cleanup

* Complying with flake8 formatting

* Added tests and addressed reviewer comments

* Complying with flake8.

* Adding pandas to travis and requirements doc

* Fixing flake8 failures

* Fixing flake8 errors from imports

* Fixing import error

* Fixing import errors

* Addressing reviewer comments

* Addressing lint error
2017-12-20 09:31:22 -08:00
Philipp Moritz
3c4408cf51 Rebase Ray on Arrow 0.8 (#1323)
* rebase Ray on Arrow 0.8

* rebase on apache repo
2017-12-19 14:24:21 -08:00
Cathy Wu
772527caa4 [rllib] Support 1-dimensional action spaces (PPO) (#1347)
* Small fix for supporting custom preprocessors

* PEP8

* Remove squeeze from actions
2017-12-19 14:17:06 -08:00
Eric Liang
6724f57b03 [Examples] Add Carla test env (#1343)
* add carla example

* add reward

* set obs

* Sun Dec 17 16:06:00 PST 2017

* add spec

* fix measurement

* add train script

* resize to 80x80

* null

* initial small training run

* robustify env, clean up action space

* clean up vars

* switch to town2 which is faster

* tunify train.py

* add discrete mode

* update

* fix excessive brakinG

* fix the weather

* rename

* redirect output and from future import

* doc

* update

* fix rebase

* allow dqn gpu growht

* adjust dqn hyperparams

* better ppo parameters
2017-12-19 12:57:58 -08:00
Melih Elibol
24b93b1123 fixes default type for product of empty shape. (#1341) 2017-12-18 17:41:44 -08:00
Eric Liang
47b1f02d3e [rllib] Pull out multi-gpu optimizer as a generic class (#1313) 2017-12-17 15:59:57 -08:00
Cathy Wu
53e736fe01 [rllib] Small fix for supporting custom preprocessors (#1334)
* Small fix for supporting custom preprocessors

* PEP8

* fix test
2017-12-17 04:37:29 -08:00
Eric Liang
bab44837e0
[tune] Tensorboard logger incorrectly reports training iteration as cur timestep value 2017-12-16 23:30:15 -08:00
Eric Liang
d21ea0ca45 Switch EC2 example config to use AWS deep learning AMI + latest Ray wheel (#1331)
* update

* install --user
2017-12-16 17:39:46 -08:00
Eric Liang
f5ea44338e EC2 cluster setup scripts and initial version of auto-scaler (#1311) 2017-12-15 23:56:39 -08:00
Robert Nishihara
76b6b4a2d3 When killing worker, release resources before dispatching tasks. (#1327) 2017-12-15 18:12:03 -08:00
Eric Liang
fbf1806b8a
[tune] Clean up result logging: move out of /tmp, add timestamp (#1297) 2017-12-15 14:19:08 -08:00
Stephanie Wang
12fdb3f53a Convert actor dummy objects to task execution edges. (#1281)
* Define execution dependencies flatbuffer and add to Redis commands

* Convert TaskSpec to TaskExecutionSpec

* Add execution dependencies to Python bindings

* Submitting actor tasks uses execution dependency API instead of dummy argument

* Fix dependency getters and some cleanup for fetching missing dependencies

* C++ convention

* Make TaskExecutionSpec a C++ class

* Convert local scheduler to use TaskExecutionSpec class

* Convert some pointers to references

* Finish conversion to TaskExecutionSpec class

* fix

* Fix

* Fix memory errors?

* Cast flatbuffers GetSize to size_t

* Fixes

* add more retries in global scheduler unit test

* fix linting and cast fbb.GetSize to size_t

* Style and doc

* Fix linting and simplify from_flatbuf.
2017-12-14 20:47:54 -08:00
Philipp Moritz
cac5f47600 First Part of Internal Ray API Refactor (#1173)
* add Ray status class

* add C++ util files

* add ID types

* more APIs

* build system integration

* add test infrastructure and implement some APIs

* add more tests

* fix bugs

* add task table tests

* update

* add toolchain file

* fix

* test

* link with pthread

* update

* fix

* more fixes

* fixes

* always vendor gtest and gflags

* linting

* fixes

* add constants file

* comments

* more fixes

* fix linting
2017-12-14 14:54:09 -08:00
Richard Liaw
c5c83a4465
[rllib] PPO and A3C unification (#1253) 2017-12-14 01:08:23 -08:00
Robert Nishihara
2f750e9ba7 Add parentheses around one-line if statement. (#1318) 2017-12-13 23:48:53 -08:00
Robert Nishihara
60d4f92d43 Add --user to instructions for building ray from source. (#1319) 2017-12-13 23:48:03 -08:00
Richard Liaw
cabbd27c56
[rllib] Support Nested Configuration Merging (#1268) 2017-12-13 14:39:01 -08:00
Robert Nishihara
f75b51d178 Register Common.error with local scheduler extension module. (#1316)
* Register Common.error with local scheduler extension module.

* Add test.
2017-12-13 11:55:54 -08:00
Richard Liaw
b6a35e0395 [rllib] Introduce pip install rllib (#1310)
* update setup

* more dependencies
2017-12-12 13:58:28 -08:00
Robert Nishihara
b1d89026cd Make ActorMethod fields private to fix tab completion. (#1312) 2017-12-12 10:07:33 -08:00
Peter Schafhalter
20d6b74aa6 [rllib] Added evaluation script to RLLib (#1295) 2017-12-11 11:59:44 -08:00
Robert Nishihara
96c46d35ff Tell Ray how to serialize FunctionSignature objects. (#1308) 2017-12-10 22:40:28 -08:00
Eric Liang
7009538321 Autodetect the number of GPUs when starting Ray. (#1293)
* autodetect

* Wed Dec  6 12:46:52 PST 2017

* Wed Dec  6 12:47:54 PST 2017

* Move GPU autodetection into services.py.

* Fix capitalization of Nvidia.

* Update documentation.
2017-12-09 15:30:16 -08:00
Robert Nishihara
6aae9a12fb Improve version checking at startup. (#1307)
* Check pyarrow version at startup.

* For version check, use absolute path to ray module.
2017-12-09 14:20:56 -08:00
Robert Nishihara
96463c680c Allow actor methods to return multiple object IDs. (#1296)
* Allow actor methods to return multiple object IDs.

* Add test.

* Fixes

* Remove outdated comment.

* Add comment and assert
2017-12-09 10:37:57 -08:00
Zongheng Yang
7e4a28f933 [rllib] Add tuned_examples/pong-ppo.yaml (#1302)
* Add tuned_examples/pong-ppo.yaml: 21 rew in ~3380sec

* Header comments
2017-12-09 01:20:22 -08:00
John Schulman
2606001a36 allow users to disable the webui (#1306)
* allow users to disable the webui

* Remove trailing whitespace.
2017-12-09 00:35:55 -08:00
Stephanie Wang
bac39a134e
Define a wrapper class for callback_data.data (#1301) 2017-12-08 11:48:21 -08:00
Robert Nishihara
5adbdfecd0 Raise exception if pyarrow is imported before ray. (#1283)
* Raise exception if pyarrow is imported before ray.

* Pip install pyarrow when building doc so we don't have to mock it.

* Raise ImportError instead of Exception.
2017-12-08 03:34:54 -08:00
Richard Liaw
2e0eb0e4c7
[rllib] Adding dependencies (#1298) 2017-12-08 01:57:19 -08:00
Philipp Moritz
26125e1547 Fixing the jenkins tests (#1299)
* trying to fix jenkins tests

* comment out more tests

* remove pytorch stuff

* use non-monotonic clock (monotonic not supported on python 2.7)

* whitespace
2017-12-07 17:03:58 -08:00
Eric Liang
35f7398666
[rllib] Update RLlib docs and README (#1288)
Updates the rllib docs and README.
2017-12-06 18:17:51 -08:00