Commit graph

282 commits

Author SHA1 Message Date
Robert Nishihara
5fd44afb8a Add note about huge pages using up memory. (#2733)
* Add note about huge pages using up memory.

* Update doc

* Update
2018-08-24 17:02:54 -07:00
Michael Tu
d16b6f6a32 [tune] Rename 'repeat' to 'num_samples' (#2698)
Deprecates the `repeat` argument and introduces `num_samples`. Also updates docs accordingly.
2018-08-24 15:05:24 -07:00
Eric Liang
aa014af85b
[rllib] Fix atari reward calculations, add LR annealing, explained var stat for A2C / impala (#2700)
Changes needed to reproduce Atari plots in IMPALA / A2C: https://github.com/ray-project/rl-experiments
2018-08-23 17:49:10 -07:00
Eric Liang
fbe6c59f72
[rllib] Misc fixes, A2C (#2679)
A bunch of minor rllib fixes:

pull in latest baselines atari wrapper changes (and use deepmind wrapper by default)
move reward clipping to policy evaluator
add a2c variant of a3c
reduce vision network fc layer size to 256 units
switch to 84x84 images
doc tweaks
print timesteps in tune status
2018-08-20 15:28:03 -07:00
Eric Liang
9473da69bd
[autoscaler] Experimental support for local / on-prem clusters (#2678)
This adds some experimental (undocumented) support for launching Ray on existing nodes. You have to provide the head ip, and the list of worker ips.

There are also a couple additional utils added for rsyncing files and port-forward.
2018-08-19 12:43:04 -07:00
Richard Liaw
62d0698097
[tune] Tune Facelift (#2472)
This PR introduces the following changes:

 * Ray Tune -> Tune 
 * [breaking] Creation of `schedulers/`, moving PBT, HyperBand into a submodule
 * [breaking] Search Algorithms now must take in experiment configurations via `add_configurations` rather through initialization
 * Support `"run": (function | class | str)` with automatic registering of trainable
 * Documentation Changes
2018-08-19 11:00:55 -07:00
Eric Liang
5f430da180
[rllib] Provide internal access to episode state in compute_actions() and allow returning extra batches (#2559)
The goal of this PR is to allow custom policies to perform model-based rollouts. In the multi-agent setting, this requires access to not only policies of other agents, but also their current observations.
Also, you might want to return the model-based trajectories as part of the rollout for efficiency.

  compute_actions() now takes a new keyword arg episodes
  pull out internal episode class into a top-level file
  add function to return extra trajectories from an episode that will be appended to the sample batch
  documentation
2018-08-16 14:37:21 -07:00
Eric Liang
079c4e482a
ray exec and ray attach commands (#2560)
ray exec CLUSTER CMD [--screen] [--start] [--stop]
ray attach CLUSTER [--start]

Example:
ray exec sgd.yaml 'source activate tensorflow_p27 && cd ~/ray/python/ray/rllib && ./train.py --run=PPO --env=CartPole-v0' --screen --start --stop

This will in one command create a cluster and run the command on it in a screen session. The screen can later be attached to via ray attach. After the command finishes, the cluster workers will be terminated and the head node stopped.
2018-08-15 14:31:50 -07:00
Eric Liang
53f9755594
[rllib] Fix support for mixed discrete and continuous action spaces, add to regression test (#2655)
* fix

* lint

* fix
2018-08-15 10:19:41 -07:00
Jones Wong
007208d2bb Support older version TF and Support RMSProp in Impala (#2590)
to support TF version < 1.5
to support rmsprop optimizer in Impala

Before TF1.5, tf.reduce_sum() and tf.reduce_max() has an argument keep_dims which has been renamed as keepdims in later versions.

In the original paper of Impala, they use rmsprop algorithm to optimize the model. We'd better also support it so that users can reproduce their experiments. Without any tuning, say that using the same hyper-parameters as AdamOptimizer, it reaches "episode_reward_mean": 19.083333333333332 in Pong after consume 3,610,350 samples.
2018-08-09 19:51:32 -07:00
Melih Elibol
8ae82180b4 [xray] Adds a driver table. (#2289)
This PR adds a driver table for the new GCS, which enables cleanup functionality associated with monitoring driver death.

Some testing in `monitor_test.py` is restored, but redis sharding for xray is needed to enable remaining tests.
2018-08-08 23:41:40 -07:00
Eric Liang
64053278aa
[tune] Support lambda functions in hyperparameters / tune rllib multiagent support (#2568)
* update

* func

* Update registry.py

* revert
2018-08-07 16:29:21 -07:00
Richard Liaw
bb44456f6f
[rllib, tune] TrainingResult -> Dict, Removes C408 from flake8 (#2565) 2018-08-07 12:17:44 -07:00
Eric Liang
981d9818c1
[rllib] Support the timesteps_per_batch in simple optimizer PPO mode (#2558)
* support ts

* doc

* Update sync_samples_optimizer.py
2018-08-06 12:10:59 -07:00
Mitar
9015e742c4 Update installation instructions with psmisc to enable 'ray stop' (#2550) 2018-08-05 23:58:58 -07:00
Richard Liaw
914a433e3f
[tune] Split Search from Scheduling (#2452)
Introduces SearchAlgorithm concept, separate from schedulers in Tune. Moves HyperOpt under this concept.
2018-08-04 21:27:39 -07:00
Eric Liang
9449d07eca
[rllib] Fix crash when setting horizon in multiagent
If a horizon is set, an env terminates without done=True.
2018-08-03 16:37:56 -07:00
Eric Liang
f7ec292360
[rllib] Support agent.get_action in multiagent (#2543)
* support get action on policy id

* comment

* grammar fixes

* Update rllib-algorithms.rst
2018-08-02 13:35:53 -07:00
Eric Liang
9ea57c2a93
[rllib] Basic IMPALA implementation (using deepmind's reference vtrace.py) (#2504)
Rename AsyncSamplesOptimizer -> AsyncReplayOptimizer
  Add AsyncSamplesOptimizer that implements the IMPALA architecture
  integrate V-trace with a3c policy graph
  audit V-trace integration
  benchmark compare vs A3C and with V-trace on/off
PongNoFrameskip-v4 on IMPALA scaling from 16 to 128 workers, solving Pong in <10 min. For reference, solving this env takes ~40 minutes for Ape-X and several hours for A3C.
2018-08-01 20:53:53 -07:00
Eric Liang
9a479b3a63
[rllib] Document creating an ensemble of envs; also add vector_index attribute to env config (#2513)
This also removes the async resetting code in VectorEnv. While that improves benchmark performance slightly, it substantially complicates env configuration and probably isn't worth it for most envs.

This makes it easy to efficiently support setups like Joint PPO: https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/retro-contest/gotta_learn_fast_report.pdf

For example, for 188 envs, you could do something like num_envs: 10, num_envs_per_worker: 19.
2018-08-01 16:29:27 -07:00
Eric Liang
d9a36c4e39
[rllib] Document auto-concat in a3c (#2533)
* docs

* update hyperparm docs
2018-08-01 15:11:30 -07:00
Sergey Kolesnikov
05490b8cb9 [rllib] dqn/ddpg policy customization (#2445)
* dqn policy update - more customization

* docs for custom DQN graph

* Update rllib-training.rst

* Update rllib-models.rst

* Update rllib.rst

* Update rllib-training.rst

* Update rllib-concepts.rst

* yapf codestyle
2018-07-22 14:47:14 -07:00
Eric Liang
68660453e4
[rllib] Better support and add two-trainer example for multiagent (#2443)
This adds a simple DQN+PPO example for multi-agent. We don't do anything fancy here, just syncing weights between two separate trainers. This potentially is wasting some compute, but is very simple to set up.

It might be nice to share experience collection between the top-level trainers in the future.
2018-07-22 05:09:25 -07:00
Robert Nishihara
4b6157ed09 Remove link to install Linux Python 3.3 wheel. (#2434) 2018-07-20 15:15:43 -07:00
Richard Liaw
8e8c733696
[tune] Fix Categorical Space + Add Keras Example (#2401)
Previously did not properly resolve categorical variables for HyperOpt.
2018-07-17 23:52:52 +02:00
Crystal
ebf4070d88 Documentation- Basic Profiling for Ray Users (#2326)
* Ray documentation - created new section 'Profiling for Ray Users', opposed to current Profiling section for Ray developers. Completed three sections 'A Basic Profiling Example', 'Timing Performance Using Python's Timestamps', and 'Profiling Using An External Profiler (Line_Profiler).' Left to-do two sections on CProfile and Ray Timeline Visualization.'

* Ray documentation - Fixed rst codeblock linebreaks in 'User Profiling'

* Ray documentation - For User Profiling, added section on cProfile

* Ray documentation - For User Profiling, completed Ray Timeline Visualization section, including graphical images

* Ray documentation - made User Profiling timeline image larger, minor wording edits

* Ray documentation - minor wording edits to User Profiling

* Ray documentation - User Profiling- fixed broken link

* Minor wording changes requested by Philipp Moritz addressed. Still need to address (1) compressing the image files, (2) correcting ex 3 to not be remote, and (3) using cProfile on an actor

* Ray documentation - For user-profiling.rst, revised example 3 to show a semi-parallelized example. Compressed timeline example image to be under 50 KB, removed view timeline GUI image. Updated timeline example image to reflect revised example 3. cProfile actor example left

* Ray documentation - in user-profiling.rst, added a new example including actors in the cProfile section

* Ray documentation - For user-profiling.rst, added section header for the Ray actor cProfile example

* Update user-profiling.rst

* Update user-profiling.rst

* 4 space indentation

* Update user-profiling.rst

* Update user-profiling.rst

* Update user-profiling.rst

* corrections
2018-07-12 16:57:39 -07:00
Robert Nishihara
515da7721a Change ray.worker.cleanup -> ray.shutdown and improve API documentation. (#2374)
* Change ray.worker.cleanup -> ray.shutdown and improve API documentation.

* Deprecate ray.worker.cleanup() gracefully.

* Fix linting
2018-07-12 12:00:00 -07:00
Eric Liang
b316afeb43 [rllib] Add debug info back to PPO and fix optimizer compatibility (#2366) 2018-07-12 19:22:46 +02:00
Eric Liang
4ef9d15315
[rllib] Add concepts section of docs (#2373)
This fills in the rllib concepts documentation.
2018-07-08 18:46:52 -07:00
Robert Nishihara
35f4a3070c Update 0.4.0 to 0.5.0 in autoscaler and installation examples. (#2352) 2018-07-07 14:34:20 -07:00
Eric Liang
d24f19fd1e
[rllib] Fix stats collection and some docs bugs since the refactoring (#2361)
* fix

* fix pbt example

* fix

* fix

* single thread by default

* vec

* fix

* fix
2018-07-07 13:29:20 -07:00
Devin Petersohn
4185aaed10 Dataframe deprecation (#2353) 2018-07-06 00:16:22 -07:00
Zongheng Yang
23a98a223f Doc: redis memory management / automatic flushing. (#2344)
* Doc: redis memory management / automatic flushing.

* Address comments

* Update redis-memory-management.rst

* Change cross ref style
2018-07-05 23:44:37 -07:00
Robert Nishihara
b90e551b41 [xray] Implement timeline and profiling API. (#2306)
* Add profile table and store profiling information there.

* Code for dumping timeline.

* Improve color scheme.

* Push timeline events on driver only for raylet.

* Improvements to profiling and timeline visualization

* Some linting

* Small fix.

* Linting

* Propagate node IP address through profiling events.

* Fix test.

* object_id.hex() should return byte string in python 2.

* Include gcs.fbs in node_manager.fbs.

* Remove flatbuffer definition duplication.

* Decode to unicode in Python 3 and bytes in Python 2.

* Minor

* Submit profile events in a batch. Revert some CMake changes.

* Fix

* Workaround test failure.

* Fix linting

* Linting

* Don't return anything from chrome_tracing_dump when filename is provided.

* Remove some redundancy from profile table.

* Linting

* Move TODOs out of docstring.

* Minor
2018-07-04 23:23:48 -07:00
Eric Liang
8aa56c12e6
[rllib] Document "v2" APIs (#2316)
* re

* wip

* wip

* a3c working

* torch support

* pg works

* lint

* rm v2

* consumer id

* clean up pg

* clean up more

* fix python 2.7

* tf session management

* docs

* dqn wip

* fix compile

* dqn

* apex runs

* up

* impotrs

* ddpg

* quotes

* fix tests

* fix last r

* fix tests

* lint

* pass checkpoint restore

* kwar

* nits

* policy graph

* fix yapf

* com

* class

* pyt

* vectorization

* update

* test cpe

* unit test

* fix ddpg2

* changes

* wip

* args

* faster test

* common

* fix

* add alg option

* batch mode and policy serving

* multi serving test

* todo

* wip

* serving test

* doc async env

* num envs

* comments

* thread

* remove init hook

* update

* fix ppo

* comments1

* fix

* updates

* add jenkins tests

* fix

* fix pytorch

* fix

* fixes

* fix a3c policy

* fix squeeze

* fix trunc on apex

* fix squeezing for real

* update

* remove horizon test for now

* multiagent wip

* update

* fix race condition

* fix ma

* t

* doc

* st

* wip

* example

* wip

* working

* cartpole

* wip

* batch wip

* fix bug

* make other_batches None default

* working

* debug

* nit

* warn

* comments

* fix ppo

* fix obs filter

* update

* wip

* tf

* update

* fix

* cleanup

* cleanup

* spacing

* model

* fix

* dqn

* fix ddpg

* doc

* keep names

* update

* fix

* com

* docs

* clarify model outputs

* Update torch_policy_graph.py

* fix obs filter

* pass thru worker index

* fix

* rename

* vlad torch comments

* fix log action

* debug name

* fix lstm

* remove unused ddpg net

* remove conv net

* revert lstm

* wip

* wip

* cast

* wip

* works

* fix a3c

* works

* lstm util test

* doc

* clean up

* update

* fix lstm check

* move to end

* fix sphinx

* fix cmd

* remove bad doc

* envs

* vec

* doc prep

* models

* rl

* alg

* up

* clarify

* copy

* async sa

* fix

* comments

* fix a3c conf

* tune lstm

* fix reshape

* fix

* back to 16

* tuned a3c update

* update

* tuned

* optional

* merge

* wip

* fix up

* move pg class

* rename env

* wip

* update

* tip

* alg

* readme

* fix catalog

* readme

* doc

* context

* remove prep

* comma

* add env

* link to paper

* paper

* update

* rnn

* update

* wip

* clean up ev creation

* fix

* fix

* fix

* fix lint

* up

* no comma

* ma

* Update run_multi_node_tests.sh

* fix

* sphinx is stupid

* sphinx is stupid

* clarify torch graph

* no horizon

* fix config

* sb

* Update test_optimizers.py
2018-07-01 00:05:08 -07:00
Eric Liang
1251abf0d1
[rllib] Modularize Torch and TF policy graphs (#2294)
* wip

* cls

* re

* wip

* wip

* a3c working

* torch support

* pg works

* lint

* rm v2

* consumer id

* clean up pg

* clean up more

* fix python 2.7

* tf session management

* docs

* dqn wip

* fix compile

* dqn

* apex runs

* up

* impotrs

* ddpg

* quotes

* fix tests

* fix last r

* fix tests

* lint

* pass checkpoint restore

* kwar

* nits

* policy graph

* fix yapf

* com

* class

* pyt

* vectorization

* update

* test cpe

* unit test

* fix ddpg2

* changes

* wip

* args

* faster test

* common

* fix

* add alg option

* batch mode and policy serving

* multi serving test

* todo

* wip

* serving test

* doc async env

* num envs

* comments

* thread

* remove init hook

* update

* fix ppo

* comments1

* fix

* updates

* add jenkins tests

* fix

* fix pytorch

* fix

* fixes

* fix a3c policy

* fix squeeze

* fix trunc on apex

* fix squeezing for real

* update

* remove horizon test for now

* multiagent wip

* update

* fix race condition

* fix ma

* t

* doc

* st

* wip

* example

* wip

* working

* cartpole

* wip

* batch wip

* fix bug

* make other_batches None default

* working

* debug

* nit

* warn

* comments

* fix ppo

* fix obs filter

* update

* wip

* tf

* update

* fix

* cleanup

* cleanup

* spacing

* model

* fix

* dqn

* fix ddpg

* doc

* keep names

* update

* fix

* com

* docs

* clarify model outputs

* Update torch_policy_graph.py

* fix obs filter

* pass thru worker index

* fix

* rename

* vlad torch comments

* fix log action

* debug name

* fix lstm

* remove unused ddpg net

* remove conv net

* revert lstm

* cast

* clean up

* fix lstm check

* move to end

* fix sphinx

* fix cmd

* remove bad doc

* clarify

* copy

* async sa

* fix
2018-06-26 13:17:15 -07:00
Eric Liang
9c3bab5c42
[tune] Support all serializable objects in config (#2287)
* wip

* order

* lint
2018-06-23 16:13:46 -07:00
Robert Nishihara
ff2217251f [xray] Add error table and push error messages to driver through node manager. (#2256)
* Fix documentation indentation.

* Add error table to GCS and push error messages through node manager.

* Add type to error data.

* Linting

* Fix failure_test bug.

* Linting.

* Enable one more test.

* Attempt to fix doc building.

* Restructuring

* Fixes

* More fixes.

* Move current_time_ms function into util.h.
2018-06-20 21:29:28 -07:00
Richard Liaw
4acb77a5c3
[tune] Update Trainable doc to expose interface (#2272) 2018-06-20 13:40:45 -07:00
Eric Liang
be178ae031 [autoscaler] GCP docs (#2235) 2018-06-12 12:40:12 -07:00
Richard Liaw
f19decb848
[docs] Update RLlib install to not include Tensorflow (#2178) 2018-06-10 10:29:12 -07:00
andrewztan
1475600c81 [rllib] Merge DDPG and DDPG2 implementations (#2202)
* removed ddpg2

* removed ddpg2 from codebase

* added tests used in ddpg vs ddpg2 comparison

* added notes about training timesteps to yaml files

* removed ddpg2 yaml files

* removed unnecessary configs from yaml files

* removed unnecessary configs from yaml files

* moved pendulum, mountaincarcontinuous, and halfcheetah tests to tuned_examples

* moved pendulum, mountaincarcontinuous, and halfcheetah tests to tuned_examples

* added more configuration details to yaml files

* removed random starts from halfcheetah
2018-06-09 16:46:23 -07:00
Eric Liang
32b9a4d3f1
Fix yapf excludes, print diff in --all mode (#2211)
* fix

* travis
2018-06-08 02:25:55 -07:00
Alok Singh
42a9233e1d Improve yapf speed and document its usage (#2160)
* Allow yapf to lint individual files

* Add tip for using yapf

* Update doc

* Update script to autoformat changed py files

The new default is for the script to only updated changed files to encourage
using it as a pre-push hook. Travis still checks all since it's not that big an
increase to runtime.

* Exclude formatting thirdparty/autogen py files

* Symlink .travis -> scripts

Hidden directories may get glossed over otherwise.

* .travis -> scripts in docs

They are symlinks to the same thing, but `scripts` is more dev-friendly, while
`.travis` is really only for Travis CI.

* Document different yapf format functions

Most devs will only need `format_changed`, and this is run by default.
`format_changed` should be fast enough in most cases to work as a pre-commit
hook.

* Speed up yapf by only formatting changed files

* Update docs

1. Mention how yapf can be used a pre-commit hook
2. rm `bash`, script is executable

* Update yapf.sh

* Update development.rst

* Update yapf.sh

* Use bash arrays for correct argument splitting

Playing fast and loose with whitespace in bash is a terrible idea.

* Only format non-excluded by default

* Check changes against master

Normally, the remote is called `origin`, but naming it explicit

* Adding missing directory to `format_all`

* Cleanup YAPF code

Remove unused function and move around code to make clearer and adding lines
give cleaner diffs.

* Ensure correct files are autoformatted

* Fix cmd line arg splitting

Each arg has to be in its own set of quotes.

* Diff against mergebase

TIL there's a clean syntax for doing that, but it's too clever to belong in a
shell script.

We use `mapfile -t` to ensure no problems down the line with weird filenames.
2018-06-05 20:22:11 -07:00
songqing
4dd4698564 unify build dir for Python and Java (#2171)
* unify build dir for Python and Java

* enable executables auto installed when just running 'make'

* fix plasma_store copy error

* fix cmake error about copying executables

* lint fix

* recover python/setup.py

* enable to copy optional file automatically

* a small fix of path

* lint fix

* lint fix

* lint fix

* Add comment.
2018-06-01 16:28:27 -07:00
Robert Nishihara
6172f94c04 Implement Python global state API for xray. (#2125)
* Implement global state API for xray.

* Fix object table.

* Fixes for log structure.

* Implement cluster_resources.

* Add driver task to task table.

* Remove python flatbuffers code

* Get some global state API tests running.

* Python linting.

* Fix linting.

* Fix mock modules for doc

* Copy over flatbuffer bindings.

* Fix for tests.

* Linting

* Fix monitor crash.
2018-05-29 16:25:54 -07:00
Robert Nishihara
dc03506108 Update resource documentation (remove outdated limitations). (#2022) 2018-05-25 22:19:47 -07:00
Eric Liang
f37e2e5d2f
[rllib] [doc] Broken link in ddpg doc 2018-05-20 00:10:59 -07:00
Ken Fehling
19b743c84b Fixed attribute name in code example (#2054)
hyperparam_mutations
2018-05-14 01:05:06 -07:00
Ken Fehling
4ff900e131 Added missing comma to code example (#2050) 2018-05-13 19:01:01 -07:00