Commit graph

849 commits

Author SHA1 Message Date
Eric Liang
bc5e259264
[rllib] Add a doc section on computing actions (#6326)
* options doc

* add note

* hint shr

* doc update
2019-12-03 00:10:50 -08:00
Shital Shah
670cb6374e Doc enhancement: use build.sh for ray, clarification on how rllib selects VisionNetwork, note on setup-dev.py for rllib. (#6092) 2019-12-02 22:19:01 -08:00
Richard Liaw
0b3d5d989b
[docs] Add public materials (#6331)
* startup

* update tune readme

* usingrah
2019-12-02 19:59:23 -08:00
Eric Liang
0b0a16982a [doc] Use .options() (#6323)
* options doc

* add note

* hint shr
2019-12-01 17:24:00 -08:00
Philipp Moritz
a4437813eb
[Projects] Unify hyphen vs underscore handling for arguments (#6208) 2019-11-20 23:52:41 -08:00
Richard Liaw
d3c7a8fda5
[docs] yarn update (#6173) 2019-11-19 16:15:08 -08:00
Yuhao Yang
d3ff2252c4 [doc] Fix link to getting involved 2019-11-18 12:59:14 -08:00
Eric Liang
8fc2272f43
[rllib] Reorganize trainer config, add warnings about high VF loss magnitude for PPO (#6181) 2019-11-18 10:39:07 -08:00
Ujval Misra
2965dc1b72 [tune] Fault tolerance improvements (#5877)
* Precede ray.get with ray.wait.

* Trigger checkpoint deletes locally in Trainable

* Clean-up code.

* Minor changes.

* Track best checkpoint so far again

* Pulled checkpoint GC out of Trainable.

* Added comments, error logging.

* Immediate pull after checkpoint taken; rsync source delete on pull

* Minor doc fixes

* Fix checkpoint manager bug

* Fix bugs, tests, formatting

* Fix bugs, feature flag for force sync.

* Fix test.

* Fix minor bugs: clear proc and less verbose sync_on_checkpoint warnings.

* Fix bug: update IP of last_result.

* Fixed message.

* Added a lot of logging.

* Changes to ray trial executor.

* More bug fixes (logging after failure), better logging.

* Fix richards bug and logging

* Add comments.

* try-except

* Fix heapq bug.

* .

* Move handling of no available trials to ray_trial_executor (#1)

* Fix formatting bug, lint.

* Addressed Richard's comments

* Revert tests.

* fix rebase

* Fix trial location reporting.

* Fix test

* Fix lint

* Rebase, use ray.get w/ timeout, lint.

* lint

* fix rebase

* Address richard's comments
2019-11-18 01:14:41 -08:00
Richard Liaw
62cbc043b4
[tune] tbx logger (#6133)
* tbx

* add_hparams

* fix_hparams

* ok

* ok

* fix

* ok

* fix
2019-11-15 08:45:44 -08:00
Edward Oakes
385783fcec
Ray on YARN + Skein Documentation (#6119) 2019-11-14 15:06:05 -08:00
Eric Liang
243b1b7281
[rllib] Add microbatch optimizer with A2C example (#6161) 2019-11-14 12:14:00 -08:00
Ujval Misra
e3e3ad4b25 Add timeout param to ray.get (#6107) 2019-11-14 00:50:04 -08:00
Eric Liang
e4565c9cc6
Reduce RLlib log verbosity (#6154) 2019-11-13 18:50:45 -08:00
Eric Liang
b924299833
Add large scale regression test for RLlib (#6093) 2019-11-13 12:22:55 -08:00
Edward Oakes
5780ec1b62
Refresh ObjectIDs in raylet for stopgap GC (#6109) 2019-11-10 23:12:59 -08:00
Adam Gleave
c157e93ba1 [tune] Retry failed tasks with checkpointing disabled (#6126)
* Allow recovery for failed tasks without checkpointing

* Update docs
2019-11-09 19:35:27 -08:00
Simon Mo
fcb6bdbc39
[Doc] Document Actor.options API (#6099)
* Document Actor.options API

* Undocument _remote
2019-11-06 23:12:23 -08:00
David Bignell
3f83b2daa9 [rllib] Rollout extensions (#6065)
* Rollout improvements

* Make info-saving optional, to avoid breaking change.

* Store generating ray version in checkpoint metadata

* Keep the linter happy

* Add small rollout test

* Terse.

* Update test_io.py
2019-11-05 20:34:18 -08:00
daiyaanarfeen
8f6d73a93a [sgd] Extend distributed pytorch functionality (#5675)
* raysgd

* apply fn

* double quotes

* removed duplicate TimerStat

* removed duplicate find_free_port

* imports in pytorch_trainer

* init doc

* ray.experimental

* remove resize example

* resnet example

* cifar

* Fix up after kwargs

* data_dir and dataloader_workers args

* formatting

* loss

* init

* update code

* lint

* smoketest

* better_configs

* fix

* fix

* fix

* train_loader

* fixdocs

* ok

* ok

* fix

* fix_update

* fix

* fix

* done

* fix

* fix

* fix

* small

* lint

* fix

* fix

* fix_test

* fix

* validate

* fix

* fi
2019-11-05 11:16:46 -08:00
Simon Mo
7f5b3502da
Implement Detached Actor (#6036)
* Arg propagation works

* Implement persistent actor

* Add doc

* Initialize is_persistent_

* Rename persistent->detached

* Address comment

* Make test passes

* Address comment

* Python2 compatiblity

* Fix naming, py2

* Lint
2019-11-01 10:28:23 -07:00
Simon Mo
56f3e96887
[Serve] Use ray's cloudpickle (#6051)
* Revert "Add cloudpickle as doc requirements (#6037)"

This reverts commit 03ce3b7c5b.

* Use ray's vendored cloudpickle
2019-10-30 15:21:09 -07:00
Simon Mo
03ce3b7c5b
Add cloudpickle as doc requirements (#6037) 2019-10-28 18:25:02 -07:00
Richard Liaw
085a6713a0
[docs] Add documentation for Dynamic Custom Resources (#6000) 2019-10-27 17:58:04 -07:00
Eric Liang
a0dcb45dc3
[rllib] Fix APEX priorities returning zero all the time (#5980)
* fix

* move example tests to end

* level err

* guard against none

* no trace test

* ignore thumbs

* np

* fix multi node

* fix
2019-10-26 13:23:42 -07:00
Edward Oakes
436dd936d2
Update profiling numbers (#5989) 2019-10-24 18:02:44 -07:00
Edward Oakes
c69e9aafdc
Update release doc (#5988)
* Update release doc

* Add comment about get_contributors.py
2019-10-24 11:13:37 -07:00
Leo Sklyut
832b5ce1f6 [docs] fix code block display (#5967) 2019-10-22 00:45:38 -07:00
Zhuohan Li
f286356e06 [docs] add pages about examples on training language models with fairseq (#5755)
* add pages about examples on training language models with fairseq and ray autoscaler

* better format

* update ray_train.sh

* Move EFS to the autoscaler file

* nits

* add comments to the code & use a new way to implement checkpoint hook

* small bug fix

* polish the doc

* fix formatting

* yaml

* update docs

* fix the bugs and add preprocess.sh

* fix lint

* Reduce batch size & fix lint

* shorttitle
2019-10-20 20:28:16 -07:00
Alexander Scammon
4d08d3c188 Add dependencies for dashboard to installation.rst (#5942)
Updating the docs to include pip installing `aiohttp` and `psutil`, both of which the dashboard requires.  Since the whole dashboard section is optional, I thought I'd just add it in the docs rather than make it an explicit requirement of the project.  Tell me if you'd prefer them as requirements in the `setup.py`, though.
2019-10-17 00:39:56 -07:00
Richard Liaw
d52a4983af
Update TF documentation (#5918) 2019-10-16 01:31:27 -07:00
Richard Liaw
9f23620412
[tune] tf2.0 mnist example (#5898)
* tfmnistexample

* tfmnist

* add_to_ci

* format

* exampledownlaod

* fix
2019-10-15 22:25:01 -07:00
Richard Liaw
7f4141df4e
[docs] Pictures for all the Examples (#5859)
* image

* plot resnet

* hyperparam

* fixup_pictures

* custom_direct
2019-10-14 14:18:52 -07:00
Edward Oakes
abbfe7392f
Bump dev version to 0.8.0.dev6 (#5906) 2019-10-14 11:36:13 +01:00
Richard Liaw
1650f7b174
[tune] Remove TF MNIST example + add TrialRunner hook to execut… (#5868)
* remove test

* add trial runner

* remvoerestore

* Remove other mnist examples

* tunetest

* revert

* v1

* Revert "v1"

This reverts commit c8bddaf2db7a8270c43c02021cac0e75df15ed20.

* Revert "revert"

This reverts commit b58f56884a0c288d3a6f997d149ab4d496ddd7a3.

* errors

* format
2019-10-13 20:33:56 -07:00
Richard Liaw
898652837c
[minor][docs] Remove example link (#5880) 2019-10-11 11:49:18 -07:00
Robert Nishihara
523c764c25
Python 2 compatibility. (#5887) 2019-10-10 19:09:25 -07:00
Richard Liaw
1181924077 [tune][minor] formatting examples, fix travis (#5869)
* formatting

* formatting
2019-10-08 17:58:43 -07:00
Ujval Misra
a851d7eb87 [tune] Readable trial progress output (#5822)
* Cleaner, tabulated progress output.

* Minor HTML changes, trial ID instead of name

* Revert basic variant changes

* Cleanup, address richard's comments, add progress_reporter.py

* Add tabulate dependency

* Added more info to table, auto-hide columns with no data.

* lint

* Address comments

* Replace experiment tag w/ trial ID

* Fixed tests.

* Fixed test

* Added requirement

* Fix formatting
2019-10-08 16:38:39 -07:00
zhu-eric
3845c97dd0 [doc] Hyperparameter Tuning Gallery Entry (#5786)
* mod_table

* Example fix for gallery

* lint

* nit

* nit

* fix

* gallery

* remove table for now

* training, object store, tune, actors, advanced

* start tf code

* first cut tf

* yapf

* pytorch

* add torch example

* torch

* parallel

* tune

* tuning

* reviewsready

* finetune

* fix

* move_code

* update conf

* compile

* init hyperparameter

* Start images

* overview

* extra

* fix

* works

* update-ps-example

* param_actor

* fix

* examples

* simple

* simplify_pong

* flake8 and run hyperopt

* add comments

* add comments

* add suggestion

* add suggestion

* suggestions

* add suggestion

* add suggestions

* fixed in wrong area

* last edit

* finish changes

* add line

* hyperparameter
2019-10-08 14:13:17 -07:00
Edward Oakes
486abedcdf
Link to kubernetes config files in docs (#5865) 2019-10-08 11:06:25 -07:00
Simon Mo
e8570874b6
[Serve] Implement flask_request and named python request (#5849)
* Implement flask_request and named python request

* Forgot to include missing files

* Address comment

* Add flask to requirements for doc (lint failed)

* Update doc requirement so lint will build

* Install flask in CI

* Fix typo in .travis.yml
2019-10-06 15:12:30 -07:00
Anthony Yu
b99cdf4e39 [tune] PBT + Memnn example (#5723)
* Add example file

* Move into train function

* Somewhat working example of MemNN, still has some failed trials

* Reorganize into a class

* Small fixes

* Iteration decrease and fix hyperparam_mutations

* Add example file

* Move into train function

* Somewhat working example of MemNN, still has some failed trials

* Reorganize into a class

* Small fixes

* Iteration decrease and fix hyperparam_mutations

* Some style edits

* Address PR changes without modifying learning rate

* Add configs and hyperparameter mutations

* Add tune test

* Modify import locations

* Some parameter changes for testing

* Update memnn example

* Add tensorboard support and address PR comment

* Final changes

* lint

* generator
2019-10-05 09:22:37 -07:00
Edward Oakes
8ca7fab581
Improve manual Kubernetes deployment documentation (#5582)
* Add ray-cluster, modify submit

* Add comments

* Job submission working

* Write docs

* Add link to autoscaling

* Fix wget link in job

* Use namespace file

* match tense

* fix tab

* Improve job documentation

* comments

* Fix link

* Fix links

* comments

* add overview paragraph

* Update imagePullPolicy

* Warning if no cluster running

* better check
2019-10-03 15:47:49 -07:00
Simon Mo
fa1214c44a
[Serve] First iteration of the serve doc (#5834)
* Address comments

* Lint

* Add py3 warning
2019-10-03 15:14:09 -07:00
Philipp Moritz
0dee225ce1
Make it possible to run ray examples as projects (#5816) 2019-10-03 14:52:37 -07:00
Edward Oakes
972dddd776
[autoscaler] Kubernetes autoscaler backend (#5492)
* Add Kubernetes NodeProvider to autoscaler

* Split off SSHCommandRunner

* Add KubernetesCommandRunner

* Cleanup

* More config options

* Check if auth present

* More auth checks

* Better output

* Always bootstrap config

* All working

* Add k8s-rsync comment

* Clean up manual k8s examples

* Fix up submit.yaml

* Automatically configure permissisons

* Fix get_node_provider arg

* Fix permissions

* Fill in empty auth

* Remove ray-cluster from this PR

* No hard dep on kubernetes library

* Move permissions into autoscaler config

* lint

* Fix indentation

* namespace validation

* Use cluster name tag

* Remove kubernetes from setup.py

* Comment in example configs

* Same default autoscaling config as aws

* Add Kubernetes quickstart

* lint

* Revert changes to submit.yaml (other PR)

* Install kubernetes in travis

* address comments

* Improve autoscaling doc

* kubectl command in setup

* Force use_internal_ips

* comments

* backend env in docs

* Change namespace config

* comments

* comments

* Fix yaml test
2019-10-03 10:17:00 -07:00
Wenjie Wu
ccd88c9e20 [doc] fix typo in ASHA blog url (#5801)
this fix issue #5800
2019-09-29 17:41:18 -07:00
Eric Liang
b5da32df78 Bump Ray version in documentation to dev5 (#5794) 2019-09-27 00:19:17 -07:00
Richard Liaw
5c549fd84b
[docs] Make slack more prominent (#5792)
Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>
2019-09-26 15:36:56 -07:00