Commit graph

789 commits

Author SHA1 Message Date
Frithjof
872a3522aa Add machinable to list of projects using Tune (#6737) 2020-01-07 15:10:17 -08:00
Eric Liang
a6c8c342b7
Better document guarantees provided by par iter API (#6726)
* update

* Update doc/source/iter.rst

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* Update doc/source/iter.rst

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
2020-01-07 14:41:50 -08:00
Robert Nishihara
5e43b25e8c
Document fault tolerance behavior. (#6698) 2020-01-06 22:34:06 -08:00
Simon Mo
6285851743
Add sphinx copy button (#6694)
* Add sphinx copy button

* Update requirements-doc.txt

Co-authored-by: Robert Nishihara <robertnishihara@gmail.com>
2020-01-04 19:31:49 -06:00
Ujval Misra
ca651af1d7 [tune] Async restores and S3/GCP-capable trial FT (#6376)
* Initial commit for asynchronous save/restore

* Set stage for cloud checkpointable trainable.

* Refactor log_sync and sync_client.

* Add durable trainable impl.

* Support delete in cmd based client

* Fix some tests and such

* Cleanup, comments.

* Use upload_dir instead.

* Revert files belonging to other PR in split.

* Pass upload_dir into trainable init.

* Pickle checkpoint at driver, more robust checkpoint_dir discovery.

* Cleanup trainable helper functions, fix tests.

* Addressed comments.

* Fix bugs from cluster testing, add parameterized cluster tests.

* Add trainable util test

* package_ref

* pbt_address

* Fix bug after running pbt example (_save returning dir).

* get cluster tests running, other bug fixes.

* raise_errors

* Fix deleter bug, add durable trainable example.

* Fix cluster test bugs.

* filelock

* save/restore bug fixes

* .

* Working cluster tests.

* Lint, revert to tracking memory checkpoints.

* Documentation, cleanup

* fixinitialsync

* fix_one_test

* Fix cluster test bug

* nit

* lint

* Revert tune md change

* Fix basename bug for directories.

* lint

* fix_tests

* nit_fix

* Add __init__ file.

* Move to utils package

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-01-02 20:40:53 -08:00
Harrison Feng
57061a15cf [docs] configure.rst with --num-cpus (#6678)
--num-cpus -> --num-gpus

Signed-off-by: Harrison Feng <feng.harrison@gmail.com>
2020-01-02 20:33:41 -08:00
Eric Liang
895f2727fb
Add experimental parallel iterators API (#6644) 2020-01-02 13:45:26 -08:00
Robert Nishihara
d2c6457832
Remove public facing references to --redis-address. (#6631) 2019-12-31 13:21:53 -08:00
Michael Luo
1cb335487e SAC for Mujoco Environments (#6642) 2019-12-31 00:16:54 -08:00
Philipp Moritz
735f282494
Use 0.9.0.dev0 as the version tag (#6630) 2019-12-30 10:14:07 -08:00
Richard Liaw
646643a588
[doc] remove redundant PS example (#6634) 2019-12-29 20:54:42 -08:00
Edward Oakes
2a66529fb7
Add multiprocessing.Pool API (#6194) 2019-12-29 21:40:58 -06:00
Eric Liang
677004ee3d
Add 'ray stat' command for debugging (#6622)
* wip

* wip

* wip

* iterate

* move

* fix thread safety
2019-12-28 14:40:32 -08:00
Robert Nishihara
96f2f8ff10 Stop testing Python 2.7 and building Python 2.7 wheels. (#6601) 2019-12-27 20:47:49 -08:00
Robert Nishihara
8724e5ffd5 Start WebUI by default. (#6493) 2019-12-27 13:49:07 -08:00
zhu-eric
65297e65f0 Experimental Actor Pool (#6055)
* mod_table

* Example fix for gallery

* lint

* nit

* nit

* fix

* gallery

* remove table for now

* training, object store, tune, actors, advanced

* start tf code

* first cut tf

* yapf

* pytorch

* add torch example

* torch

* parallel

* tune

* tuning

* reviewsready

* finetune

* fix

* move_code

* update conf

* compile

* init hyperparameter

* Start images

* overview

* extra

* fix

* works

* update-ps-example

* param_actor

* fix

* examples

* simple

* simplify_pong

* flake8 and run hyperopt

* add comments

* add comments

* add suggestion

* add suggestion

* suggestions

* add suggestion

* add suggestions

* fixed in wrong area

* last edit

* finish changes

* add line

* format

* reset

* tests and docs

* fix tests

* bazelify

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2019-12-26 14:35:10 -08:00
micafan
b98b288ffd [GCS] Change GCS Test to cc_test (#6596) 2019-12-26 14:34:35 +08:00
Yuhao Yang
df4533c649 [tune] demo exporting trained models in pbt examples (#6533) 2019-12-26 02:14:49 +01:00
Edward Oakes
6b1a57542e
Add actor.__ray_kill__() to terminate actors immediately (#6523) 2019-12-23 23:12:57 -06:00
Edward Oakes
1b14fbe179 Hotfix ray download links in documentation (#6572) 2019-12-21 23:00:29 +01:00
Rong Rong
3af5fe60e7 modify developer tips page to add PR and CI tips (#6530) 2019-12-18 16:20:05 -08:00
Simon Mo
e530c37b0e
Use localhost and set redis password by default (#6481) 2019-12-17 19:41:19 -08:00
Eric Liang
6725a61bda
Release 0.8.0 test logs (#6512) 2019-12-17 15:56:50 -08:00
Eric Liang
2530eb90dc
Move tf.test.is_gpu_available() to after session init (#6515)
* move to after session init

* script fixes
2019-12-17 14:55:39 -08:00
Eric Liang
1a1324d2a2
Bump version from 0.8.0.dev6 -> 0.9.0.dev (#6508) 2019-12-16 23:57:42 -08:00
Edward Oakes
8636d67b72 Improve release docs and add results from 0.7.7 (#6506)
* Improve docs, add logs

* add logs

* microbenchmark

* lint
2019-12-16 15:51:39 -08:00
Richard Liaw
5719a05757
[sgd] Add support for multi-model multi-optimizer training (#6317) 2019-12-15 15:19:45 -08:00
Philipp Moritz
f5d10eea0b
[Projects] Refactor cluster specification (#6488) 2019-12-14 22:43:06 -08:00
Yuhao Yang
ad4da17899 [Tune] Add example and tutorial for DCGAN (#6400) 2019-12-13 14:15:44 -08:00
Richard Liaw
4ff6ca89f4
[docs] slight doc modifications (#6466) 2019-12-13 10:38:17 -08:00
alindkhare
76e678d775 [Serve] Added deadline awareness (#6442)
* [Serve] Added deadline awareness

Added deadline awareness while enqueuing a query
Using Blist sorted-list implementation (ascending order) to get queries according to their specified deadlines. [buffer_queues]
Exposed slo_ms via handle/http request
Added slo example 
The queries in example will be executed in almost the opposite order of which they are fired
Added slo pytest
Added check for slo_ms to not be negative
Included the changes suggested

* Linting Corrections

* Adding the code changes suggested by format.sh

* Added the suggested changes

Added justification for blist
Added blist in travis/ci/install-dependencies.sh

* Fixed linting issues

* Added blist to ray/doc/requirements-doc.txt
2019-12-11 16:41:54 -08:00
Maltimore
0ec613c95a [rllib] doc: fix typo: on_postprocess_batch -> on_postprocess_traj (#6438) 2019-12-11 15:00:53 -08:00
Dean Wampler
abb4fb3f8e Added small section on installation when using Anaconda. (#6427) 2019-12-11 10:23:41 -08:00
Ameer Haj Ali
1a9948eef9 Update rllib-examples.rst (#6396) 2019-12-08 16:21:50 -08:00
Dean Wampler
65694cdc4c [bug] Attempt to fix links not working. (#6390) 2019-12-07 14:31:50 -08:00
Victor Le
4e24c805ee AlphaZero and Ranked reward implementation (#6385) 2019-12-07 12:08:40 -08:00
Yuhao Yang
c327ae152f [doc] Update the test command in getting-involved. (#6347) 2019-12-07 11:03:52 -08:00
Dean Wampler
53d62d3eec Expanded with new pages for getting started, etc. Blog links unchanged. (#6388) 2019-12-06 15:18:47 -08:00
Edward Oakes
f63b64310a
Bump version to 0.8.0.dev7 (#6303) 2019-12-05 18:33:54 -08:00
Philipp Moritz
dd27bfbb75
Rename .rayproject to ray-project (#6278) 2019-12-05 16:15:42 -08:00
Eric Liang
bc5e259264
[rllib] Add a doc section on computing actions (#6326)
* options doc

* add note

* hint shr

* doc update
2019-12-03 00:10:50 -08:00
Shital Shah
670cb6374e Doc enhancement: use build.sh for ray, clarification on how rllib selects VisionNetwork, note on setup-dev.py for rllib. (#6092) 2019-12-02 22:19:01 -08:00
Richard Liaw
0b3d5d989b
[docs] Add public materials (#6331)
* startup

* update tune readme

* usingrah
2019-12-02 19:59:23 -08:00
Eric Liang
0b0a16982a [doc] Use .options() (#6323)
* options doc

* add note

* hint shr
2019-12-01 17:24:00 -08:00
Philipp Moritz
a4437813eb
[Projects] Unify hyphen vs underscore handling for arguments (#6208) 2019-11-20 23:52:41 -08:00
Richard Liaw
d3c7a8fda5
[docs] yarn update (#6173) 2019-11-19 16:15:08 -08:00
Yuhao Yang
d3ff2252c4 [doc] Fix link to getting involved 2019-11-18 12:59:14 -08:00
Eric Liang
8fc2272f43
[rllib] Reorganize trainer config, add warnings about high VF loss magnitude for PPO (#6181) 2019-11-18 10:39:07 -08:00
Ujval Misra
2965dc1b72 [tune] Fault tolerance improvements (#5877)
* Precede ray.get with ray.wait.

* Trigger checkpoint deletes locally in Trainable

* Clean-up code.

* Minor changes.

* Track best checkpoint so far again

* Pulled checkpoint GC out of Trainable.

* Added comments, error logging.

* Immediate pull after checkpoint taken; rsync source delete on pull

* Minor doc fixes

* Fix checkpoint manager bug

* Fix bugs, tests, formatting

* Fix bugs, feature flag for force sync.

* Fix test.

* Fix minor bugs: clear proc and less verbose sync_on_checkpoint warnings.

* Fix bug: update IP of last_result.

* Fixed message.

* Added a lot of logging.

* Changes to ray trial executor.

* More bug fixes (logging after failure), better logging.

* Fix richards bug and logging

* Add comments.

* try-except

* Fix heapq bug.

* .

* Move handling of no available trials to ray_trial_executor (#1)

* Fix formatting bug, lint.

* Addressed Richard's comments

* Revert tests.

* fix rebase

* Fix trial location reporting.

* Fix test

* Fix lint

* Rebase, use ray.get w/ timeout, lint.

* lint

* fix rebase

* Address richard's comments
2019-11-18 01:14:41 -08:00
Richard Liaw
62cbc043b4
[tune] tbx logger (#6133)
* tbx

* add_hparams

* fix_hparams

* ok

* ok

* fix

* ok

* fix
2019-11-15 08:45:44 -08:00