Commit graph

734 commits

Author SHA1 Message Date
hubcity
3d0a8662b3
#7246 - Fixing broken links (#7247)
* #7246 - Fixing broken links

* Apply suggestions from code review

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-03-25 21:46:13 -07:00
Richard Liaw
3fa2e4a346
[docs] Fix import breaking docs build (#7715)
* psutil missing

* ok
2020-03-23 13:21:39 -07:00
Richard Liaw
e311013afd
[tune] Reformat Sections of API Reference (#7706)
* moveit

* moveit

* docstrings to ref

* Update tune-usage.rst

Co-authored-by: Sven Mika <sven@anyscale.io>
2020-03-23 12:23:21 -07:00
Sven Mika
1138f2ebed
[RLlib] Issue 7046 cannot restore keras model from h5 file. (#7482) 2020-03-23 12:19:30 -07:00
Robert Nishihara
1a0c9228d0
Remove pytest from setup.py and other minor changes. (#7700) 2020-03-23 08:46:56 -07:00
SangBin Cho
79767fe425
Fix wording in dashboard documentation. (#7703) 2020-03-22 22:16:40 -07:00
Robert Nishihara
8b4c2b7e88
Remove unnecessary handling of setproctitle and psutil. (#7702) 2020-03-22 22:06:42 -07:00
Robert Nishihara
4d722bf003
Remove dependence on funcsigs. (#7701) 2020-03-22 21:37:24 -07:00
Richard Liaw
81d311031b
[tune] Update API Reference Page (#7671)
* widerdocs

* init

* docs

* fix

* moveit

* mix

* better_docs

* remove

* Apply suggestions from code review

Co-Authored-By: Sven Mika <sven@anyscale.io>

Co-authored-by: Sven Mika <sven@anyscale.io>
2020-03-22 16:42:20 -07:00
Eric Liang
288933ec6b
[rllib] Fix shared metrics context in parallel iterators (#7666)
* debug

* build

* update

* wip

* wpi

* update

* recurisve sync

* comment

* stream

* fix

* Update .travis.yml
2020-03-22 14:15:01 -07:00
SangBin Cho
1b90196bef
[doc] Dashboard documentation (#7304)
* Completed the first half of dashboard documentation.

* Dashboard document initial versions.

* Formatting.

* Fixed tune note is not visible.

* Half of comments from code reivew are handled.

* Fixed based on code review.

* Improved memory usage page.

* Addressed code review.

* Fixed image not found issue.

* Add gitkeep again.

* Refactored document.

* Addressed Robert's feedback.

* Addressed code reviews.

* Addressed last comments.

* Update doc/source/ray-dashboard.rst

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-03-20 22:00:33 -07:00
Edward Oakes
31845f17a5
[docs] Add documentation for reference counting and 'ray memory' (#7661) 2020-03-20 15:47:00 -05:00
Eric Liang
9392cdbf74
[rllib] Add high-performance external application connector (#7641) 2020-03-20 12:43:57 -07:00
Eric Liang
745b9d643d
First pass at ray memory command for memory debugging (#7589) 2020-03-17 20:45:07 -07:00
Landcold7
e6a045df48
Fix typo in asyncio documentation (#7602) 2020-03-17 10:37:37 -05:00
Scott Graham
37e4d29f87
[autoscaler] Adding Azure Support (#7080)
* adding directory and node_provider entry for azure autoscaler

* adding initial cut at azure autoscaler functionality, needs testing and node_provider methods need updating

* adding todos and switching to auth file for service principal authentication

* adding role / scope to service principal

* resolving issues with app credentials

* adding retry for setting service principal role

* typo and adding retry to nic creation

* adding nsg to config, moving nic/public ip to node provider, cleanup node_provider, leaving in NodeProvider stub for testing

* linting

* updating cleanup and fixing bugs

* adding directory and node_provider entry for azure autoscaler

* adding initial cut at azure autoscaler functionality, needs testing and node_provider methods need updating

* adding todos and switching to auth file for service principal authentication

* adding role / scope to service principal

* resolving issues with app credentials

* adding retry for setting service principal role

* typo and adding retry to nic creation

* adding nsg to config, moving nic/public ip to node provider, cleanup node_provider, leaving in NodeProvider stub for testing

* linting

* updating cleanup and fixing bugs

* minor fixes

* first working version :)

* added tag support

* added msi identity intermediate

* enable MSI through user managed identity

* updated schema

* extend yaml schema
remove service principal code
add re-use of managed user identity

* fix rg_id

* fix logging

* replace manual cluster yaml validation with json schema
- improved error message
- support for intellisense in VSCode (or other IDEs)

* run linting

* updating yaml configs and formatting

* updating yaml configs and formatting

* typo in example config

* pulling default config from example-full

* resetting min, init worker prop

* adding docs for azure autoscaler and fixing status

* add azure to docs, fix config for spot instances, update azure provider to avoid caching issues during deployment

* fix for default subscription in azure node provider

* vm dev image build

* minor change

* keeping example-full.yaml in autoscaler/azure, updating azure example config

* linting azure config

* extending retries on azure config

* lint

* support for internal ips, fix to azure docs, and new azure gpu example config

* linting

* Update python/ray/autoscaler/azure/node_provider.py

Co-Authored-By: Richard Liaw <rliaw@berkeley.edu>

* revert_this

* remove_schema

* updating configs and removing ssh keygen, tweak azure node provider terminate

* minor tweaks

Co-authored-by: Markus Cozowicz <marcozo@microsoft.com>
Co-authored-by: Ubuntu <marcozo@mc-ray-jumpbox.chcbtljllnieveqhw3e4c1ducc.xx.internal.cloudapp.net>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-03-15 14:48:27 -07:00
Eric Liang
dd70720578
[rllib] Rename sample_batch_size => rollout_fragment_length (#7503)
* bulk rename

* deprecation warn

* update doc

* update fig

* line length

* rename

* make pytest comptaible

* fix test

* fi sys

* rename

* wip

* fix more

* lint

* update svg

* comments

* lint

* fix use of batch steps
2020-03-14 12:05:04 -07:00
Eric Liang
52cf77f5a9
[rllib] SAC no_done_at_end should default to False (#7594)
* update

* update doc

* stochastic

* cleanu
2020-03-14 11:16:54 -07:00
Markus Cozowicz
ba1b081477
Azure Portal cluster deployment | Support spot instances (#7558)
* added priority option

* added head node priority

* upgrade api version
2020-03-11 18:46:11 -07:00
Richard Liaw
b38ed4be71
[raysgd] Fix More Docs (#7565) 2020-03-11 14:17:47 -07:00
Richard Liaw
d046faeb9c
[sgd] Readme fix (#7564)
* readme fix

* replicas
2020-03-11 13:40:18 -07:00
Richard Liaw
b70f31339c
[sgd] Benchmark Fixes (#7553)
* fix

* fix
2020-03-11 13:08:27 -07:00
Richard Liaw
fbac256982
[sgd] Add benchmarks (#7454)
* Init fp16

* fp16 and schedulers

* scheduler linking and fp16

* to fp16

* loss scaling and documentation

* more documentation

* add tests, refactor config

* moredocs

* more docs

* fix logo, add test mode, add fp16 flag

* fix tests

* fix scheduler

* fix apex

* improve safety

* fix tests

* fix tests

* remove pin memory default

* rm

* fix

* Update doc/examples/doc_code/raysgd_torch_signatures.py

* fix

* migrate changes from other PR

* ok thanks

* pass

* signatures

* lint'

* Update python/ray/experimental/sgd/pytorch/utils.py

* Apply suggestions from code review

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* should address most comments

* comments

* fix this ci

* first_pass

* add overrides

* override

* fixing up operators

* format

* sgd

* constants

* rm

* revert

* save

* failures

* fixes

* trainer

* run test

* operator

* code

* op

* ok done

* operator

* sgd test fixes

* ok

* trainer

* format

* Apply suggestions from code review

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* Update doc/source/raysgd/raysgd_pytorch.rst

* docstring

* dcgan

* doc

* commits

* nit

* testing

* revert

* Start renaming pytorch to torch

* Rename PyTorchTrainer to TorchTrainer

* Rename PyTorch runners to Torch runners

* Finish renaming API

* Rename to torch in tests

* Finish renaming docs + tests

* Run format + fix DeprecationWarning

* fix

* move tests up

* benchmarks

* rename

* remove some args

* better metrics output

* fix up the benchmark

* benchmark-yaml

* horovod-benchmark

* benchmarks

* Remove benchmark code for cleanups

* benchmark-code

* nits

* benchmark yamls

* benchmark yaml

* ok

* ok

* ok

* benchmark

* nit

* finish_bench

* makedatacreator

* relax

* metrics

* autosetsampler

* profile

* movements

* OK

* smoothen

* fix

* nitdocs

* loss

* envflag

* comments

* nit

* format

* visible

* images

* move_images

* fix

* rernder

* rrender

* rest

* multgpu

* fix

* nit

* finish

* extrra

* setup

* revert

Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Co-authored-by: Maksim Smolin <maximsmol@gmail.com>
2020-03-11 01:09:08 -07:00
Richard Liaw
d192ef0611
[raysgd] Cleanup User API (#7384)
* Init fp16

* fp16 and schedulers

* scheduler linking and fp16

* to fp16

* loss scaling and documentation

* more documentation

* add tests, refactor config

* moredocs

* more docs

* fix logo, add test mode, add fp16 flag

* fix tests

* fix scheduler

* fix apex

* improve safety

* fix tests

* fix tests

* remove pin memory default

* rm

* fix

* Update doc/examples/doc_code/raysgd_torch_signatures.py

* fix

* migrate changes from other PR

* ok thanks

* pass

* signatures

* lint'

* Update python/ray/experimental/sgd/pytorch/utils.py

* Apply suggestions from code review

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* should address most comments

* comments

* fix this ci

* first_pass

* add overrides

* override

* fixing up operators

* format

* sgd

* constants

* rm

* revert

* save

* failures

* fixes

* trainer

* run test

* operator

* code

* op

* ok done

* operator

* sgd test fixes

* ok

* trainer

* format

* Apply suggestions from code review

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* Update doc/source/raysgd/raysgd_pytorch.rst

* docstring

* dcgan

* doc

* commits

* nit

* testing

* revert

* Start renaming pytorch to torch

* Rename PyTorchTrainer to TorchTrainer

* Rename PyTorch runners to Torch runners

* Finish renaming API

* Rename to torch in tests

* Finish renaming docs + tests

* Run format + fix DeprecationWarning

* fix

* move tests up

* benchmarks

* rename

* remove some args

* better metrics output

* fix up the benchmark

* benchmark-yaml

* horovod-benchmark

* benchmarks

* Remove benchmark code for cleanups

* makedatacreator

* relax

* metrics

* autosetsampler

* profile

* movements

* OK

* smoothen

* fix

* nitdocs

* loss

* comments

* fix

* fix

* runner_tests

* codes

* example

* fix_test

* fix

* tests

Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Co-authored-by: Maksim Smolin <maximsmol@gmail.com>
2020-03-10 08:41:42 -07:00
Anthony Yu
89ec4adb72
[tune] Dragonfly Optimizer (#5955)
* Add sample example

* Copy relevant lines of ask from inherited Optimizer

* Ignore strategy

* Additional changes

* Add DragonflySearch for tune connector for Dragonfly

* Add example and fix small errors

* lint

* Remove skopt references

* Update example based off of Dragonfly changes

* Edit example for final Dragonfly edits

* Formatting and documentation edits

* Add documentation and add to test pipeline

* Address PR comments

* Fix Jenkins test

* Adjust Dragonfly to PR#7366

* Lint

* fix_tests

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-03-10 08:40:36 -07:00
Edward Oakes
4ab80eafb9
Deprecate use_pickle flag (#7474) 2020-03-09 16:03:56 -07:00
Markus Cozowicz
e03259455f
[autoscaler] azure init script path (#7515) 2020-03-09 09:49:07 -07:00
Markus Cozowicz
145ebe14c7
added Azure Resource Manager (ARM) template (#7494)
* added Azure Resource Manager (ARM) template

* removed Azure doc (moved to separate PR)

* nit

* fixpaths

* nit

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-03-08 22:29:10 -07:00
Richard Liaw
115468de2c
[tune] Repeated evals (#7366)
* easyrepeat

* done

* suggest

* doc

* ok

* commit

* Apply suggestions from code review

Co-Authored-By: Ujval Misra <misraujval@gmail.com>

* Apply suggestions from code review

Co-Authored-By: Ujval Misra <misraujval@gmail.com>

* Apply suggestions from code review

* ok

* docs

Co-authored-by: Ujval Misra <misraujval@gmail.com>
2020-03-07 11:08:23 -08:00
Sven Mika
510c850651
[RLlib] SAC add discrete action support. (#7320)
* Exploration API (+EpsilonGreedy sub-class).

* Exploration API (+EpsilonGreedy sub-class).

* Cleanup/LINT.

* Add `deterministic` to generic Trainer config (NOTE: this is still ignored by most Agents).

* Add `error` option to deprecation_warning().

* WIP.

* Bug fix: Get exploration-info for tf framework.
Bug fix: Properly deprecate some DQN config keys.

* WIP.

* LINT.

* WIP.

* Split PerWorkerEpsilonGreedy out of EpsilonGreedy.
Docstrings.

* Fix bug in sampler.py in case Policy has self.exploration = None

* Update rllib/agents/dqn/dqn.py

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

* WIP.

* Update rllib/agents/trainer.py

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

* WIP.

* Change requests.

* LINT

* In tune/utils/util.py::deep_update() Only keep deep_updat'ing if both original and value are dicts. If value is not a dict, set

* Completely obsolete syn_replay_optimizer.py's parameters schedule_max_timesteps AND beta_annealing_fraction (replaced with prioritized_replay_beta_annealing_timesteps).

* Update rllib/evaluation/worker_set.py

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

* Review fixes.

* Fix default value for DQN's exploration spec.

* LINT

* Fix recursion bug (wrong parent c'tor).

* Do not pass timestep to get_exploration_info.

* Update tf_policy.py

* Fix some remaining issues with test cases and remove more deprecated DQN/APEX exploration configs.

* Bug fix tf-action-dist

* DDPG incompatibility bug fix with new DQN exploration handling (which is imported by DDPG).

* Switch off exploration when getting action probs from off-policy-estimator's policy.

* LINT

* Fix test_checkpoint_restore.py.

* Deprecate all SAC exploration (unused) configs.

* Properly use `model.last_output()` everywhere. Instead of `model._last_output`.

* WIP.

* Take out set_epsilon from multi-agent-env test (not needed, decays anyway).

* WIP.

* Trigger re-test (flaky checkpoint-restore test).

* WIP.

* WIP.

* Add test case for deterministic action sampling in PPO.

* bug fix.

* Added deterministic test cases for different Agents.

* Fix problem with TupleActions in dynamic-tf-policy.

* Separate supported_spaces tests so they can be run separately for easier debugging.

* LINT.

* Fix autoregressive_action_dist.py test case.

* Re-test.

* Fix.

* Remove duplicate py_test rule from bazel.

* LINT.

* WIP.

* WIP.

* SAC fix.

* SAC fix.

* WIP.

* WIP.

* WIP.

* FIX 2 examples tests.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* Fix.

* LINT.

* Renamed test file.

* WIP.

* Add unittest.main.

* Make action_dist_class mandatory.

* fix

* FIX.

* WIP.

* WIP.

* Fix.

* Fix.

* Fix explorations test case (contextlib cannot find its own nullcontext??).

* Force torch to be installed for QMIX.

* LINT.

* Fix determine_tests_to_run.py.

* Fix determine_tests_to_run.py.

* WIP

* Add Random exploration component to tests (fixed issue with "static-graph randomness" via py_function).

* Add Random exploration component to tests (fixed issue with "static-graph randomness" via py_function).

* Rename some stuff.

* Rename some stuff.

* WIP.

* update.

* WIP.

* Gumbel Softmax Dist.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP

* WIP.

* WIP.

* Hypertune.

* Hypertune.

* Hypertune.

* Lock-in.

* Cleanup.

* LINT.

* Fix.

* Update rllib/policy/eager_tf_policy.py

Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>

* Update rllib/agents/sac/sac_policy.py

Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>

* Update rllib/agents/sac/sac_policy.py

Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>

* Update rllib/models/tf/tf_action_dist.py

Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>

* Update rllib/models/tf/tf_action_dist.py

Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>

* Fix items from review comments.

* Add dm_tree to RLlib dependencies.

* Add dm_tree to RLlib dependencies.

* Fix DQN test cases ((Torch)Categorical).

* Fix wrong pip install.

Co-authored-by: Eric Liang <ekhliang@gmail.com>
Co-authored-by: Kristian Hartikainen <kristian.hartikainen@gmail.com>
2020-03-06 10:37:12 -08:00
Eric Liang
476b5c6196
[Parallel Iterators] Allow for operator chaining after repartition (#7268)
* bug fix repartition

* change add_transform from private to inner

* formatting

* addressing comments

* formatting
2020-03-04 14:42:52 -08:00
Richard Liaw
c7f0b303f3
Mention that calling some_function.remote() is non-blocking (#7417)
* Mention that calling some_function.remote() is non-blocking.

* Apply suggestions from code review

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-03-04 13:35:46 -08:00
Richard Liaw
beddaf65b4
Small correction in documentation (#7453)
* corrected import statement in docs

* Update doc/source/tune-usage.rst

Co-Authored-By: Richard Liaw <rliaw@berkeley.edu>

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-03-04 13:28:28 -08:00
Philipp Moritz
0d7ef46c83
Bazel improvements (#7427)
* Make wget quiet

* Make sphinx-build quiet

* Remove -q from pip install in CI script as config already takes care of it

* Add documentation on custom dependencies

* formatting

* python
2020-03-04 13:13:21 -08:00
Maksim Smolin
3a134c7224
[RaySGD] Rename PyTorch API endpoints to start with Torch (#7425)
* Start renaming pytorch to torch

* Rename PyTorchTrainer to TorchTrainer

* Rename PyTorch runners to Torch runners

* Finish renaming API

* Rename to torch in tests

* Finish renaming docs + tests

* Run format + fix DeprecationWarning

* fix

* move tests up

* rename

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-03-03 16:44:42 -08:00
Philipp Moritz
c2c6d96490
Fix install documentation on readthedocs (#7423) 2020-03-03 11:03:18 -08:00
Richard Liaw
48cdca843f
[raysgd] Custom training operator (#7211) 2020-03-01 21:22:48 -08:00
Sven Mika
2d97650b1e
[RLlib] Add Exploration API documentation. (#7373)
* Add Exploration API documentation.

* Add Exploration API documentation.

* Add Exploration API documentation.

* Update exporation docs.
2020-03-01 16:55:41 -08:00
Sven Mika
83e06cd30a
[RLlib] DDPG refactor and Exploration API action noise classes. (#7314)
* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* Fix

* WIP.

* Add TD3 quick Pendulum regresison.

* Cleanup.

* Fix.

* LINT.

* Fix.

* Sort quick_learning test cases, add TD3.

* Sort quick_learning test cases, add TD3.

* Revert test_checkpoint_restore.py (debugging) changes.

* Fix old soft_q settings in documentation and test configs.

* More doc fixes.

* Fix test case.

* Fix test case.

* Lower test load.

* WIP.
2020-03-01 11:53:35 -08:00
SangBin Cho
50145e668d
Fix the problem that ray.remote reference is not visible at a document. (#7311) 2020-02-28 14:03:08 -08:00
Edward Oakes
93fe4b0b58
Change actor.__ray_kill__() to ray.kill(actor) (#7360) 2020-02-28 11:55:13 -06:00
Simon Mo
29b08ddc09
Improve release process from 0.8.2 (#7303) 2020-02-24 21:18:53 -08:00
ijrsvt
325fc24afa
Removing unused Pyarrow Info (#7207) 2020-02-21 17:07:26 -08:00
Simon Mo
b804d40c04
Stop vendoring pyarrow (#7233) 2020-02-19 19:01:26 -08:00
Simon Mo
7bef7031c2
Revert "Revert "Revert "Removing Pyarrow dependency (#7146)" (#7209) (#7214)" (#7232) 2020-02-19 13:35:29 -08:00
Simon Mo
e8941b1b79
Revert "Revert "Removing Pyarrow dependency (#7146)" (#7209) (#7214) 2020-02-19 10:08:52 -08:00
Eric Liang
0aa9373d62
Revert "Removing Pyarrow dependency (#7146)" (#7209)
This reverts commit 2116fd3bca.
2020-02-18 14:12:06 -08:00
Eric Liang
5df801605e
Add ray.util package and move libraries from experimental (#7100) 2020-02-18 13:43:19 -08:00
ijrsvt
2116fd3bca
Removing Pyarrow dependency (#7146) 2020-02-17 18:00:13 -08:00
Richard Liaw
94e2fcea2e
[sgd] fp16 (apex) and scheduler support + move examples page (#7061)
* Init fp16

* fp16 and schedulers

* scheduler linking and fp16

* to fp16

* loss scaling and documentation

* more documentation

* add tests, refactor config

* moredocs

* more docs

* fix logo, add test mode, add fp16 flag

* fix tests

* fix scheduler

* fix apex

* improve safety

* fix tests

* fix tests

* remove pin memory default

* rm

* fix

* Update doc/examples/doc_code/raysgd_torch_signatures.py

* fix

* migrate changes from other PR

* ok thanks

* pass

* signatures

* lint'

* Update python/ray/experimental/sgd/pytorch/utils.py

* Apply suggestions from code review

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* should address most comments

* comments

* fix this ci

* fix tests'

* testmode

Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
2020-02-16 19:04:08 -08:00