* adding directory and node_provider entry for azure autoscaler
* adding initial cut at azure autoscaler functionality, needs testing and node_provider methods need updating
* adding todos and switching to auth file for service principal authentication
* adding role / scope to service principal
* resolving issues with app credentials
* adding retry for setting service principal role
* typo and adding retry to nic creation
* adding nsg to config, moving nic/public ip to node provider, cleanup node_provider, leaving in NodeProvider stub for testing
* linting
* updating cleanup and fixing bugs
* adding directory and node_provider entry for azure autoscaler
* adding initial cut at azure autoscaler functionality, needs testing and node_provider methods need updating
* adding todos and switching to auth file for service principal authentication
* adding role / scope to service principal
* resolving issues with app credentials
* adding retry for setting service principal role
* typo and adding retry to nic creation
* adding nsg to config, moving nic/public ip to node provider, cleanup node_provider, leaving in NodeProvider stub for testing
* linting
* updating cleanup and fixing bugs
* minor fixes
* first working version :)
* added tag support
* added msi identity intermediate
* enable MSI through user managed identity
* updated schema
* extend yaml schema
remove service principal code
add re-use of managed user identity
* fix rg_id
* fix logging
* replace manual cluster yaml validation with json schema
- improved error message
- support for intellisense in VSCode (or other IDEs)
* run linting
* updating yaml configs and formatting
* updating yaml configs and formatting
* typo in example config
* pulling default config from example-full
* resetting min, init worker prop
* adding docs for azure autoscaler and fixing status
* add azure to docs, fix config for spot instances, update azure provider to avoid caching issues during deployment
* fix for default subscription in azure node provider
* vm dev image build
* minor change
* keeping example-full.yaml in autoscaler/azure, updating azure example config
* linting azure config
* extending retries on azure config
* lint
* support for internal ips, fix to azure docs, and new azure gpu example config
* linting
* Update python/ray/autoscaler/azure/node_provider.py
Co-Authored-By: Richard Liaw <rliaw@berkeley.edu>
* revert_this
* remove_schema
* updating configs and removing ssh keygen, tweak azure node provider terminate
* minor tweaks
Co-authored-by: Markus Cozowicz <marcozo@microsoft.com>
Co-authored-by: Ubuntu <marcozo@mc-ray-jumpbox.chcbtljllnieveqhw3e4c1ducc.xx.internal.cloudapp.net>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
* make starting ray a separate page
* concept
* Apply suggestions from code review
Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>
* more fics
* Apply suggestions from code review
Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
- Surfaces local cluster usage
- Increases visability of these instructions
- Removes some docker docs (that are really out of scope for Ray
documentation IMO)
Closes#3517.
ray exec CLUSTER CMD [--screen] [--start] [--stop]
ray attach CLUSTER [--start]
Example:
ray exec sgd.yaml 'source activate tensorflow_p27 && cd ~/ray/python/ray/rllib && ./train.py --run=PPO --env=CartPole-v0' --screen --start --stop
This will in one command create a cluster and run the command on it in a screen session. The screen can later be attached to via ray attach. After the command finishes, the cluster workers will be terminated and the head node stopped.
* Ray doc - changed python indentation to 4 spaces in documentation files actors.rst, api.rst, and example-*.rst
* Ray documentation - changed Python to 4 space indentation for files install-*.rst, installation-troubleshooting.rst, internals-overview.rst, serialization.rst, troubleshootin.rst, tutorial.rst, using-ray-*.rst