* add pages about examples on training language models with fairseq and ray autoscaler
* better format
* update ray_train.sh
* Move EFS to the autoscaler file
* nits
* add comments to the code & use a new way to implement checkpoint hook
* small bug fix
* polish the doc
* fix formatting
* yaml
* update docs
* fix the bugs and add preprocess.sh
* fix lint
* Reduce batch size & fix lint
* shorttitle
- Surfaces local cluster usage
- Increases visability of these instructions
- Removes some docker docs (that are really out of scope for Ray
documentation IMO)
Closes#3517.
Adds a tmux flag that can be used to support background execution of experiments. Cannot be used together with screen. Seems to be useful feature that has shown up with different users.
This adds some experimental (undocumented) support for launching Ray on existing nodes. You have to provide the head ip, and the list of worker ips.
There are also a couple additional utils added for rsyncing files and port-forward.
ray exec CLUSTER CMD [--screen] [--start] [--stop]
ray attach CLUSTER [--start]
Example:
ray exec sgd.yaml 'source activate tensorflow_p27 && cd ~/ray/python/ray/rllib && ./train.py --run=PPO --env=CartPole-v0' --screen --start --stop
This will in one command create a cluster and run the command on it in a screen session. The screen can later be attached to via ray attach. After the command finishes, the cluster workers will be terminated and the head node stopped.
* Fri Feb 16 13:53:50 PST 2018
* Sat Feb 17 15:32:08 PST 2018
* Sat Feb 17 15:44:59 PST 2018
* fix
* Sun Feb 18 14:46:24 PST 2018
* Sun Feb 18 14:46:37 PST 2018
* Sun Feb 18 14:55:52 PST 2018
* Sun Feb 18 15:14:32 PST 2018
* Wed Feb 21 17:34:17 PST 2018
* Sun Feb 25 17:51:17 PST 2018
* Sun Feb 25 22:18:40 PST 2018
* Wed Feb 28 13:19:05 PST 2018
* Wed Feb 28 13:22:13 PST 2018
* Wed Feb 28 13:33:29 PST 2018
* Wed Feb 28 13:35:33 PST 2018
* add ex
* Fri Mar 2 12:50:17 PST 2018
* Fri Mar 2 12:54:31 PST 2018
* some autoscaling config tweaks
* Sun Jan 14 13:56:55 PST 2018
* Mon Jan 15 14:21:09 PST 2018
* increase backoff
* Mon Jan 15 14:40:47 PST 2018
* check boto version