ray/doc/source/projects.rst

Ray Projects (Experimental)
===========================

Ray projects make it easy to package a Ray application so it can be
rerun later in the same environment. They allow for the sharing and
reliable reuse of existing code.

Quick start (CLI)
-----------------

.. code-block:: bash

    # Creates a project in the current directory. It will create a
    # project.yaml defining the code and environment and a cluster.yaml
    # describing the cluster configuration. Both will be created in the
    # ray-project subdirectory of the current directory.
    $ ray project create <project-name>

    # Create a new session from the given project.  Launch a cluster and run
    # the command, which must be specified in the project.yaml file. If no
    # command is specified, the "default" command in ray-project/project.yaml
    # will be used. Alternatively, use --shell to run a raw shell command.
    $ ray session start <command-name> [arguments] [--shell]

    # Open a console for the given session.
    $ ray session attach

    # Stop the given session and terminate all of its worker nodes.
    $ ray session stop

Examples
--------
See `the readme <https://github.com/ray-project/ray/blob/master/python/ray/projects/examples/README.md>`__
for instructions on how to run these examples:

- `Open Tacotron <https://github.com/ray-project/ray/blob/master/python/ray/projects/examples/open-tacotron/ray-project/project.yaml>`__:
  A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial)
- `PyTorch Transformers <https://github.com/ray-project/ray/blob/master/python/ray/projects/examples/pytorch-transformers/ray-project/project.yaml>`__:
  A library of state-of-the-art pretrained models for Natural Language Processing (NLP)

Tutorial
--------

We will walk through how to use projects by executing the `streaming MapReduce example <auto_examples/plot_streaming.html>`_.
Commands always apply to the project in the current directory.
Let us switch into the project directory with

.. code-block:: bash

    cd ray/doc/examples/streaming


A session represents a running instance of a project. Let's start one with

.. code-block:: bash

    ray session start


The ``ray session start`` command
will bring up a new cluster and initialize the environment of the cluster
according to the `environment` section of the `project.yaml`, installing all
dependencies of the project.

Now we can execute a command in the session. To see a list of all available
commands of the project, run

.. code-block:: bash

    ray session commands


which produces the following output:

.. code-block::

  Active project: ray-example-streaming

  Command "run":
    usage: run [--num-mappers NUM_MAPPERS] [--num-reducers NUM_REDUCERS]

    Start the streaming example.

    optional arguments:
      --num-mappers NUM_MAPPERS
                          Number of mapper actors used
      --num-reducers NUM_REDUCERS
                          Number of reducer actors used


As you see, in this project there is only a single ``run`` command which has arguments
``--num-mappers`` and ``--num-reducers``. We can execute the streaming
wordcount with the default parameters by running

.. code-block:: bash

    ray session execute run


You can interrupt the command with ``<Control>-c`` and attach to the running session by executing

.. code-block:: bash

    ray session attach --tmux


Inside the session you can for example edit the streaming applications with

.. code-block:: bash

    cd ray-example-streaming
    emacs streaming.py


Try for example to add the following lines after the ``for count in counts:`` loop:

.. code-block:: python

    if "million" in wordcounts:
      print("Found the word!")


and re-run the application from outside the session with

.. code-block:: bash

    ray session execute run


The session can be terminated from outside the session with

.. code-block:: bash

    ray session stop


Project file format (project.yaml)
----------------------------------

A project file contains everything required to run a project.
This includes a cluster configuration, the environment and dependencies
for the application, and the specific inputs used to run the project.

Here is an example for a minimal project format:

.. code-block:: yaml

    name: test-project
    description: "This is a simple test project"
    repo: https://github.com/ray-project/ray

    # Cluster to be instantiated by default when starting the project.
    cluster:
      config: ray-project/cluster.yaml

    # Commands/information to build the environment, once the cluster is
    # instantiated. This can include the versions of python libraries etc.
    # It can be specified as a Python requirements.txt, a conda environment,
    # a Dockerfile, or a shell script to run to set up the libraries.
    environment:
      requirements: requirements.txt

    # List of commands that can be executed once the cluster is instantiated
    # and the environment is set up.
    # A command can also specify a cluster that overwrites the default cluster.
    commands:
      - name: default
        command: python default.py
        help: "The command that will be executed if no command name is specified"
      - name: test
        command: python test.py --param1={{param1}} --param2={{param2}}
        help: "A test command"
        params:
          - name: "param1"
            help: "The first parameter"
            # The following line indicates possible values this parameter can take.
            choices: ["1", "2"]
          - name: "param2"
            help: "The second parameter"

Project files have to adhere to the following schema:

.. jsonschema:: ../../python/ray/projects/schema.json

Cluster file format (cluster.yaml)
----------------------------------

This is the same as for the autoscaler, see
`Cluster Launch page <autoscaling.html>`_.
[projects] Project examples and documentation (#5407) 2019-08-20 20:49:15 -07:00			`Ray Projects (Experimental)`
			`===========================`

			`Ray projects make it easy to package a Ray application so it can be`
			`rerun later in the same environment. They allow for the sharing and`
			`reliable reuse of existing code.`

			`Quick start (CLI)`
			`-----------------`

			`.. code-block:: bash`

			`# Creates a project in the current directory. It will create a`
			`# project.yaml defining the code and environment and a cluster.yaml`
			`# describing the cluster configuration. Both will be created in the`
Rename .rayproject to ray-project (#6278) 2019-12-05 16:15:42 -08:00			`# ray-project subdirectory of the current directory.`
[projects] Project examples and documentation (#5407) 2019-08-20 20:49:15 -07:00			`$ ray project create <project-name>`

Project fixes and cleanups (#5632) 2019-09-05 11:55:42 -07:00			`# Create a new session from the given project. Launch a cluster and run`
			`# the command, which must be specified in the project.yaml file. If no`
Rename .rayproject to ray-project (#6278) 2019-12-05 16:15:42 -08:00			`# command is specified, the "default" command in ray-project/project.yaml`
Project fixes and cleanups (#5632) 2019-09-05 11:55:42 -07:00			`# will be used. Alternatively, use --shell to run a raw shell command.`
			`$ ray session start <command-name> [arguments] [--shell]`
[projects] Project examples and documentation (#5407) 2019-08-20 20:49:15 -07:00
			`# Open a console for the given session.`
			`$ ray session attach`

Project fixes and cleanups (#5632) 2019-09-05 11:55:42 -07:00			`# Stop the given session and terminate all of its worker nodes.`
[projects] Project examples and documentation (#5407) 2019-08-20 20:49:15 -07:00			`$ ray session stop`

			`Examples`
			`--------`
Project fixes and cleanups (#5632) 2019-09-05 11:55:42 -07:00			See `the readme <https://github.com/ray-project/ray/blob/master/python/ray/projects/examples/README.md>`__
			`for instructions on how to run these examples:`

Rename .rayproject to ray-project (#6278) 2019-12-05 16:15:42 -08:00			- `Open Tacotron <https://github.com/ray-project/ray/blob/master/python/ray/projects/examples/open-tacotron/ray-project/project.yaml>`__:
[projects] Project examples and documentation (#5407) 2019-08-20 20:49:15 -07:00			`A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial)`
Rename .rayproject to ray-project (#6278) 2019-12-05 16:15:42 -08:00			- `PyTorch Transformers <https://github.com/ray-project/ray/blob/master/python/ray/projects/examples/pytorch-transformers/ray-project/project.yaml>`__:
[projects] Project examples and documentation (#5407) 2019-08-20 20:49:15 -07:00			`A library of state-of-the-art pretrained models for Natural Language Processing (NLP)`

[Projects] Add small tutorial for projects (#6641) 2020-01-20 09:33:41 -08:00			`Tutorial`
			`--------`

			We will walk through how to use projects by executing the `streaming MapReduce example <auto_examples/plot_streaming.html>`_.
			`Commands always apply to the project in the current directory.`
			`Let us switch into the project directory with`

			`.. code-block:: bash`

			`cd ray/doc/examples/streaming`


			`A session represents a running instance of a project. Let's start one with`

			`.. code-block:: bash`

			`ray session start`


			The ``ray session start`` command
			`will bring up a new cluster and initialize the environment of the cluster`
			according to the `environment` section of the `project.yaml`, installing all
			`dependencies of the project.`

			`Now we can execute a command in the session. To see a list of all available`
			`commands of the project, run`

			`.. code-block:: bash`

			`ray session commands`


			`which produces the following output:`

			`.. code-block::`

			`Active project: ray-example-streaming`

			`Command "run":`
			`usage: run [--num-mappers NUM_MAPPERS] [--num-reducers NUM_REDUCERS]`

			`Start the streaming example.`

			`optional arguments:`
			`--num-mappers NUM_MAPPERS`
			`Number of mapper actors used`
			`--num-reducers NUM_REDUCERS`
			`Number of reducer actors used`


			As you see, in this project there is only a single ``run`` command which has arguments
			``--num-mappers`` and ``--num-reducers``. We can execute the streaming
			`wordcount with the default parameters by running`

			`.. code-block:: bash`

			`ray session execute run`


			You can interrupt the command with ``<Control>-c`` and attach to the running session by executing

			`.. code-block:: bash`

			`ray session attach --tmux`


			`Inside the session you can for example edit the streaming applications with`

			`.. code-block:: bash`

			`cd ray-example-streaming`
			`emacs streaming.py`


			Try for example to add the following lines after the ``for count in counts:`` loop:

			`.. code-block:: python`

			`if "million" in wordcounts:`
			`print("Found the word!")`


			`and re-run the application from outside the session with`

			`.. code-block:: bash`

			`ray session execute run`


			`The session can be terminated from outside the session with`

			`.. code-block:: bash`

			`ray session stop`


[projects] Project examples and documentation (#5407) 2019-08-20 20:49:15 -07:00			`Project file format (project.yaml)`
			`----------------------------------`

			`A project file contains everything required to run a project.`
			`This includes a cluster configuration, the environment and dependencies`
			`for the application, and the specific inputs used to run the project.`

			`Here is an example for a minimal project format:`

			`.. code-block:: yaml`

			`name: test-project`
			`description: "This is a simple test project"`
			`repo: https://github.com/ray-project/ray`

			`# Cluster to be instantiated by default when starting the project.`
[Projects] Refactor cluster specification (#6488) 2019-12-14 22:43:06 -08:00			`cluster:`
			`config: ray-project/cluster.yaml`
[projects] Project examples and documentation (#5407) 2019-08-20 20:49:15 -07:00
			`# Commands/information to build the environment, once the cluster is`
			`# instantiated. This can include the versions of python libraries etc.`
			`# It can be specified as a Python requirements.txt, a conda environment,`
			`# a Dockerfile, or a shell script to run to set up the libraries.`
			`environment:`
			`requirements: requirements.txt`

			`# List of commands that can be executed once the cluster is instantiated`
			`# and the environment is set up.`
			`# A command can also specify a cluster that overwrites the default cluster.`
			`commands:`
[projects] Add named commands to sessions (#5525) 2019-08-26 14:16:17 -07:00			`- name: default`
			`command: python default.py`
			`help: "The command that will be executed if no command name is specified"`
[projects] Project examples and documentation (#5407) 2019-08-20 20:49:15 -07:00			`- name: test`
[projects] Add named commands to sessions (#5525) 2019-08-26 14:16:17 -07:00			`command: python test.py --param1={{param1}} --param2={{param2}}`
			`help: "A test command"`
			`params:`
			`- name: "param1"`
			`help: "The first parameter"`
			`# The following line indicates possible values this parameter can take.`
			`choices: ["1", "2"]`
			`- name: "param2"`
			`help: "The second parameter"`
[projects] Project examples and documentation (#5407) 2019-08-20 20:49:15 -07:00
			`Project files have to adhere to the following schema:`

			`.. jsonschema:: ../../python/ray/projects/schema.json`

			`Cluster file format (cluster.yaml)`
			`----------------------------------`

			`This is the same as for the autoscaler, see`
			`Cluster Launch page <autoscaling.html>`_.