Ray Projects (Experimental) =========================== Ray projects make it easy to package a Ray application so it can be rerun later in the same environment. They allow for the sharing and reliable reuse of existing code. Quick start (CLI) ----------------- .. code-block:: bash # Creates a project in the current directory. It will create a # project.yaml defining the code and environment and a cluster.yaml # describing the cluster configuration. Both will be created in the # ray-project subdirectory of the current directory. $ ray project create # Create a new session from the given project. Launch a cluster and run # the command, which must be specified in the project.yaml file. If no # command is specified, the "default" command in ray-project/project.yaml # will be used. Alternatively, use --shell to run a raw shell command. $ ray session start [arguments] [--shell] # Open a console for the given session. $ ray session attach # Stop the given session and terminate all of its worker nodes. $ ray session stop Examples -------- See `the readme `__ for instructions on how to run these examples: - `Open Tacotron `__: A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial) - `PyTorch Transformers `__: A library of state-of-the-art pretrained models for Natural Language Processing (NLP) Tutorial -------- We will walk through how to use projects by executing the `streaming MapReduce example `_. Commands always apply to the project in the current directory. Let us switch into the project directory with .. code-block:: bash cd ray/doc/examples/streaming A session represents a running instance of a project. Let's start one with .. code-block:: bash ray session start The ``ray session start`` command will bring up a new cluster and initialize the environment of the cluster according to the `environment` section of the `project.yaml`, installing all dependencies of the project. Now we can execute a command in the session. To see a list of all available commands of the project, run .. code-block:: bash ray session commands which produces the following output: .. code-block:: Active project: ray-example-streaming Command "run": usage: run [--num-mappers NUM_MAPPERS] [--num-reducers NUM_REDUCERS] Start the streaming example. optional arguments: --num-mappers NUM_MAPPERS Number of mapper actors used --num-reducers NUM_REDUCERS Number of reducer actors used As you see, in this project there is only a single ``run`` command which has arguments ``--num-mappers`` and ``--num-reducers``. We can execute the streaming wordcount with the default parameters by running .. code-block:: bash ray session execute run You can interrupt the command with ``-c`` and attach to the running session by executing .. code-block:: bash ray session attach --tmux Inside the session you can for example edit the streaming applications with .. code-block:: bash cd ray-example-streaming emacs streaming.py Try for example to add the following lines after the ``for count in counts:`` loop: .. code-block:: python if "million" in wordcounts: print("Found the word!") and re-run the application from outside the session with .. code-block:: bash ray session execute run The session can be terminated from outside the session with .. code-block:: bash ray session stop Project file format (project.yaml) ---------------------------------- A project file contains everything required to run a project. This includes a cluster configuration, the environment and dependencies for the application, and the specific inputs used to run the project. Here is an example for a minimal project format: .. code-block:: yaml name: test-project description: "This is a simple test project" repo: https://github.com/ray-project/ray # Cluster to be instantiated by default when starting the project. cluster: config: ray-project/cluster.yaml # Commands/information to build the environment, once the cluster is # instantiated. This can include the versions of python libraries etc. # It can be specified as a Python requirements.txt, a conda environment, # a Dockerfile, or a shell script to run to set up the libraries. environment: requirements: requirements.txt # List of commands that can be executed once the cluster is instantiated # and the environment is set up. # A command can also specify a cluster that overwrites the default cluster. commands: - name: default command: python default.py help: "The command that will be executed if no command name is specified" - name: test command: python test.py --param1={{param1}} --param2={{param2}} help: "A test command" params: - name: "param1" help: "The first parameter" # The following line indicates possible values this parameter can take. choices: ["1", "2"] - name: "param2" help: "The second parameter" Project files have to adhere to the following schema: .. jsonschema:: ../../python/ray/projects/schema.json Cluster file format (cluster.yaml) ---------------------------------- This is the same as for the autoscaler, see :ref:`Cluster Launch page `.