2020-02-11 23:17:30 -08:00
.. _configuring-ray:
2019-08-05 23:33:14 -07:00
Configuring Ray
===============
This page discusses the various way to configure Ray, both from the Python API
and from the command line. Take a look at the `` ray.init `` `documentation
<package-ref.html#ray.init> `__ for a complete overview of the configurations.
2020-02-11 23:17:30 -08:00
.. important :: For the multi-node setting, you must first run `` ray start `` on the command line to start the Ray cluster services on the machine before `` ray.init `` in Python to connect to the cluster services. On a single machine, you can run `` ray.init() `` without `` ray start `` , which will both start the Ray cluster services and connect to them.
2019-08-28 17:54:15 -07:00
2019-08-05 23:33:14 -07:00
Cluster Resources
-----------------
Ray by default detects available resources.
.. code-block :: python
# This automatically detects available resources in the single machine.
ray.init()
If not running cluster mode, you can specify cluster resources overrides through `` ray.init `` as follows.
.. code-block :: python
# If not connecting to an existing cluster, you can specify resources overrides:
ray.init(num_cpus=8, num_gpus=1)
# Specifying custom resources
ray.init(num_gpus=1, resources={'Resource1': 4, 'Resource2': 16})
2020-01-03 12:33:41 +08:00
When starting Ray from the command line, pass the `` --num-cpus `` and `` --num-gpus `` flags into `` ray start `` . You can also specify custom resources.
2019-08-05 23:33:14 -07:00
.. code-block :: bash
# To start a head node.
$ ray start --head --num-cpus=<NUM_CPUS> --num-gpus=<NUM_GPUS>
# To start a non-head node.
2019-09-01 16:53:02 -07:00
$ ray start --address=<address> --num-cpus=<NUM_CPUS> --num-gpus=<NUM_GPUS>
2019-08-05 23:33:14 -07:00
# Specifying custom resources
ray start [--head] --num-cpus=<NUM_CPUS> --resources='{"Resource1": 4, "Resource2": 16}'
If using the command line, connect to the Ray cluster as follow:
.. code-block :: python
# Connect to ray. Notice if connected to existing cluster, you don't specify resources.
2019-09-01 16:53:02 -07:00
ray.init(address=<address>)
2019-08-05 23:33:14 -07:00
2020-05-19 18:33:58 -07:00
.. _omp-num-thread-note:
2020-02-11 14:15:38 -08:00
.. note ::
2020-05-19 18:33:58 -07:00
Ray sets the environment variable `` OMP_NUM_THREADS=1 `` by default. This is done
to avoid performance degradation with many workers (issue #6998). You can
2020-02-11 14:15:38 -08:00
override this by explicitly setting `` OMP_NUM_THREADS `` . `` OMP_NUM_THREADS `` is commonly
used in numpy, PyTorch, and Tensorflow to perform multit-threaded linear algebra.
In multi-worker setting, we want one thread per worker instead of many threads
per worker to avoid contention.
2020-05-19 18:33:58 -07:00
2019-08-05 23:33:14 -07:00
Logging and Debugging
---------------------
Each Ray session will have a unique name. By default, the name is
`` session_{timestamp}_{pid} `` . The format of `` timestamp `` is
`` %Y-%m-%d_%H-%M-%S_%f `` (See `Python time format <strftime.org> `__ for details);
the pid belongs to the startup process (the process calling `` ray.init() `` or
the Ray process executed by a shell in `` ray start `` ).
For each session, Ray will place all its temporary files under the
*session directory* . A *session directory* is a subdirectory of the
*root temporary path* (`` /tmp/ray `` by default),
so the default session directory is `` /tmp/ray/{ray_session_name} `` .
You can sort by their names to find the latest session.
Change the *root temporary directory* in one of these ways:
* Pass `` --temp-dir={your temp path} `` to `` ray start ``
* Specify `` temp_dir `` when call `` ray.init() ``
You can also use `` default_worker.py --temp-dir={your temp path} `` to
start a new worker with the given *root temporary directory* .
**Layout of logs** :
.. code-block :: text
/tmp
└── ray
└── session_{datetime}_{pid}
├── logs # for logging
│ ├── log_monitor.err
│ ├── log_monitor.out
│ ├── monitor.err
│ ├── monitor.out
│ ├── plasma_store.err # outputs of the plasma store
│ ├── plasma_store.out
│ ├── raylet.err # outputs of the raylet process
│ ├── raylet.out
│ ├── redis-shard_0.err # outputs of redis shards
│ ├── redis-shard_0.out
│ ├── redis.err # redis
│ ├── redis.out
│ ├── webui.err # ipython notebook web ui
│ ├── webui.out
│ ├── worker-{worker_id}.err # redirected output of workers
│ ├── worker-{worker_id}.out
│ └── {other workers}
└── sockets # for sockets
├── plasma_store
└── raylet # this could be deleted by Ray's shutdown cleanup.
2020-08-27 12:00:16 -07:00
Ports configurations
--------------------
Ray requires bi-directional communication among its nodes in a cluster. Each of node is supposed to open specific ports to receive incoming network requests.
All Nodes
~~~~~~~~~
- `` --node-manager-port `` : Raylet port for node manager. Default: Random value.
- `` --object-manager-port `` : Raylet port for object manager. Default: Random value.
The following options specify the range of ports used by worker processes across machines. All ports in the range should be open.
- `` --min-worker-port `` : Minimum port number worker can be bound to. Default: 10000.
- `` --max-worker-port `` : Maximum port number worker can be bound to. Default: 10999.
Head Node
~~~~~~~~~~~
In addition to ports specified above, the head node needs to open several more ports.
- `` --port `` : Port of GCS. Default: 6379.
- `` --dashboard-port `` : Port for accessing the dashboard. Default: 8265
- `` --gcs-server-port `` : GCS Server port. GCS server is a stateless service that is in charge of communicating with the GCS. Default: Random value.
2019-08-05 23:33:14 -07:00
Redis Port Authentication
-------------------------
Ray instances should run on a secure network without public facing ports.
The most common threat for Ray instances is unauthorized access to Redis,
which can be exploited to gain shell access and run arbitrary code.
The best fix is to run Ray instances on a secure, trusted network.
Running Ray on a secured network is not always feasible.
To prevent exploits via unauthorized Redis access, Ray provides the option to
password-protect Redis ports. While this is not a replacement for running Ray
behind a firewall, this feature is useful for instances exposed to the internet
where configuring a firewall is not possible. Because Redis is
very fast at serving queries, the chosen password should be long.
2019-09-11 20:58:39 -07:00
.. note :: The Redis passwords provided below may not contain spaces.
2019-08-05 23:33:14 -07:00
Redis authentication is only supported on the raylet code path.
To add authentication via the Python API, start Ray using:
.. code-block :: python
2020-08-28 15:03:50 -07:00
ray.init(_redis_password="password")
2019-08-05 23:33:14 -07:00
To add authentication via the CLI or to connect to an existing Ray instance with
password-protected Redis ports:
.. code-block :: bash
ray start [--head] --redis-password="password"
While Redis port authentication may protect against external attackers,
Ray does not encrypt traffic between nodes so man-in-the-middle attacks are
possible for clusters on untrusted networks.
2019-12-17 19:41:19 -08:00
One of most common attack with Redis is port-scanning attack. Attacker scans
open port with unprotected redis instance and execute arbitrary code. Ray
enables a default password for redis. Even though this does not prevent brute
force password cracking, the default password should alleviate most of the
port-scanning attack. Furtheremore, redis and other ray services are bind
to localhost when the ray is started using `` ray.init `` .
2019-08-05 23:33:14 -07:00
See the `Redis security documentation <https://redis.io/topics/security> `__
for more information.
.. _`Apache Arrow`: https://arrow.apache.org/