mirror of
https://github.com/vale981/ray
synced 2025-03-08 11:31:40 -05:00
87 lines
3.5 KiB
Markdown
87 lines
3.5 KiB
Markdown
---
|
|
layout: post
|
|
title: "Ray: 0.4 Release"
|
|
excerpt: "This post announces the release of Ray 0.4."
|
|
date: 2018-03-27 14:00:00
|
|
---
|
|
|
|
We are pleased to announce the 0.4 release of [Ray][1]. This release introduces
|
|
improvements to Ray's scheduling, substantial backend improvements, and the
|
|
start of [Pandas on Ray][2], as well as many improvements to [RLlib][3] and
|
|
[Tune][4] (you can read more about the improvements in RLlib in [this blog
|
|
post][5]).
|
|
|
|
To upgrade to the latest version, run
|
|
|
|
```
|
|
pip install -U ray
|
|
```
|
|
|
|
## Scheduling
|
|
|
|
This release includes two major changes to the scheduling behavior: spillback
|
|
scheduling and support for custom resources.
|
|
|
|
### Spillback Scheduling
|
|
|
|
Because Ray takes a bottom-up hierarchical approach to scheduling in which
|
|
scheduling decisions are made by local schedulers on each machine (in order to
|
|
avoid a centralized scheduling bottleneck), scheduling decisions are often made
|
|
with a slightly stale view of the system state. As a consequence, it is possible
|
|
for race conditions to occur and for too many tasks to be assigned to a single
|
|
machine. For example, consider the case in which there are 100 GPUs scattered
|
|
around the cluster and workers on the various machines submit a total of exactly
|
|
100 GPU tasks. If these tasks take a while to execute, then the desired behavior
|
|
is for one task to be assigned to each GPU. However, without a single scheduling
|
|
bottleneck like a centralized scheduler, too many tasks may be assigned to a
|
|
single machine resulting in delays.
|
|
|
|
Spillback scheduling provides a mechanism for correcting for bad scheduling
|
|
decisions. At a high-level, if a local scheduler decides that it does not want
|
|
to execute a task, it can spill the task back to the global scheduler (or in
|
|
principle to another local scheduler). This mechanism allows us to achieve
|
|
perfect load balancing along with high task throughput.
|
|
|
|
### Custom Resources
|
|
|
|
Remote functions and actors now support scheduling with [arbitrary custom
|
|
resource requirements][6]. We can specify that a remote function requires 1 CPU, 2
|
|
GPUs, and 3 of some custom resource with syntax like the following.
|
|
|
|
```python
|
|
@ray.remote(num_cpus=1, num_gpus=2, resources={'Custom': 3})
|
|
def f():
|
|
pass
|
|
```
|
|
|
|
To tell Ray that a node has 6 of the custom resource, Ray should be started on
|
|
that machine with a command like the following.
|
|
|
|
```
|
|
ray start ... --resources='{"Custom": 6}'
|
|
```
|
|
|
|
Custom resources can be used for a variety of different purposes. For example,
|
|
the can be used to do bookkeeping of a concrete resource like memory, or to
|
|
indicate that a particular dataset lives on a particular machine, or to give
|
|
machines certain roles (such as a "parameter server" machine or a "worker"
|
|
machine).
|
|
|
|
## Libraries
|
|
|
|
This release also includes the start of [Pandas on Ray][2], which is a project
|
|
aimed at speeding up [Pandas][7] DataFrames. It also includes substantial
|
|
improvements to [RLlib][3], such as high-quality implementations of algorithms
|
|
like [Ape-X][8], and substantial improvements to [Tune][4], such as
|
|
implementations of state of the art algorithms like [Population Based
|
|
Training][9].
|
|
|
|
[1]: https://github.com/ray-project/ray
|
|
[2]: https://rise.cs.berkeley.edu/blog/pandas-on-ray/
|
|
[3]: http://ray.readthedocs.io/en/latest/rllib.html
|
|
[4]: http://ray.readthedocs.io/en/latest/tune.html
|
|
[5]: https://rise.cs.berkeley.edu/blog/distributed-policy-optimizers-for-scalable-and-reproducible-deep-rl/
|
|
[6]: http://ray.readthedocs.io/en/latest/resources.html
|
|
[7]: https://pandas.pydata.org/
|
|
[8]: https://arxiv.org/abs/1803.00933
|
|
[9]: http://ray.readthedocs.io/en/latest/pbt.html
|