ray/site/_posts/2018-03-27-ray-0.4-release.markdown
2018-04-02 00:23:56 -07:00

87 lines
3.5 KiB
Markdown

---
layout: post
title: "Ray: 0.4 Release"
excerpt: "This post announces the release of Ray 0.4."
date: 2018-03-27 14:00:00
---
We are pleased to announce the 0.4 release of [Ray][1]. This release introduces
improvements to Ray's scheduling, substantial backend improvements, and the
start of [Pandas on Ray][2], as well as many improvements to [RLlib][3] and
[Tune][4] (you can read more about the improvements in RLlib in [this blog
post][5]).
To upgrade to the latest version, run
```
pip install -U ray
```
## Scheduling
This release includes two major changes to the scheduling behavior: spillback
scheduling and support for custom resources.
### Spillback Scheduling
Because Ray takes a bottom-up hierarchical approach to scheduling in which
scheduling decisions are made by local schedulers on each machine (in order to
avoid a centralized scheduling bottleneck), scheduling decisions are often made
with a slightly stale view of the system state. As a consequence, it is possible
for race conditions to occur and for too many tasks to be assigned to a single
machine. For example, consider the case in which there are 100 GPUs scattered
around the cluster and workers on the various machines submit a total of exactly
100 GPU tasks. If these tasks take a while to execute, then the desired behavior
is for one task to be assigned to each GPU. However, without a single scheduling
bottleneck like a centralized scheduler, too many tasks may be assigned to a
single machine resulting in delays.
Spillback scheduling provides a mechanism for correcting for bad scheduling
decisions. At a high-level, if a local scheduler decides that it does not want
to execute a task, it can spill the task back to the global scheduler (or in
principle to another local scheduler). This mechanism allows us to achieve
perfect load balancing along with high task throughput.
### Custom Resources
Remote functions and actors now support scheduling with [arbitrary custom
resource requirements][6]. We can specify that a remote function requires 1 CPU, 2
GPUs, and 3 of some custom resource with syntax like the following.
```python
@ray.remote(num_cpus=1, num_gpus=2, resources={'Custom': 3})
def f():
pass
```
To tell Ray that a node has 6 of the custom resource, Ray should be started on
that machine with a command like the following.
```
ray start ... --resources='{"Custom": 6}'
```
Custom resources can be used for a variety of different purposes. For example,
the can be used to do bookkeeping of a concrete resource like memory, or to
indicate that a particular dataset lives on a particular machine, or to give
machines certain roles (such as a "parameter server" machine or a "worker"
machine).
## Libraries
This release also includes the start of [Pandas on Ray][2], which is a project
aimed at speeding up [Pandas][7] DataFrames. It also includes substantial
improvements to [RLlib][3], such as high-quality implementations of algorithms
like [Ape-X][8], and substantial improvements to [Tune][4], such as
implementations of state of the art algorithms like [Population Based
Training][9].
[1]: https://github.com/ray-project/ray
[2]: https://rise.cs.berkeley.edu/blog/pandas-on-ray/
[3]: http://ray.readthedocs.io/en/latest/rllib.html
[4]: http://ray.readthedocs.io/en/latest/tune.html
[5]: https://rise.cs.berkeley.edu/blog/distributed-policy-optimizers-for-scalable-and-reproducible-deep-rl/
[6]: http://ray.readthedocs.io/en/latest/resources.html
[7]: https://pandas.pydata.org/
[8]: https://arxiv.org/abs/1803.00933
[9]: http://ray.readthedocs.io/en/latest/pbt.html