ray/doc/site/_posts/2017-09-30-ray-0.2-release.markdown

---
layout: post
title: "Ray: 0.2 Release"
excerpt: "This post announces the release of Ray 0.2."
date: 2017-09-30 14:00:00
---

We are pleased to announce the Ray 0.2 release. This release includes the
following:
- substantial [performance improvements to the Plasma object store][1]
- an initial [Jupyter notebook based web UI][2]
- the start of a [scalable reinforcement learning library][3]
- [fault tolerance for actors][4]

## Plasma

Since the last release, the Plasma object store has moved out of the Ray
codebase and is **now being developed as part of [Apache Arrow][5]** (see the
[relevant documentation][10]), so that it can be used as a standalone component
by other projects to leverage high-performance shared memory. In addition, our
Arrow-based serialization libraries have been moved into pyarrow (see the
[relevant documentation][9]).

In 0.2, we've increased the write throughput of the object store to around
15GB/s for large objects (when writing from a single client). Achieving this
performance requires enabling huge pages (to minimize the number of TLB cache
misses). Instructions for doing so are [here][1].

The speed at which objects can be written into the object store is a key
performance metric. For example, it is the bottleneck for [A3C][6] and many
other algorithms.

You can benchmark write throughput as follows.

```python
import numpy as np
import ray

ray.init()

x = np.ones(10 ** 9, dtype=np.uint8)

# Measure the time required to write 1GB to the Plasma store.
%time x_id = ray.put(x)
```

## Web UI

We've built an initial Jupyter-notebook-based web UI for understanding and
debugging application performance. See the [instructions for using the UI][2].
The UI includes a task timeline visualization based on Chrome tracing to see
where tasks were scheduled, how long they took, and what the dependencies
between the tasks were. An example visualization is shown below.

<div align="center">
<img src="{{ site.base-url }}/assets/ray_0.2_release/timeline_visualization.png">
</div>
<div><i>A visualization of the task timeline. Boxes indicate tasks and arrows
indicate data dependencies between tasks.</i></div>
<br />

This type of visualization can immediately expose problems with performance,
scheduling, and load balancing.

The above visualization can be generated on a single machine by the following
script.

```python
import ray
import time

ray.init()

@ray.remote
def f(x):
    time.sleep(0.001)
    return 1

@ray.remote
def g(*ys):
    return 1

time.sleep(1)

x = 1
for _ in range(3):
    ys = [f.remote(x) for _ in range(8)]
    x = g.remote(*ys)
```

## RLlib

We've begun implementing a [scalable reinforcement learning library][3] based on
Ray. So far it includes implementations of the following algorithms.

- Proximal policy optimization (PPO)
- Deep Q-learning (DQN)
- Asynchronous advantage actor critic (A3C)
- Evolution Strategies (ES)

The DQN, A3C, and ES implementations are based on the [OpenAI baselines][7].
Example code for training is available and can be used as follows.

```
# On a single machine.
python ray/python/ray/rllib/train.py --alg=PPO \
                                     --env=CartPole-v0

# On a cluster.
python ray/python/ray/rllib/train.py --alg=PPO \
                                     --env=CartPole-v0 \
                                     --redis-address=<head-node-ip>:6379
```

This uses [proximal policy optimization][11] to train a policy to control an
agent in the CartPole environment.

Running this (on the Humanoid-v1 environment to train a walking humanoid robot)
on AWS with a cluster of fifteen m4.16xlarge instances and one p2.16xlarge
instance, we achieve a reward of over 6000 in around 35 minutes. The rollouts
are parallelized over 512 physical cores and the policy optimization is
parallelized over 6 GPUs. Relevant hyperparameters for this experiment are
[here][8].

This RL library is under development, and we are looking for contributions
including implementations of more algorithms.

## Actor fault tolerance

We've enabled fault tolerance for actors as follows. If a machine fails, the
actors that were running on that machine are recreated on other machines, and
the tasks that previously executed on those actors are replayed to recreate the
state of the actor. We are working on improving the speed of recovery by
enabling actor state to be restored from checkpoints. See [an overview of fault
tolerance in Ray][4].

[1]: http://docs.ray.io/en/master/plasma-object-store.html
[2]: http://docs.ray.io/en/master/webui.html
[3]: http://docs.ray.io/en/master/rllib.html
[4]: http://docs.ray.io/en/master/fault-tolerance.html
[5]: https://github.com/apache/arrow
[6]: http://docs.ray.io/en/master/example-a3c.html
[7]: https://github.com/openai/baselines
[8]: https://github.com/ray-project/ray/blob/b020e6bf1fb00d0745371d8674146d4a5b75d9f0/python/ray/rllib/test/tuned_examples.sh#L11
[9]: https://arrow.apache.org/docs/python/ipc.html#arbitrary-object-serialization
[10]: https://arrow.apache.org/docs/python/plasma.html
[11]: https://arxiv.org/abs/1707.06347