mirror of
https://github.com/vale981/ray
synced 2025-03-06 10:31:39 -05:00
147 lines
5.4 KiB
ReStructuredText
147 lines
5.4 KiB
ReStructuredText
Web UI
|
|
======
|
|
|
|
The Ray web UI includes tools for debugging Ray jobs. The following
|
|
image shows an example of using the task timeline for performance debugging:
|
|
|
|
.. image:: timeline.png
|
|
|
|
Dependencies
|
|
------------
|
|
|
|
To use the UI, you will need to install the following.
|
|
|
|
.. code-block:: bash
|
|
|
|
pip install jupyter ipywidgets bokeh
|
|
|
|
If you see an error like
|
|
|
|
.. code-block:: bash
|
|
|
|
Widget Javascript not detected. It may not be installed properly.
|
|
|
|
Then you may need to run the following.
|
|
|
|
.. code-block:: bash
|
|
|
|
jupyter nbextension enable --py --sys-prefix widgetsnbextension
|
|
|
|
**Note:** If you are building Ray from source, then you will also need a
|
|
``python2`` executable.
|
|
|
|
Running the Web UI
|
|
------------------
|
|
|
|
Currently, the web UI is launched automatically when ``ray.init`` is called. The
|
|
command will print a URL of the form:
|
|
|
|
.. code-block:: text
|
|
|
|
=============================================================================
|
|
View the web UI at http://localhost:8889/notebooks/ray_ui92131.ipynb?token=89354a314e5a81bf56e023ad18bda3a3d272ee216f342938
|
|
=============================================================================
|
|
|
|
If you are running Ray on your local machine, then you can head directly to that
|
|
URL with your browser to see the Jupyter notebook. Otherwise, if you are using
|
|
Ray remotely, such as on EC2, you will need to ensure that port is open on that
|
|
machine. Typically, when you ssh into the machine, you can also port forward
|
|
with the ``-L`` option as such:
|
|
|
|
.. code-block:: bash
|
|
|
|
ssh -L <local_port>:localhost:<remote_port> <user>@<ip-address>
|
|
|
|
So for the above URL, you would use the port 8889. The Jupyter notebook attempts
|
|
to run on port 8888, but if that fails it tries successive ports until it finds
|
|
an open port.
|
|
|
|
You can also open the port on the machine as well, which is not recommended for
|
|
security as the machine would be open to the Internet. In this case, you would
|
|
need to replace localhost by the public IP the remote machine is using.
|
|
|
|
Once you have navigated to the URL, start the UI by clicking on the following.
|
|
|
|
.. code-block:: text
|
|
|
|
Kernel -> Restart and Run all
|
|
|
|
Features
|
|
--------
|
|
|
|
The UI supports a search for additional details on Task IDs and Object IDs, a
|
|
task timeline, a distribution of task completion times, and time series for CPU
|
|
utilization and cluster usage.
|
|
|
|
Task and Object IDs
|
|
~~~~~~~~~~~~~~~~~~~
|
|
|
|
These widgets show additional details about an object or task given the ID. If
|
|
you have the object in Python, the ID can be found by simply calling ``.hex`` on
|
|
an Object ID as below:
|
|
|
|
.. code-block:: python
|
|
|
|
# This will return a hex string of the ID.
|
|
objectid = ray.put(1)
|
|
literal_id = objectid.hex()
|
|
|
|
and pasting in the returned string with no quotes. Otherwise, they can be found
|
|
in the task timeline in the output area below the timeline when you select a
|
|
task.
|
|
|
|
For Task IDs, they can be found by searching for an object ID the task created,
|
|
or via the task timeline in the output area.
|
|
|
|
The additional details for tasks here can also be found in the task timeline;
|
|
the search just provides an easier method to find a specific task when you have
|
|
millions.
|
|
|
|
Task Timeline
|
|
~~~~~~~~~~~~~
|
|
|
|
There are three components to this widget: the controls for the widget at the
|
|
top, the timeline itself, and the details area at the bottom. In the controls,
|
|
you first select whether you want to select a subset of tasks via the time they
|
|
were completed or by the number of tasks. You can control the percentages either
|
|
via a double sided slider, or by setting specific values in the text boxes. If
|
|
you choose to select by the number of tasks, then entering a negative number N
|
|
in the text field denotes the last N tasks run, while a positive value N denotes
|
|
the first N tasks run. If there are ten tasks and you enter -1 into the field,
|
|
then the slider will show 90% to 100%, where 1 would show 0% to 10%. Finally,
|
|
you can choose if you want edges for task submission (if a task invokes another
|
|
task) or object dependencies (if the result from a task is passed to another
|
|
task) to be added, and if you want the different phases of a task broken up into
|
|
separate tasks in the timeline.
|
|
|
|
For the timeline, each node has its own dropdown with a timeline, and each row
|
|
in the dropdown is a worker. Moving and zooming are handled by selecting the
|
|
appropiate icons on the floating taskbar. The first is selection, the second
|
|
panning, the third zooming, and the fourth timing. To shown edges, you can
|
|
enable Flow Events in View Options.
|
|
|
|
If you have selection enabled in the floating taskbar and select a task, then
|
|
the details area at the bottom will fill up with information such as task ID,
|
|
function ID, and the duration in seconds of each phase of the task.
|
|
|
|
Time Distributions and Time Series
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
The completion time distribution, CPU utilization, and cluster usage all have
|
|
the same task selection controls as the task timeline.
|
|
|
|
The task completion time distribution tracks the histogram of completion tasks
|
|
for all tasks selected.
|
|
|
|
CPU utilization gives you a count of how many CPU cores are being used at a
|
|
given time. As typically each core has a worker assigned to it, this is
|
|
equivalent to utilization of the workers running in Ray.
|
|
|
|
Cluster Usage gives you a heat-map with time on the x-axis, node IP addresses on
|
|
the y-axis, and coloring based on how many tasks were running on that node at
|
|
that given time.
|
|
|
|
Troubleshooting
|
|
---------------
|
|
|
|
The Ray timeline visualization may not work in Firefox or Safari.
|