ray/python
Alex Wu c2abfdb2f7
[autoscaler][observability] Observability into when/why nodes fail to launch (#27697)
This change adds launch failures to the recent failures section of ray status when a node provider provides structured error information. For node providers which don't provide this optional information, there is now change in behavior.

For reference, when trying to launch a node type with a quota issue, it looks like the following. InsufficientInstanceCapacity is the standard term for this issue..

```
======== Autoscaler status: 2022-08-11 22:22:10.735647 ========
Node status
---------------------------------------------------------------
Healthy:
 1 cpu_4_ondemand
Pending:
 quota, 1 launching
Recent failures:
 quota: InsufficientInstanceCapacity (last_attempt: 22:22:00)

Resources
---------------------------------------------------------------
Usage:
 0.0/4.0 CPU
 0.00/9.079 GiB memory
 0.00/4.539 GiB object_store_memory

Demands:
 (no resource demands)
```

```
available_node_types:
    cpu_4_ondemand:
        node_config:
            InstanceType: m4.xlarge
            ImageId: latest_dlami
        resources: {}
        min_workers: 0
        max_workers: 0
    quota:
        node_config:
            InstanceType: p4d.24xlarge
            ImageId: latest_dlami
        resources: {}
        min_workers: 1
        max_workers: 1
```
Co-authored-by: Alex <alex@anyscale.com>
2022-08-15 18:14:29 -07:00
..
ray [autoscaler][observability] Observability into when/why nodes fail to launch (#27697) 2022-08-15 18:14:29 -07:00
requirements [tune] pin pymoo (#27311) 2022-07-31 01:06:27 -07:00
asv.conf.json [docs] Move all /latest links to /master (#11897) 2020-11-10 10:53:28 -08:00
build-wheel-macos-arm64.sh [python3.10] build python310 wheels (#24829) 2022-05-16 12:36:33 -07:00
build-wheel-macos.sh [python3.10] build python310 wheels (#24829) 2022-05-16 12:36:33 -07:00
build-wheel-manylinux2014.sh [python3.10] build python310 wheels (#24829) 2022-05-16 12:36:33 -07:00
build-wheel-windows.sh [python3.10] build python310 wheels (#24829) 2022-05-16 12:36:33 -07:00
MANIFEST.in [hotfix] Revert "Exclude Bazel build files from Ray wheels (#25679)" (#25950) 2022-06-20 20:59:48 -07:00
README-building-wheels.md [build] Build wheels with manylinux2014 (#11621) 2020-11-03 19:36:32 -08:00
requirements.txt Revert "Revert "[serve] Integrate and Document Bring-Your-Own Gradio Applications"" (#27662) 2022-08-12 15:12:20 -07:00
requirements_linters.txt Add import sorting to format.sh (#25678) 2022-06-13 14:08:51 -07:00
requirements_ml_docker.txt [AIR] Add distributed torch_geometric example (#23580) 2022-04-21 09:48:43 -07:00
setup.py Force grpcio to be >= 1.42.0 for python 3.10 (#27269) 2022-08-08 17:37:18 -07:00