ray/docker
Dmitri Gekhtman b2b442297e
[autoscaler] Fix initialization artifacts (#22570)
This PR fixes initializations artifacts related to the load metric summary and autoscaler summary.

Load metrics summaries are defined to be Falsey if the autoscaler has never received a resource message from the GCS.
We skip most autoscaler actions if load metrics is Falsey, because it doesn't makes sense to autoscale without load metrics. This also allows us to execute the TODO here: #22348 (comment) and remove the time.wait().

As for the autoscaler summary, it is possible for autoscaler.summary() to error outside of an autoscaler update in this scenario:
The very first call to NodeProvider.non_terminated_nodes fails, self.non_terminated_nodes remains a None object, and autoscaler.summary() fails trying to get an attribute of this None object.
The result is a confusing error message, as in #22515. This PR fixes that.

Closes #22515
2022-02-24 20:05:44 -08:00
..
autoscaler Revert "Revert "[Docker] Support multiple CUDA Versions (#19505)" (#19756)" (#19763) 2021-10-26 17:32:56 -07:00
base-deps Upgrade cython to 0.29.26 for py310 (#21244) 2021-12-26 20:26:08 -08:00
development [docker] Fix docker 'development' build failure (#13289) 2021-02-25 14:57:30 -08:00
examples [RLlib] Upgrade gym version to 0.21 and deprecate pendulum-v0. (#19535) 2021-11-03 16:24:00 +01:00
kuberay-autoscaler [autoscaler] Fix initialization artifacts (#22570) 2022-02-24 20:05:44 -08:00
ray [docs] Fix broken links in documentation and add linkcheck to documentation (#20030) 2021-11-04 13:19:43 -07:00
ray-deps Updates to azure autoscaler for authentication and dependency updates (#19603) 2021-12-16 09:23:32 -08:00
ray-ml [Tune; Testing] Revert to 3.7 (undone by accident by previous PR); + some minor comment cleanups. (#20031) 2021-11-04 10:58:34 +01:00
ray-worker-container Upgrade cython to 0.29.26 for py310 (#21244) 2021-12-26 20:26:08 -08:00
retag-lambda [CI] Format Python code with Black (#21975) 2022-01-29 18:41:57 -08:00
fix-docker-latest.sh [Docker] Update echo in fix-docker-latest.sh (#22123) 2022-02-07 08:50:36 -08:00
README.md [CI] Fix-Up Docker Build (Use Python) (#11139) 2020-10-12 14:22:51 -07:00

Overview of how the ray images are built:

Images without a "-cpu" or "-gpu" tag are built on ubuntu/focal. They are just an alias for -cpu (e.g. ray:latest is the same as ray:latest-cpu).

ubuntu/focal
└── base-deps:cpu
    └── ray-deps:cpu
        └── ray:cpu
            └── ray-ml:cpu

nvidia/cuda
└── base-deps:gpu
    └── ray-deps:gpu
        └── ray:gpu
            └── ray-ml:gpu