Add comments to clarify purpose of new scheduler queues (#12730)

* update

* clarify

* update
This commit is contained in:
Eric Liang 2020-12-11 11:53:09 -08:00 committed by GitHub
parent 9ded69fdaa
commit 4ad4463be6
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
2 changed files with 5 additions and 1 deletions

View file

@ -10,7 +10,7 @@ Basics
The Ray Cluster Launcher will automatically enable a load-based autoscaler. The scheduler will look at the task, actor, and placement group resource demands from the cluster, and tries to add the minimum set of nodes that can fulfill these demands. When nodes are idle for more than a timeout, they will be removed, down to the ``min_workers`` limit. The head node is never removed.
To avoid launching too many nodes at once, the number of nodes allowed to be pending is limited by the ``upscaling_speed`` setting. By default it is set to ``1.0``, which means the cluster can grow in size by at most ``100%`` at a time (doubling in size each time). This fraction can be set to as high as needed, e.g., ``99999`` to allow the cluster to quickly grow to its max size.
To avoid launching too many nodes at once, the number of nodes allowed to be pending is limited by the ``upscaling_speed`` setting. By default it is set to ``1.0``, which means the cluster can be growing in size by at most ``100%`` at any time (e.g., if the cluster currently has 20 nodes, at most 20 pending launches are allowed). This fraction can be set to as high as needed, e.g., ``99999`` to allow the cluster to quickly grow to its max size.
In more detail, the autoscaler implements the following control loop:

View file

@ -129,11 +129,15 @@ class ClusterTaskManager {
NodeInfoGetter get_node_info_;
/// Queue of lease requests that are waiting for resources to become available.
/// Tasks move from scheduled -> dispatch | waiting.
std::unordered_map<SchedulingClass, std::deque<Work>> tasks_to_schedule_;
/// Queue of lease requests that should be scheduled onto workers.
/// Tasks move from scheduled | waiting -> dispatch.
std::unordered_map<SchedulingClass, std::deque<Work>> tasks_to_dispatch_;
/// Tasks waiting for arguments to be transferred locally.
/// Tasks move from waiting -> dispatch.
absl::flat_hash_map<TaskID, Work> waiting_tasks_;
/// Determine whether a task should be immediately dispatched,