Add comments to clarify purpose of new scheduler queues (#12730)

* update * clarify * update
2025-03-06 10:31:39 -05:00 · 2020-12-11 11:53:09 -08:00 · 2020-12-11 11:53:09 -08:00 · 4ad4463be6
commit 4ad4463be6
parent 9ded69fdaa
2 changed files with 5 additions and 1 deletions
--- a/doc/source/cluster/autoscaling.rst
+++ b/doc/source/cluster/autoscaling.rst
@ -10,7 +10,7 @@ Basics

 The Ray Cluster Launcher will automatically enable a load-based autoscaler. The scheduler will look at the task, actor, and placement group resource demands from the cluster, and tries to add the minimum set of nodes that can fulfill these demands. When nodes are idle for more than a timeout, they will be removed, down to the ``min_workers`` limit. The head node is never removed.

-To avoid launching too many nodes at once, the number of nodes allowed to be pending is limited by the ``upscaling_speed`` setting. By default it is set to ``1.0``, which means the cluster can grow in size by at most ``100%`` at a time (doubling in size each time). This fraction can be set to as high as needed, e.g., ``99999`` to allow the cluster to quickly grow to its max size.
+To avoid launching too many nodes at once, the number of nodes allowed to be pending is limited by the ``upscaling_speed`` setting. By default it is set to ``1.0``, which means the cluster can be growing in size by at most ``100%`` at any time (e.g., if the cluster currently has 20 nodes, at most 20 pending launches are allowed). This fraction can be set to as high as needed, e.g., ``99999`` to allow the cluster to quickly grow to its max size.

 In more detail, the autoscaler implements the following control loop:

--- a/src/ray/raylet/scheduling/cluster_task_manager.h
+++ b/src/ray/raylet/scheduling/cluster_task_manager.h
@ -129,11 +129,15 @@ class ClusterTaskManager {
  NodeInfoGetter get_node_info_;

  /// Queue of lease requests that are waiting for resources to become available.
+  /// Tasks move from scheduled -> dispatch | waiting.
  std::unordered_map<SchedulingClass, std::deque<Work>> tasks_to_schedule_;

  /// Queue of lease requests that should be scheduled onto workers.
+  /// Tasks move from scheduled | waiting -> dispatch.
  std::unordered_map<SchedulingClass, std::deque<Work>> tasks_to_dispatch_;
+
  /// Tasks waiting for arguments to be transferred locally.
+  /// Tasks move from waiting -> dispatch.
  absl::flat_hash_map<TaskID, Work> waiting_tasks_;

  /// Determine whether a task should be immediately dispatched,