Commit graph

6 commits

Author SHA1 Message Date
Christian Barra
070e27ea7a Add external module as a node scaler. (#1703)
* WIP: add external module as a node scaler.

* Fix style.

* Add tests, fix style issues.

* Fix typos.

* Fix test error.

* Fix node provider path.

* Add function to spli pkg from class.

* Add doc.

* Correct documentation.

* Debugging....

* Debugging....

* Add __init__.py to tests.

* add more output for debugging

* Add more test, fix error with import.

* Add a small detail to the documentation.

* Update autoscaler.py
2018-03-17 16:59:13 -07:00
Richard Liaw
162d063f0d
[autoscaler/tune] Optional YAML Fields + Fix Pretty Printing for Tune (#1541) 2018-03-04 23:35:58 -08:00
Richard Liaw
1cd2703cac
[autoscaler] Docker Support (#1505) 2018-02-20 00:24:01 -08:00
Philipp Moritz
44792530a9 fix autoscaler test (#1411) 2018-01-10 13:18:34 -08:00
Eric Liang
b6c42f96be
Auto-scale ray clusters based on GCS load metrics (#1348)
This adds (experimental) auto-scaling support for Ray clusters based on GCS load metrics. The auto-scaling algorithm is as follows:

Based on current (instantaneous) load information, we compute the approximate number of "used workers". This is based on the bottleneck resource, e.g. if 8/8 GPUs are used in a 8-node cluster but all the CPUs are idle, the number of used nodes is still counted as 8. This number can also be fractional.
We scale that number by 1 / target_utilization_fraction and round up to determine the target cluster size (subject to the max_workers constraint). The autoscaler control loop takes care of launching new nodes until the target cluster size is met.
When a node is idle for more than idle_timeout_minutes, we remove it from the cluster if that would not drop the cluster size below min_workers.
Note that we'll need to update the wheel in the example yaml file after this PR is merged.
2017-12-31 14:39:57 -08:00
Eric Liang
f5ea44338e EC2 cluster setup scripts and initial version of auto-scaler (#1311) 2017-12-15 23:56:39 -08:00