Commit graph

4 commits

Author SHA1 Message Date
matthewdeng
86718071fe
[tune] Increase volume size for long running pbt failure (#27163) (#27247)
Currently running into an issue:

Cluster startup Failed. Error: RuntimeError: botocore.exceptions.ClientError: An error occurred (InvalidBlockDeviceMapping) when calling the RunInstances operation: Volume of size 202GB is smaller than  snapshot 'snap-02c4e6a0ad06cf3d6', expect size >= 400GB

Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
2022-07-29 01:16:40 -07:00
Kai Fricke
a0bba30153
[tune/release] Make long running distributed PBT cheaper (#24782)
The test currently uses 6 GPUs out of 8 available, so we can get rid of one instance.

Savings will be 25% for one instance less (3 instead of 4).
2022-05-17 18:23:31 +01:00
Jiajun Yao
cc84f18176
Increase disk for long running distributed tests (#18855) 2021-09-23 17:52:35 +01:00
Kai Fricke
153a8b8fec
[release] convert tune release tests (#15913) 2021-06-01 11:19:15 -07:00