Please review **e2e.py and test_suite belonging to your team**!
This is the first part of https://docs.google.com/document/d/16IrwerYi2oJugnRf5hvzukgpJ6FAVEpB6stH_CiNMjY/edit#
This PR adds a team name to each test suite.
If the name is not specified, it will be reported as unspecified.
If you are running a local test, and if the new test suite doesn't have a team name specified, it will raise an exception (in this way, we can avoid missing team names in the future).
Note that we will aggregate all of test config into a single file, nightly_test.yaml.
This adds memory monitoring to scalability envelope tests so that we can compare the peak memory usage for both nonHA & HA.
NOTE: the current way of adding memory monitor is not great, and we should implement fixture to support this better, but that's not in progress yet.
Why are these changes needed?
In the past, there was a regression the placement group creation time gets slower as time goes. I believe the issue is fixed in the master, but this PR verifies if that's actually fixed.
This PR adds a long running test for the placement group. There are 2 purposes of the test.
Make sure the placement group creation / removal doesn't get slower as time goes. The test basically measure the first 20 iteration P50 creation time and run very long iteration. After all iteration, it checks if the p50 creation time is not too slow compared to the initial round.
Make sure placement group removal / creation works consistently for a long time without an issue.
Q: Should we make it a real long running test? (that runs for a day?)
## Why are these changes needed?
We have concern that grpc based broadcasting might have negative impact on pg related workload. This test is to ensure it's running well before merging.
## Related issue number
#19438
## Why are these changes needed?
There are two issues fixed in this PR:
- make sure wait for session count alive node
- upgrade the machine to match what's tested in oss ray.
## Related issue number
https://github.com/ray-project/ray/issues/19084
* Revert "[nightly] Deflaky nightly test many_nodes_actor_test (#18582)"
This reverts commit fc6a739e4b.
* move to large test
Co-authored-by: Yi Cheng <chengyidna@gmail.com>
* in progress
* in progress
* almost done
* Lint
* almost done
* All tests are available now
* Change the test a little more stressful
* Modify paramter to make tests a little more stressful