Commit graph

35 commits

Author SHA1 Message Date
Amog Kamsetty
6d776976c1
[Train] Fix multi node horovod bug (#22564)
Closes #20956
2022-03-22 16:22:53 -07:00
Jiajun Yao
bab19e8e68
Add perf metrics for test_many_tasks.py (#23318)
Add perf metrics for test_many_tasks.py
Use the new smoke test structure
2022-03-22 16:16:42 -07:00
Kai Fricke
e48c407b13
[release] long running many drivers: Use SDK file manager (#23379)
This will make the test pass again: https://buildkite.com/ray-project/release-tests-branch/builds/226#_
2022-03-21 09:56:59 -07:00
Stephanie Wang
5ab634f285
[core] Disable threaded_actors_stress_test (#23292)
* disable

* smoke
2022-03-18 15:57:53 -07:00
Eric Liang
015181ab9a
Add random access support for Datasets (experimental feature) (#22749)
This PR adds experimental support for random access to datasets. A Dataset can be random access enabled by calling `ds.to_random_access_dataset(key, num_workers=N)`. This creates a RandomAccessDataset.

RandomAccessDataset partitions the dataset across the cluster by the given sort key, providing efficient random access to records via binary search. A number of worker actors are created, each of which has zero-copy access to the underlying sorted data blocks of the Dataset.

Performance-wise, you can expect each worker to provide ~3000 records / second via ``get_async()``, and ~10000 records / second via ``multiget()``.

Since Ray actor calls go direct from worker->worker, throughput scales linearly with the number of workers.
2022-03-17 15:01:12 -07:00
SangBin Cho
b350fe9ee8
[Nightly test] Fix additional k8s issues + add new tests (#23231)
Fix bug from the previous fixes.
Add more tests
Stop using m5.xlarge (not supported now)
There are 2 hard blockers from the infra: 1. Large size disk is not supported. 2. m5.xlarge is not supported. Both are considered as a high priority to be fixed soon.
2022-03-16 16:37:29 -07:00
Stephanie Wang
ce71c5bbbd
[core][tests] Mark threaded_actors_stress_test as unstable 2022-03-16 15:31:19 -07:00
Kai Fricke
e3987d85c3
[tune] Mark cloud OSS release tests as unstable (#23240)
These tests have been flaky for a while. Until this is addressed, mark them as unstable.
2022-03-16 17:37:58 +00:00
Kai Fricke
830238cce2
[ci/release] Migrate ML user tests (#22953)
Most recent tests:

https://buildkite.com/ray-project/release-tests-branch/builds/156
https://buildkite.com/ray-project/release-tests-branch/builds/158
2022-03-14 11:50:16 +00:00
Kai Fricke
430ea3e636
[ci/release] Migrate golden notebook tests (#22949)
Migrating golden notebook tests to new release test package.
Tests are passing: https://buildkite.com/ray-project/release-tests-branch/builds/155
2022-03-13 21:39:41 +00:00
Kai Fricke
956ad95d67
[ci/release] Fix release test config (#23122)
Currently the test is failing due to an invalid config (merged before validation was properly enforced).
2022-03-13 19:48:34 +00:00
Kai Fricke
76a939c820
[ci/release] Migrate long running (+distributed) tests (#22955)
Migrating to new release package.

https://buildkite.com/ray-project/release-tests-branch/builds/103
Tests pass: https://buildkite.com/ray-project/release-tests-branch/builds/143#_
2022-03-13 18:47:17 +00:00
SangBin Cho
8c1a6f9138
[Nightly Test] Fix a dataset test (#23106)
Fix a broken dataset test (due to incorrect working dir)
2022-03-12 08:16:08 -08:00
SangBin Cho
c0f8de9c3c
[Nightly tests] Run benchmark tests on k8s as well (#23100)
Run benchmark tests on k8s as well.

Note that until k8s cluster stability is confirmed, we will run the same tests twice at AWS and k8s. Once all benchmark tests look stable, we will start full migration
2022-03-11 19:40:37 -08:00
SangBin Cho
97383e4c1b
[Nightly test] Fix a broken nightly test due to the wrong config (#23097) 2022-03-11 16:47:06 -08:00
SangBin Cho
2b38fe89e2
[Nightly tests] Migrate rest of core tests (#23085)
MIgrate the rest of core tests
2022-03-11 10:41:14 -08:00
Kai Fricke
a8bed94ed6
[ci/release] Always use full cluster address (#23067)
Not using the full cluster address is deprecated and breaks Job usage for uploads/downloads: https://buildkite.com/ray-project/release-tests-branch/builds/135#2a03e47b-6a9a-42ff-9346-905725eb8d09
2022-03-11 16:31:21 +00:00
SangBin Cho
965d609627
[Nightly test] Fix a minor syntax issue for core nightly tests (#23069)
Add frequency to smoke tests
Remove unnecessary alerts
2022-03-11 04:58:40 -08:00
Kai Fricke
5b2d58674b
[ci/release] Migrate horovod tests (#22951)
Migrating horovod tests to new release package.

https://buildkite.com/ray-project/release-tests-branch/builds/125
2022-03-11 09:53:29 +00:00
SangBin Cho
ebac18d163
[Nightly test] Support Job based file manager + runner (#22860)
This PR supports the job-based file manager and runner. It will be the backbone of k8s migration.

The PR handles edge cases that originally existed in the old e2e.py job-based runners.
2022-03-10 15:03:50 -08:00
SangBin Cho
92b50ff5da
Migrate multi nightly tests (#23005) 2022-03-11 01:32:10 +09:00
SangBin Cho
4fa294ca49
[Nightly tests] Stop running broken tests (#22993) 2022-03-10 06:59:51 -08:00
SangBin Cho
e88abe4c8e
[Nightly tests] migrated most of daily tests (#22960)
* migrated most of daily tests

* Addressed code review.
2022-03-10 05:49:16 -08:00
Kai Fricke
007cf03d7a
[ci/release] Migrate RLLib tests (#22967)
Migrate to new release package.

https://buildkite.com/ray-project/release-tests-branch/builds/111
2022-03-10 10:26:03 +00:00
Kai Fricke
fee4065daf
[ci/release] Migrate SGD tests (#22966)
Migrate to new release package.

https://buildkite.com/ray-project/release-tests-branch/builds/110
2022-03-10 10:23:50 +00:00
Kai Fricke
614dc6b511
[ci/release] Migrate Serve tests (#22965)
Migrate to new release package.

https://buildkite.com/ray-project/release-tests-branch/builds/109
2022-03-10 10:23:25 +00:00
Kai Fricke
ccda1555cc
[ci/release] Migrate Runtime Env tests (#22963)
Migrating to new release test package.

https://buildkite.com/ray-project/release-tests-branch/builds/108
2022-03-10 10:22:57 +00:00
Kai Fricke
18d535f290
[ci/release] Migrate LightGBM tests (#22952)
Note that LightGBM release tests were previously not enabled.
https://buildkite.com/ray-project/release-tests-branch/builds/113
https://buildkite.com/ray-project/release-tests-branch/builds/114

Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
2022-03-10 08:14:31 +00:00
SangBin Cho
549527687f
Migrate scalability tests (#22901)
This PR migrates scalability tests to the new infra.

I had to copy the benchmarks folder to the release folder to make it work. I will remove some unnecessary files (e.g., benchmark.yaml or wait_for_cluster file) Alternatively we can support a different path than /release from the tool, but I think this way is cleaner. I am open to suggestion though cc @krfricke
2022-03-08 17:22:41 -08:00
Kai Fricke
c57abb693b
[ci/release] Add frequency to core nightly test (#22905)
Breaks the scheduled build: https://buildkite.com/ray-project/release-tests-branch/builds/82#3994f5e1-6da3-4c70-8c30-bdcfb1fec851

We should enforce schema validation soon.
2022-03-08 17:44:20 +00:00
SangBin Cho
0137fc8e23
[Tests] Add microbenchmark to the new infra test (#22861)
Verified it works. It also addresses the frequency comments from the previous PR
2022-03-08 05:58:49 -08:00
SangBin Cho
9d0148dbbe
[Test] Migrate the first test to the new infra (#22770)
This migrate the simplest nightly test to the new infra. I will also explore k8s migration with this test
2022-03-06 18:24:54 -08:00
Kai Fricke
d06c3ffd6f
[release] Migrate Tune + XGBoost tests to new infrastructure (#22705)
Migrate XGBoost and Tune tests to new release testing infrastructure.

https://buildkite.com/ray-project/release-tests-branch/builds/50
2022-03-01 08:10:06 +01:00
Kai Fricke
3695408a85
[release] Fix special cases in release test package (e.g. smoke test) (#22442)
Fixing special cases (e.g. smoke tests, long running tests) in the release test package infrastructure. Prepare migration of Tune and XGBoost tests.
2022-02-28 21:05:01 +01:00
Kai Fricke
331b71ea8d
[ci/release] Refactor release test e2e into package (#22351)
Adds a unit-tested and restructured ray_release package for running release tests.

Relevant changes in behavior:

Per default, Buildkite will wait for the wheels of the current commit to be available. Alternatively, users can a) specify a different commit hash, b) a wheels URL (which we will also wait for to be available) or c) specify a branch (or user/branch combination), in which case the latest available wheels will be used (e.g. if master is passed, behavior matches old default behavior).

The main subpackages are:

    Cluster manager: Creates cluster envs/computes, starts cluster, terminates cluster
    Command runner: Runs commands, e.g. as client command or sdk command
    File manager: Uploads/downloads files to/from session
    Reporter: Reports results (e.g. to database)

Much of the code base is unit tested, but there are probably some pieces missing.

Example build (waited for wheels to be built): https://buildkite.com/ray-project/kf-dev/builds/51#_
Wheel build: https://buildkite.com/ray-project/ray-builders-branch/builds/6023
2022-02-16 17:35:02 +00:00