mwtian
48599aef9e
Roll forward to run train_small in client mode. ( #16610 )
2021-06-23 08:52:08 +01:00
mwtian
f5f23448fc
Support downloading and testing wheels for Python 3.9. ( #16586 )
2021-06-21 12:02:22 -07:00
Chen Shen
853caea146
[tests]migrate test-many-tasks/test-dead-actors to nightly tests ( #16469 )
...
* init commit
* Update release/nightly_tests/nightly_tests.yaml
Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
* Update release/nightly_tests/nightly_tests.yaml
Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
2021-06-18 18:43:25 -07:00
Kai Fricke
aecc4c8d28
[release] fix sgd base image, microbenchmark timeout, revert xgboost train_small to not use connect ( #16532 )
2021-06-18 11:40:04 +01:00
SangBin Cho
6dc4032d19
Set the 500GB block device for a single node test ( #16493 )
2021-06-16 22:37:30 -07:00
Kai Fricke
9352cb781c
[release tests] Fix microbenchmark base image, network overhead cluster wait time, add long running tests ( #16355 )
2021-06-16 21:37:17 +01:00
mwtian
2f7d535253
[Test] Use Ray client in XGBoost train_small release test ( #16319 )
2021-06-16 14:39:32 +01:00
Antoni Baum
2fb10e6730
[SGD] Add support for native Torch AMP in SGD ( #16382 )
...
* SGD native AMP initial commit
* SGD native amp second pass
* Update docs
* Update TorchTrainer doc
* Temp fix release test
* Update release/sgd_tests/sgd_gpu/sgd_gpu_app_config.yaml
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
2021-06-15 17:48:21 -07:00
Amog Kamsetty
f3ad50fe6a
[SGD] Rename release tests ( #16410 )
...
Test failures unrelated
2021-06-15 17:16:40 +01:00
SangBin Cho
f3ab162c5e
Fix nightly release test issues. ( #16419 )
2021-06-15 00:43:08 -07:00
Eric Liang
f93ca2b673
Make it much simpler to turn on event stats ( #16401 )
2021-06-14 09:51:24 -07:00
SangBin Cho
eb7344069b
[Test] Improving tests ( #16368 )
...
* Improve testing
* Fix tsets.
2021-06-11 18:29:22 -07:00
matthewdeng
9c36ff81fa
[release] add golden notebook tests for dask/xgboost and modin/xgboost ( #16231 )
...
Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
2021-06-11 10:03:04 +01:00
Eric Liang
ae0e38b86d
Remove legacy feature flags / features ( #16349 )
2021-06-10 09:31:38 -07:00
SangBin Cho
c8a5d7ba85
[TEST] Additional data processing nightly test ( #16078 )
...
* in progress
* in progress
* almost done
* Lint
* almost done
* All tests are available now
* Change the test a little more stressful
* Modify paramter to make tests a little more stressful
2021-06-09 22:38:53 -07:00
Clark Zinzow
ca68bf1e93
[Release] Update release test configs for 1.4 release. ( #16292 )
...
* Updated scalability envelope tests for 1.4.
* Update data processing release test for 1.4.
2021-06-08 00:15:25 -07:00
mwtian
c2a2a6f7c3
Make it easier to run asan and wheel release tests ( #16242 )
2021-06-07 22:54:22 -07:00
SangBin Cho
3572d0837e
[Test] Dask on ray sort nightly ( #16213 )
...
* Make dask on ray sort works
* lint
* revert unrelated change
2021-06-06 15:58:48 -07:00
SangBin Cho
03c33cf443
add a streaming shuffl etest ( #16258 )
2021-06-06 15:58:14 -07:00
Clark Zinzow
227f252c39
[Release] Release 1.4.0 stress tests, scalability envelope, and microbenchmark release logs ( #16228 )
2021-06-04 16:36:41 -07:00
Kai Fricke
153a8b8fec
[release] convert tune release tests ( #15913 )
2021-06-01 11:19:15 -07:00
Sven Mika
c9d220bcda
[RLlib] Upgrade RLlib regression test scripts to new testing tool - RLlib release logs for 1.4. ( #16080 )
2021-06-01 17:39:18 +02:00
Amog Kamsetty
da6f28d777
[Release] Add multi-node, multi-GPU SGD release test ( #16046 )
2021-05-31 16:23:04 -07:00
SangBin Cho
9fa3b9f6f3
[Nightly test] Test non streaming shuffle ( #16150 )
2021-05-31 15:28:02 -07:00
SangBin Cho
94dc06d852
[Nightly test] improve error detection ( #16102 )
...
* improve error detection
* improve gitignore
* fix
2021-05-27 00:33:21 -07:00
SangBin Cho
ee1ccb569d
[Test] Nightly shuffle test ( #15998 )
...
* shuffle daily test update.
* lint
* Improve testing.
* Download the real nightly.
* Addressed code review.
* fix typo
* fix issue
* fix the broken release test
* Updated the test.
2021-05-24 15:33:31 -07:00
mwtian
5462c6e7de
Fix link to release checklist from release process doc. ( #15793 )
2021-05-13 13:34:54 -07:00
SangBin Cho
259fcbd5bd
[Pubsub] Generalize the pubsub interface and adapt it for ref counting protocol ( #15446 )
...
* Add mock code first
* In the initial progress.
* Fix the number error
* In progress.
* in more pgoress.
* in progress.
* lint.
* Prototype done.
* Fix compilation bug.
* Now it is working with reference counting.
* Remove template.
* lint.
* Fixed issues.
* Fix reference count test.
* Reference count test passes now.
* Fixed the test array problem
* Addressed code review.
* lint.
* Addressed half of code review.
* Fix tests.
* Addressed the most critical issue.
* Make subscriber thread-safe.
* Revert "Make subscriber thread-safe."
This reverts commit 9a6a52197cfa8463ab60dfaae9530ad3c0ed8790.
* Fixed test failures. The only failure now is the asan failure.
* Reset test suites and see if it fixes the issue.
* Fix a flaky test
* Addressed code review.
2021-05-13 09:29:02 -07:00
Eric Liang
0dfd43c61b
Add nightly release test directory and add shuffle release test ( #15671 )
...
* update
* udpate
* update
* update
* update
* Adjust script/release test json
* remove
* update
* lint
Co-authored-by: Kai Fricke <kai@anyscale.com>
2021-05-08 14:21:55 -07:00
Kai Fricke
8db2e5c23a
[release] Move xgboost tune small + microbenchmark release test to new release automation ( #15619 )
2021-05-08 20:38:39 +01:00
Kai Fricke
1d52ab819f
[release] release 1.3.0 results and test updates ( #15366 )
...
Convert a number of release tests and add logs for release 1.3.0
2021-05-04 22:10:04 +01:00
Jenna Kwon
15da948214
Support object spilling mode and data load failure mode in dask_on_ra… ( #15601 )
...
* Support object spilling mode and data load failure mode in dask_on_ray_large_scale_test.py
* Remove freq and time decimation
Co-authored-by: Jenna Kwon <jkkwon@amazon.com>
2021-05-04 10:57:49 -07:00
Amog Kamsetty
ebc44c3d76
[CI] Upgrade flake8 to 3.9.1 ( #15527 )
...
* formatting
* format util
* format release
* format rllib/agents
* format rllib/env
* format rllib/execution
* format rllib/evaluation
* format rllib/examples
* format rllib/policy
* format rllib utils and tests
* format streaming
* more formatting
* update requirements files
* fix rllib type checking
* updates
* update
* fix circular import
* Update python/ray/tests/test_runtime_env.py
* noqa
2021-05-03 14:23:28 -07:00
SangBin Cho
df9329160e
[Tests] Dask on ray release test ( #15256 )
...
* done.
* Linting.
* Update readme
* Update.
* Fix issues.
2021-04-15 10:30:17 -07:00
SangBin Cho
d0e83c43ca
[Release Test] Modify parameter to reduce stress ( #15048 )
...
* Fix.
* Fix.
2021-04-14 18:27:20 -07:00
Richard Liaw
59bf3a7b22
ray[cluster] -> ray[default] ( #15251 )
2021-04-14 09:37:04 -07:00
Edward Oakes
0f9d1bb223
Serve failure release test fix ( #15276 )
...
This test is currently not tested in CI
2021-04-13 17:49:29 +01:00
Edward Oakes
e4ca337e16
[serve] Change remaining tests to use deployment API ( #15167 )
2021-04-08 08:15:38 -05:00
Richard Liaw
e72f6b0377
Fix ray[full] -> ray[cluster] #15112
...
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2021-04-05 09:55:00 -07:00
Kai Fricke
b366500938
[tune] fix long running release test WIP ( #14866 )
...
- Use placement groups
- Introduce time between checks for failure testing
- Use gloo instead of nccl
2021-03-25 11:03:22 +01:00
Amog Kamsetty
233f174984
Update release instructions ( #14882 )
2021-03-24 12:41:50 -07:00
SangBin Cho
5f7ce293fe
[Test] Large scale dask on ray test ( #14340 )
...
* Add a test.
* Add a test.
* d
* Modify the release doc.
* Addressed code review.
2021-03-23 11:00:35 -07:00
Kai Fricke
7364a7a327
[tune] Move Optuna to ask(fixed_distributions) interface ( #14731 )
...
Adjusting to changes in Optuna 2.6.0. Old interface was marked as deprecated.
2021-03-22 12:25:37 +01:00
Ian Rodney
eb12033612
[Code Cleanup] Switch to use ray.util.get_node_ip_address() ( #14741 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-03-18 13:10:57 -07:00
Kai Fricke
4014168928
[tune] Introduce durable()
wrapper to convert trainables into durable trainables ( #14306 )
...
* [tune] Introduce `durable()` wrapper to convert trainables into durable trainables
* Fix wrong check
* Improve docs, add FAQ for tackling overhead
* Fix bugs in `tune.with_parameters`
* Update doc/source/tune/api_docs/trainable.rst
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
* Update doc/source/tune/_tutorials/_faq.rst
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-02-26 13:59:28 +01:00
SangBin Cho
5740b2391e
Add multi node data processing cluster.yaml ( #14198 )
2021-02-19 16:16:55 -08:00
Kai Fricke
a0f73cf3f7
[xgboost] Update XGBoost release test configs ( #13941 )
...
* Update XGBoost release test configs
* Use GPU containers
* Fix elastic check
* Use spot instances for GPU
* Add debugging output
* Fix success check, failure checking, outputs, sync behavior
* Update release checklist, rename mounts
2021-02-17 23:00:49 +01:00
Alex Wu
4846a6c2d0
Release process update ( #13798 )
2021-02-15 11:40:49 -08:00
Kai Fricke
1ef2a6790c
[tune] add scalability release tests ( #13986 )
...
* Add scalability tests
* Network overhead cluster
* Update xgboost tests
* Document release tests
* Don't raise on failed trial
* Update to multi node yamls
* Update yamls
* Revert xgboost test changes
* Fix import
* Update release/tune_tests/scalability_tests/workloads/test_bookkeeping_overhead.py
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
* Pass aws credentials (WIP)
* Update durable trainable example
* Update xgboost sweep
* Change xgboost scope, fix durable trainable stop condition
* Fix max depth to limit total test length
* Add cluster information to test descriptions. Update release checklist/process docs
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-02-10 17:16:31 +01:00
Kai Fricke
1e113d2e6e
[tune/xgboost] Update release test docs ( #13880 )
...
* Update release test docs
* Update
2021-02-04 13:10:56 +01:00