* Create a core set of algorithms tests to run nightly.
* Run release tests under tf, tf2, and torch frameworks.
* Fix
* Add eager_tracing option for tf2 framework.
* make sure core tests can run in parallel.
* cql
* Report progress while running nightly/weekly tests.
* Innclude SAC in nightly lineup.
* Revert changes to learning_tests
* rebrand to performance test.
* update build_pipeline.py with new performance_tests name.
* Record stats.
* bug fix, need to populate experiments dict.
* Alphabetize yaml files.
* Allow specifying frameworks. And do not run tf2 by default.
* remove some debugging code.
* fix
* Undo testing changes.
* Do not run CQL regression for now.
* LINT.
Co-authored-by: sven1977 <svenmika1977@gmail.com>
* Add an RLlib Tune experiment to UserTest suite.
* Add ray.init()
* Move example script to example/tune/, so it can be imported as module.
* add __init__.py so our new module will get included in python wheel.
* Add block device to RLlib test instances.
* Reduce disk size a little bit.
* Add metrics reporting
* Allow max of 5 workers to accomodate all the worker tasks.
* revert disk size change.
* Minor updates
* Trigger build
* set max num workers
* Add a compute cfg for autoscaled cpu and gpu nodes.
* use 1gpu instance.
* install tblib for debugging worker crashes.
* Manually upgrade to pytorch 1.9.0
* -y
* torch=1.9.0
* install torch on driver
* Add an RLlib Tune experiment to UserTest suite.
* Add ray.init()
* Move example script to example/tune/, so it can be imported as module.
* add __init__.py so our new module will get included in python wheel.
* Add block device to RLlib test instances.
* Reduce disk size a little bit.
* Add metrics reporting
* Allow max of 5 workers to accomodate all the worker tasks.
* revert disk size change.
* Minor updates
* Trigger build
* set max num workers
* Add a compute cfg for autoscaled cpu and gpu nodes.
* use 1gpu instance.
* install tblib for debugging worker crashes.
* Manually upgrade to pytorch 1.9.0
* -y
* torch=1.9.0
* install torch on driver
* bump timeout
* Write a more informational result dict.
* Revert changes to compute config files that are not used.
* add smoke test
* update
* reduce timeout
* Reduce the # of env per worker to 1.
* Small fix for getting trial_states
* Trigger build
* simply result dict
* lint
* more lint
* fix smoke test
Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
* [xgboost/release] Add GPU connect user test
* Use scaling cluster
* typo
* Increase xgboost placement group timeout
* Much higher timeout
* Move os environment timeout
* Move os environ
* [dev] install xgboost-ray from master
* GPU xgboost master
* Remove master install after new xgboost release
* Install latest
* Add master test
## Why are these changes needed?
We have concern that grpc based broadcasting might have negative impact on pg related workload. This test is to ensure it's running well before merging.
## Related issue number
#19438