ray/release/ml_user_tests/tune_rllib/run_connect_tests.py

"""Connect tests for Tune & RLlib.

Runs a couple of hard learning tests using Anyscale connect.
"""

import json
import os
import time

import ray
from ray.rllib.examples.tune.framework import run

if __name__ == "__main__":
    addr = os.environ.get("RAY_ADDRESS")
    job_name = os.environ.get("RAY_JOB_NAME", "rllib_connect_tests")
    if addr is not None and addr.startswith("anyscale://"):
        ray.init(address=addr, job_name=job_name)
    else:
        ray.init(address="auto")

    start_time = time.time()
    exp_analysis = run()
    end_time = time.time()

    result = {
        "time_taken": end_time - start_time,
        "trial_states": {t.config["framework"]: t.status for t in exp_analysis.trials},
    }

    test_output_json = os.environ.get("TEST_OUTPUT_JSON", "/tmp/release_test_out.json")
    with open(test_output_json, "wt") as f:
        json.dump(result, f)

    print("Ok.")
[RLlib] Add an RLlib Tune experiment to UserTest suite. (#19807) * Add an RLlib Tune experiment to UserTest suite. * Add ray.init() * Move example script to example/tune/, so it can be imported as module. * add __init__.py so our new module will get included in python wheel. * Add block device to RLlib test instances. * Reduce disk size a little bit. * Add metrics reporting * Allow max of 5 workers to accomodate all the worker tasks. * revert disk size change. * Minor updates * Trigger build * set max num workers * Add a compute cfg for autoscaled cpu and gpu nodes. * use 1gpu instance. * install tblib for debugging worker crashes. * Manually upgrade to pytorch 1.9.0 * -y * torch=1.9.0 * install torch on driver * Add an RLlib Tune experiment to UserTest suite. * Add ray.init() * Move example script to example/tune/, so it can be imported as module. * add __init__.py so our new module will get included in python wheel. * Add block device to RLlib test instances. * Reduce disk size a little bit. * Add metrics reporting * Allow max of 5 workers to accomodate all the worker tasks. * revert disk size change. * Minor updates * Trigger build * set max num workers * Add a compute cfg for autoscaled cpu and gpu nodes. * use 1gpu instance. * install tblib for debugging worker crashes. * Manually upgrade to pytorch 1.9.0 * -y * torch=1.9.0 * install torch on driver * bump timeout * Write a more informational result dict. * Revert changes to compute config files that are not used. * add smoke test * update * reduce timeout * Reduce the # of env per worker to 1. * Small fix for getting trial_states * Trigger build * simply result dict * lint * more lint * fix smoke test Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com> 2021-11-03 17:04:27 -07:00			`"""Connect tests for Tune & RLlib.`

			`Runs a couple of hard learning tests using Anyscale connect.`
			`"""`

			`import json`
			`import os`
			`import time`

			`import ray`
			`from ray.rllib.examples.tune.framework import run`

			`if __name__ == "__main__":`
			`addr = os.environ.get("RAY_ADDRESS")`
			`job_name = os.environ.get("RAY_JOB_NAME", "rllib_connect_tests")`
			`if addr is not None and addr.startswith("anyscale://"):`
			`ray.init(address=addr, job_name=job_name)`
			`else:`
			`ray.init(address="auto")`

			`start_time = time.time()`
			`exp_analysis = run()`
			`end_time = time.time()`

			`result = {`
			`"time_taken": end_time - start_time,`
[CI] Format Python code with Black (#21975) See #21316 and #21311 for the motivation behind these changes. 2022-01-29 18:41:57 -08:00			`"trial_states": {t.config["framework"]: t.status for t in exp_analysis.trials},`
[RLlib] Add an RLlib Tune experiment to UserTest suite. (#19807) * Add an RLlib Tune experiment to UserTest suite. * Add ray.init() * Move example script to example/tune/, so it can be imported as module. * add __init__.py so our new module will get included in python wheel. * Add block device to RLlib test instances. * Reduce disk size a little bit. * Add metrics reporting * Allow max of 5 workers to accomodate all the worker tasks. * revert disk size change. * Minor updates * Trigger build * set max num workers * Add a compute cfg for autoscaled cpu and gpu nodes. * use 1gpu instance. * install tblib for debugging worker crashes. * Manually upgrade to pytorch 1.9.0 * -y * torch=1.9.0 * install torch on driver * Add an RLlib Tune experiment to UserTest suite. * Add ray.init() * Move example script to example/tune/, so it can be imported as module. * add __init__.py so our new module will get included in python wheel. * Add block device to RLlib test instances. * Reduce disk size a little bit. * Add metrics reporting * Allow max of 5 workers to accomodate all the worker tasks. * revert disk size change. * Minor updates * Trigger build * set max num workers * Add a compute cfg for autoscaled cpu and gpu nodes. * use 1gpu instance. * install tblib for debugging worker crashes. * Manually upgrade to pytorch 1.9.0 * -y * torch=1.9.0 * install torch on driver * bump timeout * Write a more informational result dict. * Revert changes to compute config files that are not used. * add smoke test * update * reduce timeout * Reduce the # of env per worker to 1. * Small fix for getting trial_states * Trigger build * simply result dict * lint * more lint * fix smoke test Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com> 2021-11-03 17:04:27 -07:00			`}`

[CI] Format Python code with Black (#21975) See #21316 and #21311 for the motivation behind these changes. 2022-01-29 18:41:57 -08:00			`test_output_json = os.environ.get("TEST_OUTPUT_JSON", "/tmp/release_test_out.json")`
[RLlib] Add an RLlib Tune experiment to UserTest suite. (#19807) * Add an RLlib Tune experiment to UserTest suite. * Add ray.init() * Move example script to example/tune/, so it can be imported as module. * add __init__.py so our new module will get included in python wheel. * Add block device to RLlib test instances. * Reduce disk size a little bit. * Add metrics reporting * Allow max of 5 workers to accomodate all the worker tasks. * revert disk size change. * Minor updates * Trigger build * set max num workers * Add a compute cfg for autoscaled cpu and gpu nodes. * use 1gpu instance. * install tblib for debugging worker crashes. * Manually upgrade to pytorch 1.9.0 * -y * torch=1.9.0 * install torch on driver * Add an RLlib Tune experiment to UserTest suite. * Add ray.init() * Move example script to example/tune/, so it can be imported as module. * add __init__.py so our new module will get included in python wheel. * Add block device to RLlib test instances. * Reduce disk size a little bit. * Add metrics reporting * Allow max of 5 workers to accomodate all the worker tasks. * revert disk size change. * Minor updates * Trigger build * set max num workers * Add a compute cfg for autoscaled cpu and gpu nodes. * use 1gpu instance. * install tblib for debugging worker crashes. * Manually upgrade to pytorch 1.9.0 * -y * torch=1.9.0 * install torch on driver * bump timeout * Write a more informational result dict. * Revert changes to compute config files that are not used. * add smoke test * update * reduce timeout * Reduce the # of env per worker to 1. * Small fix for getting trial_states * Trigger build * simply result dict * lint * more lint * fix smoke test Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com> 2021-11-03 17:04:27 -07:00			`with open(test_output_json, "wt") as f:`
			`json.dump(result, f)`

			`print("Ok.")`