## Why are these changes needed?
This change adds Python publisher and subscriber in `gcs_utils.py`, and GRPC handler on GCS for publishing iva GCS. Error info is migrated to use the GCS-based pubsub, if feature flag `RAY_gcs_grpc_based_pubsub=true`.
Also, add a `--gcs-address` flag to some Python processes. It is not set anywhere yet, but will be set aftering Redis-less bootstrapping work.
Unit tests are added for the Python publisher and subscriber. Migrated error info publishers and subscribers are tested with existing unit tests, e.g. tests calling `ray._private.test_utils.get_error_message()` to ensure error info is published.
GCS based pubsub has gaps in handling deadline, cancelled requests and GCS restarts. So 3 more unit tests are disabled in the `HA GCS` mode. They will be addressed in a separate change.
## Related issue number
* Fix QMix, SAC, and MADDPA too.
* Unpin gym and deprecate pendulum v0
Many tests in rllib depended on pendulum v0,
however in gym 0.21, pendulum v0 was deprecated
in favor of pendulum v1. This may change reward
thresholds, so will have to potentially rerun
all of the pendulum v1 benchmarks, or use another
environment in favor. The same applies to frozen
lake v0 and frozen lake v1
Lastly, all of the RLlib tests and have
been moved to python 3.7
* Add gym installation based on python version.
Pin python<= 3.6 to gym 0.19 due to install
issues with atari roms in gym 0.20
* Reformatting
* Fixing tests
* Move atari-py install conditional to req.txt
* migrate to new ale install method
* Fix QMix, SAC, and MADDPA too.
* Unpin gym and deprecate pendulum v0
Many tests in rllib depended on pendulum v0,
however in gym 0.21, pendulum v0 was deprecated
in favor of pendulum v1. This may change reward
thresholds, so will have to potentially rerun
all of the pendulum v1 benchmarks, or use another
environment in favor. The same applies to frozen
lake v0 and frozen lake v1
Lastly, all of the RLlib tests and have
been moved to python 3.7
* Add gym installation based on python version.
Pin python<= 3.6 to gym 0.19 due to install
issues with atari roms in gym 0.20
Move atari-py install conditional to req.txt
migrate to new ale install method
Make parametric_actions_cartpole return float32 actions/obs
Adding type conversions if obs/actions don't match space
Add utils to make elements match gym space dtypes
Co-authored-by: Jun Gong <jungong@anyscale.com>
Co-authored-by: sven1977 <svenmika1977@gmail.com>
* Unpin gym and deprecate pendulum v0
Many tests in rllib depended on pendulum v0,
however in gym 0.21, pendulum v0 was deprecated
in favor of pendulum v1. This may change reward
thresholds, so will have to potentially rerun
all of the pendulum v1 benchmarks, or use another
environment in favor. The same applies to frozen
lake v0 and frozen lake v1
Lastly, all of the RLlib tests and Tune tests have
been moved to python 3.7
* fix tune test_sampler::testSampleBoundsAx
* fix re-install ray for py3.7 tests
Co-authored-by: avnishn <avnishn@uw.edu>
* Revert "[CI] Remove config that disables Bazel test result cache (#18701)"
This reverts commit 098ff36faa.
* Remove all RLlib tests from BUILD that currently fail.
Co-authored-by: sven1977 <svenmika1977@gmail.com>
* Convert worker pool to queue
* Start up to backlog size more workers
* fixes
* Prestart workers according to num available CPUs
* lint
* x
* Update src/ray/raylet/worker_pool.h
Co-authored-by: Eric Liang <ekhliang@gmail.com>
* Update src/ray/raylet/worker_pool.h
Co-authored-by: Eric Liang <ekhliang@gmail.com>
* dedicated workers
* Fix tests
* x
* fix
* asan
* asan
* Workers can only exec tasks with same job ID
* size_t for runtime env hash, fix unit tests
* include job ID in runtime env hash, remove from worker registration msg
* x
* conflict
* debug
* Schedule and dispatch periodically, skip if no new tasks
* Update src/ray/common/task/task_spec.h
Co-authored-by: Eric Liang <ekhliang@gmail.com>
* Update src/ray/raylet/scheduling/cluster_task_manager.h
Co-authored-by: Eric Liang <ekhliang@gmail.com>
* Update src/ray/raylet/worker_pool.h
Co-authored-by: Eric Liang <ekhliang@gmail.com>
Co-authored-by: Eric Liang <ekhliang@gmail.com>
* [ci/tune] Add Tune GPU pipeline step to CI
* cont.
* add sgd gpu tests
* format yaml, fix imports
* install horovod; fix line wrapping
* set GPU per worker to 0.5
* fix import
* move test to 4gpu machine
* fix lint
* lint
* set visible devices
* pull in tf gpu fix
* Fix Tune GPU pipeline step
* nit
* Disable GPU tests until we have some
* Re-add empty rllib tests
Co-authored-by: Matthew Deng <matthew.j.deng@gmail.com>
* Revert "Revert "[Build] include minimal debug info in C++ build; upgrade clang-format to 12 (#18840)" (#18886)"
This reverts commit f851a072f3.
* use gcc 8
* Add Bazel config for building with llvm. Upgrade C++ std to 17.
* Fix redis. Try fixing asan and tsan
* Fix asan and format
* Update comments.
Co-authored-by: Chen Shen <scv119@gmail.com>
* cleanup tests and errors for clients
* Fix lock and async get
* rerun
* Avoid running callback under lock. Make lock non-reentrant
* Add all necessary apis
* Removed unused APIs