* reply to the owner only after the actor is successfully created.
* reply immediately if the actor is already created
* fix comment
* add test_actor_creation_task provided by @Stephanie Wang
Co-authored-by: senlin.zsl <senlin.zsl@antfin.com>
SAC (both torch and tf versions) are showing issues (crashes) due to numeric instabilities in the SquashedGaussian distribution (sampling + logp after extreme NN outputs).
This PR fixes these. Stable MuJoCo learning (HalfCheetah) has been confirmed on both tf and torch versions. A Distribution stability test (using extreme NN outputs) has been added for SquashedGaussian (can be used for any other type of distribution as well).
* Added small section on installation when using Anaconda. Also fixed an obsolete link to Anaconda.
* Delete more temporary directories when running the doc "make clean".
* Fine-tuning the core Ray API documentation
* Fix doc lines that were too long
Co-authored-by: Dean Wampler <dean@concurrentthought.com>
* Checkpoint the image-models example
* Update cluster definition
* Fix copyright info
* Use original args
* Checkpoint fixes
* Add README
* Add some missing features
* Format
* Get rid of the unused Namespace class
* Address comments
* Link the imagenet example in docs
* Cleanup
* Fix lint
* Policy-classes cleanup and torch/tf unification.
- Make Policy abstract.
- Add `action_dist` to call to `extra_action_out_fn` (necessary for PPO torch).
- Move some methods and vars to base Policy
(from TFPolicy): num_state_tensors, ACTION_PROB, ACTION_LOGP and some more.
* Fix `clip_action` import from Policy (should probably be moved into utils altogether).
* - Move `is_recurrent()` and `num_state_tensors()` into TFPolicy (from DynamicTFPolicy).
- Add config to all Policy c'tor calls (as 3rd arg after obs and action spaces).
* Add `config` to c'tor call to TFPolicy.
* Add missing `config` to c'tor call to TFPolicy in marvil_policy.py.
* Fix test_rollout_worker.py::MockPolicy and BadPolicy classes (Policy base class is now abstract).
* Fix LINT errors in Policy classes.
* Implement StatefulPolicy abstract methods in test cases: test_multi_agent_env.py.
* policy.py LINT errors.
* Create a simple TestPolicy to sub-class from when testing Policies (reduces code in some test cases).
* policy.py
- Remove abstractmethod from `apply_gradients` and `compute_gradients` (these are not required iff `learn_on_batch` implemented).
- Fix docstring of `num_state_tensors`.
* Make QMIX torch Policy a child of TorchPolicy (instead of Policy).
* QMixPolicy add empty implementations of abstract Policy methods.
* Store Policy's config in self.config in base Policy c'tor.
* - Make only compute_actions in base Policy's an abstractmethod and provide pass
implementation to all other methods if not defined.
- Fix state_batches=None (most Policies don't have internal states).
* Cartpole tf learning.
* Cartpole tf AND torch learning (in ~ same ts).
* Cartpole tf AND torch learning (in ~ same ts). 2
* Cartpole tf (torch syntax-broken) learning (in ~ same ts). 3
* Cartpole tf AND torch learning (in ~ same ts). 4
* Cartpole tf AND torch learning (in ~ same ts). 5
* Cartpole tf AND torch learning (in ~ same ts). 6
* Cartpole tf AND torch learning (in ~ same ts). Pendulum tf learning.
* WIP.
* WIP.
* SAC torch learning Pendulum.
* WIP.
* SAC torch and tf learning Pendulum and Cartpole after cleanup.
* WIP.
* LINT.
* LINT.
* SAC: Move policy.target_model to policy.device as well.
* Fixes and cleanup.
* Fix data-format of tf keras Conv2d layers (broken for some tf-versions which have data_format="channels_first" as default).
* Fixes and LINT.
* Fixes and LINT.
* Fix and LINT.
* WIP.
* Test fixes and LINT.
* Fixes and LINT.
Co-authored-by: Sven Mika <sven@Svens-MacBook-Pro.local>
* docs
* Apply suggestions from code review
Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>
* ok
Co-authored-by: Kristian Hartikainen <kristian.hartikainen@gmail.com>
* Add a lineage_ref_count to References
* Refactor TaskManager to store TaskEntry as a struct
* Refactor to fix deadlock between TaskManager and ReferenceCounter
Add references to task specs
* Pin TaskEntries and References in the lineage of any ObjectIDs in scope
* Fix deadlock, convert num_plasma_returns to a set of object IDs
* fix unit tests
* Feature flag
* Do not release lineage for objects that were promoted to plasma
* fix build
* fix build
* Remove num executions
* Remove num executions
* Add pinned locations to ReferenceCounter, empty handler for node death
* Fix num returns for actor tasks, fix Put return value
* Add regression test
* Clear pinned locations and callbacks on node removal
* Clear pinned locations and callbacks on node removal
* Simplify num return values
* Remove unused
* doc
* tmp
* Set num returns
* Move lineage pinning flag to ReferenceCounter
* comments
* Recover from plasma failures by pinning a new copy
* Basic object reconstruction, no concurrent reqs yet
* reconstruction test suite and a few fixes:
- fix for disabling lineage
- fix for updating submitted task refs
* Handle concurrent attempts to recover the same object
* Fix deadlock in DrainAndShutdown
* Revert "[core] Revert lineage pinning (#7499) (#7692)"
This reverts commit ba86a02b37.
* debug rllib
* debug rllib
* turn on all rllib tests again
* debug rllib
* Fix drain bug, check number of pending tasks
* revert rllib debug
* remove todo
* Trigger rllib tests
* revert rllib debug commit
* Split out logic into ObjectRecoveryManager
* Fix python tests
* Refactor to remove dependency on gcs client
* Unit tests
* Move pinned at node ID to direct memory store
* Unit test fixes and lint
* simplify and more tests
* Add ResubmitTask test for TaskManager
* Doc
* fix build
* comments
* Fix
* debug
* Update
* fix
* Fix
* Fix bad status handling, unit test
* Fix build
* regression test
* Cancel lease requests
* unit tests
* update
* fix build
* Move unit test
* Set success
* Ref to shared_ptr
* debug
* Revert "debug"
This reverts commit 6b2c25805a8223b41ffcc2d88d903e16ea415089.
* Bad move
* Fix bad status handling
* Subset and Errors
* fixup! Subset and Errors
* fixup! Subset and Errors
* fixup! Subset and Errors
* fixup! Subset and Errors
* fixup! Subset and Errors
* fixup! Subset and Errors
* fixup! Subset and Errors