* Rollback.
* Fix import tree error by adding meaningful error and replacing by tf.nest wherever possible.
* LINT.
* LINT.
* Fix.
* Fix log-likelihood test case failing on travis.
* Revert "fix (#7681)"
This reverts commit 6a12a31b2e.
* Revert "[core] Pin lineage of plasma objects that are still in scope (#7499)"
This reverts commit 014929e658.
* Add a lineage_ref_count to References
* Refactor TaskManager to store TaskEntry as a struct
* Refactor to fix deadlock between TaskManager and ReferenceCounter
Add references to task specs
* Pin TaskEntries and References in the lineage of any ObjectIDs in scope
* Fix deadlock, convert num_plasma_returns to a set of object IDs
* fix unit tests
* Feature flag
* Do not release lineage for objects that were promoted to plasma
* fix build
* fix build
* Remove num executions
* Simplify num return values
* Remove unused
* doc
* Set num returns
* Move lineage pinning flag to ReferenceCounter
* comments
* Fixes
* Remove irrelevant test (replaced by ref counting tests)
* Windows compatibility bug fixes
* Use WSASend/WSARecv as WSASendMsg/WSARecvMsg do not work with TCP sockets
* Clean up some TODOs
* Fix duplicate compilations
* RedisAsioClient boost::asio::error::connection_reset
Co-authored-by: Mehrdad <noreply@github.com>
* enable
* Turn on eager eviction
* Shorten tests and drain ReferenceCounter
* Don't force kill actor handles that have gone out of scope, lint
* Fix locks
* Cleanup Plasma Async Callback (#7452)
* [rllib][tune] fix some nans (#7611)
* Change /tmp to platform-specific temporary directory (#7529)
* [Serve] UI Improvements (#7569)
* bugfix about test_dynres.py (#7615)
Co-authored-by: senlin.zsl <senlin.zsl@antfin.com>
* Java call Python actor method use actor.call (#7614)
* bug fix about useage of absl::flat_hash_map::erase and absl::flat_hash_set::erase (#7633)
Co-authored-by: senlin.zsl <senlin.zsl@antfin.com>
* [Java] Make both `RayActor` and `RayPyActor` inheriting from `BaseActor` (#7462)
* [Java] Fix the issue that the cached value in `RayObject` is serialized (#7613)
* Add failure tests to test_reference_counting (#7400)
* Fix typo in asyncio documentation (#7602)
* Fix segfault
* debug
* Force kill actor
* Fix test
* adding directory and node_provider entry for azure autoscaler
* adding initial cut at azure autoscaler functionality, needs testing and node_provider methods need updating
* adding todos and switching to auth file for service principal authentication
* adding role / scope to service principal
* resolving issues with app credentials
* adding retry for setting service principal role
* typo and adding retry to nic creation
* adding nsg to config, moving nic/public ip to node provider, cleanup node_provider, leaving in NodeProvider stub for testing
* linting
* updating cleanup and fixing bugs
* adding directory and node_provider entry for azure autoscaler
* adding initial cut at azure autoscaler functionality, needs testing and node_provider methods need updating
* adding todos and switching to auth file for service principal authentication
* adding role / scope to service principal
* resolving issues with app credentials
* adding retry for setting service principal role
* typo and adding retry to nic creation
* adding nsg to config, moving nic/public ip to node provider, cleanup node_provider, leaving in NodeProvider stub for testing
* linting
* updating cleanup and fixing bugs
* minor fixes
* first working version :)
* added tag support
* added msi identity intermediate
* enable MSI through user managed identity
* updated schema
* extend yaml schema
remove service principal code
add re-use of managed user identity
* fix rg_id
* fix logging
* replace manual cluster yaml validation with json schema
- improved error message
- support for intellisense in VSCode (or other IDEs)
* run linting
* updating yaml configs and formatting
* updating yaml configs and formatting
* typo in example config
* pulling default config from example-full
* resetting min, init worker prop
* adding docs for azure autoscaler and fixing status
* add azure to docs, fix config for spot instances, update azure provider to avoid caching issues during deployment
* fix for default subscription in azure node provider
* vm dev image build
* minor change
* keeping example-full.yaml in autoscaler/azure, updating azure example config
* linting azure config
* extending retries on azure config
* lint
* support for internal ips, fix to azure docs, and new azure gpu example config
* linting
* Update python/ray/autoscaler/azure/node_provider.py
Co-Authored-By: Richard Liaw <rliaw@berkeley.edu>
* revert_this
* remove_schema
* updating configs and removing ssh keygen, tweak azure node provider terminate
* minor tweaks
Co-authored-by: Markus Cozowicz <marcozo@microsoft.com>
Co-authored-by: Ubuntu <marcozo@mc-ray-jumpbox.chcbtljllnieveqhw3e4c1ducc.xx.internal.cloudapp.net>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>