* Revert "Revert "Unhandled exception handler based on local ref counting (#14049)" (#14099)"
This reverts commit b45ae76765.
* reomve test
* fix
* fix
* Add support for object spilling in the ownership-based object directory.
* Move owner address hashmap into pinned_objects_ and objects_pending_spill_.
* Update local object manager tests.
* Feedback and misc. fixes.
* Move spilled unpin callback lambda to std::binded private method.
* Skip test_delete_objects_multi_node test on MacOS for now.
* prepare for head node
* move command runner interface outside _private
* remove space
* Eric
* flake
* min_workers in multi node type
* fixing edge cases
* eric not idle
* fix target_workers to consider min_workers of node types
* idle timeout
* minor
* minor fix
* test
* lint
* eric v2
* eric 3
* min_workers constraint before bin packing
* Update resource_demand_scheduler.py
* Revert "Update resource_demand_scheduler.py"
This reverts commit 818a63a2c86d8437b3ef21c5035d701c1d1127b5.
* reducing diff
* make get_nodes_to_launch return a dict
* merge
* weird merge fix
* auto fill instance types for AWS
* Alex/Eric
* Update doc/source/cluster/autoscaling.rst
* merge autofill and input from user
* logger.exception
* make the yaml use the default autofill
* docs Eric
* remove test_autoscaler_yaml from windows tests
* lets try changing the test a bit
* return test
* lets see
* edward
* Limit max launch concurrency
* commenting frac TODO
* move to resource demand scheduler
* use STATUS UP TO DATE
* Eric
* make logger of gc freed refs debug instead of info
* add cluster name to docker mount prefix directory
* grrR
* fix tests
* moving docker directory to sdk
* move the import to prevent circular dependency
* smallf fix
* ian
* fix max launch concurrency bug to assume failing nodes as pending and consider only load_metric's connected nodes as running
* small fix
* Revert "Revert "[Autoscaler] Monitor refactor for backward compatability. (#13970)" (#14046)"
This reverts commit 6f9d39fb3e.
* fake news
Co-authored-by: Ameer Haj Ali <ameerhajali@ameers-mbp.lan>
Co-authored-by: Alex Wu <alex@anyscale.io>
Co-authored-by: Alex Wu <itswu.alex@gmail.com>
Co-authored-by: Eric Liang <ekhliang@gmail.com>
Co-authored-by: Ameer Haj Ali <ameerhajali@Ameers-MacBook-Pro.local>
* wrote code to enable cancellation of queued non-actor tasks
* minor changes
* bug fixes
* added comments
* rev1
* linting
* making ActorSchedulingQueue::CancelTaskIfFound raise a fatal error
* bug fix
* added two unit tests
* linting
* iterating through pending_normal_tasks starting from end
* fixup! iterating through pending_normal_tasks starting from end
* fixup! fixup! iterating through pending_normal_tasks starting from end
* post merge fixes
* added debugging instructions, pulled Accept() out of guarded loop
* removed debugging instructions, linting
* first commit
* lint
* lint
* added hack to avoid race condition in test stress
* moved hack
* fix test cancel
* removed hack (hopefully no longer needed)
* Revert "removed hack (hopefully no longer needed)"
This reverts commit 99d0e7c91539f290700f50aaaed805dcde04a5ee.
* added sleep in mock_worker.cc
* sleep function fixup to work on windows
* sleep in test_fast both for force=true and force=false
* linting
Co-authored-by: Ian <ian.rodney@gmail.com>
* Admission control, TODO: tests, object size
* Unit tests for admission control and some bug fixes
* Add object size to object table, only activate pull if object size is known
* Some fixes, reset timer on eviction
* doc
* update
* Trigger OOM from the pull manager
* don't spam
* doc
* Update src/ray/object_manager/pull_manager.cc
Co-authored-by: Eric Liang <ekhliang@gmail.com>
* Remove useless tests
* Fix test
* osx build
* Skip broken test
* tests
* Skip failing tests
Co-authored-by: Eric Liang <ekhliang@gmail.com>