* Add placement group scheduler and some api of resource scheduler.
Merge fix cv hang in multithread variables race (#8984).
* change the bundle id and delete unit count in bundle
change vector<bundle_spec> to vector<shared_ptr<bundle_spec>>
Add placement group scheduler and some api of resource scheduler.
Merge fix cv hang in multithread variables race (#8984).
change the bundle id and delete unit count in bundle
remove CheckIfSchedulable()
add comments and fix the bug in resource
* fix placement group schedule
* add placement group scheduler and change some api in resource scheduler
* fix by the comments
* fix conflict
* fix lint
* fix lint
* fix bug in merge
* fix lint
Co-authored-by: Lingxuan Zuo <skyzlxuan@gmail.com>
* Get rid of system() calls
* Work around '/usr/share/mini' showing up on GitHub Actions (probably due to psutil truncation)
https://github.com/ray-project/ray/runs/722480047?check_suite_focus=true
* Don't check for socket max path length on Windows
* Don't check for socket existence on Windows
* Fix race condition in Windows fate-sharing
* Work around missing .exe extension for Redis tests
* Add more tests to GitHub Actions
Co-authored-by: Mehrdad <noreply@github.com>
* integrate plasma store as a thread (C++)
* integrate plasma store as a thread (Python)
* fix config issues
* remove plasma component fail tests
* without forcefully kill the plasma store thread
* Move some Java tests into ci.sh
* Move C++ worker tests into ci.sh
* Define run()
* Prepare to move Python tests into ci.sh
* Fix issues in install-dependencies.sh
* Reload environment for GitHub Actions
* Move wheels to ci.sh and fix related issues
* Don't bypass failures in install-ray.sh anymore
* Make CI a little quieter
* Move linting into ci.sh
* Add vitals test right after build
* Fix os.uname() unavailability on Windows
Co-authored-by: Mehrdad <noreply@github.com>
* Command-line parsing functions
* Work around bug in MSVCRT for passing command-lines to programs
* Polishing
* Fix std::regex_replace() overload compatibility issue with GCC 4.8.x
* Try to work around linker error
* Implement ScanToken()
* Parse command-lines via ScanToken
* Merge src/ray/util.cc and src/ray/url.cc
Co-authored-by: Mehrdad <noreply@github.com>
* Add a lineage_ref_count to References
* Refactor TaskManager to store TaskEntry as a struct
* Refactor to fix deadlock between TaskManager and ReferenceCounter
Add references to task specs
* Pin TaskEntries and References in the lineage of any ObjectIDs in scope
* Fix deadlock, convert num_plasma_returns to a set of object IDs
* fix unit tests
* Feature flag
* Do not release lineage for objects that were promoted to plasma
* fix build
* fix build
* Remove num executions
* Remove num executions
* Add pinned locations to ReferenceCounter, empty handler for node death
* Fix num returns for actor tasks, fix Put return value
* Add regression test
* Clear pinned locations and callbacks on node removal
* Clear pinned locations and callbacks on node removal
* Simplify num return values
* Remove unused
* doc
* tmp
* Set num returns
* Move lineage pinning flag to ReferenceCounter
* comments
* Recover from plasma failures by pinning a new copy
* Basic object reconstruction, no concurrent reqs yet
* reconstruction test suite and a few fixes:
- fix for disabling lineage
- fix for updating submitted task refs
* Handle concurrent attempts to recover the same object
* Fix deadlock in DrainAndShutdown
* Revert "[core] Revert lineage pinning (#7499) (#7692)"
This reverts commit ba86a02b37.
* debug rllib
* debug rllib
* turn on all rllib tests again
* debug rllib
* Fix drain bug, check number of pending tasks
* revert rllib debug
* remove todo
* Trigger rllib tests
* revert rllib debug commit
* Split out logic into ObjectRecoveryManager
* Fix python tests
* Refactor to remove dependency on gcs client
* Unit tests
* Move pinned at node ID to direct memory store
* Unit test fixes and lint
* simplify and more tests
* Add ResubmitTask test for TaskManager
* Doc
* fix build
* comments
* Fix
* debug
* Update
* fix
* Fix
* Fix bad status handling, unit test
* Fix build
* Windows compatibility bug fixes
* Use WSASend/WSARecv as WSASendMsg/WSARecvMsg do not work with TCP sockets
* Clean up some TODOs
* Fix duplicate compilations
* RedisAsioClient boost::asio::error::connection_reset
Co-authored-by: Mehrdad <noreply@github.com>