* Command-line parsing functions
* Work around bug in MSVCRT for passing command-lines to programs
* Polishing
* Fix std::regex_replace() overload compatibility issue with GCC 4.8.x
* Try to work around linker error
* Implement ScanToken()
* Parse command-lines via ScanToken
* Merge src/ray/util.cc and src/ray/url.cc
Co-authored-by: Mehrdad <noreply@github.com>
* Add a lineage_ref_count to References
* Refactor TaskManager to store TaskEntry as a struct
* Refactor to fix deadlock between TaskManager and ReferenceCounter
Add references to task specs
* Pin TaskEntries and References in the lineage of any ObjectIDs in scope
* Fix deadlock, convert num_plasma_returns to a set of object IDs
* fix unit tests
* Feature flag
* Do not release lineage for objects that were promoted to plasma
* fix build
* fix build
* Remove num executions
* Remove num executions
* Add pinned locations to ReferenceCounter, empty handler for node death
* Fix num returns for actor tasks, fix Put return value
* Add regression test
* Clear pinned locations and callbacks on node removal
* Clear pinned locations and callbacks on node removal
* Simplify num return values
* Remove unused
* doc
* tmp
* Set num returns
* Move lineage pinning flag to ReferenceCounter
* comments
* Recover from plasma failures by pinning a new copy
* Basic object reconstruction, no concurrent reqs yet
* reconstruction test suite and a few fixes:
- fix for disabling lineage
- fix for updating submitted task refs
* Handle concurrent attempts to recover the same object
* Fix deadlock in DrainAndShutdown
* Revert "[core] Revert lineage pinning (#7499) (#7692)"
This reverts commit ba86a02b37.
* debug rllib
* debug rllib
* turn on all rllib tests again
* debug rllib
* Fix drain bug, check number of pending tasks
* revert rllib debug
* remove todo
* Trigger rllib tests
* revert rllib debug commit
* Split out logic into ObjectRecoveryManager
* Fix python tests
* Refactor to remove dependency on gcs client
* Unit tests
* Move pinned at node ID to direct memory store
* Unit test fixes and lint
* simplify and more tests
* Add ResubmitTask test for TaskManager
* Doc
* fix build
* comments
* Fix
* debug
* Update
* fix
* Fix
* Fix bad status handling, unit test
* Fix build
* Windows compatibility bug fixes
* Use WSASend/WSARecv as WSASendMsg/WSARecvMsg do not work with TCP sockets
* Clean up some TODOs
* Fix duplicate compilations
* RedisAsioClient boost::asio::error::connection_reset
Co-authored-by: Mehrdad <noreply@github.com>
* adding directory and node_provider entry for azure autoscaler
* adding initial cut at azure autoscaler functionality, needs testing and node_provider methods need updating
* adding todos and switching to auth file for service principal authentication
* adding role / scope to service principal
* resolving issues with app credentials
* adding retry for setting service principal role
* typo and adding retry to nic creation
* adding nsg to config, moving nic/public ip to node provider, cleanup node_provider, leaving in NodeProvider stub for testing
* linting
* updating cleanup and fixing bugs
* adding directory and node_provider entry for azure autoscaler
* adding initial cut at azure autoscaler functionality, needs testing and node_provider methods need updating
* adding todos and switching to auth file for service principal authentication
* adding role / scope to service principal
* resolving issues with app credentials
* adding retry for setting service principal role
* typo and adding retry to nic creation
* adding nsg to config, moving nic/public ip to node provider, cleanup node_provider, leaving in NodeProvider stub for testing
* linting
* updating cleanup and fixing bugs
* minor fixes
* first working version :)
* added tag support
* added msi identity intermediate
* enable MSI through user managed identity
* updated schema
* extend yaml schema
remove service principal code
add re-use of managed user identity
* fix rg_id
* fix logging
* replace manual cluster yaml validation with json schema
- improved error message
- support for intellisense in VSCode (or other IDEs)
* run linting
* updating yaml configs and formatting
* updating yaml configs and formatting
* typo in example config
* pulling default config from example-full
* resetting min, init worker prop
* adding docs for azure autoscaler and fixing status
* add azure to docs, fix config for spot instances, update azure provider to avoid caching issues during deployment
* fix for default subscription in azure node provider
* vm dev image build
* minor change
* keeping example-full.yaml in autoscaler/azure, updating azure example config
* linting azure config
* extending retries on azure config
* lint
* support for internal ips, fix to azure docs, and new azure gpu example config
* linting
* Update python/ray/autoscaler/azure/node_provider.py
Co-Authored-By: Richard Liaw <rliaw@berkeley.edu>
* revert_this
* remove_schema
* updating configs and removing ssh keygen, tweak azure node provider terminate
* minor tweaks
Co-authored-by: Markus Cozowicz <marcozo@microsoft.com>
Co-authored-by: Ubuntu <marcozo@mc-ray-jumpbox.chcbtljllnieveqhw3e4c1ducc.xx.internal.cloudapp.net>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
* Fix cyclic dependency
Headers in ray/util should not depend on those in ray/common
* Move random generations to ray/common/test_util.h
* Add license header
Co-authored-by: Mehrdad <noreply@github.com>
Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>
* Switch hiredis on Windows to that of the Windows port of Redis
* Use boost::asio::ip::tcp::socket::native_handle_type
* Use normal hiredis instead of Windows-specific one
* Finish up using normal hiredis
Co-authored-by: Mehrdad <noreply@github.com>
* Fix common.fbs rename (due to apache/arrow/commit/bef9a1c251397311a6415d3dc362ef419d154caa)
* Add missing COPTS
* Use socketpair(AF_INET) if boost::asio::local is unavailable (e.g. on Windows)
* Fix compile bug in service_based_gcs_client_test.cc (fix build breakage in #6686)
* Work around googletest/gmock inability to specify override to avoid -Werror,-Winconsistent-missing-override
* Fix missing override on IsPlasmaBuffer()
* Fix missing libraries for streaming
* Factor out install-toolchains.sh
* Put some Bazel flags into .bazelrc
* Fix jni_md.h missing inclusion
* Add ~/bin to PATH for Bazel
* Change echo $$(date) > $@ to date > $@
* Fix lots of unquoted paths
* Add system() call checks for Windows
Co-authored-by: GitHub Web Flow <noreply@github.com>
* Use Boost.Process instead of pid_t
This will let us handle child processes (mostly) uniformly across platforms.
TODO: There is no SIGTERM on Windows; achieving something equivalent is fairly involved.
* Polish Bazel build scripts
* Remove glog references from streaming_logging.cc
* Move out COPTS and reference them
* Disable streaming on Windows
* Remove -fno-gnu-unique
* Platform shims for Windows
* Tentative workaround for some forks and signals on Windows
* Rewrite WorkerPool::StartProcess by moving spawnvp wrapper to a separate function
* Separate spawnvp the wrappers for POSIX and Windows
* Fix rv use
* Update .bazelrc for Windows compatibility
* Block inclusion of (legacy) WinSock.h to avoid errors
* Suppress warnings for Windows code
* Include boost::asio in includes so that it is passed as -isystem to avoid warnings
* Link with -lpthread only on non-Windows
* Undefine BOOST_FALLTHROUGH, which is unnecessary and causes macro redefinition warnings
* Define RAY_STATIC and ARROW_STATIC to compile for Windows
* Add WinSock import library for Arrow
* TaskStateManager to track and complete pending tasks
* Convert actor transport to use task state manager
* Refactor direct actor transport to use TaskStateManager
* rename
* Unit test
* doc
* IsTaskPending
* Fix?
* Shared ptr
* HUH?
* Update src/ray/core_worker/task_manager.cc
Co-Authored-By: Zhijun Fu <37800433+zhijunfu@users.noreply.github.com>
* Revert "HUH?"
This reverts commit f80f0ba204ff4da5e0b03191fa0d5a4d9f552434.
* Fix memory issue
* oops
* move more unit tests to bazel
* move to avoid conflict
* fix lint
* fix deps
* seprate
* fix failing tests
* show tests
* ignore mismatch
* try combining bazel runs
* build lint
* remove tests from install
* fix test utils
* better config
* split up
* exclusive
* fix verbosity
* fix tests class
* cleanup
* remove flaky
* fix metrics test
* Update .travis.yml
* no retry flaky
* split up actor
* split basic test
* split up trial runner test
* split stress
* fix basic test
* fix tests
* switch to pytest runner for main
* make microbench not fail
* move load code to py3
* test is no longer package
* bazel to end
* Start trying to figure out where to put fibers
* Pass is_async flag from python to context
* Just running things in fiber works
* Yield implemented, need some debugging to make it work
* It worked!
* Remove debug prints
* Lint
* Revert the clang-format
* Remove unnecessary log
* Remove unncessary import
* Add attribution
* Address comment
* Add test
* Missed a merge conflict
* Make test pass and compile
* Address comment
* Rename async -> asyncio
* Move async test to py3 only
* Fix ignore path
* Priority queue in direct actor transport by task number
* Move LocalDependencyResolver out to separate file, share with direct actor transport
* works
* Test case for ordering
* Cleanups
* Remove priority queue
* comment
* Share ClientFactoryFn with direct actor transport
* Unit test
* fix