* Init commit for async plasma client
* Create an eventloop model for ray/plasma
* Implement a poll-like selector base on `ray.wait`. Huge improvements.
* Allow choosing workers & selectors
* remove original design
* initial implementation of epoll-like selector for plasma
* Add a param for `worker` used in `PlasmaSelectorEventLoop`
* Allow accepting a `Future` which returns object_id
* Do not need `io.py` anymore
* Create a basic testing model
* fix: `ray.wait` returns tuple of lists
* fix a few bugs
* improving performance & bug fixing
* add test
* several improvements & fixing
* fix relative import
* [async] change code format, remove old files
* [async] Create context wrapper for the eventloop
* [async] fix: context should return a value
* [async] Implement futures grouping
* [async] Fix bugs & replace old functions
* [async] Fix bugs found in tests
* [async] Implement `PlasmaEpoll`
* [async] Make test faster, add tests for epoll
* [async] Fix code format
* [async] Add comments for main code.
* [async] Fix import path.
* [async] Fix test.
* [async] Compatibility.
* [async] less verbose to not annoy the CI.
* [async] Add test for new API
* [async] Allow showing debug info in some of the test.
* [async] Fix test.
* [async] Proper shutdown.
* [async] Lint~
* [async] Move files to experimental and create API
* [async] Use async/await syntax
* [async] Fix names & styles
* [async] comments
* [async] bug fixing & use pytest
* [async] bug fixing & change tests
* [async] use logger
* [async] add tests
* [async] lint
* [async] type checking
* [async] add more tests
* [async] fix bugs on waiting a future while timeout. Add more docs.
* [async] Formal docs.
* [async] Add typing info since these codes are compatible with py3.5+.
* [async] Documents.
* [async] Lint.
* [async] Fix deprecated call.
* [async] Fix deprecated call.
* [async] Implement a more reasonable way for dealing with pending inputs.
* [async] Fix docs
* [async] Lint
* [async] Fix bug: Type for time
* [async] Set our eventloop as the default eventloop so that we can get it through `asyncio.get_event_loop()`.
* [async] Update test & docs.
* [async] Lint.
* [async] Temporarily print more debug info.
* [async] Use `Poll` as a default option.
* [async] Limit resources.
* new async implementation for Ray
* implement linked list
* bug fix
* update
* support seamless async operations
* update
* update API
* fix tests
* lint
* bug fix
* refactor names
* improve doc
* properly shutdown async_api
* doc
* Change the table on the index page.
* Adjust table size.
* Only keeps `as_future`.
* change how we init connection
* init connection in `ray.worker.connect`
* doc
* fix
* Move initialization code into the module.
* Fix docs & code
* Update pyarrow version.
* lint
* Restore index.rst
* Add known issues.
* Apply suggestions from code review
Co-Authored-By: suquark <suquark@gmail.com>
* rename
* Update async_api.rst
* Update async_api.py
* Update async_api.rst
* Update async_api.py
* Update worker.py
* Update async_api.rst
* fix tests
* lint
* lint
* replace the magic number
* bugfix: env exists check error
* support to avoid re-build pyarrow in project
* bugfix: adapt gtest for centos lib64
* bugfix: check gtest lib exists in the directory
* bugfix: find gtest with checking all libs exists
* prefix RAY_ to thirdparty env variables to avoid conflicts with other module
* arrow use glog from ray
* change the glog and gtest install dir
This tests the case in which a worker is blocked in a call to ray.get or ray.wait, and then the worker dies. Then later, the object that the worker was waiting for becomes available. We need to make sure not to try to send a message to the dead worker and then die. Related to #2790.
* enable using thirdparty env variable to find installed dependency, to speed up the build process
* fix target dependency in cmake. :-) too chaos in each CMakeLists
* check env variable defined directory exists
* Update Arrow to Plasma with glog and update the building process
* Remove ParquetExternalProject.cmake
* Fix Mac building error in CI
* Use find_package(BISON) instead of hard code
* Revert BISON binary to hard code.
* Remove build_parquet.sh
* Update setup.sh
* use cmake to build ray project, no need to appply build.sh before cmake, fix some abuse of cmake, improve the build performance
* support boost external project, avoid using the system or build.sh boost
* keep compatible with build.sh, remove boost and arrow build from it.
* bugfix: parquet bison version control, plasma_java lib install problem
* bugfix: cmake, do not compile plasma java client if no need
* bugfix: component failures test timeout machenism has problem for plasma manager failed case
* bugfix: arrow use lib64 in centos, travis check-git-clang-format-output.sh does not support other branches except master
* revert some fix
* set arrow python executable, fix format error in component_failures_test.py
* make clean arrow python build directory
* update cmake code style, back to support cmake minimum version 3.4
* [WIP] Support different backend log lib
* Refine code, unify level, address comment
* Address comment and change formatter
* Fix linux building failure.
* Fix lint
* Remove log4cplus.
* Add log init to raylet main and add test to travis.
* Address comment and refine.
* Update logging_test.cc
* directory for raylet
* some initial class scaffolding -- in progress
* node_manager build code and test stub files.
* class scaffolding for resources, workers, and the worker pool
* Node manager server loop
* raylet policy and queue - wip checkpoint
* fix dependencies
* add gen_nm_fbs as target.
* object manager build, stub, and test code.
* Start integrating WorkerPool into node manager
* fix build on mac
* tmp
* adding LsResources boilerplate
* add/build Task spec boilerplate
* checkpoint ActorInformation and LsQueue
* Worker pool maintains started and removed workers
* todos for e2e task assignment
* fix build on mac
* build/add lsqueue interface
* channel resource config through from NodeServer to LsResources; prep LsResources to replace/provide worker_pool
* progress on LsResources class: resource availability check implementation
* Read task submission messages from a client
* Submit tasks from the client to the local scheduler
* Assign a task to a worker from the WorkerPool
* change the way node_manager is built to prevent build issues for object_manager.
* add namespaces. fix build.
* Move ClientConnection message handling into server, remove reference to
WorkerPool
* Add raw constructors for TaskSpecification
* Define TaskArgument by reference and by value
* Flatbuffer serialization for TaskSpec
* expand resource implementation
* Start integrating TaskExecutionSpecification into Task
* Separate WorkerPool from LsResources, give ownership to NodeServer
* checkpoint queue and resource code
* resoving merge conflicts
* lspolicy::schedule ; adding lsqueue and lspolicy to the nodeserver
* Implement LsQueue RemoveTasks and QueueReadyTasks
* Fill in some LsQueue code for assigning a task
* added suport for test_asio
* Implement LsQueue queue tasks methods, queue running tasks
* calling into policy from nodeserver; adding cluster resource map
* Feedback and Testing.
Incorporate Alexey's feedback. Actually test some code. Clean up callback imp.
* end to end task assignment
* Decouple local scheduler from node server
* move TODO
* Move local scheduler to separate file
* Add scaffolding for reconstruction policy, task dependency manager, and object manager
* fix
* asio for store client notifications.
added asio for plasma store connection.
added tests for store notifications.
encapsulate store interaction under store_messenger.
* Move Worker inside of ClientConnection
* Set the assigned task ID in the worker
* Several changes toward object manager implementation.
Store client integration with asio.
Complete OM/OD scaffolding.
* simple simulator to estimate number of retry timeouts
* changing dbclientid --> clientid
* fix build (include sandbox after it's fixed).
* changes to object manager, adding lambdas to the interface
* changing void * callbacks to std::function typed callbacks
* remove use namespace std from .h files.
use ray:: for Status everywhere.
* minor
* lineage cache interfaces
* TODO for object IDs
* Interface for the GCS client table
* Revert "Set the assigned task ID in the worker"
This reverts commit a770dd31048a289ef431c56d64e491fa7f9b2737.
* Revert "Move Worker inside of ClientConnection"
This reverts commit dfaa0d662a76976c05be6d76b214b45d88482818.
* OD/OM: ray::Status
* mock gcs integration.
* gcs mock clientinfo assignment
* Allow lookup of a Worker in the WorkerPool
* Split out Worker and ClientConnection source files
* Allow assignment of a task ID to a worker, skeleton for finishing a task
* integrate mock gcs with om tests.
* added tcp connection acceptor
* integrated OM with NM.
integrated GcsClient with NM.
Added multi-node integration tests.
* OM to receive incoming tcp connections.
* implemented object manager connection protocol.
* Added todos.
* slight adjustment to add/remove handler invocation on object store client.
* Simplify Task interface for getting dependencies
* Remove unused object manager file
* TaskDependencyManager tracks missing task dependencies and processes object add notifications
* Local scheduler queues tasks according to argument availability
* Fill in TaskSpecification methods to get arguments
* Implemented push.
* Queue tasks that have been scheduled but that are waiting for a worker
* Pull + mock gcs cleanup.
* OD/OM/GCS mock code review, fixing unused-result issues, eliminating copy ctor
* Remove unique_ptr from object_store_client
* Fix object manager Push memory error
* Pull task arguments in task dependency manager
* Add a demo script for remote task dependencies
* Some comments for the TaskDependencyManager
* code cleanup; builds on mac
* Make ClientConnection a templated type based on the connection protocol
* Add gmock to build
* Add WorkerPool unit tests
* clean up.
* clean up connection code.
* instantiate a template instance in the module
* Virtual destructors
* Document public api.
* Separate read and write buffers in ClientConnection; documentation
* Remove ObjectDirectory from NodeServer constructor, make directory InitGcs call a separate constructor
* Convert NodeServer Terminate to a destructor
* NodeServer documentation
* WorkerPool documentation
* TaskDependencyManager doc
* unifying naming conventions
* unifying naming conventions
* Task cleanup and documentation
* unifying naming conventions
* unifying naming conventions
* code cleanup and naming conventions
* code cleanup
* Rename om --> object_manager
* Merge with master
* SchedulingQueue doc
* Docs and implementation skeleton for ClientTable
* Node manager documentation
* ReconstructionPolicy doc
* Replace std::bind with lambda in TaskDependencyManager
* lineage cache doc
* Use \param style for doc
* documentation for scheduling policy and resources
* minor code cleanup
* SchedulingResources class documentation + code cleanup
* referencing ray/raylet directory; doxygen documentation
* updating trivial policy
* Fix bug where event loop stops after task submission
* Define entry point for ClientManager for handling new connections
* Node manager to node manager protocol, heartbeat protocol
* Fix flatbuffer
* Fix GCS flatbuffer naming conflict
* client connection moved to common dir.
* rename based on feedback.
* Added google style and 90 char lines clang-format file under src/ray.
* const ref ClientID.
* Incorporated feedback from PR.
* raylet: includes and namespaces
* raylet/om/gcs logging/using
* doxygen style
* camel casing, comments, other style; DBClientID -> ClientID
* object_manager : naming, defines, style
* consistent caps and naming; misc style
* cleaning up client connection + other stylistic fixes
* cmath, std::nan
* more style polish: OM, Raylet, gcs tables
* removing sandbox (moved to ray-project/sandbox)
* raylet linting
* object manager linting
* gcs linting
* all other linting
Co-authored-by: Melih <elibol@gmail.com>
Co-authored-by: Stephanie <swang@cs.berkeley.edu>
* Pass DPYTHON_EXECUTABLE into cmake for arrow and for ray.
* Add cython to setup.py install_requires.
* Revert custom code for finding python in cmake.
* Correctly find arrow on CentOS.
* In cmake, don't find PythonLibs, just find PYTHON_INCLUDE_DIRS.
* Fix typo.
* Do not use boost shared libraries when building arrow.
* Add six to the setup.py install_requires because it is needed by pyarrow.
* Don't link numbuf against boost_system and boost_filesystem.
* Compile boost when we are on Linux.
* Make numbuf find the correct boost libraries.
* Only use find_package Boost on Linux, suppress output when building boost.
* Changes to wheel building scripts, install cython in mac script.
* Compile flatbuffers ourselves on Linux and pass it in when compiling Arrow.
* Clean up build_flatbuffers.sh and build_boost.sh scripts a little.
* Install cython when building linux wheel.