mirror of
https://github.com/vale981/ray
synced 2025-03-06 10:31:39 -05:00
3 commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
![]() |
91464a56dd |
[XRay] Raylet node and object manager unification/backend redesign. (#1640)
* directory for raylet * some initial class scaffolding -- in progress * node_manager build code and test stub files. * class scaffolding for resources, workers, and the worker pool * Node manager server loop * raylet policy and queue - wip checkpoint * fix dependencies * add gen_nm_fbs as target. * object manager build, stub, and test code. * Start integrating WorkerPool into node manager * fix build on mac * tmp * adding LsResources boilerplate * add/build Task spec boilerplate * checkpoint ActorInformation and LsQueue * Worker pool maintains started and removed workers * todos for e2e task assignment * fix build on mac * build/add lsqueue interface * channel resource config through from NodeServer to LsResources; prep LsResources to replace/provide worker_pool * progress on LsResources class: resource availability check implementation * Read task submission messages from a client * Submit tasks from the client to the local scheduler * Assign a task to a worker from the WorkerPool * change the way node_manager is built to prevent build issues for object_manager. * add namespaces. fix build. * Move ClientConnection message handling into server, remove reference to WorkerPool * Add raw constructors for TaskSpecification * Define TaskArgument by reference and by value * Flatbuffer serialization for TaskSpec * expand resource implementation * Start integrating TaskExecutionSpecification into Task * Separate WorkerPool from LsResources, give ownership to NodeServer * checkpoint queue and resource code * resoving merge conflicts * lspolicy::schedule ; adding lsqueue and lspolicy to the nodeserver * Implement LsQueue RemoveTasks and QueueReadyTasks * Fill in some LsQueue code for assigning a task * added suport for test_asio * Implement LsQueue queue tasks methods, queue running tasks * calling into policy from nodeserver; adding cluster resource map * Feedback and Testing. Incorporate Alexey's feedback. Actually test some code. Clean up callback imp. * end to end task assignment * Decouple local scheduler from node server * move TODO * Move local scheduler to separate file * Add scaffolding for reconstruction policy, task dependency manager, and object manager * fix * asio for store client notifications. added asio for plasma store connection. added tests for store notifications. encapsulate store interaction under store_messenger. * Move Worker inside of ClientConnection * Set the assigned task ID in the worker * Several changes toward object manager implementation. Store client integration with asio. Complete OM/OD scaffolding. * simple simulator to estimate number of retry timeouts * changing dbclientid --> clientid * fix build (include sandbox after it's fixed). * changes to object manager, adding lambdas to the interface * changing void * callbacks to std::function typed callbacks * remove use namespace std from .h files. use ray:: for Status everywhere. * minor * lineage cache interfaces * TODO for object IDs * Interface for the GCS client table * Revert "Set the assigned task ID in the worker" This reverts commit a770dd31048a289ef431c56d64e491fa7f9b2737. * Revert "Move Worker inside of ClientConnection" This reverts commit dfaa0d662a76976c05be6d76b214b45d88482818. * OD/OM: ray::Status * mock gcs integration. * gcs mock clientinfo assignment * Allow lookup of a Worker in the WorkerPool * Split out Worker and ClientConnection source files * Allow assignment of a task ID to a worker, skeleton for finishing a task * integrate mock gcs with om tests. * added tcp connection acceptor * integrated OM with NM. integrated GcsClient with NM. Added multi-node integration tests. * OM to receive incoming tcp connections. * implemented object manager connection protocol. * Added todos. * slight adjustment to add/remove handler invocation on object store client. * Simplify Task interface for getting dependencies * Remove unused object manager file * TaskDependencyManager tracks missing task dependencies and processes object add notifications * Local scheduler queues tasks according to argument availability * Fill in TaskSpecification methods to get arguments * Implemented push. * Queue tasks that have been scheduled but that are waiting for a worker * Pull + mock gcs cleanup. * OD/OM/GCS mock code review, fixing unused-result issues, eliminating copy ctor * Remove unique_ptr from object_store_client * Fix object manager Push memory error * Pull task arguments in task dependency manager * Add a demo script for remote task dependencies * Some comments for the TaskDependencyManager * code cleanup; builds on mac * Make ClientConnection a templated type based on the connection protocol * Add gmock to build * Add WorkerPool unit tests * clean up. * clean up connection code. * instantiate a template instance in the module * Virtual destructors * Document public api. * Separate read and write buffers in ClientConnection; documentation * Remove ObjectDirectory from NodeServer constructor, make directory InitGcs call a separate constructor * Convert NodeServer Terminate to a destructor * NodeServer documentation * WorkerPool documentation * TaskDependencyManager doc * unifying naming conventions * unifying naming conventions * Task cleanup and documentation * unifying naming conventions * unifying naming conventions * code cleanup and naming conventions * code cleanup * Rename om --> object_manager * Merge with master * SchedulingQueue doc * Docs and implementation skeleton for ClientTable * Node manager documentation * ReconstructionPolicy doc * Replace std::bind with lambda in TaskDependencyManager * lineage cache doc * Use \param style for doc * documentation for scheduling policy and resources * minor code cleanup * SchedulingResources class documentation + code cleanup * referencing ray/raylet directory; doxygen documentation * updating trivial policy * Fix bug where event loop stops after task submission * Define entry point for ClientManager for handling new connections * Node manager to node manager protocol, heartbeat protocol * Fix flatbuffer * Fix GCS flatbuffer naming conflict * client connection moved to common dir. * rename based on feedback. * Added google style and 90 char lines clang-format file under src/ray. * const ref ClientID. * Incorporated feedback from PR. * raylet: includes and namespaces * raylet/om/gcs logging/using * doxygen style * camel casing, comments, other style; DBClientID -> ClientID * object_manager : naming, defines, style * consistent caps and naming; misc style * cleaning up client connection + other stylistic fixes * cmath, std::nan * more style polish: OM, Raylet, gcs tables * removing sandbox (moved to ray-project/sandbox) * raylet linting * object manager linting * gcs linting * all other linting Co-authored-by: Melih <elibol@gmail.com> Co-authored-by: Stephanie <swang@cs.berkeley.edu> |
||
![]() |
eabc4027c8 | Hiredis asio integration (#1547) | ||
![]() |
cac5f47600 |
First Part of Internal Ray API Refactor (#1173)
* add Ray status class * add C++ util files * add ID types * more APIs * build system integration * add test infrastructure and implement some APIs * add more tests * fix bugs * add task table tests * update * add toolchain file * fix * test * link with pthread * update * fix * more fixes * fixes * always vendor gtest and gflags * linting * fixes * add constants file * comments * more fixes * fix linting |