Commit graph

21 commits

Author SHA1 Message Date
Robert Nishihara
658c14282c Remove legacy Ray code. (#3121)
* Remove legacy Ray code.

* Fix cmake and simplify monitor.

* Fix linting

* Updates

* Fix

* Implement some methods.

* Remove more plasma manager references.

* Fix

* Linting

* Fix

* Fix

* Make sure class IDs are strings.

* Some path fixes

* Fix

* Path fixes and update arrow

* Fixes.

* linting

* Fixes

* Java fixes

* Some java fixes

* TaskLanguage -> Language

* Minor

* Fix python test and remove unused method signature.

* Fix java tests

* Fix jenkins tests

* Remove commented out code.
2018-10-26 13:36:58 -07:00
Robert Nishihara
9c1826ed69 Use XRay backend by default. (#3020)
* Use XRay backend by default.

* Remove irrelevant valgrind tests.

* Fix

* Move tests around.

* Fix

* Fix test

* Fix test.

* String/unicode fix.

* Fix test

* Fix unicode issue.

* Minor changes

* Fix bug in test_global_state.py.

* Fix test.

* Linting

* Try arrow change and other object manager changes.

* Use newer plasma client API

* Small updates

* Revert plasma client api change.

* Update

* Update arrow and allow SendObjectHeaders to fail.

* Update arrow

* Update python/ray/experimental/state.py

Co-Authored-By: robertnishihara <robertnishihara@gmail.com>

* Address comments.
2018-10-23 12:46:39 -07:00
Hanwei Jin
87639b9e26 move make clean before cmake command, avoid always running mvn install plasma java lib (#3047) 2018-10-12 09:03:30 -07:00
Hanwei Jin
060891a9c9 [cmake] avoid to re-build pyarrow (#2963)
* bugfix: env exists check error

* support to avoid re-build pyarrow in project

* bugfix: adapt gtest for centos lib64

* bugfix: check gtest lib exists in the directory

* bugfix: find gtest with checking all libs exists

* prefix RAY_ to thirdparty env variables to avoid conflicts with other module

* arrow use glog from ray

* change the glog and gtest install dir
2018-10-10 14:33:15 -07:00
Robert Nishihara
3ce8eb2d4c Test dying_worker_get and dying_worker_wait for xray. (#2997)
This tests the case in which a worker is blocked in a call to ray.get or ray.wait, and then the worker dies. Then later, the object that the worker was waiting for becomes available. We need to make sure not to try to send a message to the dead worker and then die. Related to #2790.
2018-10-02 00:08:47 -07:00
Hanwei Jin
9f9e49e4a1 [cmake] enable using thirdparty env variable to find installed dependency (#2912)
* enable using thirdparty env variable to find installed dependency, to speed up the build process

* fix target dependency in cmake. :-) too chaos in each CMakeLists

* check env variable defined directory exists
2018-09-23 07:52:33 -07:00
Yuhong Guo
93ded5a3d5 Update arrow using Plasma with glog (#2913)
* Update Arrow to Plasma with glog and update the building process

* Remove ParquetExternalProject.cmake

* Fix Mac building error in CI

* Use find_package(BISON) instead of hard code

* Revert BISON binary to hard code.

* Remove build_parquet.sh

* Update setup.sh
2018-09-20 13:37:44 -07:00
Hanwei Jin
dc76e51a60 bugfix: cmake copy plasma java lib from lib64 directory in centos (#2885) 2018-09-16 22:32:09 -07:00
Hanwei Jin
fbf214e408 update ray cmake build process (#2853)
* use cmake to build ray project, no need to appply build.sh before cmake, fix some abuse of cmake, improve the build performance

* support boost external project, avoid using the system or build.sh boost

* keep compatible with build.sh, remove boost and arrow build from it.

* bugfix: parquet bison version control, plasma_java lib install problem

* bugfix: cmake, do not compile plasma java client if no need

* bugfix: component failures test timeout machenism has problem for plasma manager failed case

* bugfix: arrow use lib64 in centos, travis check-git-clang-format-output.sh does not support other branches except master

* revert some fix

* set arrow python executable, fix format error in component_failures_test.py

* make clean arrow python build directory

* update cmake code style, back to support cmake minimum version 3.4
2018-09-12 11:19:33 -07:00
Philipp Moritz
a34a7172b4 Remove gflags (#2813)
Seems like gflags is not needed. This *might* remove writing spurious files into the home directory on the RISE infrastructure.
2018-09-03 16:10:47 -07:00
Yuhong Guo
9f06c19edd Fix glog wheel failure on MacOS (#2775) 2018-08-30 09:06:19 -07:00
Yuhong Guo
eec1a3eb89 Support pluggable backend log lib with glog (#2695)
* [WIP] Support different backend log lib

* Refine code, unify level, address comment

* Address comment and change formatter

* Fix linux building failure.

* Fix lint

* Remove log4cplus.

* Add log init to raylet main and add test to travis.

* Address comment and refine.

* Update logging_test.cc
2018-08-23 09:43:38 -07:00
Philipp Moritz
4c82ac72df Upgrade arrow to include the plasma TensorFlow op (#2412) 2018-07-18 12:33:02 -07:00
Philipp Moritz
3039cca242 add facility to link libraries to tests (#1850) 2018-04-09 18:59:24 -07:00
Alexey Tumanov
91464a56dd [XRay] Raylet node and object manager unification/backend redesign. (#1640)
* directory for raylet

* some initial class scaffolding -- in progress

* node_manager build code and test stub files.

* class scaffolding for resources, workers, and the worker pool

* Node manager server loop

* raylet policy and queue - wip checkpoint

* fix dependencies

* add gen_nm_fbs as target.

* object manager build, stub, and test code.

* Start integrating WorkerPool into node manager

* fix build on mac

* tmp

* adding LsResources boilerplate

* add/build Task spec boilerplate

* checkpoint ActorInformation and LsQueue

* Worker pool maintains started and removed workers

* todos for e2e task assignment

* fix build on mac

* build/add lsqueue interface

* channel resource config through from NodeServer to LsResources; prep LsResources to replace/provide worker_pool

* progress on LsResources class: resource availability check implementation

* Read task submission messages from a client

* Submit tasks from the client to the local scheduler

* Assign a task to a worker from the WorkerPool

* change the way node_manager is built to prevent build issues for object_manager.

* add namespaces. fix build.

* Move ClientConnection message handling into server, remove reference to
WorkerPool

* Add raw constructors for TaskSpecification

* Define TaskArgument by reference and by value

* Flatbuffer serialization for TaskSpec

* expand resource implementation

* Start integrating TaskExecutionSpecification into Task

* Separate WorkerPool from LsResources, give ownership to NodeServer

* checkpoint queue and resource code

* resoving merge conflicts

* lspolicy::schedule ; adding lsqueue and lspolicy to the nodeserver

* Implement LsQueue RemoveTasks and QueueReadyTasks

* Fill in some LsQueue code for assigning a task

* added suport for test_asio

* Implement LsQueue queue tasks methods, queue running tasks

* calling into policy from nodeserver; adding cluster resource map

* Feedback and Testing.
Incorporate Alexey's feedback. Actually test some code. Clean up callback imp.

* end to end task assignment

* Decouple local scheduler from node server

* move TODO

* Move local scheduler to separate file

* Add scaffolding for reconstruction policy, task dependency manager, and object manager

* fix

* asio for store client notifications.
added asio for plasma store connection.
added tests for store notifications.
encapsulate store interaction under store_messenger.

* Move Worker inside of ClientConnection

* Set the assigned task ID in the worker

* Several changes toward object manager implementation.
Store client integration with asio.
Complete OM/OD scaffolding.

* simple simulator to estimate number of retry timeouts

* changing dbclientid --> clientid

* fix build (include sandbox after it's fixed).

* changes to object manager, adding lambdas to the interface

* changing void * callbacks to std::function typed callbacks

* remove use namespace std from .h files.
use ray:: for Status everywhere.

* minor

* lineage cache interfaces

* TODO for object IDs

* Interface for the GCS client table

* Revert "Set the assigned task ID in the worker"

This reverts commit a770dd31048a289ef431c56d64e491fa7f9b2737.

* Revert "Move Worker inside of ClientConnection"

This reverts commit dfaa0d662a76976c05be6d76b214b45d88482818.

* OD/OM: ray::Status

* mock gcs integration.

* gcs mock clientinfo assignment

* Allow lookup of a Worker in the WorkerPool

* Split out Worker and ClientConnection source files

* Allow assignment of a task ID to a worker, skeleton for finishing a task

* integrate mock gcs with om tests.

* added tcp connection acceptor

* integrated OM with NM.
integrated GcsClient with NM.
Added multi-node integration tests.

* OM to receive incoming tcp connections.

* implemented object manager connection protocol.

* Added todos.

* slight adjustment to add/remove handler invocation on object store client.

* Simplify Task interface for getting dependencies

* Remove unused object manager file

* TaskDependencyManager tracks missing task dependencies and processes object add notifications

* Local scheduler queues tasks according to argument availability

* Fill in TaskSpecification methods to get arguments

* Implemented push.

* Queue tasks that have been scheduled but that are waiting for a worker

* Pull + mock gcs cleanup.

* OD/OM/GCS mock code review, fixing unused-result issues, eliminating copy ctor

* Remove unique_ptr from object_store_client

* Fix object manager Push memory error

* Pull task arguments in task dependency manager

* Add a demo script for remote task dependencies

* Some comments for the TaskDependencyManager

* code cleanup; builds on mac

* Make ClientConnection a templated type based on the connection protocol

* Add gmock to build

* Add WorkerPool unit tests

* clean up.

* clean up connection code.

* instantiate a template instance in the module

* Virtual destructors

* Document public api.

* Separate read and write buffers in ClientConnection; documentation

* Remove ObjectDirectory from NodeServer constructor, make directory InitGcs call a separate constructor

* Convert NodeServer Terminate to a destructor

* NodeServer documentation

* WorkerPool documentation

* TaskDependencyManager doc

* unifying naming conventions

* unifying naming conventions

* Task cleanup and documentation

* unifying naming conventions

* unifying naming conventions

* code cleanup and naming conventions

* code cleanup

* Rename om --> object_manager

* Merge with master

* SchedulingQueue doc

* Docs and implementation skeleton for ClientTable

* Node manager documentation

* ReconstructionPolicy doc

* Replace std::bind with lambda in TaskDependencyManager

* lineage cache doc

* Use \param style for doc

* documentation for scheduling policy and resources

* minor code cleanup

* SchedulingResources class documentation + code cleanup

* referencing ray/raylet directory; doxygen documentation

* updating trivial policy

* Fix bug where event loop stops after task submission

* Define entry point for ClientManager for handling new connections

* Node manager to node manager protocol, heartbeat protocol

* Fix flatbuffer

* Fix GCS flatbuffer naming conflict

* client connection moved to common dir.

* rename based on feedback.

* Added google style and 90 char lines clang-format file under src/ray.

* const ref ClientID.

* Incorporated feedback from PR.

* raylet: includes and namespaces

* raylet/om/gcs logging/using

* doxygen style

* camel casing, comments, other style; DBClientID -> ClientID

* object_manager : naming, defines, style

* consistent caps and naming; misc style

* cleaning up client connection + other stylistic fixes

* cmath, std::nan

* more style polish: OM, Raylet, gcs tables

* removing sandbox (moved to ray-project/sandbox)

* raylet linting

* object manager linting

* gcs linting

* all other linting


Co-authored-by: Melih <elibol@gmail.com>
Co-authored-by: Stephanie <swang@cs.berkeley.edu>
2018-03-08 12:53:24 -08:00
Philipp Moritz
eabc4027c8 Hiredis asio integration (#1547) 2018-02-20 13:37:09 -08:00
Philipp Moritz
3d224c4edf Second Part of Internal API Refactor (#1326) 2017-12-26 16:22:04 -08:00
Philipp Moritz
cac5f47600 First Part of Internal Ray API Refactor (#1173)
* add Ray status class

* add C++ util files

* add ID types

* more APIs

* build system integration

* add test infrastructure and implement some APIs

* add more tests

* fix bugs

* add task table tests

* update

* add toolchain file

* fix

* test

* link with pthread

* update

* fix

* more fixes

* fixes

* always vendor gtest and gflags

* linting

* fixes

* add constants file

* comments

* more fixes

* fix linting
2017-12-14 14:54:09 -08:00
Robert Nishihara
be4beb19c1 Changes to build to fix creation of wheels. (#840)
* Pass DPYTHON_EXECUTABLE into cmake for arrow and for ray.

* Add cython to setup.py install_requires.

* Revert custom code for finding python in cmake.

* Correctly find arrow on CentOS.

* In cmake, don't find PythonLibs, just find PYTHON_INCLUDE_DIRS.

* Fix typo.

* Do not use boost shared libraries when building arrow.

* Add six to the setup.py install_requires because it is needed by pyarrow.

* Don't link numbuf against boost_system and boost_filesystem.

* Compile boost when we are on Linux.

* Make numbuf find the correct boost libraries.

* Only use find_package Boost on Linux, suppress output when building boost.

* Changes to wheel building scripts, install cython in mac script.

* Compile flatbuffers ourselves on Linux and pass it in when compiling Arrow.

* Clean up build_flatbuffers.sh and build_boost.sh scripts a little.

* Install cython when building linux wheel.
2017-08-21 17:49:35 -07:00
Robert Nishihara
1133315746 help cmake find right python interpreter on mac (#251) 2016-07-11 12:16:10 -07:00
Philipp Moritz
7905d4c769 add FindNumPy.cmake 2016-03-10 14:46:26 -08:00