Commit graph

152 commits

Author SHA1 Message Date
Robert Nishihara
2f750e9ba7 Add parentheses around one-line if statement. (#1318) 2017-12-13 23:48:53 -08:00
Robert Nishihara
c21e189371 Allow scheduling with arbitrary user-defined resource labels. (#1236)
* Enable scheduling with custom resource labels.

* Fix.

* Minor fixes and ref counting fix.

* Linting

* Use .data() instead of .c_str().

* Fix linting.

* Fix ResourcesTest.testGPUIDs test by waiting for workers to start up.

* Sleep in test so that all tasks are submitted before any completes.
2017-12-01 11:41:40 -08:00
Stephanie Wang
c70430f322 Fix bugs in plasma manager transfer (#1188)
* Plasma client test for plasma abort

* Use ray-project/arrow:abort-objects branch

* Set plasma manager connection cursor to -1 when not in use

* Handle transfer errors between plasma managers, abort unsealed objects

* Add TODO for local scheduler exiting on plasma manager death

* Revert "Plasma client test for plasma abort"

This reverts commit e00fbd58dc4a632f58383549b19fb9057b305a14.

* Upgrade arrow to version with PlasmaClient::Abort

* Fix plasma manager test

* Fix plasma test

* Temporarily use arrow fork for testing

* fix and set arrow commit

* Fix plasma test

* Fix plasma manager test and make write_object_chunk consistent with read_object_chunk

* style

* upgrade arrow
2017-11-15 22:32:38 -08:00
Peter Schafhalter
9a6a056609 Convert UT datastructures in tests (#1203)
* bind_ipc_sock_retry returns std::string

* snprintf -> std::snprintf

* Fix formatting

* Use stringstream instead of snprintf

* Fix typo
2017-11-11 16:55:05 -08:00
Stephanie Wang
07f0532b9b Local scheduler filters out dead clients during reconstruction (#1182)
* Object table lookup returns vector of DBClientID instead of address strings

* Add node IP address to DBClient notification

* DB client cache stores entire DB client, convert addresses to std::string

* get cached db client returns the client

* Expose a call to initialize the redis cache

* Local scheduler filters out dead clients during reconstruction

* Remove node ip address from dbclient, use aux_address for plasma managers

* Get entire db client entry when not found in cache

* Fix common tests

* Fix address in tests

* Push error to driver if driver task did the put

* Address Robert's comments and cleanup

* Remove unused Redis command

* Fix db test
2017-11-10 11:29:24 -08:00
Robert Nishihara
1c6b30b5e2 Move all config constants into single file. (#1192)
* Initial pass at factoring out C++ configuration into a single file.

* Expose config through Python.

* Forward declarations.

* Fixes with Python extensions

* Remove old code.

* Consistent naming for constants.

* Fixes

* Fix linting.

* More linting.

* Whitespace

* rename config -> _config.

* Move config inside a class.

* update naming convention

* Fix linting.

* More linting

* More linting.

* Add in some more constants.

* Fix linting
2017-11-08 11:10:38 -08:00
Robert Nishihara
1cdc2fb011 Clean up event loop and callbacks when processes exit. (#1125)
* Clean up event loop and callbacks when processes exit.

* Fix bug.
2017-10-19 17:07:03 -07:00
Robert Nishihara
486cb64e3f Compile with -Werror and -Wall (#1116)
* Compile global scheduler with -Werror -Wall.

* Compile plasma manager with -Werror -Wall.

* Compile local scheduler with -Werror -Wall.

* Compile common code with -Werror -Wall.

* Signed/unsigned comparisons.

* More signed/unsigned fixes.

* More signed/unsigned fixes and added extern keyword.

* Fix linting.

* Don't check strict-aliasing because Python.h doesn't pass.
2017-10-12 21:00:23 -07:00
Robert Nishihara
9f1e385335 Return errno from handle_sigpipe. (#1051) 2017-10-11 18:36:28 -07:00
Peter Schafhalter
46f6c163dc Converted ClientConnection to C++ standard library (#1099) 2017-10-11 11:12:15 -07:00
Robert Nishihara
1488975d1b Add timing statement to loop that calls redis_get_cached_db_client be… (#1045)
* Add timing statement to loop that calls redis_get_cached_db_client because it has been slow in the past.

* Fix linting.

* Refactoring to make manager vectors into std::vector.

* Fix linting.

* Fixes.
2017-10-02 10:46:21 -07:00
Robert Nishihara
ce278aa06a Fix valgrind tests. (#1037)
* Comment out local scheduler valgrind test.

* Fix free/delete error.

* More free -> delete errors

* One more free -> delete and also clean up callback state in plasma manager.

* Add set -x to run_valgrind scripts.

* Fix valgrind error in CreateLocalSchedulerInfoMessage.
2017-09-30 00:11:09 -07:00
Eric Liang
ba153adc4c Downgrade severity of most common messages (#1039)
* downgrade severity of most common messages

* update
2017-09-30 00:01:49 -07:00
Peter Schafhalter
10027974b1 Replaced ObjectWaitRequests with unordered map (#990)
* Replaced ObjectWaitRequests with unordered map

* Pass C++ STL object by reference

* Formatting changes and typos.
2017-09-28 15:29:26 -07:00
Peter Schafhalter
bb76d4ca0a PlasmaRequestBuffer data structure updates (#1023)
* Replaced utstring with std::string

* Converted transfer_queue to a list

* Converted pending_object_transfers to unordered_map

* Fix free/delete bug and small modifications.
2017-09-27 19:50:37 -07:00
Peter Schafhalter
6e9657e696 Replaced utstring with std::string (#1009) 2017-09-24 22:42:17 -07:00
Peter Schafhalter
241612709e Data structure updates to plasma manager (#937)
* Implemented local_available_objects as an unordered set

* Implemented fetch_requests as an unordered map

* Fixed bug and changed fetch_requests from pointer to object

* free(PlasmaManagerState *) -> delete PlasmaManagerState *

* removed unnecessary newline

* Make local_available_objects not a pointer.

* Attempt to safely iterate over unordered_map and remove elements.
2017-09-15 20:09:29 -07:00
Peter Schafhalter
8906a920f7 Implemented wait_requests as vector (#943) 2017-09-08 13:39:54 -07:00
Philipp Moritz
054ae4180e Fix installation instruction for ubuntu 14.04 (#805)
* fix installation instruction for ubuntu 14.04

* upgrade cmake requirements

* fix
2017-08-02 18:14:14 -07:00
Robert Nishihara
37282330c0 Allow plasma manager to gracefully handle EPROTOTYPE. (#802)
* Allow plasma manager to gracefully handle EPROTOTYPE.

* Fix linting.
2017-08-01 23:33:25 -07:00
Philipp Moritz
c3b39b4d86 Pull Plasma from Apache Arrow and remove Plasma store from Ray. (#692)
* Rebase Ray on top of Plasma in Apache Arrow

* add thirdparty building scripts

* use rebased arrow

* fix

* fix build

* fix python visibility

* comment out C tests for now

* fix multithreading

* fix

* reduce logging

* fix plasma manager multithreading

* make sure old and new object IDs can coexist peacefully

* more rebasing

* update

* fixes

* fix

* install pyarrow

* install cython

* fix

* install newer cmake

* fix

* rebase on top of latest arrow

* getting runtest.py run locally (needed to comment out a test for that to work)

* work on plasma tests

* more fixes

* fix local scheduler tests

* fix global scheduler test

* more fixes

* fix python 3 bytes vs string

* fix manager tests valgrind

* fix documentation building

* fix linting

* fix c++ linting

* fix linting

* add tests back in

* Install without sudo.

* Set PKG_CONFIG_PATH in build.sh so that Ray can find plasma.

* Install pkg-config

* Link -lpthread, note that find_package(Threads) doesn't seem to work reliably.

* Comment in testGPUIDs in runtest.py.

* Set PKG_CONFIG_PATH when building pyarrow.

* Pull apache/arrow and not pcmoritz/arrow.

* Fix installation in docker image.

* adapt to changes of the plasma api

* Fix installation of pyarrow module.

* Fix linting.

* Use correct python executable to build pyarrow.
2017-07-31 21:04:15 -07:00
Yeolar
31329d43dd fixtypo: plasma_protocol (#764)
Fix typo in plasma_protocol.
2017-07-22 17:52:27 -07:00
Robert Nishihara
e0867c8845 Switch Python indentation from 2 spaces to 4 spaces. (#726)
* 4 space indentation for actor.py.

* 4 space indentation for worker.py.

* 4 space indentation for more files.

* 4 space indentation for some test files.

* Check indentation in Travis.

* 4 space indentation for some rl files.

* Fix failure test.

* Fix multi_node_test.

* 4 space indentation for more files.

* 4 space indentation for remaining files.

* Fixes.
2017-07-13 21:53:57 +00:00
Robert Nishihara
0926550661 Remove -mtune and -march compiler flags. (#697) 2017-06-26 05:52:45 +00:00
Robert Nishihara
ad480f8165 Don't reconstruct all objects in every fetch request in local scheduler. (#686)
* Don't reconstruct all objects in every fetch request in local scheduler.

* Separate out fetch timer and reconstruction timer.

* Fix bug.

* Bug fix.

* Fix naming convention for global variables.

* Address comments.

* Make reconstruct_counter a static variable.

* Fix linting.

* Redo reconstruct handler using a set of objects to fetch.

* Fix linting.

* Replace set with vector.
2017-06-23 21:08:02 +00:00
Robert Nishihara
5ebc2f3f2e Do resource bookkeeping for actor methods. (#682)
* Dispatch regular and actor tasks when resources become available.

* Make actor methods do resource bookkeeping and add test.

* Remove unnecessary field.

* Fix linting.

* Fix actor test.

* Maintain set of actors with pending tasks to speed up task dispatch.

* Exit early from task dispatch if there are no resources available.

* Fix linting.

* Fix error.

* Fix bug related to iterator invalidation.

* When an actor is removed, remove it from the set of actors with pending tasks.
2017-06-21 05:52:45 +00:00
Robert Nishihara
9e4a3e4972 Replace some UT data structures in local scheduler with C++ STL. (#680)
* Replace a local scheduler ut_array with a std::vector.

* Replace vector of sizes in local scheduler with std::pair.

* Remove utarray include.

* Replace utarray with std::vector for reading local scheduler input messages.

* Remove more UT data structures.

* Remove UT includes.

* Fix linting.

* Include stdlib.h to find size_t.

* Remove includes of stdbool.h.

* Replace std::pair with TaskQueueEntry.

* Fix redis tests.

* Reinstate tests.
2017-06-19 21:58:42 +00:00
Robert Nishihara
f12db5f0e2 Divide large plasma requests into smaller chunks, and wait longer before reissuing large requests. (#678)
* Divide large get requests into smaller chunks.

* Divide fetches into smaller chunks.

* Wait longer in worker and manager before reissuing fetch requests if there are many outstanding fetch requests.

* Log warning if a handler in the local scheduler or plasma manager takes more than one second.
2017-06-18 04:42:15 +00:00
Robert Nishihara
96962cdee0 Log fatal error if plasma manager or local scheduler heartbeats take too long. (#676)
* Log fatal error if plasma manager or local scheduler take too long to send heartbeat.

* Fix linting.

* Use int64_t for milliseconds since unix epoch.
2017-06-16 19:11:01 +00:00
Robert Nishihara
1916475e14 Increase socket listen backlog from 5 to 128. (#661) 2017-06-11 06:34:16 +00:00
Robert Nishihara
dd7f866a92 Fix compilation error on CentOS. (#622)
* Fix compilation error on CentOS.

* add TODO
2017-06-01 06:51:00 +00:00
Robert Nishihara
4d51ed37b2 Fix bug in which plasma client file descriptors were not closed. (#618)
* Fix bug in which plasma client file descriptors were not closed.

* Add logging statement when disconnecting client from plasma store.

* Fix after rebasing.

* Add more checks to plasma disconnect client.
2017-06-01 05:37:29 +00:00
Philipp Moritz
b94b4a35e0 Make the Plasma store ready for Arrow integration (#579)
* port plasma to arrow

* fixes

* refactor plasma client

* more modernization

* fix plasma manager tests

* everything compiles

* fix plasma client tests

* update plasma serialization tests

* fix plasma manager tests

* fix bug

* updates

* fix bug

* fix tests

* fix rebase

* address comments

* fix travis valgrind build

* fix linting

* fix include order again

* fix linting

* address comments
2017-05-31 16:24:23 -07:00
Richard Shin
16050eca8d Don't link Python extensions to libpython*.so (#598) 2017-05-25 19:01:12 -07:00
Philipp Moritz
3885d1b286 make builds with CMake incremental (#592) 2017-05-24 21:52:33 -07:00
Stephanie Wang
ee08c8274b Shard Redis. (#539)
* Implement sharding in the Ray core

* Single node Python modifications to do sharding

* Do the sharding in redis.cc

* Pipe num_redis_shards through start_ray.py and worker.py.

* Use multiple redis shards in multinode tests.

* first steps for sharding ray.global_state

* Fix problem in multinode docker test.

* fix runtest.py

* fix some tests

* fix redis shard startup

* fix redis sharding

* fix

* fix bug introduced by the map-iterator being consumed

* fix sharding bug

* shard event table

* update number of Redis clients to be 64K

* Fix object table tests by flushing shards in between unit tests

* Fix local scheduler tests

* Documentation

* Register shard locations in the primary shard

* Add plasma unit tests back to build

* lint

* lint and fix build

* Fix

* Address Robert's comments

* Refactor start_ray_processes to start Redis shard

* lint

* Fix global scheduler python tests

* Fix redis module test

* Fix plasma test

* Fix component failure test

* Fix local scheduler test

* Fix runtest.py

* Fix global scheduler test for python3

* Fix task_table_test_and_update bug, from actor task table submission race

* Fix jenkins tests.

* Retry Redis shard connections

* Fix test cases

* Convert database clients to DBClient struct

* Fix race condition when subscribing to db client table

* Remove unused lines, add APITest for sharded Ray

* Fix

* Fix memory leak

* Suppress ReconstructionTests output

* Suppress output for APITestSharded

* Reissue task table add/update commands if initial command does not publish to any subscribers.

* fix

* Fix linting.

* fix tests

* fix linting

* fix python test

* fix linting
2017-05-18 17:40:41 -07:00
Philipp Moritz
08e988aee5 Modernize plasma store (C to C++ changes). (#546) 2017-05-15 01:19:44 -07:00
Philipp Moritz
3a6922276a convert malloc.c to STL (#537)
* convert malloc.c to STL

* linting

* cleanup and comments

* address Richard's comments
2017-05-11 11:18:23 -07:00
Philipp Moritz
3a0e86395e Convert eviction code to STL (#534)
* temp commit

* convert eviction policy to C++

* temp commit

* fix plasma tests

* fix

* linting

* fixes

* fix linting
2017-05-09 21:26:22 -07:00
Stephanie Wang
e50a23b820 Fix bug with reused file descriptors (#471)
* Fix bug with reused file descriptors

* Remove client connection if write_object_chunk fails

* Handle ECONNRESET on unsuccessful write

* lint

* Back to lowercase

* fix compilation

* fix linting
2017-05-02 19:45:27 -07:00
Philipp Moritz
b7ace01b5f Convert Plasma client to STL (#486)
* convert mmap table to STL

* update

* fix

* convert objects_in_use

* fix

* convert release_history

* cleanup

* linting

* update

* fix

* linting
2017-04-25 01:25:40 -07:00
Philipp Moritz
8194b71f32 Convert pending_notifications to STL (#484)
* temp commit

* converted more plasma notifications

* cleanup

* rename

* linting

* fixes

* fixes
2017-04-24 14:41:34 -07:00
Philipp Moritz
892e53d69e Convert plasma client array and object notification queue to STL (#482)
* Conver plasma clients to STL

* use a deque for object notifications in plasma store for perf

* cleanup

* linting

* fix include order
2017-04-24 00:43:48 -07:00
Philipp Moritz
e36de2dad1 Convert object table to STL (#480)
* convert object table to stl

* temp commit

* fix

* comments

* linting
2017-04-23 22:24:05 -07:00
Alexey Tumanov
a67a107e0e Fix int-type compilation problem on redhat. (#472) 2017-04-19 02:43:33 -07:00
Philipp Moritz
8ac6c59931 Remove n^2 algorithm in plasma get (#466)
Remove n^2 algorithm in plasma get.
2017-04-17 23:37:33 -07:00
Philipp Moritz
6ffc849d23 Use Arrow Tensors for serializing numpy arrays and get rid of extra memcpy. (#436)
* Use Arrow Tensors for serializing numpy arrays and get rid of extra memcpy

* fix nondeterminism problem

* mark array as immutable

* make arrays contiguous

* fix serialize_list and deseralize_list

* fix numbuf tests

* linting

* add optimization flags

* fixes

* roll back arrow
2017-04-10 01:37:34 -07:00
Alexey Tumanov
6f9225490b Plasma manager performance: speed up wait with a wait request object map (#427)
* plasma manager perf: speedup wait with a wait request object map

* removing duplicate == operator in plasma store

* fix serialization test

* code cleanup

* minor cleanup

* factoring out uniqueid hash and equality operators into common

* plasma manager: c++ify the WaitRequest struct

* plasma manager: get rid of the initial object request malloc

* cleanup

* linting

* cleanups and fix compiler warnings

* compiler warnings and linting
2017-04-07 12:32:12 -07:00
Stephanie Wang
93679df724 Stopped nodes can rejoin immediately (#428)
* Ignore deleted clients when reading address info from Redis

* Remove self from db_client table when exiting cleanly

* Fix valgrind test

* Do not call plasma_perform_release when disconnecting
2017-04-05 23:50:38 -07:00
Philipp Moritz
4043769ba2 Make putting large objects work. (#411)
* putting large objects

* add more checks

* support large objects

* fix test

* fix linting

* upgrade to latest arrow version

* check malloc return code

* print mmap file sizes

* printing

* revert to dlmalloc

* add prints

* more prints

* add printing

* printing

* fix

* update

* fix

* update

* print

* initialization

* temp

* fix

* update

* fix linting

* comment out object_store_full tests

* fix test

* fix test

* evict objects if dlmalloc fails

* fix stresstests

* Fix linting.

* Uncomment large-memory tests.

* Increase memory for docker image for jenkins tests.

* Reduce large memory tests.

* Further reduce large memory tests.
2017-04-05 01:04:05 -07:00