Commit graph

474 commits

Author SHA1 Message Date
Peter Schafhalter
2c19ae97a3 Implemented db_client_cache as unordered_map (#921)
* Implemented db_client_cache as unordered_map

* Fix for memory leak

* Fixed linting
2017-09-03 17:26:05 -07:00
Stephanie Wang
7496c98010 Fault tolerance race (#894)
* Remove race between local scheduler disconnecting and global scheduler
assigning a task

* Fix number of workers started in component failures test

* Fix race between global scheduler retrying a task assignment and monitor
cleaning up task table. The global scheduler should only retry the task
assignment if the local scheduler is still alive.

* Clean up task_table_update callback if failure

* Look up current local scheduler mapping when retrying actor task submission

* Log warning if no subscribers received a task table update

* Clean up database handle memory in local scheduler
2017-08-30 22:20:50 -07:00
Robert Nishihara
e6de744ef4 Fix potential bug in redis.cc. (#851) 2017-08-23 20:38:25 -07:00
Robert Nishihara
be4beb19c1 Changes to build to fix creation of wheels. (#840)
* Pass DPYTHON_EXECUTABLE into cmake for arrow and for ray.

* Add cython to setup.py install_requires.

* Revert custom code for finding python in cmake.

* Correctly find arrow on CentOS.

* In cmake, don't find PythonLibs, just find PYTHON_INCLUDE_DIRS.

* Fix typo.

* Do not use boost shared libraries when building arrow.

* Add six to the setup.py install_requires because it is needed by pyarrow.

* Don't link numbuf against boost_system and boost_filesystem.

* Compile boost when we are on Linux.

* Make numbuf find the correct boost libraries.

* Only use find_package Boost on Linux, suppress output when building boost.

* Changes to wheel building scripts, install cython in mac script.

* Compile flatbuffers ourselves on Linux and pass it in when compiling Arrow.

* Clean up build_flatbuffers.sh and build_boost.sh scripts a little.

* Install cython when building linux wheel.
2017-08-21 17:49:35 -07:00
Robert Nishihara
ea8da13938 Remove UT data structures from global scheduler. (#838)
* Replace pending_tasks utarray with vector.

* Replace local_schedulers vector with unordered_map.

* Replace object info table with unordered_map.

* Replace local_scheduler_plasma_map and plasma_local_scheduler_map with unordered maps.

* Remove unnecessary includes.

* Fix linting.

* Bug fixes.

* Add function for computing the amount of data for a task that wouldn't have to be shipped because it is already accessible to a local scheduler.

* Small cleanups.
2017-08-16 22:28:21 -07:00
Alexey Tumanov
fc885bd918 Adding basic support for a user-interpretable resource label (#761)
* adding support for the user-interpretable label(UIR)

* more plumbing for num_uirs further upstream; set to infty when specified on cmd line

* pass default num_uirs for actors; update GlobalStateAPI

* support num_uirs in ray.init()

* local scheduler resource accounting: support num_uirs; prep for vectorized resource accounting

* global scheduler test updated

* Fix bug introduced by rebase.

* Rename UIR -> CustomResource and add test.

* Small changes and use constexpr instead of macros.

* Linting and some renaming.

* Reorder some code.

* Remove cpus_in_use and fix bug.

* Add another test and make a small change.

* Rephrase documentation about feature stability.
2017-08-08 02:53:59 -07:00
Robert Nishihara
03f2325780 Package pyarrow along with ray. (#822)
* Rough pass at installing pyarrow along with Ray.

* Remove hardcoded path and try to find correct path automatically.

* Add print.

* Fix linting.

* Copy pyarrow files to a location that we manually add to python path in order to avoid interfering with pre-existing pyarrow installations.

* Move call to build.sh back into build_ext in setup.py.

* Ignore some linting errors.

* Fix problem in which pyarrow files to copy were listed before they were built.

* Fix tests by importing ray before pyarrow.
2017-08-07 21:17:28 -07:00
Robert Nishihara
d7b10a84b6 Fallback to custom serializer for very long python ints. (#821)
* Fallback to custom serializer for very long python ints.

* Fix linting.

* Fix naming convention and add RETURN_NOT_OK.
2017-08-07 17:21:06 -07:00
Robert Nishihara
3071ba0070 Add correct Python executable to Path when building arrow. (#820)
* Tell cmake which python to use when building arrow.

* Pass different path into cmake when building arrow so that cmake finds the right python.

* Add correct python executable to PATH when running cmake for ray.
2017-08-07 14:47:34 -07:00
Philipp Moritz
054ae4180e Fix installation instruction for ubuntu 14.04 (#805)
* fix installation instruction for ubuntu 14.04

* upgrade cmake requirements

* fix
2017-08-02 18:14:14 -07:00
Robert Nishihara
cb84972f6b Recreate actors when local schedulers die. (#804)
* Reconstruct actor state when local schedulers fail.

* Simplify construction of arguments to pass into default_worker.py from local scheduler.

* Remove deprecated ray.actor.

* Simplify actor reconstruction method.

* Fix linting.

* Small fixes.
2017-08-02 18:02:52 -07:00
Robert Nishihara
37282330c0 Allow plasma manager to gracefully handle EPROTOTYPE. (#802)
* Allow plasma manager to gracefully handle EPROTOTYPE.

* Fix linting.
2017-08-01 23:33:25 -07:00
Robert Nishihara
8c8258de20 Move worker methods into Worker class and expose more TaskSpec fields to Python. (#796)
* Move worker methods inside worker class. Move some helper methods from actor.py into utils.py and state.py.

* Add more methods exposing task spec fields to Python.

* Fix linting.

* Fix error.

* Remove unused code in default worker.
2017-08-01 17:16:57 -07:00
Philipp Moritz
c3b39b4d86 Pull Plasma from Apache Arrow and remove Plasma store from Ray. (#692)
* Rebase Ray on top of Plasma in Apache Arrow

* add thirdparty building scripts

* use rebased arrow

* fix

* fix build

* fix python visibility

* comment out C tests for now

* fix multithreading

* fix

* reduce logging

* fix plasma manager multithreading

* make sure old and new object IDs can coexist peacefully

* more rebasing

* update

* fixes

* fix

* install pyarrow

* install cython

* fix

* install newer cmake

* fix

* rebase on top of latest arrow

* getting runtest.py run locally (needed to comment out a test for that to work)

* work on plasma tests

* more fixes

* fix local scheduler tests

* fix global scheduler test

* more fixes

* fix python 3 bytes vs string

* fix manager tests valgrind

* fix documentation building

* fix linting

* fix c++ linting

* fix linting

* add tests back in

* Install without sudo.

* Set PKG_CONFIG_PATH in build.sh so that Ray can find plasma.

* Install pkg-config

* Link -lpthread, note that find_package(Threads) doesn't seem to work reliably.

* Comment in testGPUIDs in runtest.py.

* Set PKG_CONFIG_PATH when building pyarrow.

* Pull apache/arrow and not pcmoritz/arrow.

* Fix installation in docker image.

* adapt to changes of the plasma api

* Fix installation of pyarrow module.

* Fix linting.

* Use correct python executable to build pyarrow.
2017-07-31 21:04:15 -07:00
Robert Nishihara
8ad9ced99b Fix task ID hash computation. (#774) 2017-07-26 10:08:38 -07:00
Yeolar
31329d43dd fixtypo: plasma_protocol (#764)
Fix typo in plasma_protocol.
2017-07-22 17:52:27 -07:00
Robert Nishihara
e0867c8845 Switch Python indentation from 2 spaces to 4 spaces. (#726)
* 4 space indentation for actor.py.

* 4 space indentation for worker.py.

* 4 space indentation for more files.

* 4 space indentation for some test files.

* Check indentation in Travis.

* 4 space indentation for some rl files.

* Fix failure test.

* Fix multi_node_test.

* 4 space indentation for more files.

* 4 space indentation for remaining files.

* Fixes.
2017-07-13 21:53:57 +00:00
alanamarzoev
8464d77c76 Change event logs to store one Redis ZSET per worker. (#705)
* Changing to zset

* Fixed bug.

* Fixed another bug.

* Modified task_profiles.

* Removed extra file.

* Modified task_profiles test.

* WIP

* WIP

* Undid changes

* Updated

* WIP

* Made changes according to comments.

* Removed unneeded print.

* Removed ujson usage.

* failing test

* tests passing

* Fixed linting errors and modified style.

* Fixed bug.

* Fixed linting

* Fixed according to comments.

* Redis crashing?

* Fixed linting

* Fixed linting
2017-07-09 01:42:29 +02:00
Robert Nishihara
6c45657280 Reset the SIGCHLD handler after forking a worker to avoid influencing the worker. (#713) 2017-07-07 14:50:37 +00:00
Robert Nishihara
1941e0f7b1 Fix compilation on CentOS. (#699) 2017-06-26 05:54:21 +00:00
Robert Nishihara
0926550661 Remove -mtune and -march compiler flags. (#697) 2017-06-26 05:52:45 +00:00
Robert Nishihara
ad480f8165 Don't reconstruct all objects in every fetch request in local scheduler. (#686)
* Don't reconstruct all objects in every fetch request in local scheduler.

* Separate out fetch timer and reconstruction timer.

* Fix bug.

* Bug fix.

* Fix naming convention for global variables.

* Address comments.

* Make reconstruct_counter a static variable.

* Fix linting.

* Redo reconstruct handler using a set of objects to fetch.

* Fix linting.

* Replace set with vector.
2017-06-23 21:08:02 +00:00
Robert Nishihara
5ebc2f3f2e Do resource bookkeeping for actor methods. (#682)
* Dispatch regular and actor tasks when resources become available.

* Make actor methods do resource bookkeeping and add test.

* Remove unnecessary field.

* Fix linting.

* Fix actor test.

* Maintain set of actors with pending tasks to speed up task dispatch.

* Exit early from task dispatch if there are no resources available.

* Fix linting.

* Fix error.

* Fix bug related to iterator invalidation.

* When an actor is removed, remove it from the set of actors with pending tasks.
2017-06-21 05:52:45 +00:00
Robert Nishihara
3052ce25a6 Divide up large fetch requests from local scheduler, also print warni… (#683)
* Divide up large fetch requests from local scheduler, also print warning if fetch handler is slow.

* Fix linting.

* Fix typo.
2017-06-19 22:57:51 +00:00
Robert Nishihara
9e4a3e4972 Replace some UT data structures in local scheduler with C++ STL. (#680)
* Replace a local scheduler ut_array with a std::vector.

* Replace vector of sizes in local scheduler with std::pair.

* Remove utarray include.

* Replace utarray with std::vector for reading local scheduler input messages.

* Remove more UT data structures.

* Remove UT includes.

* Fix linting.

* Include stdlib.h to find size_t.

* Remove includes of stdbool.h.

* Replace std::pair with TaskQueueEntry.

* Fix redis tests.

* Reinstate tests.
2017-06-19 21:58:42 +00:00
Robert Nishihara
f12db5f0e2 Divide large plasma requests into smaller chunks, and wait longer before reissuing large requests. (#678)
* Divide large get requests into smaller chunks.

* Divide fetches into smaller chunks.

* Wait longer in worker and manager before reissuing fetch requests if there are many outstanding fetch requests.

* Log warning if a handler in the local scheduler or plasma manager takes more than one second.
2017-06-18 04:42:15 +00:00
alanamarzoev
4d5ac9dad5 Include object size and hash in the table returned by the object_table function in the GlobalStateAPI. (#665)
* added log_table function and a test

* fixed log_files and added task_profiles

* fixed formatting

* fixed linting errors

* fixes

* removed file

* more fixes

* hopefully fixed

* Small changes.

* Fix linting.

* Fix bug in log monitor.

* Small changes.

* Fix bug in travis.

* Including data_size and hash in the ResultTableReply.

* Included data_size and hash info in object_table.

* Fixed bugs in ray_redis_module.cc.

* Removing commented out code.

* Fixes

* Freed hash and data_size strings after using, and checked if they're null along with task_id and is_put.

* Changed it so that data_size is set correctly.

* Removed iostream import.

* Included a check to ensure that the Redis string to long long conversion was successful.

* Included separate data_size and hash null checks.

* Fixed bug.

* Made linting changes.

* Another linting error.

* Slight simplication.
2017-06-16 23:17:11 -07:00
Robert Nishihara
96962cdee0 Log fatal error if plasma manager or local scheduler heartbeats take too long. (#676)
* Log fatal error if plasma manager or local scheduler take too long to send heartbeat.

* Fix linting.

* Use int64_t for milliseconds since unix epoch.
2017-06-16 19:11:01 +00:00
Philipp Moritz
c343df832e use multiple threads for memcpy (#669) 2017-06-14 19:14:24 -07:00
Philipp Moritz
54925996ca Allow remote functions to specify max executions and kill worker once limit is reached. (#660)
* implement restarting workers after certain number of task executions

* Clean up python code.

* Don't start new worker when an actor disconnects.

* Move wait_for_pid_to_exit to test_utils.py.

* Add test.

* Fix linting errors.

* Fix linting.

* Fix typo.
2017-06-13 00:34:58 -07:00
Robert Nishihara
1916475e14 Increase socket listen backlog from 5 to 128. (#661) 2017-06-11 06:34:16 +00:00
Eric Liang
d4d2c03ac5 Remove timeout for Redis commands. (#649)
* update

* Remove interaction between callback data identifier and event loop.

* Remove tests that no longer apply.
2017-06-09 15:55:36 -07:00
Philipp Moritz
0254efa5e8 Use parallel memcopy from arrow (#633)
* use parallel memcopy from arrow

* fix linting

* remove memory.h
2017-06-02 18:18:41 -07:00
Robert Nishihara
a4d8e13094 Suppress excess warning messages related to intentional actor deaths. (#627)
* Don't submit the actor destructor tasks when the job is exiting.

* Don't propagate error messages to the driver when an actor exits intentionally.
2017-06-01 20:10:40 +00:00
Robert Nishihara
dd7f866a92 Fix compilation error on CentOS. (#622)
* Fix compilation error on CentOS.

* add TODO
2017-06-01 06:51:00 +00:00
Robert Nishihara
5f193afb87 Tell local scheduler to ignore SIGCHLD so that workers don't become zombies. (#620) 2017-06-01 06:37:28 +00:00
Robert Nishihara
4d51ed37b2 Fix bug in which plasma client file descriptors were not closed. (#618)
* Fix bug in which plasma client file descriptors were not closed.

* Add logging statement when disconnecting client from plasma store.

* Fix after rebasing.

* Add more checks to plasma disconnect client.
2017-06-01 05:37:29 +00:00
Philipp Moritz
b94b4a35e0 Make the Plasma store ready for Arrow integration (#579)
* port plasma to arrow

* fixes

* refactor plasma client

* more modernization

* fix plasma manager tests

* everything compiles

* fix plasma client tests

* update plasma serialization tests

* fix plasma manager tests

* fix bug

* updates

* fix bug

* fix tests

* fix rebase

* address comments

* fix travis valgrind build

* fix linting

* fix include order again

* fix linting

* address comments
2017-05-31 16:24:23 -07:00
Richard Shin
16050eca8d Don't link Python extensions to libpython*.so (#598) 2017-05-25 19:01:12 -07:00
Philipp Moritz
3885d1b286 make builds with CMake incremental (#592) 2017-05-24 21:52:33 -07:00
Stephanie Wang
ee08c8274b Shard Redis. (#539)
* Implement sharding in the Ray core

* Single node Python modifications to do sharding

* Do the sharding in redis.cc

* Pipe num_redis_shards through start_ray.py and worker.py.

* Use multiple redis shards in multinode tests.

* first steps for sharding ray.global_state

* Fix problem in multinode docker test.

* fix runtest.py

* fix some tests

* fix redis shard startup

* fix redis sharding

* fix

* fix bug introduced by the map-iterator being consumed

* fix sharding bug

* shard event table

* update number of Redis clients to be 64K

* Fix object table tests by flushing shards in between unit tests

* Fix local scheduler tests

* Documentation

* Register shard locations in the primary shard

* Add plasma unit tests back to build

* lint

* lint and fix build

* Fix

* Address Robert's comments

* Refactor start_ray_processes to start Redis shard

* lint

* Fix global scheduler python tests

* Fix redis module test

* Fix plasma test

* Fix component failure test

* Fix local scheduler test

* Fix runtest.py

* Fix global scheduler test for python3

* Fix task_table_test_and_update bug, from actor task table submission race

* Fix jenkins tests.

* Retry Redis shard connections

* Fix test cases

* Convert database clients to DBClient struct

* Fix race condition when subscribing to db client table

* Remove unused lines, add APITest for sharded Ray

* Fix

* Fix memory leak

* Suppress ReconstructionTests output

* Suppress output for APITestSharded

* Reissue task table add/update commands if initial command does not publish to any subscribers.

* fix

* Fix linting.

* fix tests

* fix linting

* fix python test

* fix linting
2017-05-18 17:40:41 -07:00
Robert Nishihara
9018dffd7f Fix bug in actor task dispatch. (#552)
* Fix bug in actor task dispatch.

* Return early from dispatch_actor_task if creation notification has not arrived. Also fix comment.
2017-05-15 23:47:15 -07:00
Philipp Moritz
08e988aee5 Modernize plasma store (C to C++ changes). (#546) 2017-05-15 01:19:44 -07:00
Eric Liang
e2e9e4ce6f Fix segmentation fault when calling ray.put on a dictionary with object keys (#548)
* fix segfault when serializing dict key

* fix style

* fix test

* Fix linting.
2017-05-15 01:09:13 -07:00
Philipp Moritz
3a6922276a convert malloc.c to STL (#537)
* convert malloc.c to STL

* linting

* cleanup and comments

* address Richard's comments
2017-05-11 11:18:23 -07:00
Philipp Moritz
c1e9496a06 fix problem if old version of arrow is cloned (#538) 2017-05-10 12:16:07 -07:00
Philipp Moritz
3a0e86395e Convert eviction code to STL (#534)
* temp commit

* convert eviction policy to C++

* temp commit

* fix plasma tests

* fix

* linting

* fixes

* fix linting
2017-05-09 21:26:22 -07:00
Philipp Moritz
118fac5619 Remove boost dependencies from Ray (#518)
* remove boost regex

* workaround for boost

* fix

* do not link against boost any more

* rebased on arrow change
2017-05-09 16:17:20 -07:00
Philipp Moritz
e5e2aab5e4 upgrade arrow and fix bug (#530)
* upgrade arrow and fix bug

* fixes suggested by Wes
2017-05-09 13:58:42 -07:00
Philipp Moritz
0681107039 add serializing numpy boolean (#529) 2017-05-08 22:24:02 -07:00