Commit graph

1962 commits

Author SHA1 Message Date
Philipp Moritz
4157bcb80b Improve deserialization performance by rebasing on latest arrow (#1129)
* improve serialization performance by rebasing on latest arrow

* update

* revert worker.py
2017-10-17 14:56:11 -07:00
Robert Nishihara
f3e3c7ec71 Add is_actor_checkpoint_method to TaskSpec. (#1117)
* Add is_actor_checkpoint_method to TaskSpec.

* Fix linting.

* Fix rebase error.

* Fix errors from rebase.
2017-10-15 16:52:10 -07:00
Robert Nishihara
d6062ef8f6 Compile with -rdynamic for better debugging symbols. (#1123)
* Compile with -rdynamic.

* Only use -rdynamic on Linux.

* Add comment.
2017-10-13 21:39:11 -07:00
Stephanie Wang
15486a14a0 Refactor actor task queues (#1118)
* Refactor add_task_to_actor_queue into queue_actor_task and insert_actor_task_queue

* Refactor actor task queue to share the waiting task queue

* Fix
2017-10-13 20:52:11 -07:00
Robert Nishihara
486cb64e3f Compile with -Werror and -Wall (#1116)
* Compile global scheduler with -Werror -Wall.

* Compile plasma manager with -Werror -Wall.

* Compile local scheduler with -Werror -Wall.

* Compile common code with -Werror -Wall.

* Signed/unsigned comparisons.

* More signed/unsigned fixes.

* More signed/unsigned fixes and added extern keyword.

* Fix linting.

* Don't check strict-aliasing because Python.h doesn't pass.
2017-10-12 21:00:23 -07:00
Stephanie Wang
3764f2f2e1 Actor checkpointing with object lineage reconstruction (#1004)
* Worker reports error in previous task, actor task counter is incremented after task is successful

* Refactor actor task execution

- Return new task counter in GetTaskRequest
- Update worker state for actor tasks inside of the actor method
  executor

* Manually invoked checkpoint method

* Scheduling for actor checkpoint methods

* Fix python bugs in checkpointing

* Return task success from worker to local scheduler instead of actor counter

* Kill local schedulers halfway through actor execution instead of waiting for all tasks to execute once

* Remove redundant actor tasks during dispatch, reconstruct missing dependencies for actor tasks

* Make executor for temporary actor methods

* doc

* Set default argument for whether the previous task was a success

* Refactor actor method call

* Simplify checkpoint task submission

* lint

* fix philipp's comments

* Add missing line

* Make actor reconstruction tests run faster

* Unimportant whitespace.

* Unimportant whitespace.

* Update checkpoint method signature

* Documentation and handle exceptions during checkpoint save/resume

* Rename get_task message field to actor_checkpoint_failed

* Fix bug.

* Remove debugging check, redirect test output
2017-10-12 09:53:32 -07:00
Robert Nishihara
b585001881 When a task is passed to the global scheduler, if it is not received,… (#1106)
* When a task is passed to the global scheduler, if it is not received, then try again.
* Call give_task_to_global_scheduler directly (same with local).
2017-10-12 00:04:38 -07:00
Robert Nishihara
9f1e385335 Return errno from handle_sigpipe. (#1051) 2017-10-11 18:36:28 -07:00
Peter Schafhalter
46f6c163dc Converted ClientConnection to C++ standard library (#1099) 2017-10-11 11:12:15 -07:00
Stephanie Wang
1e0ab3d386 Switch to monotonic clock (#1100) 2017-10-10 22:35:21 -07:00
Philipp Moritz
0684258d2e Update arrow to include pandas serialization (#1102)
* update arrow to include pandas serialization

* update
2017-10-10 22:16:35 -07:00
Robert Nishihara
8f1a73f041 Allow Ray to be built without UI by setting INCLUDE_UI=0. (#1094)
* Allow building Ray without UI by setting INCLUDE_UI=0.

* Fix bash.

* Fix linting.
2017-10-09 23:32:38 -07:00
Stephanie Wang
aebe9f9374 Fix actor garbage collection by breaking cyclic references (#1064)
* Fix bug in wait_for_pid_to_exit, add test for actor deletion.

* Fix actor garbage collection by breaking cyclic references

* Add test for calling actor method immediately after actor creation.

* Fix bug, must dispatch tasks when workers are killed.

* Fix python test

* Fix cyclic reference problem by creating ActorMethod objects on the fly.

* Try simply increasing the time allowed for many_drivers_test.py.
2017-10-05 00:55:33 -07:00
Mitar
a0d3fb1de1 Fix Arrow's repository URL. (#1072)
Thanks!
2017-10-03 21:40:21 -07:00
Robert Nishihara
0dcf36c91e Switch Arrow commit. (#1068) 2017-10-03 13:56:53 -07:00
Philipp Moritz
57bd1d6ff5 Specialize Serialization for OrderedDict (#1035)
Specialize Serialization for OrderedDict and defaultdict
2017-10-02 17:33:10 -07:00
Robert Nishihara
1488975d1b Add timing statement to loop that calls redis_get_cached_db_client be… (#1045)
* Add timing statement to loop that calls redis_get_cached_db_client because it has been slow in the past.

* Fix linting.

* Refactoring to make manager vectors into std::vector.

* Fix linting.

* Fixes.
2017-10-02 10:46:21 -07:00
Robert Nishihara
a31d138f21 Don't log when a worker can't be started. (#1056) 2017-10-02 10:32:46 -07:00
Philipp Moritz
79e013e876 upgrade to latest arrow to fix XCode 9 problem (#1042) 2017-09-30 16:24:59 -07:00
Robert Nishihara
ce278aa06a Fix valgrind tests. (#1037)
* Comment out local scheduler valgrind test.

* Fix free/delete error.

* More free -> delete errors

* One more free -> delete and also clean up callback state in plasma manager.

* Add set -x to run_valgrind scripts.

* Fix valgrind error in CreateLocalSchedulerInfoMessage.
2017-09-30 00:11:09 -07:00
Eric Liang
ba153adc4c Downgrade severity of most common messages (#1039)
* downgrade severity of most common messages

* update
2017-09-30 00:01:49 -07:00
Eric Liang
b118cef49e [webui] Allow timeline scroll-to-zoom without holding ALT (#993)
* Allow timeline scroll-to-zoom without holding ALT

* Update build_ui.sh

* Update build_ui.sh

* Update build_ui.sh

* Update build_ui.sh

* Retry when getting catapult.
2017-09-29 21:35:12 -07:00
Peter Schafhalter
10027974b1 Replaced ObjectWaitRequests with unordered map (#990)
* Replaced ObjectWaitRequests with unordered map

* Pass C++ STL object by reference

* Formatting changes and typos.
2017-09-28 15:29:26 -07:00
Zongheng Yang
427dee511b Fill out specs of the task table in ray_redis_module.cc. (#1024)
* Fill out specs of the task table in ray_redis_module.cc.

* local scheduler field in task table

* linting
2017-09-27 23:45:58 -07:00
Peter Schafhalter
bb76d4ca0a PlasmaRequestBuffer data structure updates (#1023)
* Replaced utstring with std::string

* Converted transfer_queue to a list

* Converted pending_object_transfers to unordered_map

* Fix free/delete bug and small modifications.
2017-09-27 19:50:37 -07:00
Robert Nishihara
116fe168b5 Download boost 1.65.1 from bintray. (#1019)
* Download boost 1.65.1 from bintray.

* Pass --no-check-certificate to wget.
2017-09-27 13:25:05 -07:00
Zongheng Yang
5a50e80b63 Make Monitor remove dead Redis entries from exiting drivers. (#994)
* WIP: removing OL, OI, TT on client exit; no saving yet.

* ray_redis_module.cc: update header comment.

* Cleanup: just the removal.

* Reformat via yapf: use pep8 style instead of google.

* Checkpoint addressing comments (partially)

* Add 'b' marker before strings (py3 compat)

* Add MonitorTest.

* Use `isort` to sort imports.

* Remove some loggings

* Fix flake8 noqa marker runtest.py

* Try to separate tests out to monitor_test.py

* Rework cleanup algorithm: correct logic

* Extend tests to cover multi-shard cases

* Add some small comments and formatting changes.
2017-09-26 00:11:38 -07:00
Peter Schafhalter
6e9657e696 Replaced utstring with std::string (#1009) 2017-09-24 22:42:17 -07:00
Peter Schafhalter
241612709e Data structure updates to plasma manager (#937)
* Implemented local_available_objects as an unordered set

* Implemented fetch_requests as an unordered map

* Fixed bug and changed fetch_requests from pointer to object

* free(PlasmaManagerState *) -> delete PlasmaManagerState *

* removed unnecessary newline

* Make local_available_objects not a pointer.

* Attempt to safely iterate over unordered_map and remove elements.
2017-09-15 20:09:29 -07:00
Robert Nishihara
413140df38 Autogenerate catapult files if they are not already present. (#978)
* Autogenerate catapult files if they are not already present.

* Fix bash syntax.
2017-09-14 12:37:33 -07:00
Stephanie Wang
74ac80631b Local scheduler sends a null heartbeat to global scheduler (#962)
* Local scheduler sends a null heartbeat to global scheduler to notify death

* Add whitespace.

* Speed up component failures test

* Free local scheduler state upon plasma manager disconnection
2017-09-12 10:45:21 -07:00
Stephanie Wang
99c8b1f38c Actor fault tolerance using object lineage reconstruction (#902)
* Revert Python actor reconstruction

* Actor reconstruction using object lineage

* Add dummy arguments and return values for actor tasks

* Pin dummy outputs for actor tasks

* Skip checkpointing test for now

* TODOs

* minor edits

* Generate dummy object dependencies in Python, not C

* Fix linting.

* Move actor counter and dummy objects inside of the actor handle

* Refactor Worker._process_task, suppress exception propagation for
sequential actor tasks
2017-09-10 19:29:28 -07:00
Robert Nishihara
f3c1248d98 Clone catapult and generate html files during installation. (#956)
* Clone catapult and generate static html during setup.

* Include UI files in installation.

* Fix directory to clone catapult to and fix linting.

* Use absolute path.

* Make sure we find a sufficiently new version of python2 when building wheels.

* Copy the trace_viewer_full.html file to the local directory if it is not present.

* Make sure wheels fail to build if UI is not included.
2017-09-10 13:41:16 -07:00
Philipp Moritz
546ba23ceb Upgrade to latest arrow to include set serialization speedups (#957)
* update arrow to pull in the set serialization speedups

* remove _register_class for set
2017-09-10 00:12:17 -07:00
Peter Schafhalter
8906a920f7 Implemented wait_requests as vector (#943) 2017-09-08 13:39:54 -07:00
Philipp Moritz
7030ef366f Rebase Ray on latest arrow (remove numbuf from Ray). (#910)
* remove some stuff

* put get roundtrip working

* fixes

* more fixes

* cleanup

* fix tests

* latest arrow

* fixes

* fix tests

* fix linting

* rebase

* fixes

* fix bug

* bring back libgcc error

* fix linting

* use official arrow repo

* fixes
2017-09-04 22:58:49 -07:00
Robert Nishihara
d8010723d7 Attempt to wget boost up to 20 times during installation. (#927) 2017-09-04 14:42:29 -07:00
Stephanie Wang
ae0212b399 Fix failing task table test (#924) 2017-09-03 22:41:38 -07:00
Peter Schafhalter
2c19ae97a3 Implemented db_client_cache as unordered_map (#921)
* Implemented db_client_cache as unordered_map

* Fix for memory leak

* Fixed linting
2017-09-03 17:26:05 -07:00
Stephanie Wang
7496c98010 Fault tolerance race (#894)
* Remove race between local scheduler disconnecting and global scheduler
assigning a task

* Fix number of workers started in component failures test

* Fix race between global scheduler retrying a task assignment and monitor
cleaning up task table. The global scheduler should only retry the task
assignment if the local scheduler is still alive.

* Clean up task_table_update callback if failure

* Look up current local scheduler mapping when retrying actor task submission

* Log warning if no subscribers received a task table update

* Clean up database handle memory in local scheduler
2017-08-30 22:20:50 -07:00
Robert Nishihara
e6de744ef4 Fix potential bug in redis.cc. (#851) 2017-08-23 20:38:25 -07:00
Robert Nishihara
be4beb19c1 Changes to build to fix creation of wheels. (#840)
* Pass DPYTHON_EXECUTABLE into cmake for arrow and for ray.

* Add cython to setup.py install_requires.

* Revert custom code for finding python in cmake.

* Correctly find arrow on CentOS.

* In cmake, don't find PythonLibs, just find PYTHON_INCLUDE_DIRS.

* Fix typo.

* Do not use boost shared libraries when building arrow.

* Add six to the setup.py install_requires because it is needed by pyarrow.

* Don't link numbuf against boost_system and boost_filesystem.

* Compile boost when we are on Linux.

* Make numbuf find the correct boost libraries.

* Only use find_package Boost on Linux, suppress output when building boost.

* Changes to wheel building scripts, install cython in mac script.

* Compile flatbuffers ourselves on Linux and pass it in when compiling Arrow.

* Clean up build_flatbuffers.sh and build_boost.sh scripts a little.

* Install cython when building linux wheel.
2017-08-21 17:49:35 -07:00
Robert Nishihara
ea8da13938 Remove UT data structures from global scheduler. (#838)
* Replace pending_tasks utarray with vector.

* Replace local_schedulers vector with unordered_map.

* Replace object info table with unordered_map.

* Replace local_scheduler_plasma_map and plasma_local_scheduler_map with unordered maps.

* Remove unnecessary includes.

* Fix linting.

* Bug fixes.

* Add function for computing the amount of data for a task that wouldn't have to be shipped because it is already accessible to a local scheduler.

* Small cleanups.
2017-08-16 22:28:21 -07:00
Alexey Tumanov
fc885bd918 Adding basic support for a user-interpretable resource label (#761)
* adding support for the user-interpretable label(UIR)

* more plumbing for num_uirs further upstream; set to infty when specified on cmd line

* pass default num_uirs for actors; update GlobalStateAPI

* support num_uirs in ray.init()

* local scheduler resource accounting: support num_uirs; prep for vectorized resource accounting

* global scheduler test updated

* Fix bug introduced by rebase.

* Rename UIR -> CustomResource and add test.

* Small changes and use constexpr instead of macros.

* Linting and some renaming.

* Reorder some code.

* Remove cpus_in_use and fix bug.

* Add another test and make a small change.

* Rephrase documentation about feature stability.
2017-08-08 02:53:59 -07:00
Robert Nishihara
03f2325780 Package pyarrow along with ray. (#822)
* Rough pass at installing pyarrow along with Ray.

* Remove hardcoded path and try to find correct path automatically.

* Add print.

* Fix linting.

* Copy pyarrow files to a location that we manually add to python path in order to avoid interfering with pre-existing pyarrow installations.

* Move call to build.sh back into build_ext in setup.py.

* Ignore some linting errors.

* Fix problem in which pyarrow files to copy were listed before they were built.

* Fix tests by importing ray before pyarrow.
2017-08-07 21:17:28 -07:00
Robert Nishihara
d7b10a84b6 Fallback to custom serializer for very long python ints. (#821)
* Fallback to custom serializer for very long python ints.

* Fix linting.

* Fix naming convention and add RETURN_NOT_OK.
2017-08-07 17:21:06 -07:00
Robert Nishihara
3071ba0070 Add correct Python executable to Path when building arrow. (#820)
* Tell cmake which python to use when building arrow.

* Pass different path into cmake when building arrow so that cmake finds the right python.

* Add correct python executable to PATH when running cmake for ray.
2017-08-07 14:47:34 -07:00
Philipp Moritz
054ae4180e Fix installation instruction for ubuntu 14.04 (#805)
* fix installation instruction for ubuntu 14.04

* upgrade cmake requirements

* fix
2017-08-02 18:14:14 -07:00
Robert Nishihara
cb84972f6b Recreate actors when local schedulers die. (#804)
* Reconstruct actor state when local schedulers fail.

* Simplify construction of arguments to pass into default_worker.py from local scheduler.

* Remove deprecated ray.actor.

* Simplify actor reconstruction method.

* Fix linting.

* Small fixes.
2017-08-02 18:02:52 -07:00
Robert Nishihara
37282330c0 Allow plasma manager to gracefully handle EPROTOTYPE. (#802)
* Allow plasma manager to gracefully handle EPROTOTYPE.

* Fix linting.
2017-08-01 23:33:25 -07:00