Commit graph

1638 commits

Author SHA1 Message Date
Robert Nishihara
9b9ff19dd0 Use automatic memory management in Redis modules. (#1797) 2018-05-22 01:05:09 -07:00
eric-jj
eb078766d8 Performance fix (#2110) 2018-05-20 18:07:55 -07:00
Kunal Gosar
eba73449cc fix unused lambda capture (#2102) 2018-05-19 13:27:10 -07:00
Melih Elibol
f1da721522
[xray] Use pubsub instead of timeout for ObjectManager Pull. (#2079)
Use pubsub instead of timeout for Pull.
2018-05-18 21:35:12 -07:00
Yujie Liu
5918776dd4 [JavaWorker] Changes to the build system for support java worker (#2092)
* Changes to the build system for support java worker
--------------------------
This commit includes changes to the build system, which is part of the java worker support of Ray.
It consists of the following changes:
 - the changes of CMakeLists.txt files
 - the changes of the python setup.py and init files for the adaptation of the changed build system
 - move the location of local_scheduler_extension.cc for the adaptation of the changed build system which maybe better support multi-language worker

* minor whitespace

* Linting
2018-05-18 19:09:23 -07:00
Stephanie Wang
71e5cca59f
[xray] Fix bug in updating actor execution dependencies (#2064)
* [xray] FIX: bugs in actor execution

* comments

* Stronger check
2018-05-18 12:45:17 -07:00
Melih Elibol
25e7aa1e79 [xray] Better error messaging when pulling from self. (#2068)
* complain more loudly when object pulls from self.

* Add checks for node manager, and internal checks for object manager.

* linting
2018-05-18 10:26:47 -07:00
Robert Nishihara
15b72f9893 Fix compilation error for RAY_USE_NEW_GCS with latest clang. (#2086) 2018-05-17 23:10:02 -07:00
Melih Elibol
3c245f66d4 [xray] Corrects Error Handling During Push and Pull. (#2059)
* Makes bad status during Pull non-fatal.
Makes a bad status during Push fatal.

* pretty logs

* Stephanie's feedback.
2018-05-17 17:51:55 -07:00
Stephanie Wang
6ca122f723 [xray] Sophisticated task dependency management (#2035) 2018-05-17 17:18:30 -07:00
Stephanie Wang
796864d887
[xray] Lineage cache only requests notifications about remote parent tasks (#2066)
* Only request notifications about a parent task that is remote

* Fix typo

* Fix lineage cache test
2018-05-17 13:01:40 -07:00
Stephanie Wang
88fa98e851
[xray] Fix GCS table prefixes (#2065)
* Fix GCS table prefixes

* More explicit documentation
2018-05-16 13:15:03 -07:00
Stephanie Wang
ad48e47120 Don't crash on duplicate actor notifications (#2043) 2018-05-14 14:26:37 -07:00
Melih Elibol
3ac0c08daa use jobid_nil (#2044) 2018-05-13 14:22:09 -07:00
eric-jj
71997a481b Improve shared_ptr usage (#2030)
[xray] Improve shared_ptr usage
2018-05-11 20:05:04 -07:00
Stephanie Wang
a292d7ba32
[xray] Fix UniqueID hashing for object and task IDs. (#2017)
* Skip object prefix in UniqueIDHasher, choose shard based on hash

* lint
2018-05-10 21:56:12 -07:00
alonamid
32fa862408 add pthread linking (#1986) 2018-05-02 21:50:29 -07:00
eric-jj
34bc6ce6ea remove UniqueIDHasher (#1957)
* remove UniqueIDHasher

* Format the change

* remove unused line

* Fix format

* fix lint error

* fix linting whitespace
2018-04-30 06:31:23 -07:00
Philipp Moritz
af88fdefcf Incorporate C++ Buffer management and Seal global threadpool fix from arrow (#1950) 2018-04-25 22:53:44 -07:00
Philipp Moritz
dad465a2bf [XRay] Add consistency check for protocol between node_manager and local_scheduler_client (#1944) 2018-04-23 23:51:25 -07:00
Melih Elibol
8264e64b18 Handle interrupts correctly for ASIO synchronous reads and writes. (#1929)
* handle interrupts correctly.

* linting

* handle interrupts on read_some/write_some.
2018-04-20 22:55:40 -07:00
Robert Nishihara
cffda73da1 Allow task_table_update to fail when tasks are finished. (#1927)
* Allow task_table_update to fail when tasks are finished.

* Add comment.
2018-04-20 11:34:29 -07:00
Stephanie Wang
aa07f1ce4e [xray] Workers blocked in a ray.get release their resources (#1920)
* [xray] Throttle task dispatch by required resources
* Pass in number of initial workers into raylet command
* Workers blocked in a ray.get release resources
2018-04-18 20:59:58 -07:00
Alexey Tumanov
1c965fcfeb Raylet task dispatch and throttling worker startup (#1912)
* separate task placement and task dispatch; throttle task dispatch with locally available resournces

* keep track of worker's being started/in flight and suppress starting extraneous workers

* cleanup comments

* remove early termination in task dispatch to support zero-resource actor tasks

* info -> debug

* add documentation

* linting

* mock the worker pool for testing

* some linting

* kill all workers in flight; clear the worker pool in dtor

* remove fixed todo

* lint
2018-04-18 10:58:11 -07:00
Eric Liang
7ab890f4a1 [tune] [rllib] Automatically determine RLlib resources and add queueing mechanism for autoscaling (#1848) 2018-04-16 16:58:15 -07:00
Stephanie Wang
2e25972d4d Preemptively push local arguments for actor tasks (#1901) 2018-04-16 16:26:59 -07:00
Melih Elibol
ddfc875149 Multithreading refactor for ObjectManager. (#1911)
* removes transfer service. adds separate pool for sends and receives.

* get rid of send/receive transfer counts.

* update comment.

* remove clang formatting.

* clang formatting.
2018-04-16 15:51:53 -07:00
Melih Elibol
cff37765b1 Addresses missed comments from multichunk object transfer PR. (#1908)
* Move object manager parameters to ray config,
object manager config bug fix.
addresses other comments from #1827.

* linting and uint?

* typos

* remove uint.
2018-04-15 21:35:51 -07:00
Robert Nishihara
6ca2c2a609 Allow numpy arrays to be passed by value into tasks (and inlined in the task spec). (#1816)
* Allow numpy arrays and larger objects to be passed by value in task specifications.

* Fix bug.

* Fix bug. Inline all bug numpy object arrays.

* Increase size limit for inlining args in task spec.

* Give numpy init different signatures in Python 2 and Python 3.

* Simplify code.

* Fix test.

* Use import_array1 instead of import_array.
2018-04-15 20:36:01 -07:00
Stephanie Wang
6bd944ae0d [xray] Lineage cache requests notifications from the GCS about remote tasks (#1834)
* Add PubsubInterface to GCS tables

* Add task table PubsubInterface to lineage cache and tests

* Request notifications for remote tasks in the lineage cache

* Add RegisterGCS method to node manager

* Fix NodeManager member initialization order, subscribe to task table notifications

* Comments

* Use returned statuses.

* Fix double commit bug in lineage cache

* lint

* More linting.

* Fix pure virtual method declarations
2018-04-15 20:16:55 -07:00
Robert Nishihara
3383553dc0 Remove unnecessary calls to .hex() for object IDs. (#1910) 2018-04-15 13:52:51 -07:00
Stephanie Wang
4b655b0ff6 [xray] Turn on flushing to the GCS for the lineage cache (#1907) 2018-04-14 23:40:56 -07:00
Melih Elibol
fcd30444a8 Single Big Object Parallel Transfer. (#1827)
* cache all object info from object added store notification.

* Adds parallel transfer for big objects.

* documentation and clean up.

* compare objects...

* merge buffer_state with chunk vec. Make separate buffer state for get and create.

* use references for Get. Allow partial failure of Create.

* single plasma client.

* changes based on review.

* update documentation and add parameters for object manager in main.cc.

* review feedback.

* use vector consturctor.

* linting

* remove profile visualizations.

* test fixes.

* linting.

* kill specific pids and use less memory.

* linting.

* simplify tests.

* Asynchronous IO for ObjectManager messages and object transfer.

* Revert "Asynchronous IO for ObjectManager messages and object transfer."

This reverts commit 4af43b159babc04daf80d1543e27c2cb46b7b19d.

* update test configuration to reflect changes in #1891

* review feedback.

* linting.
2018-04-14 17:08:19 -07:00
Melih Elibol
6a84b1f26e Remove num_threads as a parameter. (#1891)
* remove num_threads as a parameter.

* linting.

* add additional checks.

* Invoke TransferCompleted on failures.

* Fix issue with failed Gets on store.

* ray check status of writing object headers.

* fix mac issues.
2018-04-14 15:22:59 -07:00
Melih Elibol
6be73350c6 Adds Valgrind tests for multi-threaded object manager. (#1890)
* adds valgrind to new object manager.

* Add some comments.

* Update run_object_manager_valgrind.sh

typo

* Update run_object_manager_tests.sh

* update tests to reflect changes in #1891.

* reduce # tests.
2018-04-13 21:56:12 -07:00
Robert Nishihara
d0fffec2d0 Update arrow and parquet-cpp. (#1875)
* Update arrow.

* Fix bug.

* Cherry-pick commit for fixing parquet segfault.

* Update arrow and revert auto-releasing buffer commit.

* Remove parquet cherry-pick.
2018-04-12 16:17:12 -07:00
Alexey Tumanov
39cf6ff6e1 raylet command line resource configuration plumbing (#1882)
* raylet command line resource configuration plumbing

* Small changes.
2018-04-12 02:37:15 -07:00
Philipp Moritz
834e594709 [XRay] Register object store and raylet with the GCS (#1860) 2018-04-09 18:56:33 -07:00
Robert Nishihara
256389dc59 Use new task spec for computing IDs in raylet code path. (#1830)
* Use new task spec for computing IDs in raylet code path.

* Fix linting.

* Fixes

* Fix test.
2018-04-08 13:31:55 -07:00
Robert Nishihara
0b7ad668ff Fix unused lambda capture compilation error. (#1844)
* Fix unused lambda capture compilation error.

* Fix linting.
2018-04-07 14:54:21 -07:00
Stephanie Wang
bef1d872b4
[xray] Cleanup Raylet processes on exit (#1839)
* Add raylet monitor script to timeout Raylet heartbeats

* Unit test for removing a different client from the client table

* Set node manager heartbeat according to global config

* Doc and fixes

* Add regression test for client table disconnect, refactor client table

* Convert 'Terminate' methods to destructors

* Destroy the Raylet on a SIGTERM

* Clean up workers on a SIGTERM
2018-04-06 17:21:51 -07:00
Melih Elibol
3bf80839cb Remove all runtime errors. (#1840) 2018-04-06 17:20:52 -07:00
Melih Elibol
c7e11e9057 lint fix. (#1842) 2018-04-06 13:28:52 -07:00
Melih Elibol
24a8cede88
Cache object info from store notification. (#1815)
Cache all object info from object added store notification & submit to GCS via object directory.
2018-04-06 02:33:23 -07:00
Stephanie Wang
bf194db4bc [xray] Basic actor support (#1835) 2018-04-06 00:17:14 -07:00
Melih Elibol
313b864e66
disconnect bug fix. (#1837) 2018-04-05 22:10:51 -07:00
Stephanie Wang
cbf3181fd2 [xray] Monitor for Raylet processes (#1831)
* Add raylet monitor script to timeout Raylet heartbeats

* Unit test for removing a different client from the client table

* Set node manager heartbeat according to global config

* Doc and fixes

* Add regression test for client table disconnect, refactor client table

* Fix linting.
2018-04-05 20:45:38 -07:00
Alexey Tumanov
5a9e83761d fix unused-lambda-capture on clang version 9.1 (#1823)
* fix unused-lambda-capture on clang9.1

* unused lambda capture fix continued

* lambda capture: NM

* lambda capture

* Fix linting.
2018-04-04 11:04:10 -07:00
Robert Nishihara
e0193a5501 Print backtrace for RAY_LOG(FATAL) and also add file and line number … (#1805)
* Print backtrace for RAY_LOG(FATAL) and also add file and line number in common case.

* Fix linting.
2018-04-03 10:12:46 -07:00
Robert Nishihara
fbfbb1c079 [xray] Integrate worker.py with raylet. (#1810)
* Integrate worker with raylet.

* Begin allowing worker to attach to cluster.

* Fix linting and documentation.

* Fix linting.

* Comment tests back in.

* Fix type of worker command.

* Remove xray python files and tests.

* Fix from rebase.

* Add test.

* Copy over raylet executable.

* Small cleanup.
2018-04-03 02:38:56 -07:00