Hao Chen
8efd0f7b1b
[xray] support multi-workers per process ( #2244 )
...
* support multi-workers per process
Signed-off-by: Hao Chen <chenh1024@gmail.com>
* use RayConfig
Signed-off-by: Hao Chen <chenh1024@gmail.com>
* fix
Signed-off-by: Hao Chen <chenh1024@gmail.com>
* fix
* remove clear
* address comments
* fix lint
* fix bug
* make WorkerPool and WorkerPoolMock more consistent
2018-06-13 10:14:05 -07:00
Robert Nishihara
61139e1509
Enable fractional resources and resource IDs for xray. ( #2187 )
...
* Implement GPU IDs and fractional resources.
* Add documentation and python exceptions.
* Fix signed/unsigned comparison.
* Fix linting.
* Fixes from rebase.
* Re-enable tests that use ray.wait.
* Don't kill the raylet if an infeasible task is submitted.
* Ignore tests that require better load balancing.
* Linting
* Ignore array test.
* Ignore stress test reconstructions tests.
* Don't kill node manager if remote node manager disconnects.
* Ignore more stress tests.
* Naming changes
* Remove outdated todo
* Small fix
* Re-enable test.
* Linting
* Fix resource bookkeeping for blocked tasks.
* Fix linting
* Fix Java client.
* Ignore test
* Ignore put error tests
2018-06-10 15:31:43 -07:00
Philipp Moritz
4ec5bea03b
[xray] Implement fetch ( #2195 )
2018-06-09 23:36:27 -07:00
Stephanie Wang
cb5e6e6d68
Add dependency between copy_ray and python extensions ( #2221 )
2018-06-08 20:41:54 -07:00
Yuhong Guo
0a34bea0b0
Use scoped enums in C++ and flatbuffers. ( #2194 )
...
* Enable --scoped-enums in flatbuffer compiler.
* Change enum to c++11 style (enum class).
* Resolve conflicts.
* Solve building failure when RAY_USE_NEW_GCS=on and remove ERROR_INDEX suffix.
* Merge with master and fix CI failure.
2018-06-07 01:01:21 -07:00
Hao Chen
f0907a6ee9
Optimize lineage eviction efficiency ( #2196 )
...
* Java in vscode.
* Optimize lineage eviction
* minor fix
* fix ut
* fix comment and lint
* format
* format
* remove unneeded code
2018-06-07 00:35:15 -07:00
Philipp Moritz
343f29801b
[xray] Fix compilation on mac ( #2199 )
2018-06-06 22:33:46 -07:00
Melih Elibol
7246ff80a4
[xray] Implements ray.wait ( #2162 )
...
Implements ray.wait for xray. Fixes #1128 .
2018-06-06 16:56:44 -07:00
songqing
451cdb43f6
Fix redefinition of flatbuffer types ( #2189 )
2018-06-05 00:08:05 -07:00
Philipp Moritz
d699bfbf10
Use hashing function that takes into account all UniqueID bytes ( #2174 )
2018-06-01 23:07:29 -07:00
Philipp Moritz
e1024d84e9
[xray] Start actor workers in parallel ( #2168 )
2018-06-01 23:04:16 -07:00
songqing
4dd4698564
unify build dir for Python and Java ( #2171 )
...
* unify build dir for Python and Java
* enable executables auto installed when just running 'make'
* fix plasma_store copy error
* fix cmake error about copying executables
* lint fix
* recover python/setup.py
* enable to copy optional file automatically
* a small fix of path
* lint fix
* lint fix
* lint fix
* Add comment.
2018-06-01 16:28:27 -07:00
Yuhong Guo
c1de03acac
Add timeout mechanism to Push function instead of retries ( #2148 )
...
Use timer instead of retries in Push when objects are not local.
2018-06-01 01:21:05 -07:00
Stephanie Wang
117107cb15
[xray] Evict tasks from the lineage cache ( #2152 )
2018-05-31 00:24:39 -07:00
Robert Nishihara
6172f94c04
Implement Python global state API for xray. ( #2125 )
...
* Implement global state API for xray.
* Fix object table.
* Fixes for log structure.
* Implement cluster_resources.
* Add driver task to task table.
* Remove python flatbuffers code
* Get some global state API tests running.
* Python linting.
* Fix linting.
* Fix mock modules for doc
* Copy over flatbuffer bindings.
* Fix for tests.
* Linting
* Fix monitor crash.
2018-05-29 16:25:54 -07:00
Stephanie Wang
166000b089
[xray] Improve flush algorithm for the lineage cache ( #2130 )
...
* Private method to flush a single task from the lineage cache
* Track parent->child relationships for faster flushing
* doc
* Only flush the newly ready task
* Flush() returns void
* x
2018-05-28 21:03:15 -07:00
caopeng428
bb8bfce403
bugfix: use array redis_primary_addr out of its scope ( #2139 )
2018-05-25 21:40:23 -07:00
Yuhong Guo
a8517cc82a
Fix infinite retry in Push function. ( #2133 )
2018-05-25 01:16:44 -07:00
Yujie Liu
5c2b2c7b49
[JavaWorker] Changes to the directory under src for support java worker ( #2093 )
...
* Changes to the directory under src for support java worker
--------------------------
This commit includes changes to the directory under src, which is part of the java worker support of Ray.
It consists of the following changes:
src/common/task.cc - just fix null point problem
org_ray_spi_impl_DefaultLocalSchedulerClient.* - JNI support for local scheduler client, and the org_ray_spi_impl_DefaultLocalSchedulerClient.cc file is not autogenerated
2018-05-25 00:59:05 -07:00
Zongheng Yang
fa97acbc89
Integrate credis with Ray & route task table entries into credis. ( #1841 )
2018-05-24 23:35:25 -07:00
Philipp Moritz
225608ec66
Update arrow to latest master ( #2100 )
2018-05-24 00:26:13 -07:00
yuyiming
9ff3d57429
do not fetch from dead Plasma Manager ( #2116 )
2018-05-23 16:13:09 -07:00
Robert Nishihara
9b9ff19dd0
Use automatic memory management in Redis modules. ( #1797 )
2018-05-22 01:05:09 -07:00
eric-jj
eb078766d8
Performance fix ( #2110 )
2018-05-20 18:07:55 -07:00
Kunal Gosar
eba73449cc
fix unused lambda capture ( #2102 )
2018-05-19 13:27:10 -07:00
Melih Elibol
f1da721522
[xray] Use pubsub instead of timeout for ObjectManager Pull. ( #2079 )
...
Use pubsub instead of timeout for Pull.
2018-05-18 21:35:12 -07:00
Yujie Liu
5918776dd4
[JavaWorker] Changes to the build system for support java worker ( #2092 )
...
* Changes to the build system for support java worker
--------------------------
This commit includes changes to the build system, which is part of the java worker support of Ray.
It consists of the following changes:
- the changes of CMakeLists.txt files
- the changes of the python setup.py and init files for the adaptation of the changed build system
- move the location of local_scheduler_extension.cc for the adaptation of the changed build system which maybe better support multi-language worker
* minor whitespace
* Linting
2018-05-18 19:09:23 -07:00
Stephanie Wang
71e5cca59f
[xray] Fix bug in updating actor execution dependencies ( #2064 )
...
* [xray] FIX: bugs in actor execution
* comments
* Stronger check
2018-05-18 12:45:17 -07:00
Melih Elibol
25e7aa1e79
[xray] Better error messaging when pulling from self. ( #2068 )
...
* complain more loudly when object pulls from self.
* Add checks for node manager, and internal checks for object manager.
* linting
2018-05-18 10:26:47 -07:00
Robert Nishihara
15b72f9893
Fix compilation error for RAY_USE_NEW_GCS with latest clang. ( #2086 )
2018-05-17 23:10:02 -07:00
Melih Elibol
3c245f66d4
[xray] Corrects Error Handling During Push and Pull. ( #2059 )
...
* Makes bad status during Pull non-fatal.
Makes a bad status during Push fatal.
* pretty logs
* Stephanie's feedback.
2018-05-17 17:51:55 -07:00
Stephanie Wang
6ca122f723
[xray] Sophisticated task dependency management ( #2035 )
2018-05-17 17:18:30 -07:00
Stephanie Wang
796864d887
[xray] Lineage cache only requests notifications about remote parent tasks ( #2066 )
...
* Only request notifications about a parent task that is remote
* Fix typo
* Fix lineage cache test
2018-05-17 13:01:40 -07:00
Stephanie Wang
88fa98e851
[xray] Fix GCS table prefixes ( #2065 )
...
* Fix GCS table prefixes
* More explicit documentation
2018-05-16 13:15:03 -07:00
Stephanie Wang
ad48e47120
Don't crash on duplicate actor notifications ( #2043 )
2018-05-14 14:26:37 -07:00
Melih Elibol
3ac0c08daa
use jobid_nil ( #2044 )
2018-05-13 14:22:09 -07:00
eric-jj
71997a481b
Improve shared_ptr usage ( #2030 )
...
[xray] Improve shared_ptr usage
2018-05-11 20:05:04 -07:00
Stephanie Wang
a292d7ba32
[xray] Fix UniqueID hashing for object and task IDs. ( #2017 )
...
* Skip object prefix in UniqueIDHasher, choose shard based on hash
* lint
2018-05-10 21:56:12 -07:00
alonamid
32fa862408
add pthread linking ( #1986 )
2018-05-02 21:50:29 -07:00
eric-jj
34bc6ce6ea
remove UniqueIDHasher ( #1957 )
...
* remove UniqueIDHasher
* Format the change
* remove unused line
* Fix format
* fix lint error
* fix linting whitespace
2018-04-30 06:31:23 -07:00
Philipp Moritz
af88fdefcf
Incorporate C++ Buffer management and Seal global threadpool fix from arrow ( #1950 )
2018-04-25 22:53:44 -07:00
Philipp Moritz
dad465a2bf
[XRay] Add consistency check for protocol between node_manager and local_scheduler_client ( #1944 )
2018-04-23 23:51:25 -07:00
Melih Elibol
8264e64b18
Handle interrupts correctly for ASIO synchronous reads and writes. ( #1929 )
...
* handle interrupts correctly.
* linting
* handle interrupts on read_some/write_some.
2018-04-20 22:55:40 -07:00
Robert Nishihara
cffda73da1
Allow task_table_update to fail when tasks are finished. ( #1927 )
...
* Allow task_table_update to fail when tasks are finished.
* Add comment.
2018-04-20 11:34:29 -07:00
Stephanie Wang
aa07f1ce4e
[xray] Workers blocked in a ray.get
release their resources ( #1920 )
...
* [xray] Throttle task dispatch by required resources
* Pass in number of initial workers into raylet command
* Workers blocked in a ray.get release resources
2018-04-18 20:59:58 -07:00
Alexey Tumanov
1c965fcfeb
Raylet task dispatch and throttling worker startup ( #1912 )
...
* separate task placement and task dispatch; throttle task dispatch with locally available resournces
* keep track of worker's being started/in flight and suppress starting extraneous workers
* cleanup comments
* remove early termination in task dispatch to support zero-resource actor tasks
* info -> debug
* add documentation
* linting
* mock the worker pool for testing
* some linting
* kill all workers in flight; clear the worker pool in dtor
* remove fixed todo
* lint
2018-04-18 10:58:11 -07:00
Eric Liang
7ab890f4a1
[tune] [rllib] Automatically determine RLlib resources and add queueing mechanism for autoscaling ( #1848 )
2018-04-16 16:58:15 -07:00
Stephanie Wang
2e25972d4d
Preemptively push local arguments for actor tasks ( #1901 )
2018-04-16 16:26:59 -07:00
Melih Elibol
ddfc875149
Multithreading refactor for ObjectManager. ( #1911 )
...
* removes transfer service. adds separate pool for sends and receives.
* get rid of send/receive transfer counts.
* update comment.
* remove clang formatting.
* clang formatting.
2018-04-16 15:51:53 -07:00
Melih Elibol
cff37765b1
Addresses missed comments from multichunk object transfer PR. ( #1908 )
...
* Move object manager parameters to ray config,
object manager config bug fix.
addresses other comments from #1827 .
* linting and uint?
* typos
* remove uint.
2018-04-15 21:35:51 -07:00