Philipp Moritz
25f0094ee4
Fix copying the plasma fbs directory from arrow ( #2579 )
2018-08-07 00:04:37 -07:00
Yuhong Guo
d35ce7fa63
Use real callback index in subscribe_callback_index_ ( #2473 )
2018-08-06 15:29:56 -07:00
Alexey Tumanov
85b8b2a395
mark all remaining placeable tasks pending with task dependency manager ( #2528 )
2018-08-06 13:08:11 -07:00
Melih Elibol
34d3a46f48
[xray] Revert dynamic chunk size optimization for ObjectManager. ( #2557 )
...
* Revert dynamic chunk size optimization.
* fix mac build issues.
2018-08-05 02:09:37 -07:00
Wang Qing
e4f68ff8cf
[Java Worker] Support raylet on Java ( #2479 )
2018-08-01 17:52:49 -07:00
Zhijun Fu
ca36827f01
[Issues 2403][xray] Fix raylet performance issues on scheduling queue ( #2438 )
...
* merge from ray
* Revert "merge from ray"
This reverts commit 32b181ebbb1fa184026631e1a7368112c4c3118d.
* fix raylet performance regression
* address comments
* Update code after merging latest changes
* fix lint
* address comments
2018-08-01 14:41:20 -07:00
Stephanie Wang
e90ecef297
[xray] Try to flush children of a task that is evicted from the lineage cache ( #2531 )
2018-08-01 00:23:02 -07:00
Stephanie Wang
a45f9cfafc
[xray] Implement task lease table, logic for deciding when to reconstruct a task ( #2497 )
2018-07-30 14:42:28 -07:00
Ion
80db69d245
State transition diagram documentation. ( #2502 )
...
* Added description of transition diagram and a few name changes for imporved clarity.
* rename some methods and update task_states.rst
2018-07-28 22:28:45 -07:00
Robert Nishihara
2be1ccbd8f
Raise application-level exceptions for some failure scenarios. ( #2429 )
...
* Raise application level exception for actor methods that can't be executed and failed tasks.
* Retry task forwarding for actor tasks.
* Small cleanups
* Move constant to ray_config.
* Create ForwardTaskOrResubmit method.
* Minor
* Clean up queued tasks for dead actors.
* Some cleanups.
* Linting
* Notify task_dependency_manager_ about failed tasks.
* Manage timer lifetime better.
* Use smart pointers to deallocate the timer.
* Fix
* add comment
2018-07-27 19:53:30 -04:00
Stephanie Wang
6675361684
[xray] Track ray.get
calls as task dependencies ( #2362 )
2018-07-27 11:59:17 -07:00
Zhijun Fu
9ad6a973a0
[xray] lineage optimization: avoid unnecessary lineage entry allocation & free ( #2463 )
...
* merge from ray
* Revert "merge from ray"
This reverts commit 32b181ebbb1fa184026631e1a7368112c4c3118d.
* [xray] avoid unnecessary lineage entry allocation & free
* address comments
* address review comments
* address comments
2018-07-26 10:44:38 -04:00
Yuhong Guo
b35ce5dbf1
Update Arrow Package with breaking changes ( #2440 )
...
* Merge the breaking change of Arrow Package.
* Fix typo
* Fix lint.
* put forward declarations into header
* fix
* add protocol.h
* fix linting
2018-07-25 14:28:33 -07:00
Philipp Moritz
e821f852ef
[xray] Silence some object manager logging ( #2437 )
2018-07-20 13:10:03 -07:00
Robert Nishihara
eed39163f9
Add callback to node manager for client removed event. ( #2417 )
...
* Add callback to node manager for client removed event.
* Fix linting.
2018-07-18 16:59:04 -07:00
Philipp Moritz
4c82ac72df
Upgrade arrow to include the plasma TensorFlow op ( #2412 )
2018-07-18 12:33:02 -07:00
Yuhong Guo
206254bcf3
Add const to to_plasma_id function to make it usable by const ObjectID ( #2404 )
...
* Add const to to_plasma_id to make it usable by const ObjectID
* Separate the building script to another PR.
2018-07-16 11:05:29 -07:00
Hao Chen
c1575e98c1
Make local scheduler client thread-safe ( #2386 )
...
* Make local scheduler client thread-safe for python
* lock write_messages
* remove allow-threads
* fix linter
* rename _write_message to do_write_message
2018-07-13 16:19:00 -07:00
Philipp Moritz
fbde8cad74
Update apache arrow to include TensorFlow fix ( #2345 )
2018-07-06 13:18:56 -07:00
Stephanie Wang
5b7475a2e0
[xray] Unsubscribe to task dependencies when task starts execution ( #2354 )
...
* Add back call to unsubscribe to task dependencies
* fix
2018-07-05 21:08:58 -07:00
Stephanie Wang
c50f1966e0
Publish a notification for empty keys in the GCS ( #2347 )
...
* Publish an empty notification for empty keys
* Add failure callback to Table::Subscribe, add unit test for new behavior
2018-07-05 13:39:07 -07:00
Robert Nishihara
b90e551b41
[xray] Implement timeline and profiling API. ( #2306 )
...
* Add profile table and store profiling information there.
* Code for dumping timeline.
* Improve color scheme.
* Push timeline events on driver only for raylet.
* Improvements to profiling and timeline visualization
* Some linting
* Small fix.
* Linting
* Propagate node IP address through profiling events.
* Fix test.
* object_id.hex() should return byte string in python 2.
* Include gcs.fbs in node_manager.fbs.
* Remove flatbuffer definition duplication.
* Decode to unicode in Python 3 and bytes in Python 2.
* Minor
* Submit profile events in a batch. Revert some CMake changes.
* Fix
* Workaround test failure.
* Fix linting
* Linting
* Don't return anything from chrome_tracing_dump when filename is provided.
* Remove some redundancy from profile table.
* Linting
* Move TODOs out of docstring.
* Minor
2018-07-04 23:23:48 -07:00
Zongheng Yang
ba28dddf6f
Make xray object table credis-managed and hence flushable. ( #2338 )
...
* monitor.py: issue flushes to data shard
* ResultTableAdd & ObjectTableAdd: add credis-managed versions
* Fix return codes
* Credis-manage xray object table & associated ray.table_append cmd
* Fix incorrect return code from TableAppend_DoWrite()
* Revert "ResultTableAdd & ObjectTableAdd: add credis-managed versions"
This reverts commit 628c2ea190df4c861dda0c284fab7ca6faa1ea24.
* Address comments
* Lint: fix indent
* Address comment
2018-07-03 17:32:44 -07:00
Philipp Moritz
f21d783e6d
Remove new gcs code from legacy Ray codepath ( #2329 )
2018-07-03 11:48:50 -07:00
Peter Schafhalter
bb1d7eaece
Replenish workers for disconnected actors ( #2307 )
2018-07-02 08:26:10 -07:00
Philipp Moritz
762bdf646e
[xray] Put GCS data into the redis data shard ( #2298 )
2018-06-30 15:42:10 -10:00
Alexey Tumanov
965e182384
[xray] raylet task queue transition discipline ( #2302 )
...
* add queueing interface to move tasks between queues internally
* queueing discipline change: ready->waiting->scheduled->running
* rename task states : ready -> placeable; update documentation
* rename task states : scheduled -> ready; update documentation
* cleanup comments
* cleanup; transition placeable actor tasks
* minor comment cleanup
* addressing comments
* linting
2018-06-27 14:23:41 -07:00
Yuhong Guo
aa42331844
Fix build failure while using make -j1. Issue 2257 ( #2279 )
...
* Fix build failure while using make -j1
* Fix java test failure
2018-06-21 15:18:00 -07:00
Robert Nishihara
ff2217251f
[xray] Add error table and push error messages to driver through node manager. ( #2256 )
...
* Fix documentation indentation.
* Add error table to GCS and push error messages through node manager.
* Add type to error data.
* Linting
* Fix failure_test bug.
* Linting.
* Enable one more test.
* Attempt to fix doc building.
* Restructuring
* Fixes
* More fixes.
* Move current_time_ms function into util.h.
2018-06-20 21:29:28 -07:00
Zongheng Yang
8190ff1fd0
Experimental: enable automatic GCS flushing with configurable policy. ( #2266 )
...
* build_credis.sh: use an up-to-date credis commit.
* build_credis.sh: leveldb is updated, so update build cmds for it
* WIP: make monitor.py issue flush; switch gcs client to use credis
* Experimental: enable automatic GCS flushing with configurable policy.
* Fix linux compilation error
* Fix leveldb build
* Use optimized build for credis
* Address comments
* Attempt to fix tests
2018-06-20 14:40:57 -07:00
Melih Elibol
60bc3a014f
[xray] Sets good object manager defaults. ( #2255 )
...
* better object manager defaults. added max for number of chunks.
* change source of cores.
2018-06-20 14:10:57 -07:00
Yuhong Guo
51744459f3
Mitigate randomly building failure: adding gen_local_scheduler_fbs to raylet lib. ( #2271 )
2018-06-19 15:29:57 -07:00
Hao Chen
8efd0f7b1b
[xray] support multi-workers per process ( #2244 )
...
* support multi-workers per process
Signed-off-by: Hao Chen <chenh1024@gmail.com>
* use RayConfig
Signed-off-by: Hao Chen <chenh1024@gmail.com>
* fix
Signed-off-by: Hao Chen <chenh1024@gmail.com>
* fix
* remove clear
* address comments
* fix lint
* fix bug
* make WorkerPool and WorkerPoolMock more consistent
2018-06-13 10:14:05 -07:00
Robert Nishihara
61139e1509
Enable fractional resources and resource IDs for xray. ( #2187 )
...
* Implement GPU IDs and fractional resources.
* Add documentation and python exceptions.
* Fix signed/unsigned comparison.
* Fix linting.
* Fixes from rebase.
* Re-enable tests that use ray.wait.
* Don't kill the raylet if an infeasible task is submitted.
* Ignore tests that require better load balancing.
* Linting
* Ignore array test.
* Ignore stress test reconstructions tests.
* Don't kill node manager if remote node manager disconnects.
* Ignore more stress tests.
* Naming changes
* Remove outdated todo
* Small fix
* Re-enable test.
* Linting
* Fix resource bookkeeping for blocked tasks.
* Fix linting
* Fix Java client.
* Ignore test
* Ignore put error tests
2018-06-10 15:31:43 -07:00
Philipp Moritz
4ec5bea03b
[xray] Implement fetch ( #2195 )
2018-06-09 23:36:27 -07:00
Stephanie Wang
cb5e6e6d68
Add dependency between copy_ray and python extensions ( #2221 )
2018-06-08 20:41:54 -07:00
Yuhong Guo
0a34bea0b0
Use scoped enums in C++ and flatbuffers. ( #2194 )
...
* Enable --scoped-enums in flatbuffer compiler.
* Change enum to c++11 style (enum class).
* Resolve conflicts.
* Solve building failure when RAY_USE_NEW_GCS=on and remove ERROR_INDEX suffix.
* Merge with master and fix CI failure.
2018-06-07 01:01:21 -07:00
Hao Chen
f0907a6ee9
Optimize lineage eviction efficiency ( #2196 )
...
* Java in vscode.
* Optimize lineage eviction
* minor fix
* fix ut
* fix comment and lint
* format
* format
* remove unneeded code
2018-06-07 00:35:15 -07:00
Philipp Moritz
343f29801b
[xray] Fix compilation on mac ( #2199 )
2018-06-06 22:33:46 -07:00
Melih Elibol
7246ff80a4
[xray] Implements ray.wait ( #2162 )
...
Implements ray.wait for xray. Fixes #1128 .
2018-06-06 16:56:44 -07:00
songqing
451cdb43f6
Fix redefinition of flatbuffer types ( #2189 )
2018-06-05 00:08:05 -07:00
Philipp Moritz
d699bfbf10
Use hashing function that takes into account all UniqueID bytes ( #2174 )
2018-06-01 23:07:29 -07:00
Philipp Moritz
e1024d84e9
[xray] Start actor workers in parallel ( #2168 )
2018-06-01 23:04:16 -07:00
songqing
4dd4698564
unify build dir for Python and Java ( #2171 )
...
* unify build dir for Python and Java
* enable executables auto installed when just running 'make'
* fix plasma_store copy error
* fix cmake error about copying executables
* lint fix
* recover python/setup.py
* enable to copy optional file automatically
* a small fix of path
* lint fix
* lint fix
* lint fix
* Add comment.
2018-06-01 16:28:27 -07:00
Yuhong Guo
c1de03acac
Add timeout mechanism to Push function instead of retries ( #2148 )
...
Use timer instead of retries in Push when objects are not local.
2018-06-01 01:21:05 -07:00
Stephanie Wang
117107cb15
[xray] Evict tasks from the lineage cache ( #2152 )
2018-05-31 00:24:39 -07:00
Robert Nishihara
6172f94c04
Implement Python global state API for xray. ( #2125 )
...
* Implement global state API for xray.
* Fix object table.
* Fixes for log structure.
* Implement cluster_resources.
* Add driver task to task table.
* Remove python flatbuffers code
* Get some global state API tests running.
* Python linting.
* Fix linting.
* Fix mock modules for doc
* Copy over flatbuffer bindings.
* Fix for tests.
* Linting
* Fix monitor crash.
2018-05-29 16:25:54 -07:00
Stephanie Wang
166000b089
[xray] Improve flush algorithm for the lineage cache ( #2130 )
...
* Private method to flush a single task from the lineage cache
* Track parent->child relationships for faster flushing
* doc
* Only flush the newly ready task
* Flush() returns void
* x
2018-05-28 21:03:15 -07:00
caopeng428
bb8bfce403
bugfix: use array redis_primary_addr out of its scope ( #2139 )
2018-05-25 21:40:23 -07:00
Yuhong Guo
a8517cc82a
Fix infinite retry in Push function. ( #2133 )
2018-05-25 01:16:44 -07:00