Commit graph

1082 commits

Author SHA1 Message Date
mehrdadn
75cc994e0a Update various build options relating to Windows (#6315)
* Update .bazelrc for Windows compatibility

* Block inclusion of (legacy) WinSock.h to avoid errors

* Suppress warnings for Windows code

* Include boost::asio in includes so that it is passed as -isystem to avoid warnings

* Link with -lpthread only on non-Windows

* Undefine BOOST_FALLTHROUGH, which is unnecessary and causes macro redefinition warnings

* Define RAY_STATIC and ARROW_STATIC to compile for Windows

* Add WinSock import library for Arrow
2019-12-01 15:05:50 -08:00
mehrdadn
10d49a3f6f Use Boost's socket_holder instead of manually managing the socket (#6314)
* Use Boost's socket_holder instead of manually managing sockets.

Socket types are not ints on Windows, and we need to use wrapper for proper lifetime management regardless.
2019-12-01 13:27:52 -08:00
fangfengbin
7275556365 Reconstruct local dead actors immediately instead of waiting for initial_reconstruction_timeout_ms (#6243) 2019-11-30 18:03:48 +08:00
mehrdadn
e28e464158 Convert io_service_ from reference to smart pointer (#6285) 2019-11-29 16:09:46 -08:00
mehrdadn
b8cfdba752 Bazelify hiredis (#6203) 2019-11-29 15:32:45 -08:00
Eric Liang
b7b655c851
Also use NotifyDirectCallTaskBlock/Unblocked for plasma store accesses (#6249)
* wip

* fix it

* lint

* wip

* fix

* unblock

* flaky

* use fetch only flag

* Revert "use fetch only flag"

This reverts commit 56e938a0ee2024f5c99c9ab2d55fd35558fb15e1.

* restore error resolution

* use worker task id

* proto comments

* fix if
2019-11-27 22:46:15 -08:00
Stephanie Wang
31a0b11e16 Revert SubmitTask over grpc, use RayletConnection instead (#6305)
* Revert SubmitTask over grpc

* comment
2019-11-27 19:28:12 -06:00
Stephanie Wang
2797c11b69
[direct task] For serialized object IDs, check with owner before declaring object unreconstructable (#6286)
* Track borrowed vs owned objects

* Serialize owner address with object ID

* serialize owner task id

* Deserialize object IDs

* Pass direct task ID instead of plasma ID

* it works

* Fix ref count test

* Add unit test

* update warning

* we own ray.put objects

* missing file

* doc

* Fix unit test

* comments

* Fix py2

* lint

* update
2019-11-27 15:31:44 -08:00
Edward Oakes
8622559e0c
Use one queue per resource shape in direct task transport (#6277) 2019-11-26 20:56:05 -06:00
Eric Liang
30b2fc1d81
Fix actor creation hang due to race in SWAP queue (#6280) 2019-11-26 15:21:03 -08:00
Stephanie Wang
f6a0408173
Track pending tasks with TaskManager (#6259)
* TaskStateManager to track and complete pending tasks

* Convert actor transport to use task state manager

* Refactor direct actor transport to use TaskStateManager

* rename

* Unit test

* doc

* IsTaskPending

* Fix?

* Shared ptr

* HUH?

* Update src/ray/core_worker/task_manager.cc

Co-Authored-By: Zhijun Fu <37800433+zhijunfu@users.noreply.github.com>

* Revert "HUH?"

This reverts commit f80f0ba204ff4da5e0b03191fa0d5a4d9f552434.

* Fix memory issue

* oops
2019-11-25 16:37:26 -08:00
mehrdadn
ed5154d7fe Modify RayLogLevel to avoid conflicts with DEBUG macro and ERROR macros that are defined externally (#6204)
* Prevent name collision of ERROR macro from Windows with RayLogLevel::ERROR
2019-11-25 17:02:26 -07:00
Eric Liang
64a3a7239e
Set RAY_FORCE_DIRECT=1 for run_rllib_tests, test_basic (#6171) 2019-11-25 14:12:11 -08:00
Edward Oakes
c9314098b9
Implement direct task worker lease timeouts (#6188) 2019-11-25 14:48:19 -07:00
Eric Liang
7917bbef78
Set progress report interval for bazel explicitly (#6262)
* set progress internval

* add keep alive

* add keepalive

* remove cat

* smaller time

* squash error

* reduce log spam
2019-11-24 22:37:59 -08:00
Simon Mo
aa8d5d2f6c
Rate limit asyncio actor (#6242) 2019-11-24 11:39:28 -08:00
Stephanie Wang
d2662fecea
Miscellaneous bug fixes to throw unreconstructable errors for direct calls (#6245)
* Test cases

* Fix InPlasmaError

* raylet fixes to force errors for direct calls

* Disable lineage logging and task pending checks for direct calls

* move todo

* Clean up tests

* Fix bugs in object store for Contains and Delete

* Use direct call in tests

* Fixes, separate actor creation direct call from normal direct call spec
2019-11-23 15:05:49 -08:00
Stephanie Wang
c4fa3b3afb
fix (#6251) 2019-11-23 15:04:48 -08:00
Eric Liang
ea270495a1
Remove stray change (#6247) 2019-11-23 00:07:45 -08:00
Edward Oakes
ae5abc48a9
Fix race condition in redis_async_context.cc (#6231)
* dispatch callback to backend thread

* tmp: test in loop

* compiling

* Works using shared_ptrs

* Revert "tmp: test in loop"

This reverts commit faf1f8f74b34a99396906f56827d2691472ae7d4.

* Copy into CallbackReply

* fix comment

* warning

* add nil case
2019-11-22 15:51:40 -08:00
Ion
68ac08332b Initial commit of new cluster resource scheduler (#6178) 2019-11-22 11:14:46 -08:00
Stephanie Wang
d3227f2f2d
Fix bug in direct task calls for objects that were evicted (#6216)
* Fix bug and add some checks

* rename
2019-11-21 15:38:31 -08:00
Stephanie Wang
eb7b73d731
Disconnect direct task workers that died (#6213)
* Disconnect workers that died so that we push the worker died error to redis

* Push error if actor is non nil

* fix test
2019-11-21 15:37:15 -08:00
Simon Mo
29ba6bfc64
Basic Async Actor Call (#6183)
* Start trying to figure out where to put fibers

* Pass is_async flag from python to context

* Just running things in fiber works

* Yield implemented, need some debugging to make it work

* It worked!

* Remove debug prints

* Lint

* Revert the clang-format

* Remove unnecessary log

* Remove unncessary import

* Add attribution

* Address comment

* Add test

* Missed a merge conflict

* Make test pass and compile

* Address comment

* Rename async -> asyncio

* Move async test to py3 only

* Fix ignore path
2019-11-21 11:56:46 -08:00
Eric Liang
7f52d019ca
Inline memory_store_provider into memory_store (#6217) 2019-11-21 10:13:53 -08:00
Eric Liang
1f9ab74293
Fix hang on Ray shutdown (#6201) 2019-11-20 23:30:35 -08:00
Eric Liang
425edb5cd9
Support NotifyBlocked/UnBlocked for direct call tasks (#6177) 2019-11-20 22:07:12 -08:00
mehrdadn
95bf977839 Rename UpdateResource due to conflict with Windows (#6205)
* Rename UpdateResource due to conflict with Windows

* Rename UpdateResource_ to UpdateResourceCapacity
2019-11-20 20:44:13 -08:00
Stephanie Wang
c0be9e6738
Resolve dependencies locally before submitting direct actor tasks (#6191)
* Priority queue in direct actor transport by task number

* Move LocalDependencyResolver out to separate file, share with direct actor transport

* works

* Test case for ordering

* Cleanups

* Remove priority queue

* comment

* Share ClientFactoryFn with direct actor transport

* Unit test

* fix
2019-11-20 16:45:19 -08:00
micafan
e7dbafa000 fix gcs::RedisAsioClient non-thread safe (#5946) 2019-11-20 10:18:35 -08:00
Eric Liang
23ef58716d
Fix crash on sys.exit of direct task calls (#6202) 2019-11-19 21:30:48 -08:00
ashione
a1744f67fe Add hostname to nodeinfo(#6156) 2019-11-19 15:03:46 +08:00
Danyang Zhuo
4f583ec784 Improve Object Transfer Performance (#6067) 2019-11-18 14:40:34 -08:00
Stephanie Wang
66edebce3a
Spillback scheduling for direct task calls (#6164)
* add dac

* remove cachign

* rename return buffer

* cleanup

* add tests

* add perf

* fix

* flip

* remove

* remove it

* lint

* remove fork safety

* lint

* comments

* s/core/client

* wip

* remove

* fmt

* consistently return direct naming

* basic pass by ref

* fix bugs

* wip

* wip

* wip

* wip

* add test

* works now

* fix constructor

* fix merge

* add todo for perf

* fix single client test

* use lower n

* bazel

* faster

* fix core worker test

* init

* fix tests

* no plasma for direct call

* Update worker.py

* add order test

* fixes

* comments

* remove old assert

* lint

* add test

* Very wip

* wip

* add options for tasks

* add test

* fmt

* add backpressure

* remove idle prof event

* lint

* Fix 0 returns

* Set memcopy threads globally

* add benchmark

* Fix object exists

* Fix reference

* Remove return_buffer

* Add check

* add exit handler

* update benchmarks

* Fix compile error

* Fix NoReturn

* Use is instead of == for NoReturn

* fix

* Remove list comprehension

* Fix core worker test

* comment

* Apply suggestions from code review

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* fix merge error

* lint

* wip

* fix merge

* wip

* finish

* lint

* task interface

* add file

* add

* wip

* now works!

* updated

* wip

* dep resolution

* remove remote dep handling

* comments

* fix test_multithreading

* fix merge

* fix exit handling

* fix merge

* comments

* get fallback fetch working

* handle contains

* fix typo

* Skeleton for SubmitTask proto

* Update src/ray/common/id.h

Co-Authored-By: Stephanie Wang <swang@cs.berkeley.edu>

* comments

* rename to core worker service

* lint

* fix compile

* wip

* update

* error code

* fix up and rename

* clean up call manager

* comments

* add test and cleanup deserialization

* fix pickle

* fix comments, lint

* test todo

* comments

* use shared ptr

* rename

* Update src/ray/protobuf/gcs.proto

Co-Authored-By: Stephanie Wang <swang@cs.berkeley.edu>

* require transport type for ids; lint

* cleanup

* comments 1

* use worker available for real

* wip

* fix test

* resolve local dependencies test

* add num pending metric

* client factory

* unit test task submission

* wip

* fix bug

* rename

* Pass through node manager port, connect in raylet client

* finish rename

* Switch submit task to grpc

* fix crash

* Check port in use

* fix merge

* comments more

* doc

* Remove default port, set port randomly from driver

* add unique_ptr comment about TaskSpec

* lint

* fix test

* update

* fix lint

* GetMessageMutable should not be const

* iwyu

* fix const

* Update direct_task_transport_test.cc

* fix segfault

* Fix test

* Add RpcAddress, set in actor table data

* fix serialization

* fix lint

* Pass through task caller address

* Fix object manager test

* RpcAddress -> Address

* merge

* Port WorkerLease to grpc

* wip

* fix test

* add mem test

* update

* comments

* fix core worker tests

* fix

* remove old worker lease code

* First pass on spillback

* lint

* crash?

* Debug

* Fix task spec copy, extend test basic

* lint

* Port return worker to grpc

* lint

* Return worker to the correct raylet

* Only request worker if queued tasks

* A bit better failure handling

* Fix unit test

* Add unit test for spillback

* fix

* python test multinode

* update

* updates

* fix
2019-11-17 20:29:32 -08:00
Ion
1b80675206 Scheduling ids (#6137) 2019-11-15 16:04:16 -08:00
Edward Oakes
33040d734f
Disable stopgap GC by default (#6165)
* disable stopgap gc by default

* fix gc testss
2019-11-15 15:42:59 -08:00
Eric Liang
7d33e9949b
Integrate ref count module into local memory store (#6122) 2019-11-15 10:52:19 -08:00
Eric Liang
8ff393a7bd
Handle exchange of direct call objects between tasks and actors (#6147) 2019-11-14 17:32:04 -08:00
Edward Oakes
2758cd0b34
Make log message debug (#6166) 2019-11-14 15:05:36 -08:00
Eric Liang
0a3623ded6
Fix memory store wait (#6152) 2019-11-14 10:17:30 -08:00
Stephanie Wang
bbadde57e0
Pass through caller address when submitting a task (#6143)
* Add RpcAddress, set in actor table data

* Pass through task caller address

* RpcAddress -> Address

* update

* fix

* lint

* fix cc tests
2019-11-14 09:14:08 -08:00
Ujval Misra
e3e3ad4b25 Add timeout param to ray.get (#6107) 2019-11-14 00:50:04 -08:00
Edward Oakes
51e76151d6
Use shared_ptr for gcs client in profiler (#6150) 2019-11-13 15:24:01 -08:00
Eric Liang
f3f86385d6
Minimal implementation of direct task calls (#6075) 2019-11-12 11:45:28 -08:00
Stephanie Wang
35d177f459
Use grpc for communication from worker to local raylet (task submission and direct actor args only) (#6118)
* Skeleton for SubmitTask proto

* Pass through node manager port, connect in raylet client

* Switch submit task to grpc

* Check port in use

* doc

* Remove default port, set port randomly from driver

* update

* Fix test

* Fix object manager test
2019-11-11 21:17:25 -08:00
Edward Oakes
5780ec1b62
Refresh ObjectIDs in raylet for stopgap GC (#6109) 2019-11-10 23:12:59 -08:00
Philipp Moritz
ccbcc4bafa
Use GRCP and Bazel 1.0 (#6002) 2019-11-08 15:58:28 -08:00
Philipp Moritz
5a05eaaa54 Fix compilation on master (#6116) 2019-11-07 22:38:42 -08:00
Eric Liang
4a28306186
Allow large returns from direct actor calls (#6088) 2019-11-07 21:28:55 -08:00
Edward Oakes
ca53af4d0f
Add pending task dependencies to ObjectID ref counting (#6054) 2019-11-07 18:37:10 -08:00