fangfengbin
e196fcdbaf
Add gcs_service_enabled function to avoid getting environment variable directly ( #7742 )
2020-03-26 22:02:53 +08:00
Eric Liang
23b6fdcda1
ray memory
should collect statistics from all nodes (#7721 )
2020-03-25 16:31:31 -07:00
Stephanie Wang
46404d8a0b
[core] Pin lineage of plasma objects that are still in scope ( #7690 )
...
* Fix deadlock in DrainAndShutdown
* Revert "[core] Revert lineage pinning (#7499 ) (#7692 )"
This reverts commit ba86a02b37
.
* debug rllib
* debug rllib
* turn on all rllib tests again
* debug rllib
* Fix drain bug, check number of pending tasks
* revert rllib debug
* remove todo
* Trigger rllib tests
* revert rllib debug commit
2020-03-25 09:29:32 -07:00
Stephanie Wang
a1cee6af7b
Revert "New scheduler local node ( #7441 )" ( #7732 )
...
This reverts commit 6141fdab95
.
2020-03-24 18:32:16 -07:00
Ion
6141fdab95
New scheduler local node ( #7441 )
2020-03-24 13:59:50 -05:00
fangfengbin
bf866de6fd
Enable GCS Service by default ( #7541 )
2020-03-24 14:20:23 +08:00
mehrdadn
b4030cdbbe
File HANDLE/descriptor translation layer for Windows ( #7657 )
...
* Use TCP sockets on Windows with custom HANDLE <-> FD translation layer
* Get Plasma working on Windows
Co-authored-by: Mehrdad <noreply@github.com>
2020-03-23 21:08:25 -07:00
Edward Oakes
9318b29f5e
Remove is_direct logic from the raylet ( #7698 )
2020-03-23 17:09:35 -05:00
Stephanie Wang
7f38cc1d03
Debug statements and increase timeout for test array ( #7713 )
2020-03-23 13:02:14 -07:00
ZhuSenlin
74825db804
Fix TestGcsRedisFailureDetector ( #7710 )
...
* fix test_gcs_redis_failure_detector
* fix test_gcs_redis_failure_detector
Co-authored-by: senlin.zsl <senlin.zsl@antfin.com>
2020-03-23 22:48:53 +08:00
ZhuSenlin
039961b63a
rename ActorTable to LogBasedActorTable and add new ActorTable ( #7643 )
2020-03-23 15:05:43 +08:00
Edward Oakes
8b4f5a9431
Remove non-direct-call code from core worker ( #7625 )
2020-03-22 19:20:08 -05:00
Stephanie Wang
ba86a02b37
[core] Revert lineage pinning ( #7499 ) ( #7692 )
...
* Revert "fix (#7681 )"
This reverts commit 6a12a31b2e
.
* Revert "[core] Pin lineage of plasma objects that are still in scope (#7499 )"
This reverts commit 014929e658
.
2020-03-21 18:35:43 -07:00
Zhijun Fu
a7a5d172b1
[core] fix bug that actor tasks from reconstructed actor is ignored by scheduling queue ( #7637 )
2020-03-21 13:05:24 +08:00
Stephanie Wang
6a12a31b2e
fix ( #7681 )
2020-03-20 18:53:28 -07:00
Stephanie Wang
014929e658
[core] Pin lineage of plasma objects that are still in scope ( #7499 )
...
* Add a lineage_ref_count to References
* Refactor TaskManager to store TaskEntry as a struct
* Refactor to fix deadlock between TaskManager and ReferenceCounter
Add references to task specs
* Pin TaskEntries and References in the lineage of any ObjectIDs in scope
* Fix deadlock, convert num_plasma_returns to a set of object IDs
* fix unit tests
* Feature flag
* Do not release lineage for objects that were promoted to plasma
* fix build
* fix build
* Remove num executions
* Simplify num return values
* Remove unused
* doc
* Set num returns
* Move lineage pinning flag to ReferenceCounter
* comments
* Fixes
* Remove irrelevant test (replaced by ref counting tests)
2020-03-20 10:56:43 -07:00
mehrdadn
e69664b74b
Miscellaneous Windows compatibility bugfixes ( #7658 )
...
* Windows compatibility bug fixes
* Use WSASend/WSARecv as WSASendMsg/WSARecvMsg do not work with TCP sockets
* Clean up some TODOs
* Fix duplicate compilations
* RedisAsioClient boost::asio::error::connection_reset
Co-authored-by: Mehrdad <noreply@github.com>
2020-03-19 19:32:53 -07:00
Stephanie Wang
c7cae036c3
[core] Only drain references for non-actor workers on shutdown ( #7668 )
...
* Only drain ref counter for non-actor tasks
* Don't force kill actors that have gone out of scope
2020-03-19 18:46:16 -07:00
fangfengbin
0d0a41f598
[GCS]Tie lifecycle of gcs service and redis together ( #7601 )
2020-03-19 19:52:35 +08:00
Stephanie Wang
b499100a88
Enable distributed ref counting by default ( #7628 )
...
* enable
* Turn on eager eviction
* Shorten tests and drain ReferenceCounter
* Don't force kill actor handles that have gone out of scope, lint
* Fix locks
* Cleanup Plasma Async Callback (#7452 )
* [rllib][tune] fix some nans (#7611 )
* Change /tmp to platform-specific temporary directory (#7529 )
* [Serve] UI Improvements (#7569 )
* bugfix about test_dynres.py (#7615 )
Co-authored-by: senlin.zsl <senlin.zsl@antfin.com>
* Java call Python actor method use actor.call (#7614 )
* bug fix about useage of absl::flat_hash_map::erase and absl::flat_hash_set::erase (#7633 )
Co-authored-by: senlin.zsl <senlin.zsl@antfin.com>
* [Java] Make both `RayActor` and `RayPyActor` inheriting from `BaseActor` (#7462 )
* [Java] Fix the issue that the cached value in `RayObject` is serialized (#7613 )
* Add failure tests to test_reference_counting (#7400 )
* Fix typo in asyncio documentation (#7602 )
* Fix segfault
* debug
* Force kill actor
* Fix test
2020-03-18 22:39:21 -07:00
Stephanie Wang
35a4bfc885
[core] Fix leak for subscribing to object dependencies in NodeManager ( #7630 )
...
* Fix GetDependencies
* lint
2020-03-18 11:01:29 -07:00
Eric Liang
745b9d643d
First pass at ray memory
command for memory debugging ( #7589 )
2020-03-17 20:45:07 -07:00
Edward Oakes
c1b0f9ccdf
Add failure tests to test_reference_counting ( #7400 )
2020-03-17 10:30:21 -05:00
ZhuSenlin
dfa5d9b8e9
bug fix about useage of absl::flat_hash_map::erase and absl::flat_hash_set::erase ( #7633 )
...
Co-authored-by: senlin.zsl <senlin.zsl@antfin.com>
2020-03-17 19:39:56 +08:00
ZhuSenlin
ffa9df4683
bugfix about test_dynres.py ( #7615 )
...
Co-authored-by: senlin.zsl <senlin.zsl@antfin.com>
2020-03-17 13:58:44 +08:00
mehrdadn
a0700e2f86
Change /tmp to platform-specific temporary directory ( #7529 )
2020-03-16 18:10:14 -07:00
fangfengbin
6b37be9677
[GCS]Add job id when operating gcs table ( #7592 )
2020-03-15 12:04:04 +08:00
Kai Yang
630e48967d
[Java] Allow passing internal config from raylet to Java worker ( #7532 )
2020-03-15 12:03:38 +08:00
mehrdadn
a87199d240
Fix cyclic dependency between ray/util and ray/common ( #7581 )
...
* Fix cyclic dependency
Headers in ray/util should not depend on those in ray/common
* Move random generations to ray/common/test_util.h
* Add license header
Co-authored-by: Mehrdad <noreply@github.com>
Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>
2020-03-14 12:44:53 -07:00
Stephanie Wang
53549314c5
[core] Option to fallback to LRU on OutOfMemory ( #7410 )
...
* Add a test for LRU fallback
* Update error message
* Upgrade arrow to master
* Integrate with arrow
* Revert "Bazel mirrors (#7385 )"
This reverts commit 44aded5272
.
* Don't LRU evict
* Revert "Revert "Bazel mirrors (#7385 )""
This reverts commit b6359fea78d1bd3925452ca88ac71e0c9e5c7dd3.
* Add lru_evict flag
* fix internal config
* Fix
* upgrade arrow
* debug
* Set free period in config for lru_evict, override max retries to fix
test
* Fix test?
* fix test
* Revert "debug"
This reverts commit 98f01c63a267f38218f5047b1866e4c1c8280017.
* fix exception str
* Fix ref count test
* Shorten travis test?
2020-03-14 11:28:43 -07:00
Kai Yang
d6e8f47065
Add a flag to disable reconstruction for a killed actor ( #7346 )
2020-03-13 19:10:21 +08:00
Qing Wang
f4656d8cc3
[Java] Enable direct call by default. ( #7408 )
...
* WIP
* Address comments.
* Linting
* Fix
* Fix
* Fix test
* Fix
* Fix single process ci
* Fix ut
* Update java/test/src/main/java/org/ray/api/test/PlasmaFreeTest.java
* Address comments
* Fix linting
* Minor update comments.
* Fix streaming CI
2020-03-13 12:25:30 +08:00
micafan
cc91ed57dc
[core] Fix losing task state when giving up forward task. ( #7525 )
...
* fix NodeManager::Forward task bug on error
* fix lint
* revert spillback task forward
2020-03-13 11:49:44 +08:00
Edward Oakes
768d0b3b3f
Allocate a buffer of 100 calls for each RPC handler ( #7573 )
2020-03-12 12:05:30 -07:00
ZhuSenlin
b663bc6d67
Use gcs server to replace raylet monitor when RAY_GCS_SERVICE_ENABLED=true ( #7166 )
2020-03-12 22:13:56 +08:00
fangfengbin
4c834b9d68
Fix the issue that gcs service client ignores error status code ( #7539 )
...
* add gcs reply status
* rebase master
* use macro to simplify
* convert status in gcs rpc client
* define a Status message in probobuf
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-03-12 15:08:29 +08:00
Stephanie Wang
fdb528514b
[core] Ref counting for actor handles ( #7434 )
...
* tmp
* Move Exit handler into CoreWorker, exit once owner's ref count goes to 0
* fix build
* Remove __ray_terminate__ and add test case for distributed ref counting
* lint
* Remove unused
* Fixes for detached actor, duplicate actor handles
* Remove unused
* Remove creation return ID
* Remove ObjectIDs from python, set references in CoreWorker
* Fix crash
* Fix memory crash
* Fix tests
* fix
* fixes
* fix tests
* fix java build
* fix build
* fix
* check status
* check status
2020-03-10 17:45:07 -07:00
Edward Oakes
119a303ea0
Remove static concurrency limit from gRPC server ( #7544 )
2020-03-10 16:27:02 -07:00
Edward Oakes
dbbf0c0e70
Add Apache 2 license to C++ files ( #7520 )
2020-03-10 16:07:17 -07:00
fangfengbin
fa785a2ad2
ServiceBasedGcsClient support detect gcs server availability and retry ( #7292 )
2020-03-10 21:01:07 +08:00
mehrdadn
fc76586518
Redis on Windows ( #7509 )
...
* Switch hiredis on Windows to that of the Windows port of Redis
* Use boost::asio::ip::tcp::socket::native_handle_type
* Use normal hiredis instead of Windows-specific one
* Finish up using normal hiredis
Co-authored-by: Mehrdad <noreply@github.com>
2020-03-09 18:49:54 -07:00
Edward Oakes
b4e2d5317e
Remove experimental.NoReturn ( #7475 )
2020-03-09 11:09:36 -07:00
Stephanie Wang
95bb0c5357
Upgrade plasma to latest version, use synchronous Seal ( #7470 )
...
* Upgrade arrow to master
* fix build
* todo
* lint
* Fix hanging test
2020-03-09 10:30:44 -07:00
Edward Oakes
0abcca258f
Add entries to in-memory store on Put() ( #7085 )
2020-03-04 10:17:27 -08:00
ijrsvt
fb76092d75
Re-route asyncio plasma code path through raylet instead of direct plasma connection ( #7234 )
2020-03-03 15:43:46 -05:00
fangfengbin
f5b1062ed9
Fix TwoNodeTest.TestActorTaskCrossNodes testcase when enable gcs service ( #7416 )
2020-03-03 19:37:38 +08:00
ijrsvt
584645cc7d
Fix Experimental Async API ( #7391 )
2020-03-02 22:24:20 -06:00
Qing Wang
2771af1036
Fix the bug of unregistered workers in worker pool ( #7343 )
...
* Fix
* Fix
* Fix complie
* Fix lint
* Fix linting
* Fix testDeleteObject
* Fix linting
* Update src/ray/raylet/worker_pool.cc
Co-Authored-By: Hao Chen <chenh1024@gmail.com>
* Update src/ray/raylet/worker_pool.cc
Co-Authored-By: Hao Chen <chenh1024@gmail.com>
* Update src/ray/raylet/worker_pool.h
Co-Authored-By: Hao Chen <chenh1024@gmail.com>
* Update src/ray/raylet/worker_pool.cc
Co-Authored-By: Hao Chen <chenh1024@gmail.com>
* Address comments.
* FIx linting
Co-authored-by: Hao Chen <chenh1024@gmail.com>
2020-03-02 16:30:39 +08:00
mehrdadn
5fb5be0ba5
Some bug fixes for Windows ( #7374 )
...
* Fix MAP_SHARED check in sys/mman.h
* Fix missing :platform_shims dependency for ray_util
* dlmalloc patch for Arrow
2020-02-28 10:22:32 -08:00
mehrdadn
0efaa9b310
Use Redis for Windows ( #7364 )
2020-02-28 10:18:56 -08:00