fangfengbin
9ae5bba7cf
[GCS]Fix gcs table storage GetAll
and GetByJobId
api bug ( #13195 )
2021-01-07 10:37:00 +08:00
Siyuan (Ryans) Zhuang
02ae6c5a9a
[Core] Fix incorrect comment ( #13228 )
2021-01-06 11:37:29 -08:00
Lingxuan Zuo
01d4638b49
[Log] fix spdlog init race ( #12973 )
...
* fix spdlog init race
* use global logger
* refine logger name and constructor
2021-01-06 11:02:54 -08:00
dHannasch
695833082d
[Redis] Note that each Redis Connect retry takes two minutes ( #12183 )
...
* Slightly alter error message so it's the same in both cases.
* Each retry takes about two minutes.
2021-01-06 11:00:58 -08:00
SangBin Cho
32dc5676b4
[Metrics] Record per node and raylet cpu / mem usage ( #12982 )
...
* Record per node and raylet cpu / mem usage
* Add comments.
* Addressed code review.
2021-01-05 21:57:21 -08:00
fangfengbin
779b3876f6
[GCS]Fix TestActorSubscribeAll bug ( #13193 )
2021-01-06 13:52:39 +08:00
fangfengbin
dd14e5a3b3
[BugFix][GCS]Fix gcs_actor_manager_test multithreading bug ( #13158 )
2021-01-06 10:47:06 +08:00
Tao Wang
a0bbf2bfc2
Notify listeners after registered node stored ( #13069 )
2021-01-05 11:18:03 +08:00
fangfengbin
88eaa87e3a
Remove unused file(object_manager_integration_test.cc) ( #12989 )
2021-01-05 11:09:36 +08:00
Eric Liang
dfb326d4b5
Surface object store spilling statistics in ray memory
( #13124 )
2021-01-04 17:35:39 -08:00
Stephanie Wang
b765914a1b
Revert "Enabling the cancellation of non-actor tasks in a worker's queue ( #12117 )" ( #13178 )
...
This reverts commit b4d688b4a6
.
2021-01-04 17:27:48 -08:00
Siyuan (Ryans) Zhuang
46cf433f0e
[Core] Remove Arrow dependencies ( #13157 )
...
* remove arrow ubsan
* remove arrow build depend
* remove arrow buffer
2021-01-04 11:19:09 -08:00
Gabriele Oliaro
b4d688b4a6
Enabling the cancellation of non-actor tasks in a worker's queue ( #12117 )
...
* wrote code to enable cancellation of queued non-actor tasks
* minor changes
* bug fixes
* added comments
* rev1
* linting
* making ActorSchedulingQueue::CancelTaskIfFound raise a fatal error
* bug fix
* added two unit tests
* linting
* iterating through pending_normal_tasks starting from end
* fixup! iterating through pending_normal_tasks starting from end
* fixup! fixup! iterating through pending_normal_tasks starting from end
* post merge fixes
* added debugging instructions, pulled Accept() out of guarded loop
* removed debugging instructions, linting
2021-01-04 09:52:29 -08:00
Clark Zinzow
c2bff64699
[Core] Locality-aware leasing: Milestone 1 - Owned refs, pinned location ( #12817 )
...
* Locality-aware leasing for owned refs (pinned locations).
* LessorPicker --> LeasePolicy.
* Consolidate GetBestNodeIdForTask and GetBestNodeIdForObjects.
* Update comments.
* Turn on locality-aware leasing feature flag by default.
* Move local fallback logic to LeasePolicy, move feature flag check to CoreWorker constructor, add local-only lease policy.
* Add lease policy consulting assertions to the direct task submitter tests.
* Add lease policy tests.
* LocalityLeasePolicy --> LocalityAwareLeasePolicy.
* Add missing const declarations.
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
* Add RAY_CHECK for raylet address nullptr when creating lease client.
* Make the fact that LocalLeasePolicy always returns the local node more explicit.
* Flatten GetLocalityData conditionals to make it more readable.
* Add ReferenceCounter::GetLocalityData() unit test.
* Add data-intensive microbenchmarks for single-node perf testing.
* Add data-intensive microbenchmarks for simulated cluster perf testing.
* Remove redundant comment.
* Remove data-intensive benchmarks.
* Add locality-aware leasing Python test.
* Formatting changes in ray_perf.py.
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
2021-01-04 09:49:08 -08:00
fangfengbin
25f9f0d781
[GCS] Move resource usage info to gcs resource manager ( #13059 )
2020-12-25 15:17:45 +08:00
Siyuan (Ryans) Zhuang
cf9952a028
[Core] Remote outdated external store ( #13080 )
...
* remove outdated external store
2020-12-24 17:30:06 -08:00
Siyuan (Ryans) Zhuang
bf7f6a7de3
[Core] Remove cuda support in plasma store ( #13070 )
...
* remove cuda support in plasma store
2020-12-24 13:24:56 -08:00
Stephanie Wang
4461f9980a
Refactor TaskDependencyManager, allow passing bundles of objects to ObjectManager ( #13006 )
...
* New dependency manager
* Switch raylet to new DependencyManager
* PullManager accepts bundles
* Cleanup, remove old task dependency manager
* x
* PullManager unit tests
* lint
* Unit tests
* Rename
* lint
* test
* Update src/ray/raylet/dependency_manager.cc
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
* Update src/ray/raylet/dependency_manager.cc
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
* x
* lint
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
2020-12-23 18:36:00 -08:00
Stephanie Wang
d95c8b8a41
[core][new scheduler] Move tasks from ready to dispatch to waiting on argument eviction ( #13048 )
...
* Add index for tasks to dispatch
* Task dependency manager interface
* Unsubscribe dependencies and tests
* NodeManager
* Revert "Add index for tasks to dispatch"
This reverts commit c6ccb9aa306e00f80d34b991055e4e83872595ea.
* tmp
* Move back to waiting if args not ready
* update
2020-12-23 09:33:43 -08:00
DK.Pino
6e19facc7f
[GCS] Delete redis gcs client and redis_xxx_accessor ( #12996 )
2020-12-23 20:31:46 +08:00
fangfengbin
646c4201ac
[GCS]Decouple gcs resource manager and gcs node manager ( #13012 )
2020-12-23 11:25:01 +08:00
fyrestone
62a5832007
[Dashboard] Add GET /logical/actors API ( #12913 )
2020-12-23 11:14:23 +08:00
Alex Wu
ea8d782be1
[core] Pull Manager exponential backoff ( #13024 )
2020-12-21 19:17:51 -08:00
Eric Liang
8068041006
Don't release resources during plasma fetch ( #13025 )
2020-12-21 18:32:40 -08:00
Eric Liang
03a5b90ed6
Revert "Revert "Increase the number of unique bits for actors to avoi… ( #12990 )
2020-12-21 15:16:42 -08:00
Kai Yang
5a6801dde7
[Core] Remove delete_creating_tasks
( #12962 )
2020-12-22 00:01:27 +08:00
fangfengbin
85a4435ba0
[GCS]Fix redis store client AsyncPutWithIndex unordered bug ( #13002 )
2020-12-21 20:02:50 +08:00
Barak Michener
c576f0b073
[ray_client] Implement a gRPC streaming logs API for the client ( #13001 )
2020-12-20 19:35:34 -08:00
fangfengbin
4caa6c6d78
[GCS]GCS resource manager remove cluster_resources_ ( #12972 )
2020-12-21 11:00:25 +08:00
Barak Michener
e715ade2d1
Support retrieval of named actor handles ( #13000 )
...
Change-Id: I05d31c9c67943d2a0230782cbdaa98341584cbc7
2020-12-20 16:34:50 -08:00
Barak Michener
80f6dd16b2
[ray_client] Implement optional arguments to ray.remote() and f.options() ( #12985 )
2020-12-20 15:43:48 -08:00
Barak Michener
7ab9164f1b
[ray_client] Integrate with test_basic, test_basic_2 and test_actor ( #12964 )
2020-12-20 14:54:18 -08:00
fangfengbin
3fab93b61b
Fix scheduling_resources comment errors ( #12991 )
...
* Fix scheduling_resources comment error
* add part code
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2020-12-20 20:20:07 +08:00
Eric Liang
64c97d25d3
Enable by default new scheduler ( #12735 )
2020-12-19 13:22:24 -08:00
Eric Liang
5d987f5988
Revert "Increase the number of unique bits for actors to avoid handle collisions ( #12894 )" ( #12988 )
...
This reverts commit 3e492a79ec
.
2020-12-18 23:51:44 -08:00
dHannasch
a092433bc8
[core] Use the ConnectWithoutRetries error message ( #12732 )
2020-12-18 22:34:34 -08:00
SangBin Cho
9d939e6674
[Object Spilling] Implement level triggered logic to make streaming shuffle work + additional cleanup ( #12773 )
2020-12-18 19:31:14 -08:00
Alex Wu
404161a3ff
[Autoscaler/Core] Remove autoscaler spam ( #12952 )
2020-12-18 18:22:45 -08:00
Kai Yang
ac5ea2c13d
[Java] Fix output parsing in RunManager ( #12968 )
...
* Fix output parsing in RunManager
* change log level
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2020-12-18 18:22:12 -08:00
Eric Liang
6ece291f35
Clean up block/unblock handling of resources in new scheduler ( #12963 )
2020-12-18 16:00:54 -08:00
Eric Liang
3e492a79ec
Increase the number of unique bits for actors to avoid handle collisions ( #12894 )
2020-12-18 15:59:03 -08:00
Eric Liang
92812f2e8a
Implement resource deadlock detection for new scheduler ( #12961 )
2020-12-18 12:17:54 -08:00
Barak Michener
5cfa1934e4
[ray_client]: Implement object retain/release and Data Streaming API ( #12818 )
2020-12-18 11:47:38 -08:00
fangfengbin
a442cd17e0
[GCS]Optimize gcs client reconnection ( #12878 )
...
* [GCS]Optimize gcs client reconnection
* fix review comment
* fix review comment
* add part code
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2020-12-17 21:57:37 -08:00
dHannasch
cfefd7c70e
Test PingPort ( #12954 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-12-17 21:15:42 -08:00
DK.Pino
6404f1e609
[Placement Group][New scheduler] New scheduler pg implementation ( #12910 )
2020-12-18 11:56:45 +08:00
Tao Wang
17152c84a7
[Tiny]Print raylet info after register ( #12566 )
2020-12-18 11:22:13 +08:00
dHannasch
d747071dd9
Test shard_context on already-created boost::asio::io_service. ( #12917 )
2020-12-17 14:26:30 -08:00
Allen
e6cb4f4bd7
[Core] Add log of address and port ( #12908 )
...
Co-authored-by: Allen Yin <allenyin@anyscale.io>
2020-12-17 00:25:29 -08:00
Yi Cheng
40032541dc
[core] Introduce fetch_local to ray.wait
( #12526 )
2020-12-16 23:44:28 -08:00