Tao Wang
516eb77080
[GCS] Remove task info publish as nowhere uses it ( #13509 )
...
* Remove task info publish as nowhere uses it
* simplify right publish channel
2021-01-18 01:15:03 -08:00
Tao Wang
3a0710130c
[GCS]Only publish changed field when node dead ( #13364 )
...
* Only update changed field when node dead
* node_id missed
2021-01-17 21:28:35 -08:00
ZhuSenlin
a4ebdbd7da
Refactor node manager to eliminate new_scheduler_enabled_
( #12936 )
2021-01-18 00:15:35 +08:00
ZhuSenlin
2cd51ce608
sync write internal config in gcs ( #13197 )
2021-01-17 12:00:01 +08:00
Eric Liang
ee6332dbb0
Bump dev branch to 2.0 to avoid endless version bump toil ( #13497 )
...
* wip
* fix
* fix
2021-01-15 17:41:17 -08:00
SangBin Cho
d09df55b14
Update ID specification doc ( #13356 )
2021-01-15 15:15:51 -08:00
Eric Liang
4aeb0ea550
Return version info from Ray client connect, to allow for discovering version mismatches
2021-01-15 14:27:26 -08:00
SangBin Cho
f6d9996874
[Object Spilling] Dedup restore objects ( #13470 )
...
* done.
* Addressed code review.
2021-01-14 23:51:11 -08:00
fangfengbin
ce1b208e41
[GCS]Remove unused class variable ( #13454 )
2021-01-15 14:48:18 +08:00
Barak Michener
84e110a949
[ray_client]: Support runtime_context as metadata ( #13428 )
2021-01-14 14:37:00 -08:00
Clark Zinzow
9a658b568f
[Core] Ownership-based Object Directory: Consolidate location table and reference table. ( #13220 )
...
* Added owned object reference before Plasma put on Create() + Seal() path.
* Consolidated location table and reference table in reference counter.
* Restore type in definition.
* Clean up owned reference on failed Seal().
* Added RemoveOwnedObject test for reference counter.
* Guard against ref going out of scope before location RPCs.
* Add 'owner must have ref in scope' precondition to documentation for object location methods.
* Move to separate Create() + Seal() methods for existing objects.
* Clearer distinction between Create() and Seal() methods.
* Make it clear that references will normally be cleaned up by reference counting.
2021-01-14 13:48:10 -08:00
fangfengbin
4a6c53da46
[Core]Fix raylet scheduling bug ( #13452 )
...
* [Core]Fix raylet scheduling bug
* fix lint error
* fix lint error
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2021-01-14 14:50:32 +01:00
fangfengbin
33b092de28
[GCS]Add gcs resource scheduler ( #13072 )
2021-01-14 20:05:55 +08:00
Kai Fricke
b296642646
Fix linter error ( #13451 )
2021-01-14 10:28:44 +01:00
fyrestone
8697d67791
Fix raylet::MockWorker::GetProcess crashes ( #13440 )
...
Co-authored-by: 刘宝 <po.lb@antfin.com>
2021-01-14 12:19:21 +08:00
Tao Wang
062b7efc93
Remove unused handler methods ( #13394 )
2021-01-14 10:51:31 +08:00
fyrestone
4853aa96cb
[Dashboard] Fix missing actor pid ( #13229 )
2021-01-13 16:45:12 +08:00
Tao Wang
f587b9a50c
Remove unimplemented GetAll method in actor info accessor ( #13362 )
2021-01-13 09:55:27 +08:00
Eric Liang
470fda190a
Forgot overwrite parameter in Ray client internal kv
2021-01-11 17:50:06 -08:00
Eric Liang
de5bc24c60
Implement internal kv in ray client ( #13344 )
...
* kv internal
* fix
2021-01-11 14:54:52 -08:00
Eric Liang
fbb9795374
[client] Report number of currently active clients on connect ( #13326 )
...
* wip
* update
* update
* reset worker
* fix conn
* fix
* disable pycodestyle
2021-01-11 14:53:12 -08:00
ZhuSenlin
c39658f368
fix removal of task dependencies ( #13333 )
...
Co-authored-by: senlin.zsl <senlin.zsl@antfin.com>
2021-01-11 09:55:48 -08:00
Alex Wu
6ca4fb1054
[Pull manager] Only pull once per retry period ( #13245 )
...
* .
* docs
* cleanup
* .
* .
* .
* .
Co-authored-by: Alex <alex@anyscale.com>
2021-01-08 14:51:11 -08:00
Hao Chen
77cd0d5a21
Fix a crash problem caused by GetActorHandle in ActorManager ( #13164 )
2021-01-08 12:11:08 +08:00
Tao Wang
ab2229dcb7
[GCS] Remove old lightweight resource usage report code path ( #13192 )
2021-01-08 10:30:00 +08:00
Tao Wang
82c54c67ee
Publish job/worker info with Hex format instead of Binary ( #13235 )
2021-01-07 20:31:58 +08:00
fangfengbin
3669c02821
[GCS]Add gcs actor schedule strategy ( #13156 )
2021-01-07 15:44:33 +08:00
fangfengbin
9ae5bba7cf
[GCS]Fix gcs table storage GetAll
and GetByJobId
api bug ( #13195 )
2021-01-07 10:37:00 +08:00
Siyuan (Ryans) Zhuang
02ae6c5a9a
[Core] Fix incorrect comment ( #13228 )
2021-01-06 11:37:29 -08:00
Lingxuan Zuo
01d4638b49
[Log] fix spdlog init race ( #12973 )
...
* fix spdlog init race
* use global logger
* refine logger name and constructor
2021-01-06 11:02:54 -08:00
dHannasch
695833082d
[Redis] Note that each Redis Connect retry takes two minutes ( #12183 )
...
* Slightly alter error message so it's the same in both cases.
* Each retry takes about two minutes.
2021-01-06 11:00:58 -08:00
SangBin Cho
32dc5676b4
[Metrics] Record per node and raylet cpu / mem usage ( #12982 )
...
* Record per node and raylet cpu / mem usage
* Add comments.
* Addressed code review.
2021-01-05 21:57:21 -08:00
fangfengbin
779b3876f6
[GCS]Fix TestActorSubscribeAll bug ( #13193 )
2021-01-06 13:52:39 +08:00
fangfengbin
dd14e5a3b3
[BugFix][GCS]Fix gcs_actor_manager_test multithreading bug ( #13158 )
2021-01-06 10:47:06 +08:00
Tao Wang
a0bbf2bfc2
Notify listeners after registered node stored ( #13069 )
2021-01-05 11:18:03 +08:00
fangfengbin
88eaa87e3a
Remove unused file(object_manager_integration_test.cc) ( #12989 )
2021-01-05 11:09:36 +08:00
Eric Liang
dfb326d4b5
Surface object store spilling statistics in ray memory
( #13124 )
2021-01-04 17:35:39 -08:00
Stephanie Wang
b765914a1b
Revert "Enabling the cancellation of non-actor tasks in a worker's queue ( #12117 )" ( #13178 )
...
This reverts commit b4d688b4a6
.
2021-01-04 17:27:48 -08:00
Siyuan (Ryans) Zhuang
46cf433f0e
[Core] Remove Arrow dependencies ( #13157 )
...
* remove arrow ubsan
* remove arrow build depend
* remove arrow buffer
2021-01-04 11:19:09 -08:00
Gabriele Oliaro
b4d688b4a6
Enabling the cancellation of non-actor tasks in a worker's queue ( #12117 )
...
* wrote code to enable cancellation of queued non-actor tasks
* minor changes
* bug fixes
* added comments
* rev1
* linting
* making ActorSchedulingQueue::CancelTaskIfFound raise a fatal error
* bug fix
* added two unit tests
* linting
* iterating through pending_normal_tasks starting from end
* fixup! iterating through pending_normal_tasks starting from end
* fixup! fixup! iterating through pending_normal_tasks starting from end
* post merge fixes
* added debugging instructions, pulled Accept() out of guarded loop
* removed debugging instructions, linting
2021-01-04 09:52:29 -08:00
Clark Zinzow
c2bff64699
[Core] Locality-aware leasing: Milestone 1 - Owned refs, pinned location ( #12817 )
...
* Locality-aware leasing for owned refs (pinned locations).
* LessorPicker --> LeasePolicy.
* Consolidate GetBestNodeIdForTask and GetBestNodeIdForObjects.
* Update comments.
* Turn on locality-aware leasing feature flag by default.
* Move local fallback logic to LeasePolicy, move feature flag check to CoreWorker constructor, add local-only lease policy.
* Add lease policy consulting assertions to the direct task submitter tests.
* Add lease policy tests.
* LocalityLeasePolicy --> LocalityAwareLeasePolicy.
* Add missing const declarations.
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
* Add RAY_CHECK for raylet address nullptr when creating lease client.
* Make the fact that LocalLeasePolicy always returns the local node more explicit.
* Flatten GetLocalityData conditionals to make it more readable.
* Add ReferenceCounter::GetLocalityData() unit test.
* Add data-intensive microbenchmarks for single-node perf testing.
* Add data-intensive microbenchmarks for simulated cluster perf testing.
* Remove redundant comment.
* Remove data-intensive benchmarks.
* Add locality-aware leasing Python test.
* Formatting changes in ray_perf.py.
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
2021-01-04 09:49:08 -08:00
fangfengbin
25f9f0d781
[GCS] Move resource usage info to gcs resource manager ( #13059 )
2020-12-25 15:17:45 +08:00
Siyuan (Ryans) Zhuang
cf9952a028
[Core] Remote outdated external store ( #13080 )
...
* remove outdated external store
2020-12-24 17:30:06 -08:00
Siyuan (Ryans) Zhuang
bf7f6a7de3
[Core] Remove cuda support in plasma store ( #13070 )
...
* remove cuda support in plasma store
2020-12-24 13:24:56 -08:00
Stephanie Wang
4461f9980a
Refactor TaskDependencyManager, allow passing bundles of objects to ObjectManager ( #13006 )
...
* New dependency manager
* Switch raylet to new DependencyManager
* PullManager accepts bundles
* Cleanup, remove old task dependency manager
* x
* PullManager unit tests
* lint
* Unit tests
* Rename
* lint
* test
* Update src/ray/raylet/dependency_manager.cc
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
* Update src/ray/raylet/dependency_manager.cc
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
* x
* lint
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
2020-12-23 18:36:00 -08:00
Stephanie Wang
d95c8b8a41
[core][new scheduler] Move tasks from ready to dispatch to waiting on argument eviction ( #13048 )
...
* Add index for tasks to dispatch
* Task dependency manager interface
* Unsubscribe dependencies and tests
* NodeManager
* Revert "Add index for tasks to dispatch"
This reverts commit c6ccb9aa306e00f80d34b991055e4e83872595ea.
* tmp
* Move back to waiting if args not ready
* update
2020-12-23 09:33:43 -08:00
DK.Pino
6e19facc7f
[GCS] Delete redis gcs client and redis_xxx_accessor ( #12996 )
2020-12-23 20:31:46 +08:00
fangfengbin
646c4201ac
[GCS]Decouple gcs resource manager and gcs node manager ( #13012 )
2020-12-23 11:25:01 +08:00
fyrestone
62a5832007
[Dashboard] Add GET /logical/actors API ( #12913 )
2020-12-23 11:14:23 +08:00
Alex Wu
ea8d782be1
[core] Pull Manager exponential backoff ( #13024 )
2020-12-21 19:17:51 -08:00