Alex Wu
18d85d2de9
Grpc based resource broadcast ( #15466 )
2021-05-05 11:20:08 -07:00
Eric Liang
a482034916
Flaky test builder for tests tagged "flaky" ( #15408 )
2021-04-20 00:19:07 -07:00
SangBin Cho
61d120557d
[Pubsub] Generalize pubsub, Move pubsub code to pubsub_lib module ( #15164 )
...
* cherry-pick-1
* cherry-pick-2
* cherry-pick-part-3
* Should work.
* Lint fix.
* Fix lint 2.
2021-04-07 20:40:39 -07:00
Siyuan (Ryans) Zhuang
7fd86f7e15
[Core] Use static callback instead of dynamic notification listener ( #15059 )
...
* static callback & remove outdated protocol
* address comments
* fix
* make fields constant
* fix windows compilation error
2021-04-02 22:33:41 -07:00
Alex Wu
4fba05ae4d
[core] Hybrid scheduling policy. ( #14790 )
2021-04-01 16:59:59 -07:00
SangBin Cho
005cff0092
Revert "Revert "[Core] Implement long polling-based pubsub to reduce … ( #14909 )
2021-04-01 09:03:15 -07:00
Alex Wu
1f4d4dfeb0
Gcs pull resource reports ( #14336 )
2021-03-29 11:36:30 -07:00
Siyuan (Ryans) Zhuang
87c79553e9
[Core] Remove code paths that contains plasma store executable ( #14950 )
...
* remove plasma store executable & never used tests
* set default behavior
* fix tests
2021-03-28 21:22:14 -07:00
SangBin Cho
ec3cfef883
Revert "[Core] Implement long polling-based pubsub to reduce number of WaitForObjectEviction requests in flight. ( #14638 )" ( #14905 )
...
This reverts commit 35ec91c4e0
.
2021-03-24 11:22:48 -07:00
SangBin Cho
35ec91c4e0
[Core] Implement long polling-based pubsub to reduce number of WaitForObjectEviction requests in flight. ( #14638 )
...
* in progress.
* IN progress.
* lint.
* Updated code
* lint.
* In progress of writing tets.
* Finished implementation. Need cleanup & refactoring.
* fixing tests...
* Finish the impl.
* Fix typo.
* impl done. Only cleanup left.
* done.
* Finished clean up.
* Fix issues.
* Add a stronger consistency check.
* Addressed code review.
* lint.
* done.
* Addressed more.
* addressed all reviews.
* Addressed code review.
* lint.
* Added unit tests to assert no leak.
2021-03-23 23:47:08 -07:00
Yi Cheng
881a46e1d6
[core] RuntimeEnv GC in local node ( #14594 )
2021-03-18 14:55:11 -07:00
Clark Zinzow
566dcea56a
[Core] Added event loop metrics for posts. ( #14546 )
...
* Added event loop metrics for posts.
* io_context_proxy --> instrumented_io_context
* Fix feature flag, chrono-->absl, trim the stats, inline functions, reformat stats string.
* Make stats struct mutex plain lock instead of reader-writer lock.
* Mutex reader locking, std::array double braces initialization.
* Fix Bazel BUILD formatting.
2021-03-10 11:52:45 -08:00
Eric Liang
99a63b3dd1
Remove old scheduler and friends ( #14184 )
2021-03-03 18:29:15 -08:00
Stephanie Wang
5c6c9d5b91
[core] Spill tasks from waiting queue ( #14288 )
...
* Spill back waiting tasks
* test
* test
* todo
* Avoid iterating over args
* update
* lint
* Fix test
* test
* Test force spillback
* Unit test resource scheduler
* test
* travis?
* rename
* debug
* revert flaky test
* lint
* fix test
* fix
2021-03-02 22:30:02 -08:00
Stephanie Wang
a24ac13671
[core] Randomize actor ID to avoid collisions ( #14358 )
...
* Randomize actor ID
* Mix index and current time, add python test
* test
* nanos
2021-03-02 10:00:28 -08:00
Eric Liang
cc156f7b3c
Fix deadlock in unhandled exception handler and re-merge ( #3 ) ( #14192 )
2021-02-19 11:52:09 -08:00
SangBin Cho
66f93a3d63
Revert "Fix OSX error and re-merge unhandled exceptions handling ( #14138 )" ( #14180 )
...
This reverts commit ee584e8328
.
2021-02-18 10:35:38 -08:00
Eric Liang
ee584e8328
Fix OSX error and re-merge unhandled exceptions handling ( #14138 )
2021-02-17 13:35:07 -08:00
architkulkarni
3ce03a52bc
Revert "Revert "Revert "Unhandled exception handler based on local ref counti… ( #14113 )" ( #14136 )
...
This reverts commit e457872fe1
.
2021-02-16 11:47:09 -08:00
Eric Liang
e457872fe1
Revert "Revert "Unhandled exception handler based on local ref counti… ( #14113 )
...
* Revert "Revert "Unhandled exception handler based on local ref counting (#14049 )" (#14099 )"
This reverts commit b45ae76765
.
* reomve test
* fix
* fix
2021-02-15 14:11:11 -08:00
SangBin Cho
b45ae76765
Revert "Unhandled exception handler based on local ref counting ( #14049 )" ( #14099 )
...
This reverts commit 9dc671ae02
.
2021-02-14 22:08:32 -08:00
Eric Liang
9dc671ae02
Unhandled exception handler based on local ref counting ( #14049 )
2021-02-12 22:58:38 -08:00
Stephanie Wang
0998d69968
[core] Admission control for pulling objects to the local node ( #13514 )
...
* Admission control, TODO: tests, object size
* Unit tests for admission control and some bug fixes
* Add object size to object table, only activate pull if object size is known
* Some fixes, reset timer on eviction
* doc
* update
* Trigger OOM from the pull manager
* don't spam
* doc
* Update src/ray/object_manager/pull_manager.cc
Co-authored-by: Eric Liang <ekhliang@gmail.com>
* Remove useless tests
* Fix test
* osx build
* Skip broken test
* tests
* Skip failing tests
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2021-01-21 16:46:42 -08:00
fangfengbin
33b092de28
[GCS]Add gcs resource scheduler ( #13072 )
2021-01-14 20:05:55 +08:00
Siyuan (Ryans) Zhuang
46cf433f0e
[Core] Remove Arrow dependencies ( #13157 )
...
* remove arrow ubsan
* remove arrow build depend
* remove arrow buffer
2021-01-04 11:19:09 -08:00
Clark Zinzow
c2bff64699
[Core] Locality-aware leasing: Milestone 1 - Owned refs, pinned location ( #12817 )
...
* Locality-aware leasing for owned refs (pinned locations).
* LessorPicker --> LeasePolicy.
* Consolidate GetBestNodeIdForTask and GetBestNodeIdForObjects.
* Update comments.
* Turn on locality-aware leasing feature flag by default.
* Move local fallback logic to LeasePolicy, move feature flag check to CoreWorker constructor, add local-only lease policy.
* Add lease policy consulting assertions to the direct task submitter tests.
* Add lease policy tests.
* LocalityLeasePolicy --> LocalityAwareLeasePolicy.
* Add missing const declarations.
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
* Add RAY_CHECK for raylet address nullptr when creating lease client.
* Make the fact that LocalLeasePolicy always returns the local node more explicit.
* Flatten GetLocalityData conditionals to make it more readable.
* Add ReferenceCounter::GetLocalityData() unit test.
* Add data-intensive microbenchmarks for single-node perf testing.
* Add data-intensive microbenchmarks for simulated cluster perf testing.
* Remove redundant comment.
* Remove data-intensive benchmarks.
* Add locality-aware leasing Python test.
* Formatting changes in ray_perf.py.
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
2021-01-04 09:49:08 -08:00
Siyuan (Ryans) Zhuang
cf9952a028
[Core] Remote outdated external store ( #13080 )
...
* remove outdated external store
2020-12-24 17:30:06 -08:00
Stephanie Wang
4461f9980a
Refactor TaskDependencyManager, allow passing bundles of objects to ObjectManager ( #13006 )
...
* New dependency manager
* Switch raylet to new DependencyManager
* PullManager accepts bundles
* Cleanup, remove old task dependency manager
* x
* PullManager unit tests
* lint
* Unit tests
* Rename
* lint
* test
* Update src/ray/raylet/dependency_manager.cc
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
* Update src/ray/raylet/dependency_manager.cc
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
* x
* lint
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
2020-12-23 18:36:00 -08:00
DK.Pino
6e19facc7f
[GCS] Delete redis gcs client and redis_xxx_accessor ( #12996 )
2020-12-23 20:31:46 +08:00
DK.Pino
6404f1e609
[Placement Group][New scheduler] New scheduler pg implementation ( #12910 )
2020-12-18 11:56:45 +08:00
DK.Pino
153b24746c
[Placement Group] Refactor pg resource constrain in node manager ( #12538 )
...
* first version by pointer
* second version reference
* clean up
* add cpp ut
* lint
* extract LocalPlacementGroupManagerInterface
* lint
* fix commemt
* add idempotency test
* lint
* fix pg ut
* fix pg ut
* python lint
* fix pg ut timeout
* python lint
* fix comment
* lint
* lint
2020-12-12 23:32:15 -08:00
Alex Wu
676ec363f6
[Object Manager] Pull Manager refactor ( #12335 )
2020-12-11 11:56:23 -08:00
Keqiu Hu
ee012532fb
[core] Use node manager client pool for GCS service #10398 ( #12368 )
...
* raylet client pool
* Fix merging conflict
* Fix documentation typo
* fix linting
* address comments
* fix typo
* remove unintended logging
* address comments
* fix bazel file lint error
2020-12-09 12:44:40 -08:00
fangfengbin
d5215745e4
[PlacementGroup] Introduce GcsResourceManager and avoid copying resources when scheduling placement groups ( #12253 )
2020-11-26 11:21:58 +08:00
Stephanie Wang
c49554fb7a
Abstract plasma store creation request queue ( #12039 )
2020-11-16 17:09:15 -08:00
Barak Michener
272edcca94
[ray_client]: Implement function calls ( #11922 )
2020-11-12 16:49:34 -08:00
Siyuan (Ryans) Zhuang
b8dda0e3d0
[Serialization] Fix buffer alignment issues ( #11888 )
...
* fix buffer alignment issues
* remove unused fields
* aligned memory allocation
* windows compat
* license. fix compiler warnings
* fix compilation error
* reinterpret_cast
2020-11-10 23:44:16 -08:00
Eric Liang
ee2da0cf45
[Core] PushManager for reliable broadcast ( #11869 )
2020-11-09 18:01:47 -08:00
Barak Michener
27c810a97e
Basic protos for ray client ( #11762 )
2020-11-05 16:23:54 -08:00
Stephanie Wang
0ba777af99
[Object spilling] Add policy to automatically spill objects on OutOfMemory ( #11673 )
2020-11-02 12:42:02 -08:00
Stephanie Wang
427b5af0ae
[Object spilling] Refactor raylet to add a local object manager class ( #11647 )
...
* Fix pytest...
* Release objects that have been spilled
* GCS object table interface refactor
* Add spilled URL to object location info
* refactor to include spilled URL in notifications
* improve tests
* Add spilled URL to object directory results
* Remove force restore call
* Merge spilled URL and location
* fix
* tmp
* refactor
* unit test skeleton
* unit testing
* unit test fixes
* cleanup
* cleanup
* update
* Separate pinning from waiting for object free, fixes pytest
* Update src/ray/raylet/local_object_manager.h
Co-authored-by: Eric Liang <ekhliang@gmail.com>
Co-authored-by: Tyler Westenbroek <westenbroekt@berkeley.edu>
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-10-28 10:38:42 -04:00
Scott Graham
c4ae94d60b
[autoscaler] Azure deployment fixes ( #11613 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-10-27 15:27:18 -07:00
Max Fitton
caf3b04b27
[Dashboard] Turn on new dashboard by default pt 2 ( #11510 )
2020-10-23 15:52:14 -05:00
Ian Rodney
acbd12eabf
[Docker] Set Docker as the Default ( #11416 )
2020-10-19 10:53:30 -07:00
SangBin Cho
b1481c6acf
Revert "[PlacementGroup]Add node manager test framework ( #11174 )" ( #11398 )
...
This reverts commit 241e765d3a
.
2020-10-14 11:09:20 -07:00
fangfengbin
241e765d3a
[PlacementGroup]Add node manager test framework ( #11174 )
...
* add part code
* add part code
* add part code
* add part code
* add part code
* add part code
* fix ut bug
* fix ut bug
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-10-13 19:27:11 -07:00
fangfengbin
180c259702
[GCS]Remove unused api(ServiceBasedActorInfoAccessor::AsyncRegister/ServiceBasedActorInfoAccessor::AsyncUpdate) ( #11099 )
...
* remove unused gcs api
* fix ut bug
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-10-02 00:54:28 -07:00
Eric Liang
6a227ae501
[autoscaler] Split autoscaler interface public private ( #10898 )
2020-09-18 18:16:23 -07:00
Basasuya
5e030db8a5
[EVENT] add log reporter ( #10419 )
2020-09-16 11:54:05 +08:00
chaokunyang
ccf27a9ad2
[Streaming] Fix streaming ci ( #10665 )
2020-09-09 16:53:43 +08:00