Lixin Wei
462c7fb575
[streaming] export aligned_ symbols from raylet.so ( #12345 )
2020-11-24 10:16:12 -06:00
ZhuSenlin
1ae4d2873a
[GCS] refactor gcs initialization ( #11890 )
2020-11-24 21:11:18 +08:00
fangfengbin
be7938ee09
[PlacementGroup]Fix AddBundleLocations bug ( #12330 )
...
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2020-11-24 16:57:17 +08:00
dHannasch
2c4514a2c0
[minor] Refactor to expose RedisContext::PingPort ( #12022 )
2020-11-23 20:39:50 -08:00
fangfengbin
084f03797b
[Placement Group]Placement Group supports gcs failover(Part3) ( #12036 )
2020-11-23 16:57:58 +08:00
Eric Liang
dac09bd569
Fix actor_registry_ copied on each heartbeat; Improve receive object chunk debug messages ( #12187 )
2020-11-19 16:45:37 -08:00
Stephanie Wang
7bf5145d36
Lint plasma source files ( #12171 )
2020-11-19 19:08:18 -05:00
Eric Liang
de86d5aff7
ActorStatisticalData() debug metrics bog down raylet with 100% CPU ( #12148 )
...
* comment out bad
* update
2020-11-19 11:38:44 -08:00
SangBin Cho
7d67af6c2a
[Metrics] Add stats to measure process startup time + scheduling stats. ( #12100 )
...
* Add new stats.
* Fix issues.
2020-11-19 11:04:26 -08:00
Ian Rodney
7fcce785ed
[hotfix] Fix windows build ( #12146 )
...
* [hotfix] fix windows
* remove debug logs
2020-11-19 11:00:19 -08:00
Ian Rodney
e086ddc18f
[core] Add Recursive task cancelation ( #11923 )
2020-11-18 15:18:40 -08:00
Alex Wu
e9c9ba9c9f
[New Scheduler] Don't start tasks if the owner is dead ( #12050 )
2020-11-18 11:34:19 -08:00
Ameer Haj Ali
eef624750c
[ray client] ray wait() implementation ( #12072 )
2020-11-18 11:33:57 -08:00
dHannasch
b41f4fdec2
Extract the connection logic to reduce duplication. ( #12016 )
2020-11-18 00:12:58 -08:00
fangfengbin
d87af0da88
[PlacementGroup]Add gcs placement group manager debug info ( #12061 )
...
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2020-11-18 11:15:38 +08:00
fangfengbin
f400333841
[Placement Group]Placement Group supports gcs failover(Part2) ( #12003 )
...
* add testcase
* fix ut
* fix review comment
* fix review comment
* fix review comments
* fix ut bug
* add part code
* add part code
* add part code
* add testcase
* add part code
* fix ut bug
* fix ut timeout bug
* fix ut bug
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2020-11-18 10:59:26 +08:00
Stephanie Wang
f6bdd5ab17
[New Scheduler] Spillback from the queue of tasks assigned to the local node ( #12084 )
2020-11-17 16:13:59 -08:00
dHannasch
b5dfdb2a21
Log the Redis shard addresses as originally received from the head GCS. ( #12011 )
2020-11-17 13:11:17 -08:00
dHannasch
010e6cef3f
Allow setting the RAY_BACKEND_LOG_LEVEL to trace. ( #12012 )
2020-11-17 13:10:23 -08:00
dHannasch
f0dcf01807
Clarify that Ray is not yet retrying to connect. ( #12013 )
2020-11-17 13:01:42 -08:00
DK.Pino
0f9e2fec12
[Placement Group] Add get / get all / remove interface for Placement Group Java api. ( #11821 )
...
* add placement group java get/get all interface
* add remove placement group api
* fix some issue like: Placement Group -> placement group
* extract dumplicate code to placement group utils
* specify running mode for placement group ut
* update checkGlobalStateAccessorPointerValid -> validateGlobalStateAccessorPointer
* use THROW_EXCEPTION_AND_RETURN_IF_NOT_OK
* update pg log print
2020-11-17 12:32:39 +08:00
Tao Wang
d525e61288
[GCS]Open light heartbeat by default ( #11968 )
...
* [GCS]Open light heartbeat by default (#11689 )
* Add some unit tests
2020-11-16 18:21:47 -08:00
Stephanie Wang
c49554fb7a
Abstract plasma store creation request queue ( #12039 )
2020-11-16 17:09:15 -08:00
fangfengbin
8fb926565c
[Placement Group]Placement Group supports gcs failover (Part1) ( #11933 )
2020-11-16 14:42:56 +08:00
Gabriele Oliaro
4744ed01f7
Queueing non-actor tasks at the workers ( #11051 )
...
* separated adding tasks to queue and executing them (worker side)
* linting
* first review
* second rev
* rev3, all tests passing locally
* linting
* rev4
* linting
* finished rev4, all tests passing locally (mac)
* rev4, all tests passing locally
* linting
* rev5
* bug fix
* hopefully fixed build
* nvm
* ptr cast
* linting
* no special treatment for actor creation tasks
2020-11-12 12:44:13 -05:00
Tao Wang
3fbd8be851
[Placement Group]Do not really subtract resources, just count ( #11894 )
...
* [Placement Group]Do not really subtract resources, just count
* add todo
2020-11-12 00:01:19 -08:00
SangBin Cho
f80d812799
[Object Spilling] Introduce SpillWorker & RestoreWorker Pool to avoid IO worker deadlock. ( #11885 )
2020-11-11 18:20:14 -08:00
Tao Wang
92286660e4
[Core] Lazy create node manager clients, and destroy then ( #11928 )
2020-11-11 08:51:40 -08:00
Siyuan (Ryans) Zhuang
b8dda0e3d0
[Serialization] Fix buffer alignment issues ( #11888 )
...
* fix buffer alignment issues
* remove unused fields
* aligned memory allocation
* windows compat
* license. fix compiler warnings
* fix compilation error
* reinterpret_cast
2020-11-10 23:44:16 -08:00
dHannasch
29cb32539e
[Core] If failed to connect to redis, try to say why. ( #11916 )
2020-11-10 18:22:10 -08:00
fangfengbin
433e4f32da
[GCS]Reduce get operations of worker table ( #11599 )
...
* [GCS]Reduce get operations of worker table
* fix ut bug
* fix ut bug
* fix review comment
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-11-10 18:11:25 -08:00
Eric Liang
46f3652102
Remove repeat push timeout from object manager ( #11874 )
2020-11-10 16:26:53 -08:00
fangfengbin
543f7809a6
[GCS]Add gcs dump log(Part1) ( #11727 )
...
* add part code
* fix compile bug
* Fix bug
* Add part code
* fix review comment
* fix review comment
* fix lint error
* fix review comment
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2020-11-10 14:10:03 +08:00
Eric Liang
ee2da0cf45
[Core] PushManager for reliable broadcast ( #11869 )
2020-11-09 18:01:47 -08:00
Kai Yang
904f48ebd9
[Core] Multi-tenancy: Pass job ID from Raylet to worker via env variable ( #11829 )
...
* Pass job ID from Raylet to worker via env variable
* fix
* fix
* fix
* lint
* fix
* fix test_object_spilling
* address comments
* lint
* fix
2020-11-09 11:02:15 -08:00
Tao Wang
77e3163630
[GCS]Only pass node id to node failure detector ( #11886 )
...
* [GCS]Only pass node id to node failure detector
* rename
2020-11-09 10:52:33 -08:00
fangfengbin
407a212816
[GCS]Fix TestActorTableResubscribe bug ( #11830 )
...
* fix compile bug
* [GCS]Fix TestActorTableResubscribe bug
* rm unused code
* fix lint error
* fix review comment
* fix ut bug
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-11-08 23:50:05 -08:00
Stephanie Wang
61e41257e7
[Object spilling] Queue failed object creation requests until objects have been spilled ( #11796 )
...
* Queue creation requests
* Cleanup disconnected clients
* Remove unused
* todo
* FIFO order for create requests, remove warmup for IO workers
* test and lint
* disable test
* lint
* Skip on windows
2020-11-06 18:22:19 -05:00
SangBin Cho
e0ecf5d79d
Revert "[GCS]Open light heartbeat by default ( #11689 )" ( #11861 )
...
This reverts commit 612ddb2dd1
.
2020-11-06 14:34:59 -08:00
Barak Michener
27c810a97e
Basic protos for ray client ( #11762 )
2020-11-05 16:23:54 -08:00
Eric Liang
f86c4f992c
Fix RAY_ENABLE_NEW_SCHEDULER=1 pytest test_advanced_2.py::test_zero_cpus_actor ( #11817 )
2020-11-05 16:02:04 -08:00
SangBin Cho
3cd1d7f44a
[Metrics] Implement basic metrics changes ( #11769 )
...
* Implement basic metrics changes
* Addressed code review.
* Fix build issue.
* Fix build issue.
2020-11-05 11:07:05 -08:00
Tao Wang
612ddb2dd1
[GCS]Open light heartbeat by default ( #11689 )
2020-11-05 12:11:00 +08:00
DK.Pino
50110b934c
[Placement Group]Enhance create placement group java api ( #11702 )
...
* enhance create pg java api
* add state for PlacementGroup
* fix comment
* move default pg
* make default pg name private
* add bundle size and bundle resource size check when placement group create
2020-11-05 09:59:36 +08:00
Stephanie Wang
952b71dc94
Fix windows build ( #11786 )
2020-11-03 12:38:45 -05:00
Stephanie Wang
0ba777af99
[Object spilling] Add policy to automatically spill objects on OutOfMemory ( #11673 )
2020-11-02 12:42:02 -08:00
Ameer Haj Ali
8d74a04a42
[autoscaler] Flag flip for resource_demand_scheduler should take into account queue ( #11615 )
2020-11-02 12:41:22 -08:00
fangfengbin
4a7d0e059d
[GCS]Optimize subscription perf ( #11669 )
...
* [GCS]Optimize subscription perf
* fix review comment
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-11-02 09:46:04 -08:00
Eric Liang
48dee789b3
Add random actor placement; fix cancellation callback; update test skips ( #11684 )
2020-10-30 18:36:35 -07:00
DK.Pino
b10871a1f5
[Core]Fix get workrer table bug ( #11516 )
...
* fix get_worker_table bug
* fix lint
* fix comment
* remove actor table
* fix comment
* fix get alive worker
* remove unused python import
2020-10-30 14:48:29 -07:00