dHannasch
29cb32539e
[Core] If failed to connect to redis, try to say why. ( #11916 )
2020-11-10 18:22:10 -08:00
fangfengbin
433e4f32da
[GCS]Reduce get operations of worker table ( #11599 )
...
* [GCS]Reduce get operations of worker table
* fix ut bug
* fix ut bug
* fix review comment
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-11-10 18:11:25 -08:00
Eric Liang
46f3652102
Remove repeat push timeout from object manager ( #11874 )
2020-11-10 16:26:53 -08:00
fangfengbin
543f7809a6
[GCS]Add gcs dump log(Part1) ( #11727 )
...
* add part code
* fix compile bug
* Fix bug
* Add part code
* fix review comment
* fix review comment
* fix lint error
* fix review comment
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2020-11-10 14:10:03 +08:00
Eric Liang
ee2da0cf45
[Core] PushManager for reliable broadcast ( #11869 )
2020-11-09 18:01:47 -08:00
Kai Yang
904f48ebd9
[Core] Multi-tenancy: Pass job ID from Raylet to worker via env variable ( #11829 )
...
* Pass job ID from Raylet to worker via env variable
* fix
* fix
* fix
* lint
* fix
* fix test_object_spilling
* address comments
* lint
* fix
2020-11-09 11:02:15 -08:00
Tao Wang
77e3163630
[GCS]Only pass node id to node failure detector ( #11886 )
...
* [GCS]Only pass node id to node failure detector
* rename
2020-11-09 10:52:33 -08:00
fangfengbin
407a212816
[GCS]Fix TestActorTableResubscribe bug ( #11830 )
...
* fix compile bug
* [GCS]Fix TestActorTableResubscribe bug
* rm unused code
* fix lint error
* fix review comment
* fix ut bug
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-11-08 23:50:05 -08:00
Stephanie Wang
61e41257e7
[Object spilling] Queue failed object creation requests until objects have been spilled ( #11796 )
...
* Queue creation requests
* Cleanup disconnected clients
* Remove unused
* todo
* FIFO order for create requests, remove warmup for IO workers
* test and lint
* disable test
* lint
* Skip on windows
2020-11-06 18:22:19 -05:00
SangBin Cho
e0ecf5d79d
Revert "[GCS]Open light heartbeat by default ( #11689 )" ( #11861 )
...
This reverts commit 612ddb2dd1
.
2020-11-06 14:34:59 -08:00
Barak Michener
27c810a97e
Basic protos for ray client ( #11762 )
2020-11-05 16:23:54 -08:00
Eric Liang
f86c4f992c
Fix RAY_ENABLE_NEW_SCHEDULER=1 pytest test_advanced_2.py::test_zero_cpus_actor ( #11817 )
2020-11-05 16:02:04 -08:00
SangBin Cho
3cd1d7f44a
[Metrics] Implement basic metrics changes ( #11769 )
...
* Implement basic metrics changes
* Addressed code review.
* Fix build issue.
* Fix build issue.
2020-11-05 11:07:05 -08:00
Tao Wang
612ddb2dd1
[GCS]Open light heartbeat by default ( #11689 )
2020-11-05 12:11:00 +08:00
DK.Pino
50110b934c
[Placement Group]Enhance create placement group java api ( #11702 )
...
* enhance create pg java api
* add state for PlacementGroup
* fix comment
* move default pg
* make default pg name private
* add bundle size and bundle resource size check when placement group create
2020-11-05 09:59:36 +08:00
Stephanie Wang
952b71dc94
Fix windows build ( #11786 )
2020-11-03 12:38:45 -05:00
Stephanie Wang
0ba777af99
[Object spilling] Add policy to automatically spill objects on OutOfMemory ( #11673 )
2020-11-02 12:42:02 -08:00
Ameer Haj Ali
8d74a04a42
[autoscaler] Flag flip for resource_demand_scheduler should take into account queue ( #11615 )
2020-11-02 12:41:22 -08:00
fangfengbin
4a7d0e059d
[GCS]Optimize subscription perf ( #11669 )
...
* [GCS]Optimize subscription perf
* fix review comment
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-11-02 09:46:04 -08:00
Eric Liang
48dee789b3
Add random actor placement; fix cancellation callback; update test skips ( #11684 )
2020-10-30 18:36:35 -07:00
DK.Pino
b10871a1f5
[Core]Fix get workrer table bug ( #11516 )
...
* fix get_worker_table bug
* fix lint
* fix comment
* remove actor table
* fix comment
* fix get alive worker
* remove unused python import
2020-10-30 14:48:29 -07:00
SangBin Cho
6e2a1eac36
[Placement Group] Placement group automatic cleanup. ( #11546 )
...
* In progress. Done with all placement group manager code.
* It is working with job.
* Finished detached actor implementation.
* Fix minor issue.
* In progress.
* Addressed code review.
* Addressed code review.
* Addressed code reivew.
* Fix a build error.
2020-10-30 10:55:43 -07:00
Alex Wu
e022d12dc3
[New scheduler] Deflake test heartbeat ( #11586 )
...
* defleked
* lint
* .
* Update cluster_task_manager_test.cc
Co-authored-by: Alex Wu <alex@anyscale.com>
2020-10-29 23:10:19 -07:00
architkulkarni
4175569d96
[Core] Add option to override environment variables for tasks and actors ( #11619 )
2020-10-29 14:22:44 -05:00
Simon Mo
e82ff08b0c
Fix asyncio plasma integration in cluster mode ( #11665 )
2020-10-29 11:53:10 -07:00
Lingxuan Zuo
0b7a3d9e02
[Log] new spdlog tool for ray ( #10967 )
...
* spdlog support
* fatal abort for spdlog
* print all logs in stderr if no logger given
* fix log test
* install signal handler for spdlog by reusing glog lib
* fix lint
* Avoid duplicated dump
* log rotation and fmt comments
* fix
2020-10-29 11:37:13 -07:00
Tao Wang
1d5694ddea
[GCS]Use direct getting instead of pub-sub to update load metrics in monitor.py ( #11339 )
2020-10-28 11:23:18 -07:00
Eric Liang
c933477915
[new scheduler] Pass test_basic and add CI builds with flag on ( #11635 )
2020-10-28 11:02:43 -07:00
Stephanie Wang
427b5af0ae
[Object spilling] Refactor raylet to add a local object manager class ( #11647 )
...
* Fix pytest...
* Release objects that have been spilled
* GCS object table interface refactor
* Add spilled URL to object location info
* refactor to include spilled URL in notifications
* improve tests
* Add spilled URL to object directory results
* Remove force restore call
* Merge spilled URL and location
* fix
* tmp
* refactor
* unit test skeleton
* unit testing
* unit test fixes
* cleanup
* cleanup
* update
* Separate pinning from waiting for object free, fixes pytest
* Update src/ray/raylet/local_object_manager.h
Co-authored-by: Eric Liang <ekhliang@gmail.com>
Co-authored-by: Tyler Westenbroek <westenbroekt@berkeley.edu>
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-10-28 10:38:42 -04:00
fyrestone
05ad4c7499
[Dashboard] Optimize dashboard datacenter ( #11391 )
...
* Optimize dashboard datacenter
* Fix tests
* Fix tests
* Fix
* Fix CI
* python/build-wheel-macos.sh
Co-authored-by: 刘宝 <po.lb@antfin.com>
Co-authored-by: Max Fitton <maxfitton@anyscale.com>
2020-10-27 23:49:31 -07:00
fangfengbin
55a090fb16
[GCS]Optimize gcs client nodes get function ( #11424 )
...
* [GCS]Optimize gcs client nodes get function
* fix review comment
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-10-27 21:13:19 -07:00
Tao Wang
273a712786
[GCS]Decouple node failure detector with resoure related operations ( #11465 )
2020-10-27 15:52:42 -07:00
fangfengbin
ebe9a8865c
[GCS]Fix a bug that creates invalid connection ( #11590 )
...
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-10-27 10:08:06 -07:00
Ian Rodney
2da6ad2176
[core] Better error message for named actor not found ( #11604 )
2020-10-26 09:46:02 -07:00
Tao Wang
0fbee4da0c
[GCS] Remove unused ReportBatchHeartbeat/SubscribeHeartbeat ( #11567 )
...
* Remove unused message ReportBatchHeartbeat
* add up
2020-10-25 21:06:28 -07:00
Eric Liang
d3ee83205b
Remove crashing assert in actor creation for old scheduler ( #11577 )
...
* remove assert
* warn log
2020-10-24 00:05:26 -07:00
DK.Pino
9f804ade5f
[Placement Group]Add get all placement group api ( #11460 )
...
* add get all interface for placement group
* add get all interface for placement group
* make it work
* fix lint
* fix lint
* fix comment
* add cpp test
* fix python lint
2020-10-23 11:46:48 -07:00
Alex Wu
e02f4c0157
[New scheduler] queue by shape ( #11381 )
2020-10-21 15:56:06 -07:00
Edward Oakes
5d7f271e7d
Add --worker-port-list option to ray start ( #11481 )
2020-10-21 14:46:45 -05:00
Tao Wang
da2d3fbcfc
Remove unused field in heartbeat message ( #11459 )
2020-10-21 10:49:16 -07:00
Kai Yang
078a22d676
[Core] Allow creating tasks/actors in a detached actor when driver has exited ( #11493 )
...
* Allow creating tasks/actors in a detached actor when driver has exited
* lint
* Address comment
2020-10-21 10:45:29 -07:00
Xuxue1
7200ddb72d
Fix code_search_path failed in java ( #11406 )
...
Co-authored-by: xujiqiang eigen <xujiqiang@hpc1.ipa.aidigger.com>
2020-10-21 18:10:48 +08:00
fangfengbin
a075e37695
[GCS]Fix TestActorTableResubscribe bug ( #11463 )
...
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-10-20 22:32:41 -07:00
Lingxuan Zuo
aed739fbf4
[Log] Ignore callstacktrace test for windows ( #11413 )
2020-10-20 15:21:29 +08:00
DK.Pino
1b3b009f7a
[PlacementGroup]Add guarded by in placement group scheduler ut ( #11306 )
...
* add GUARDED_BY for success_placement_groups_ and failure_placement_groups_ vector
* update lint
* update lint
* update logical
* update lint
* change int to unsigned int
* update lint
* rename vector_mutex_ to placement_group_requests_mutex_
* resolve comment
* add int() for windows
2020-10-19 18:54:35 -07:00
fangfengbin
da89cb19eb
[GCS]Fix node info idempotent bug ( #11423 )
...
* [GCS]Fix node info idempotent bug
* Fix review comment
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-10-19 10:23:33 +08:00
SangBin Cho
666fcde8ca
[Placement group] Input validation ( #11152 )
...
* Add a basic input validation.
* Addressed code review.
2020-10-14 13:56:41 -07:00
SangBin Cho
b1481c6acf
Revert "[PlacementGroup]Add node manager test framework ( #11174 )" ( #11398 )
...
This reverts commit 241e765d3a
.
2020-10-14 11:09:20 -07:00
Lingxuan Zuo
149ec5f6bf
[Log] dump stacktrace from glog lib ( #11360 )
...
* dump stacktrace from glog lib
* fix windows compile
* add comments for getcallstack
2020-10-14 10:52:12 -07:00
Kai Yang
abc6126814
[Java] Release actor instance reference when Ray.exitActor()
is invoked ( #11324 )
2020-10-14 13:12:59 +08:00