Eric Liang
|
3e492a79ec
|
Increase the number of unique bits for actors to avoid handle collisions (#12894)
|
2020-12-18 15:59:03 -08:00 |
|
Eric Liang
|
92812f2e8a
|
Implement resource deadlock detection for new scheduler (#12961)
|
2020-12-18 12:17:54 -08:00 |
|
Barak Michener
|
5cfa1934e4
|
[ray_client]: Implement object retain/release and Data Streaming API (#12818)
|
2020-12-18 11:47:38 -08:00 |
|
fangfengbin
|
a442cd17e0
|
[GCS]Optimize gcs client reconnection (#12878)
* [GCS]Optimize gcs client reconnection
* fix review comment
* fix review comment
* add part code
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
|
2020-12-17 21:57:37 -08:00 |
|
dHannasch
|
cfefd7c70e
|
Test PingPort (#12954)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
|
2020-12-17 21:15:42 -08:00 |
|
DK.Pino
|
6404f1e609
|
[Placement Group][New scheduler] New scheduler pg implementation (#12910)
|
2020-12-18 11:56:45 +08:00 |
|
Tao Wang
|
17152c84a7
|
[Tiny]Print raylet info after register (#12566)
|
2020-12-18 11:22:13 +08:00 |
|
dHannasch
|
d747071dd9
|
Test shard_context on already-created boost::asio::io_service. (#12917)
|
2020-12-17 14:26:30 -08:00 |
|
Allen
|
e6cb4f4bd7
|
[Core] Add log of address and port (#12908)
Co-authored-by: Allen Yin <allenyin@anyscale.io>
|
2020-12-17 00:25:29 -08:00 |
|
Yi Cheng
|
40032541dc
|
[core] Introduce fetch_local to ray.wait (#12526)
|
2020-12-16 23:44:28 -08:00 |
|
Tao Wang
|
12231ec2a6
|
Optimize heartbeat manager initialization (#12911)
|
2020-12-17 14:24:23 +08:00 |
|
SangBin Cho
|
057687e534
|
[New Scheduler] Fix test_failure.py by supporting infeasible tasks (#12738)
* Fix the first issue.
* ip
* In Progress.
* In progress.
* done.
* Remove unnecessary logs.
* Addressed code review + fix some test failures.
* Try fixing issues.
* Fix issues.
* Fix test issues.
* Fix issues.
* done.
|
2020-12-16 21:27:50 -08:00 |
|
Alex Wu
|
8b783ecafa
|
Fix pull manager retry (#12907)
|
2020-12-16 14:18:43 -08:00 |
|
fangfengbin
|
91878d18b5
|
[PlacementGroup]Fix placement group wait api disorder bug (#12827)
* [PlacementGroup]Fix placment group wait api disorder bug
* fix review comment
* fix review comment
* fix review comment
* fix review comments
* increase num_heartbeats_timeout
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
|
2020-12-16 18:45:53 +08:00 |
|
Eric Liang
|
7ff314a5df
|
[New scheduler] Also unsubscribe get dependencies on unblock
|
2020-12-15 20:29:44 -08:00 |
|
Alex Wu
|
0031723ace
|
[New scheduler] Object spilling (#12857)
|
2020-12-15 11:05:38 -08:00 |
|
Edward Oakes
|
261b2f9053
|
Check for raylet PID as ppid in dashboard agent fate-sharing (#12867)
|
2020-12-15 12:13:11 -06:00 |
|
Max Fitton
|
e077bc4206
|
[Release] Bump master to 1.2.0 for 1.1.0 release (#12856)
|
2020-12-15 09:40:26 -08:00 |
|
Simon Mo
|
b291dd4486
|
[Metrics] Call GetMeasureDoubleByName to prevent override (#12860)
|
2020-12-15 09:39:39 -08:00 |
|
fangfengbin
|
43b9259d40
|
[GCS]GCS resource manager support scheduling resource (#12780)
* add part code
* add part code
* fix review comments
* rebase master
* add part code
* add part code
* fix review comments
* add part code
* fix code style
* fix ut bug
* fix ut bug
* fix review comments
* fix review comment
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
|
2020-12-15 10:27:55 +08:00 |
|
Tao Wang
|
ac53e2f857
|
[GCS]Tell dead nodes to commit suicide (#12792)
* [GCS]Tell dead nodes to commit suicide
* fix comment, add ut
|
2020-12-14 11:42:00 -08:00 |
|
Tao Wang
|
35f7d84dbe
|
Revert heartbeat interval to keep ci stable (#12836)
* Revert heartbeat interval to keep ci stable
* fix missing one
|
2020-12-14 16:58:40 +08:00 |
|
fangfengbin
|
1e02b28abe
|
[GCS]Move node resource info to gcs resource manager (#12775)
* add part code
* add part code
* fix review comments
* fix ut bug
* rebase master
* add part code
* fix ut bug
* fix ut bug
* fix review comments
* fix review comment
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
|
2020-12-13 20:37:34 +08:00 |
|
DK.Pino
|
153b24746c
|
[Placement Group] Refactor pg resource constrain in node manager (#12538)
* first version by pointer
* second version reference
* clean up
* add cpp ut
* lint
* extract LocalPlacementGroupManagerInterface
* lint
* fix commemt
* add idempotency test
* lint
* fix pg ut
* fix pg ut
* python lint
* fix pg ut timeout
* python lint
* fix comment
* lint
* lint
|
2020-12-12 23:32:15 -08:00 |
|
Eric Liang
|
b73d4831d4
|
Add grace period before warning of resource deadlock
|
2020-12-12 12:02:13 -08:00 |
|
fangfengbin
|
c22990a537
|
[GCS]GCS node manager rename GetNode to GetAliveNode (#12781)
|
2020-12-12 20:34:43 +08:00 |
|
Alex Wu
|
aa64cd4534
|
[New scheduler] Fix test_global_state (#12586)
|
2020-12-11 21:47:01 -08:00 |
|
Eric Liang
|
1ce745cf44
|
Add automatic local GC and plasma debug logs every 10 minutes by default (#12804)
|
2020-12-11 17:09:58 -08:00 |
|
Alex Wu
|
676ec363f6
|
[Object Manager] Pull Manager refactor (#12335)
|
2020-12-11 11:56:23 -08:00 |
|
Eric Liang
|
4ad4463be6
|
Add comments to clarify purpose of new scheduler queues (#12730)
* update
* clarify
* update
|
2020-12-11 11:53:09 -08:00 |
|
Tao Wang
|
295b6e5ce4
|
Split heartbeat message (#12535)
* first
* xxx
* Split heartbeat message
* only report resource usage when changed
* Fix GetAllResourceUsage
* Fix report resource usage
* Increase default heartbeat interval
* regularize heartbeat interval in test case
|
2020-12-11 21:19:57 +08:00 |
|
Stephanie Wang
|
86b0741026
|
[new scheduler] Allocate resources for spilled back task to a local view of the remote node (#12711)
* Force report heartbeats if remote resources may be dirty
* lint
* typo
* typo
* unit test
* debug
* Revert "lint"
This reverts commit 6dc7e982ffee98185665eb7c3c8fde0d91938919.
* Revert "Force report heartbeats if remote resources may be dirty"
This reverts commit cbfa9405197df62f874107d55b46715ceae2abd2.
* Local view of resources
* debug travis
* debug
* debug
* debug
* weaken test
* cleanups
* lint
* Revert "debug travis"
This reverts commit 11ff5f4f84e64e9fbd4eecda5b3c7fd07a7130a4.
* revert
* const view, remove unused
|
2020-12-10 22:43:29 -05:00 |
|
Barak Michener
|
b7f246c451
|
[ray_client] Include multiple facets of the Ray API (#12736)
|
2020-12-10 19:09:34 -08:00 |
|
Edward Oakes
|
62d6b0a558
|
Fix max_task_retries for named actors (#12762)
|
2020-12-10 18:24:55 -06:00 |
|
Kai Yang
|
e3b5deb741
|
[Multi-tenancy] Delete flag enable_multi_tenancy and remove old code path (#10573)
|
2020-12-10 19:01:40 +08:00 |
|
Stephanie Wang
|
a776209aec
|
Revert "Fix dashboard agent check ppid is raylet pid (#12256)" (#12729)
This reverts commit 3ce9286977 .
|
2020-12-09 17:20:38 -05:00 |
|
dHannasch
|
d455cae036
|
Add period to error message. (#12716)
|
2020-12-09 15:58:21 -06:00 |
|
Keqiu Hu
|
ee012532fb
|
[core] Use node manager client pool for GCS service #10398 (#12368)
* raylet client pool
* Fix merging conflict
* Fix documentation typo
* fix linting
* address comments
* fix typo
* remove unintended logging
* address comments
* fix bazel file lint error
|
2020-12-09 12:44:40 -08:00 |
|
Alex Wu
|
0b6e44efb8
|
[New scheduler] Cluster Resource Scheduler dynamic resources (for placement groups) (#12518)
* prepare implemented
* dynamic resources
* .
* commit
* .
* .
* Still needs to be cleaned up
* Passes basic tests + cleanup
* .
* .
* .
* Apply suggestions from code review
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
* fix
* lint
Co-authored-by: Alex <alex@anyscale.com>
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
|
2020-12-09 12:05:31 -08:00 |
|
fangfengbin
|
ef9ebbc636
|
[GCS]GCS based Actor Scheduling support actor colocation (#12707)
* [GCS]GCS based Actor Scheduling support actor colocation
* fix review comment
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
|
2020-12-09 11:54:23 -08:00 |
|
fyrestone
|
3ce9286977
|
Fix dashboard agent check ppid is raylet pid (#12256)
* Dashboard agent check ppid is raylet pid
* Improve implementation
* Refine code
* Make the RAY_NODE_PID environment required for dashboard agent
Co-authored-by: 刘宝 <po.lb@antfin.com>
|
2020-12-09 09:12:34 -05:00 |
|
Stephanie Wang
|
840de49161
|
Fix race condition between failure detection and references going out of scope (#12573)
* fix
* lint
* fix initialization
|
2020-12-08 23:49:55 -08:00 |
|
Stephanie Wang
|
50f28811ac
|
[new scheduler] Always spill back to a feasible node if the local node is not feasible (#12557)
* fix
lint
* feasible nodes
* Enable test, cleanup
* Revert "fix"
This reverts commit aef81d04c0b4560b758f846e1afdafbdb5552efe.
* unit test
* doc
|
2020-12-08 13:46:58 -05:00 |
|
fangfengbin
|
93c0eb249c
|
[PlacementGroup]Support acquire and return bundle resource from gcs resource manager (#12349)
|
2020-12-08 10:29:57 +08:00 |
|
fangfengbin
|
7e1422e925
|
[PlacementGroup]Fix placement group strict spread bug when node dead (#12647)
* [PlacementGroup]Fix strict spread bug when node dead
* fix ut bug
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
|
2020-12-07 21:50:28 +08:00 |
|
Philipp Moritz
|
73a1a232b9
|
Ray debugger stepping between tasks (#12075)
|
2020-12-06 21:50:18 -08:00 |
|
fangfengbin
|
260b07cf0c
|
[PlacementGroup]Add PlacementGroup wait java api (#12499)
* add part code
* add part code
* add part code
* add part code
* fix review comments
* fix compile bug
* fix compile bug
* fix review comments
* fix review comments
* fix code style
* add part code
* fix review comments
* fix review comments
* fix code style
* rebase master
* fix bug
* fix lint error
* fix compile bug
* fix newline issue
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
|
2020-12-05 16:40:04 +08:00 |
|
SangBin Cho
|
0138c2dbb4
|
[Metrics] Remove redundant unit specification. (#12595)
|
2020-12-04 00:06:21 -08:00 |
|
Kai Yang
|
21fcee28f9
|
[Java] Simplify Ray.init() by invoking ray start internally (#10762)
|
2020-12-04 14:33:45 +08:00 |
|
fangfengbin
|
ff34563539
|
[PlacementGroup]Fix bug that kill workers mistakenly when gcs restarts (#12568)
|
2020-12-03 17:50:48 +08:00 |
|