dHannasch
|
cfefd7c70e
|
Test PingPort (#12954)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
|
2020-12-17 21:15:42 -08:00 |
|
DK.Pino
|
6404f1e609
|
[Placement Group][New scheduler] New scheduler pg implementation (#12910)
|
2020-12-18 11:56:45 +08:00 |
|
Tao Wang
|
17152c84a7
|
[Tiny]Print raylet info after register (#12566)
|
2020-12-18 11:22:13 +08:00 |
|
dHannasch
|
d747071dd9
|
Test shard_context on already-created boost::asio::io_service. (#12917)
|
2020-12-17 14:26:30 -08:00 |
|
Allen
|
e6cb4f4bd7
|
[Core] Add log of address and port (#12908)
Co-authored-by: Allen Yin <allenyin@anyscale.io>
|
2020-12-17 00:25:29 -08:00 |
|
Yi Cheng
|
40032541dc
|
[core] Introduce fetch_local to ray.wait (#12526)
|
2020-12-16 23:44:28 -08:00 |
|
Tao Wang
|
12231ec2a6
|
Optimize heartbeat manager initialization (#12911)
|
2020-12-17 14:24:23 +08:00 |
|
SangBin Cho
|
057687e534
|
[New Scheduler] Fix test_failure.py by supporting infeasible tasks (#12738)
* Fix the first issue.
* ip
* In Progress.
* In progress.
* done.
* Remove unnecessary logs.
* Addressed code review + fix some test failures.
* Try fixing issues.
* Fix issues.
* Fix test issues.
* Fix issues.
* done.
|
2020-12-16 21:27:50 -08:00 |
|
Alex Wu
|
8b783ecafa
|
Fix pull manager retry (#12907)
|
2020-12-16 14:18:43 -08:00 |
|
fangfengbin
|
91878d18b5
|
[PlacementGroup]Fix placement group wait api disorder bug (#12827)
* [PlacementGroup]Fix placment group wait api disorder bug
* fix review comment
* fix review comment
* fix review comment
* fix review comments
* increase num_heartbeats_timeout
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
|
2020-12-16 18:45:53 +08:00 |
|
Eric Liang
|
7ff314a5df
|
[New scheduler] Also unsubscribe get dependencies on unblock
|
2020-12-15 20:29:44 -08:00 |
|
Alex Wu
|
0031723ace
|
[New scheduler] Object spilling (#12857)
|
2020-12-15 11:05:38 -08:00 |
|
Edward Oakes
|
261b2f9053
|
Check for raylet PID as ppid in dashboard agent fate-sharing (#12867)
|
2020-12-15 12:13:11 -06:00 |
|
Max Fitton
|
e077bc4206
|
[Release] Bump master to 1.2.0 for 1.1.0 release (#12856)
|
2020-12-15 09:40:26 -08:00 |
|
Simon Mo
|
b291dd4486
|
[Metrics] Call GetMeasureDoubleByName to prevent override (#12860)
|
2020-12-15 09:39:39 -08:00 |
|
fangfengbin
|
43b9259d40
|
[GCS]GCS resource manager support scheduling resource (#12780)
* add part code
* add part code
* fix review comments
* rebase master
* add part code
* add part code
* fix review comments
* add part code
* fix code style
* fix ut bug
* fix ut bug
* fix review comments
* fix review comment
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
|
2020-12-15 10:27:55 +08:00 |
|
Tao Wang
|
ac53e2f857
|
[GCS]Tell dead nodes to commit suicide (#12792)
* [GCS]Tell dead nodes to commit suicide
* fix comment, add ut
|
2020-12-14 11:42:00 -08:00 |
|
Tao Wang
|
35f7d84dbe
|
Revert heartbeat interval to keep ci stable (#12836)
* Revert heartbeat interval to keep ci stable
* fix missing one
|
2020-12-14 16:58:40 +08:00 |
|
fangfengbin
|
1e02b28abe
|
[GCS]Move node resource info to gcs resource manager (#12775)
* add part code
* add part code
* fix review comments
* fix ut bug
* rebase master
* add part code
* fix ut bug
* fix ut bug
* fix review comments
* fix review comment
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
|
2020-12-13 20:37:34 +08:00 |
|
DK.Pino
|
153b24746c
|
[Placement Group] Refactor pg resource constrain in node manager (#12538)
* first version by pointer
* second version reference
* clean up
* add cpp ut
* lint
* extract LocalPlacementGroupManagerInterface
* lint
* fix commemt
* add idempotency test
* lint
* fix pg ut
* fix pg ut
* python lint
* fix pg ut timeout
* python lint
* fix comment
* lint
* lint
|
2020-12-12 23:32:15 -08:00 |
|
Eric Liang
|
b73d4831d4
|
Add grace period before warning of resource deadlock
|
2020-12-12 12:02:13 -08:00 |
|
fangfengbin
|
c22990a537
|
[GCS]GCS node manager rename GetNode to GetAliveNode (#12781)
|
2020-12-12 20:34:43 +08:00 |
|
Alex Wu
|
aa64cd4534
|
[New scheduler] Fix test_global_state (#12586)
|
2020-12-11 21:47:01 -08:00 |
|
Eric Liang
|
1ce745cf44
|
Add automatic local GC and plasma debug logs every 10 minutes by default (#12804)
|
2020-12-11 17:09:58 -08:00 |
|
Alex Wu
|
676ec363f6
|
[Object Manager] Pull Manager refactor (#12335)
|
2020-12-11 11:56:23 -08:00 |
|
Eric Liang
|
4ad4463be6
|
Add comments to clarify purpose of new scheduler queues (#12730)
* update
* clarify
* update
|
2020-12-11 11:53:09 -08:00 |
|
Tao Wang
|
295b6e5ce4
|
Split heartbeat message (#12535)
* first
* xxx
* Split heartbeat message
* only report resource usage when changed
* Fix GetAllResourceUsage
* Fix report resource usage
* Increase default heartbeat interval
* regularize heartbeat interval in test case
|
2020-12-11 21:19:57 +08:00 |
|
Stephanie Wang
|
86b0741026
|
[new scheduler] Allocate resources for spilled back task to a local view of the remote node (#12711)
* Force report heartbeats if remote resources may be dirty
* lint
* typo
* typo
* unit test
* debug
* Revert "lint"
This reverts commit 6dc7e982ffee98185665eb7c3c8fde0d91938919.
* Revert "Force report heartbeats if remote resources may be dirty"
This reverts commit cbfa9405197df62f874107d55b46715ceae2abd2.
* Local view of resources
* debug travis
* debug
* debug
* debug
* weaken test
* cleanups
* lint
* Revert "debug travis"
This reverts commit 11ff5f4f84e64e9fbd4eecda5b3c7fd07a7130a4.
* revert
* const view, remove unused
|
2020-12-10 22:43:29 -05:00 |
|
Barak Michener
|
b7f246c451
|
[ray_client] Include multiple facets of the Ray API (#12736)
|
2020-12-10 19:09:34 -08:00 |
|
Edward Oakes
|
62d6b0a558
|
Fix max_task_retries for named actors (#12762)
|
2020-12-10 18:24:55 -06:00 |
|
Kai Yang
|
e3b5deb741
|
[Multi-tenancy] Delete flag enable_multi_tenancy and remove old code path (#10573)
|
2020-12-10 19:01:40 +08:00 |
|
Stephanie Wang
|
a776209aec
|
Revert "Fix dashboard agent check ppid is raylet pid (#12256)" (#12729)
This reverts commit 3ce9286977 .
|
2020-12-09 17:20:38 -05:00 |
|
dHannasch
|
d455cae036
|
Add period to error message. (#12716)
|
2020-12-09 15:58:21 -06:00 |
|
Keqiu Hu
|
ee012532fb
|
[core] Use node manager client pool for GCS service #10398 (#12368)
* raylet client pool
* Fix merging conflict
* Fix documentation typo
* fix linting
* address comments
* fix typo
* remove unintended logging
* address comments
* fix bazel file lint error
|
2020-12-09 12:44:40 -08:00 |
|
Alex Wu
|
0b6e44efb8
|
[New scheduler] Cluster Resource Scheduler dynamic resources (for placement groups) (#12518)
* prepare implemented
* dynamic resources
* .
* commit
* .
* .
* Still needs to be cleaned up
* Passes basic tests + cleanup
* .
* .
* .
* Apply suggestions from code review
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
* fix
* lint
Co-authored-by: Alex <alex@anyscale.com>
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
|
2020-12-09 12:05:31 -08:00 |
|
fangfengbin
|
ef9ebbc636
|
[GCS]GCS based Actor Scheduling support actor colocation (#12707)
* [GCS]GCS based Actor Scheduling support actor colocation
* fix review comment
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
|
2020-12-09 11:54:23 -08:00 |
|
fyrestone
|
3ce9286977
|
Fix dashboard agent check ppid is raylet pid (#12256)
* Dashboard agent check ppid is raylet pid
* Improve implementation
* Refine code
* Make the RAY_NODE_PID environment required for dashboard agent
Co-authored-by: 刘宝 <po.lb@antfin.com>
|
2020-12-09 09:12:34 -05:00 |
|
Stephanie Wang
|
840de49161
|
Fix race condition between failure detection and references going out of scope (#12573)
* fix
* lint
* fix initialization
|
2020-12-08 23:49:55 -08:00 |
|
Stephanie Wang
|
50f28811ac
|
[new scheduler] Always spill back to a feasible node if the local node is not feasible (#12557)
* fix
lint
* feasible nodes
* Enable test, cleanup
* Revert "fix"
This reverts commit aef81d04c0b4560b758f846e1afdafbdb5552efe.
* unit test
* doc
|
2020-12-08 13:46:58 -05:00 |
|
fangfengbin
|
93c0eb249c
|
[PlacementGroup]Support acquire and return bundle resource from gcs resource manager (#12349)
|
2020-12-08 10:29:57 +08:00 |
|
fangfengbin
|
7e1422e925
|
[PlacementGroup]Fix placement group strict spread bug when node dead (#12647)
* [PlacementGroup]Fix strict spread bug when node dead
* fix ut bug
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
|
2020-12-07 21:50:28 +08:00 |
|
Philipp Moritz
|
73a1a232b9
|
Ray debugger stepping between tasks (#12075)
|
2020-12-06 21:50:18 -08:00 |
|
fangfengbin
|
260b07cf0c
|
[PlacementGroup]Add PlacementGroup wait java api (#12499)
* add part code
* add part code
* add part code
* add part code
* fix review comments
* fix compile bug
* fix compile bug
* fix review comments
* fix review comments
* fix code style
* add part code
* fix review comments
* fix review comments
* fix code style
* rebase master
* fix bug
* fix lint error
* fix compile bug
* fix newline issue
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
|
2020-12-05 16:40:04 +08:00 |
|
SangBin Cho
|
0138c2dbb4
|
[Metrics] Remove redundant unit specification. (#12595)
|
2020-12-04 00:06:21 -08:00 |
|
Kai Yang
|
21fcee28f9
|
[Java] Simplify Ray.init() by invoking ray start internally (#10762)
|
2020-12-04 14:33:45 +08:00 |
|
fangfengbin
|
ff34563539
|
[PlacementGroup]Fix bug that kill workers mistakenly when gcs restarts (#12568)
|
2020-12-03 17:50:48 +08:00 |
|
Stephanie Wang
|
443339ab19
|
[core] Move out-of-memory handling into the plasma store and support async object creation (#12186)
* Refactor to extract creation request queue
* timer on oom
* move timer out
* Move evict_if_full and on_store_full into plasma store
* Remove client-side code
* revert
* Distinguish between transient and permanent OOM delays
* update
* Move out create request queue, unit test
* unit test
* Fix max retries
* test
* Do not pin restored objects
* First pass to add polling requests, unit test passes
* worker plasma client retries plasma requests
* cleanup
* Clean up after disconnected clients, check memory leaks
* Support immediate requests in request queue
* Option to try creating immediately
* lint
* Fix build, address comments
* doc
* fixes
* debug travis
* debug
* debug
* debug
* debug
* Revert "debug"
This reverts commit 6bf2f6ee5640e71630c4aecdb7ebf54911ea32db.
Revert "debug"
This reverts commit 73017099c9b06cdaae1217bf0e0f4d23ed68a9e5.
Revert "debug"
This reverts commit 5a155529e28cee9461a598b0cdf7b6a3cc194c93.
Revert "debug"
This reverts commit b50c2101afd45d4cf663daae857bfe1b40387703.
Revert "debug travis"
This reverts commit 012b8721dedf9bca46294ae75eee2815b160368b.
* Skip if new scheduler enabled
* error message
* merge
|
2020-12-02 13:25:54 -05:00 |
|
Ian Rodney
|
786f839ff3
|
[Windows] Fix windows build (#12555)
* fix remote watch
* remove const
* unfix remote-watch
* format
|
2020-12-02 09:37:40 -08:00 |
|
Kai Fricke
|
0a12eba603
|
Revert "Fix race condition between failure detection and references going out of scope (#12548)" (#12570)
This reverts commit 8801e87a
|
2020-12-02 10:20:17 -05:00 |
|
Stephanie Wang
|
8801e87afd
|
Fix race condition between failure detection and references going out of scope (#12548)
* fix
* lint
|
2020-12-01 20:52:30 -05:00 |
|