Alex Wu
136c8ff19e
[NewScheduler] Pass test_basic.py ( #10059 )
...
* .
* .
* Cleanup
* .
* whoops
* Update src/ray/raylet/scheduling/cluster_task_manager.h
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
* Update src/ray/raylet/scheduling/cluster_task_manager.h
Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>
* CR
* .
* .
* done
* .
* Unit tests
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>
2020-08-21 15:00:08 -07:00
Barak Michener
f03caa4532
rpc: Follow-up by sharing the core worker client pool within the core worker. ( #10206 )
...
* Share CoreWorkerClientPool
* Format
2020-08-21 11:01:22 -07:00
Stephanie Wang
85e57a7a98
[Object spilling] Look up the location of the primary raylet from the owner's metadata ( #10197 )
...
* Get the primary copy from the owner, python test, some node manager fixes
* fixes and todo
* update
* lint
* fix build
2020-08-20 14:46:59 -07:00
fangfengbin
a462ae2747
[Placement Group]Add strict spread strategy ( #10174 )
...
* support STRICT_SPREAD strategy
* fix review comments
* rebase master
* fix lint error
* fix lint error
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-08-20 10:18:58 -07:00
SangBin Cho
224933b5e4
[Placement Group] Remove API part 2 ( #10215 )
...
* Initial progress done.
* Fix mistake.
* Addressed code review.
* Fix cpp build issue.
* Addressed code review.
2020-08-20 09:50:13 -07:00
fangfengbin
9734dbca3e
[Placement Group]Reschedule bundles when the node of bundles is dead ( #10021 )
2020-08-19 13:24:42 -07:00
SangBin Cho
263df6163c
[Placement Group] Placement group remove api part 1 ( #10063 )
...
* Added basic rpc calls.
* fix issues.
* Fix the gcs server not getting request issue.
* In Progress.
* Basic logic done. Tests are required.
* In progress.
* In progress in refactoring context.
* Revert "In progress in refactoring context."
This reverts commit 38236256cf1306c60dd203e75d45ceb4509c8106.
* Working now.
* Python test works.
* Lint.
* Addressed code review.
* Addressed code review.
* Lint.
* Added unit tests.
* Done, but one of unit tests fail
* Addressed code review.
* Addressed the last code review.
* Fix the wrong test case.
2020-08-18 12:44:00 -07:00
Simon Mo
bedc2c24c8
Export Metrics in OpenCensus Protobuf Format ( #10080 )
2020-08-18 11:32:42 -07:00
SangBin Cho
053188dfbe
[Placement Group] Support Placement Group state table. ( #10090 )
...
* Done.
* Addressed code review.
* Linting.
* Fix lint.
* Fix lint.
* Fix a test.
* Lint.
* Add a lint sleep to test.
* Fix the lint issue.
* Fixed doc build error.
2020-08-17 09:24:50 -07:00
fangfengbin
edd783bc32
[Placement Group]Add soft pack strategy ( #10099 )
2020-08-17 12:01:34 +08:00
Tao Wang
fba5906ce3
[GCS] Re-report heartbeat when gcs server restarts ( #10040 )
...
* Retry to send failed heartbeat when light heartbeat enalbed
* Re-report heartbeat when gcs server restarts
* remove is_pubsub_server_restarted
* add lock per comment
* minor change, name related
2020-08-14 17:37:20 -07:00
Siyuan (Ryans) Zhuang
17ca1d8ff4
[Core] Object spilling prototype ( #9818 )
2020-08-14 15:39:10 -07:00
Robert Nishihara
36e626e95d
Revert "[Dashboard] Start the new dashboard ( #9860 )" ( #10116 )
...
This reverts commit 739933e5b8
.
2020-08-14 14:06:57 -07:00
fangfengbin
3a6fa7d622
[Placement Group]Optimize placement group strict pack strategy ( #9924 )
...
* add part code
* add code
* add part code
* rm used import
* add part code
* add part code
* add part code
* add part code
* add part code
* add part code
* fix review comment
* add testcase
* use ResourceSet
* fix review comment
* fix ut bug
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-08-13 23:58:52 -07:00
Simon Mo
01f38bc5d1
CoreWorker correctly push metrics to agent ( #10031 )
2020-08-13 16:44:53 -07:00
Ícaro Aragão
b77d6bf87d
[GCS] Improve fallback for getting local valid IP for GCS server ( #10004 )
2020-08-13 16:29:47 -05:00
SangBin Cho
86b1db3f11
[Stats] Make metrics report time configurable ( #10036 )
...
* Done.
* Lint.
* Address code review.
* Address code review.
* Remove wrong commit.
* Fix a test error.
2020-08-13 00:30:24 -07:00
fyrestone
739933e5b8
[Dashboard] Start the new dashboard ( #9860 )
2020-08-13 11:01:46 +08:00
fangfengbin
701e26e0af
[GCS]Add node realtime resource view ( #10043 )
2020-08-12 10:52:17 +08:00
Zhuohan Li
a6fed4820e
[Core] Preliminary implementation of ownership-based object directory ( #9735 )
2020-08-11 15:04:13 -07:00
SangBin Cho
946ae74817
[GCS Actor Management] Race condition around creating -> created phase. ( #10035 )
...
* Fix the issue.
* Address a code review.
2020-08-11 12:31:27 -07:00
Basasuya
0400a88bf1
[EVENT] Basic Function and Definition ( #9657 )
2020-08-11 17:36:07 +08:00
Kai Yang
3bc17fa62a
[Core] Multi-tenancy: Pass env variables from job config to worker processes ( #10022 )
2020-08-10 14:31:37 -07:00
Alex Wu
2ebf76c7a3
[New Scheduler] Additional unit tests ( #9990 )
2020-08-10 11:44:06 -07:00
SangBin Cho
eb6b10221e
Increase the num of trials to reduce the probability of failing sample_test ( #10007 )
2020-08-10 10:05:33 -07:00
Kai Yang
37821f0b4c
Support unlimited JVM options ( #9910 )
2020-08-10 16:08:33 +08:00
fangfengbin
26b36a1982
Optimize node register&worker failure log ( #9833 )
2020-08-10 11:41:45 +08:00
fangfengbin
a2bfdcbf24
[Placement Group]Trigger placement group scheduling when a new node is added ( #9905 )
2020-08-10 10:56:17 +08:00
Barak Michener
8e76796fd0
ci: Redo format.sh --all
script & backfill lint fixes ( #9956 )
2020-08-07 16:49:49 -07:00
Barak Michener
1d01c668f0
rpc: Core Worker client pool ( #9934 )
2020-08-07 16:34:29 -07:00
Tao Wang
8bea875673
[TEST]Check if port is free before start up redis ( #9974 )
...
* [TEST]Check if port is free before start up redis
* per comment
2020-08-07 10:15:12 -07:00
SangBin Cho
44826878ff
[Core] Remove Legacy Raylet Code ( #9936 )
...
* Remove a flag and some methods in node manager including HandleDisconnectedActor, ResubmitTask, and HandleTaskReconstruction
* Make actor creator always required + remove raylet transport
* Remove actor reporter + remove FinishAssignedActorCreationTask
* Remove actor tasks.
* Remove finishactortask and switched it to finishactorcreation task
* Remove reconstruction policy.
* Remove lineage cache.
* Formatting.
* Remove actor frontier code.
* Removed build error.
* Revert "Remove reconstruction policy."
This reverts commit 9d25c9bced4da5fbcac5d484d51013345f16513b.
* Recover HandleReconstruction to mark expired objects as failed.
2020-08-06 16:37:50 -07:00
SangBin Cho
ec2f1a225e
[Stats] Metrics Export User Interface Part 1 ( #9913 )
...
* Metrics export port expose done.
* Support exposing metrics port + metrics agent service discovery through ray.nodes()
* Formatting.
* Added a doc.
* Linting.
* Change the location of metrics agent port.
* Addressed code review.
* Addressed code review.
2020-08-06 16:16:29 -07:00
Eric Liang
7d4f204aa8
[Placement Group] Allow scheduling a task on any bundle (-1, default) ( #9885 )
...
* wip
* wip
* fix tests
* wip
* wip
* wip
* wip
* wip
* add test
* update
* update
* remov debug
* comments
2020-08-06 00:05:21 -07:00
Tao Wang
1760586628
[GCS]Use an asynchronous PING to avoid blocking other operations ( #9871 )
...
* Use separate redis client to avoid its sync command blocking other operations
* use redis_failure_detector_client_
* use async command to ping redis
* format log
2020-08-05 19:10:53 -07:00
SangBin Cho
68899e2f8e
[GCS Actor Management] Fix race condition for DEPENDENCIES_UNREADY states. ( #9883 )
...
* Fix issues.
* Address code review.
* Addressed code review 2.
* Fix formatting.
* Addressed code review 3/
* Addressed code review.
2020-08-05 12:22:12 -07:00
SangBin Cho
685182923c
[Core] Fix detached actor local mode when gcs actor management is on. ( #9839 )
...
* Fix local mode detached actor.
* Revert changes.
2020-08-05 09:04:24 -07:00
kisuke95
ddc1e483fb
Fix actor table Delete bug ( #9499 )
2020-08-05 18:05:51 +08:00
kisuke95
80d2544f6b
Fix vector<bool> for loop ( #9907 )
2020-08-05 17:49:37 +08:00
fangfengbin
193d11ab8b
Optimize placement group log ( #9891 )
2020-08-05 14:41:32 +08:00
chaokunyang
3323ad9d59
[HOTFIX] Fix master build with missing placement group argument ( #9868 )
...
* fix common task submit default placement group
* fix java_function
2020-08-04 11:19:15 -07:00
Barak Michener
c16e1b9524
src/ray/protobuf: Break proto rules into a proper BUILD file ( #9792 )
2020-08-04 11:12:45 -07:00
Kai Yang
27cd323ce1
[Core] Multi-tenancy: Job isolation & implement per job config (except for env variables) ( #9500 )
2020-08-04 15:51:29 +08:00
kisuke95
28b1f7710c
[Core] Error info pubsub (Remove ray.errors API) ( #9665 )
2020-08-04 14:04:29 +08:00
fangfengbin
8c3fc1db76
Optimize actor creation log ( #9781 )
2020-08-04 10:29:30 +08:00
Zhijun Fu
4f2e4f31dd
async grpc calls should always return void ( #9533 )
2020-08-03 12:44:02 -07:00
Stephanie Wang
37a9c5783c
[core] Report resource load by shape ( #9806 )
...
* Report and aggregate resource load by shape
* python test
* python test
* x
* update
2020-07-31 16:57:30 -07:00
Eric Liang
b73080c85f
Allow tasks to be used with placement groups ( #9738 )
2020-07-31 10:51:37 -07:00
fangfengbin
3900643948
Add actor states definitions & transition diagram doc ( #9754 )
2020-07-31 15:35:25 +08:00
Kai Yang
02fd950252
[Java] Local and distributed ref counting in Java ( #9371 )
2020-07-31 11:49:31 +08:00