Stephanie Wang
9a31166050
Option to disable profiling and task timeline ( #10414 )
2020-08-29 11:35:22 -07:00
Lixin Wei
eb66db3199
[Build] bug fixed for logging ( #10364 )
2020-08-28 09:17:08 -07:00
SangBin Cho
d206fbbc99
[Placement group] Scheduler map refactoring part 1. ( #10381 )
...
* In Progress
* done.
* Address code review.
2020-08-28 00:57:09 -07:00
SongGuyang
cb70864c04
[cpp worker] support cluster mode and object Put/Get works ( #9682 )
2020-08-28 13:53:36 +08:00
SangBin Cho
17f465d5c1
[Core] Improve raylet failure error msg ( #10345 )
...
* Improve error message.
* Lint.
* Addressed code review.
2020-08-27 12:53:18 -07:00
Clark Zinzow
0178d6318e
[Core] Expand job ID to 4 bytes by removing object flag bytes. ( #10187 )
2020-08-27 14:08:17 -05:00
Stephanie Wang
f75dfd60a3
[api] API deprecations and cleanups for 1.0 (internal_config and Checkpointable actor) ( #10333 )
...
* remove
* internal config updates, remove Checkpointable
* Lower object timeout default
* remove json
* Fix flaky test
* Fix unit test
2020-08-27 10:19:53 -07:00
Edward Oakes
60665fc936
Clean up task dependency and scheduler metrics ( #10340 )
2020-08-26 22:56:03 -05:00
Lixin Wei
4b856fa416
[Core]Async updating issue fixed for actor's num_restart ( #10176 )
...
* bug fixed for num_restart updating
* add log
* log updated
* lint
* fixed
* Update src/ray/gcs/gcs_server/gcs_actor_manager.cc
Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>
* bug fixed
* bug fixed
* test passed
Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>
2020-08-26 11:49:26 -07:00
Edward Oakes
c35ad8237d
[metrics] Clean up object manager stats ( #10316 )
2020-08-26 13:43:06 -05:00
Edward Oakes
916a19363f
Clean up actor metrics ( #10317 )
2020-08-26 10:21:15 -05:00
Edward Oakes
cbd9632f3a
Fix wait timeout logic ( #10199 )
2020-08-25 22:41:39 -05:00
fyrestone
08adbb371f
Cross language exception ( #10023 )
2020-08-26 10:46:05 +08:00
Edward Oakes
1e99b814f0
Remove unused scheduler states ( #10318 )
...
* remove unused state
* remove unused states
2020-08-25 18:56:21 -07:00
Stephanie Wang
d4537ac1ce
[core] Try to schedule tasks locally before spilling over to remote nodes ( #10302 )
...
* Regression test
* Spillback
* Remove check for actor tasks
2020-08-25 15:01:59 -07:00
kisuke95
24a7a8a04d
[Streaming] Build fix ( #10233 )
2020-08-25 11:37:21 -07:00
fyrestone
05c103af94
[Dashboard] Start the new dashboard ( #10131 )
...
* Use new dashboard if environment var RAY_USE_NEW_DASHBOARD exists; new dashboard startup
* Make fake client/build/static directory for dashboard
* Add test_dashboard.py for new dashboard
* Travis CI enable new dashboard test
* Update new dashboard
* Agent manager service
* Add agent manager
* Register agent to agent manager
* Add a new line to the end of agent_manager.cc
* Fix merge; Fix lint
* Update dashboard/agent.py
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
* Update dashboard/head.py
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
* Fix bug
* Add tests for dashboard
* Fix
* Remove const from Process::Kill() & Fix bugs
* Revert error check of execute_after
* Raise exception from DashboardAgent.run
* Add more tests.
* Fix compile on Linux
* Use dict comprehension instead of dict(generator)
* Fix lint
* Fix windows compile
* Fix lint
* Test Windows CI
* Revert "Test Windows CI"
This reverts commit 945e01051ec95cff5fcc1c0bc37045b46e7ad9a6.
* Fix ParseWindowsCommandLine bug
* Update src/ray/util/util.cc
Co-authored-by: Robert Nishihara <robertnishihara@gmail.com>
Co-authored-by: 刘宝 <po.lb@antfin.com>
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: Robert Nishihara <robertnishihara@gmail.com>
2020-08-24 13:24:23 -07:00
Kai Yang
07f6cb17e4
[Core] Multi-tenancy: Refine worker env variable passing ( #10191 )
...
* Resolve issues with environment variable handling
* fix
* fix warning
* lint
Co-authored-by: Mehrdad <noreply@github.com>
2020-08-24 09:04:22 -07:00
fangfengbin
b61a79efd7
[Placement Group]Fix SigSegv bug ( #10262 )
...
* fix SigSegv bug
* fix review comments
* fix ut bug
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-08-23 11:33:40 -07:00
Ian Rodney
32ed1a18b7
[hotfix] Fix lint in master ( #10254 )
2020-08-21 20:53:05 -07:00
Alex Wu
136c8ff19e
[NewScheduler] Pass test_basic.py ( #10059 )
...
* .
* .
* Cleanup
* .
* whoops
* Update src/ray/raylet/scheduling/cluster_task_manager.h
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
* Update src/ray/raylet/scheduling/cluster_task_manager.h
Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>
* CR
* .
* .
* done
* .
* Unit tests
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>
2020-08-21 15:00:08 -07:00
Barak Michener
f03caa4532
rpc: Follow-up by sharing the core worker client pool within the core worker. ( #10206 )
...
* Share CoreWorkerClientPool
* Format
2020-08-21 11:01:22 -07:00
Stephanie Wang
85e57a7a98
[Object spilling] Look up the location of the primary raylet from the owner's metadata ( #10197 )
...
* Get the primary copy from the owner, python test, some node manager fixes
* fixes and todo
* update
* lint
* fix build
2020-08-20 14:46:59 -07:00
fangfengbin
a462ae2747
[Placement Group]Add strict spread strategy ( #10174 )
...
* support STRICT_SPREAD strategy
* fix review comments
* rebase master
* fix lint error
* fix lint error
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-08-20 10:18:58 -07:00
SangBin Cho
224933b5e4
[Placement Group] Remove API part 2 ( #10215 )
...
* Initial progress done.
* Fix mistake.
* Addressed code review.
* Fix cpp build issue.
* Addressed code review.
2020-08-20 09:50:13 -07:00
fangfengbin
9734dbca3e
[Placement Group]Reschedule bundles when the node of bundles is dead ( #10021 )
2020-08-19 13:24:42 -07:00
SangBin Cho
263df6163c
[Placement Group] Placement group remove api part 1 ( #10063 )
...
* Added basic rpc calls.
* fix issues.
* Fix the gcs server not getting request issue.
* In Progress.
* Basic logic done. Tests are required.
* In progress.
* In progress in refactoring context.
* Revert "In progress in refactoring context."
This reverts commit 38236256cf1306c60dd203e75d45ceb4509c8106.
* Working now.
* Python test works.
* Lint.
* Addressed code review.
* Addressed code review.
* Lint.
* Added unit tests.
* Done, but one of unit tests fail
* Addressed code review.
* Addressed the last code review.
* Fix the wrong test case.
2020-08-18 12:44:00 -07:00
Simon Mo
bedc2c24c8
Export Metrics in OpenCensus Protobuf Format ( #10080 )
2020-08-18 11:32:42 -07:00
SangBin Cho
053188dfbe
[Placement Group] Support Placement Group state table. ( #10090 )
...
* Done.
* Addressed code review.
* Linting.
* Fix lint.
* Fix lint.
* Fix a test.
* Lint.
* Add a lint sleep to test.
* Fix the lint issue.
* Fixed doc build error.
2020-08-17 09:24:50 -07:00
fangfengbin
edd783bc32
[Placement Group]Add soft pack strategy ( #10099 )
2020-08-17 12:01:34 +08:00
Tao Wang
fba5906ce3
[GCS] Re-report heartbeat when gcs server restarts ( #10040 )
...
* Retry to send failed heartbeat when light heartbeat enalbed
* Re-report heartbeat when gcs server restarts
* remove is_pubsub_server_restarted
* add lock per comment
* minor change, name related
2020-08-14 17:37:20 -07:00
Siyuan (Ryans) Zhuang
17ca1d8ff4
[Core] Object spilling prototype ( #9818 )
2020-08-14 15:39:10 -07:00
Robert Nishihara
36e626e95d
Revert "[Dashboard] Start the new dashboard ( #9860 )" ( #10116 )
...
This reverts commit 739933e5b8
.
2020-08-14 14:06:57 -07:00
fangfengbin
3a6fa7d622
[Placement Group]Optimize placement group strict pack strategy ( #9924 )
...
* add part code
* add code
* add part code
* rm used import
* add part code
* add part code
* add part code
* add part code
* add part code
* add part code
* fix review comment
* add testcase
* use ResourceSet
* fix review comment
* fix ut bug
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-08-13 23:58:52 -07:00
Simon Mo
01f38bc5d1
CoreWorker correctly push metrics to agent ( #10031 )
2020-08-13 16:44:53 -07:00
Ícaro Aragão
b77d6bf87d
[GCS] Improve fallback for getting local valid IP for GCS server ( #10004 )
2020-08-13 16:29:47 -05:00
SangBin Cho
86b1db3f11
[Stats] Make metrics report time configurable ( #10036 )
...
* Done.
* Lint.
* Address code review.
* Address code review.
* Remove wrong commit.
* Fix a test error.
2020-08-13 00:30:24 -07:00
fyrestone
739933e5b8
[Dashboard] Start the new dashboard ( #9860 )
2020-08-13 11:01:46 +08:00
fangfengbin
701e26e0af
[GCS]Add node realtime resource view ( #10043 )
2020-08-12 10:52:17 +08:00
Zhuohan Li
a6fed4820e
[Core] Preliminary implementation of ownership-based object directory ( #9735 )
2020-08-11 15:04:13 -07:00
SangBin Cho
946ae74817
[GCS Actor Management] Race condition around creating -> created phase. ( #10035 )
...
* Fix the issue.
* Address a code review.
2020-08-11 12:31:27 -07:00
Basasuya
0400a88bf1
[EVENT] Basic Function and Definition ( #9657 )
2020-08-11 17:36:07 +08:00
Kai Yang
3bc17fa62a
[Core] Multi-tenancy: Pass env variables from job config to worker processes ( #10022 )
2020-08-10 14:31:37 -07:00
Alex Wu
2ebf76c7a3
[New Scheduler] Additional unit tests ( #9990 )
2020-08-10 11:44:06 -07:00
SangBin Cho
eb6b10221e
Increase the num of trials to reduce the probability of failing sample_test ( #10007 )
2020-08-10 10:05:33 -07:00
Kai Yang
37821f0b4c
Support unlimited JVM options ( #9910 )
2020-08-10 16:08:33 +08:00
fangfengbin
26b36a1982
Optimize node register&worker failure log ( #9833 )
2020-08-10 11:41:45 +08:00
fangfengbin
a2bfdcbf24
[Placement Group]Trigger placement group scheduling when a new node is added ( #9905 )
2020-08-10 10:56:17 +08:00
Barak Michener
8e76796fd0
ci: Redo format.sh --all
script & backfill lint fixes ( #9956 )
2020-08-07 16:49:49 -07:00
Barak Michener
1d01c668f0
rpc: Core Worker client pool ( #9934 )
2020-08-07 16:34:29 -07:00