Stephanie Wang
ada58abcd9
[Object spilling] Update object directory and reload spilled objects automatically ( #11021 )
...
* Fix pytest...
* Release objects that have been spilled
* GCS object table interface refactor
* Add spilled URL to object location info
* refactor to include spilled URL in notifications
* improve tests
* Add spilled URL to object directory results
* Remove force restore call
* Merge spilled URL and location
* fix
* CI
* build
* osx
* Fix multitenancy issues
* Skip windows tests
2020-10-02 15:52:42 -07:00
fangfengbin
180c259702
[GCS]Remove unused api(ServiceBasedActorInfoAccessor::AsyncRegister/ServiceBasedActorInfoAccessor::AsyncUpdate) ( #11099 )
...
* remove unused gcs api
* fix ut bug
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-10-02 00:54:28 -07:00
Alex Wu
a866be381c
[New Scheduler] Heartbeat ( #11024 )
...
* .
* refactor
* .
* .
* done?
* .
* .
* .
* lint
* no light heartbeat, no tests, fields 2,3
* .
* manually clang format :(
* .
* .
* test
* .
* .
* task manager heartbeat
* lint
* .
* add reminder
* CR
* CR
* cleanup
* CR
* comment
* lint
* .
2020-10-01 15:54:53 -07:00
fangfengbin
138d6cced9
[GCS]Optimizing actor info query interface ( #11067 )
...
* add part code
* add part code
* fix review comment
* fix review comment
* fix review comment
* fix crash bug
* fix ut bug
* fix bug
* fix ut bug
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-09-30 11:34:42 -07:00
Kai Yang
3504391fd2
[Core] Multi-tenancy: enable multi-tenancy by default ( #10570 )
...
* Add new job in Travis to enable multi-tenancy
* fix
* Update .bazelrc
* Update .travis.yml
* fix test_job_gc_with_detached_actor
* fix test_multiple_downstream_tasks
* fix lint
* Enable multi-tenancy by default
* Kill idle workers in FIFO order
* Update test
* minor update
* Address comments
* fix some cases
* fix test_remote_cancel
* Address comments
* fix after merge
* remove kill
* fix worker_pool_test
* fix java test timeout
* fix test_two_custom_resources
* Add a delay when killing idle workers
* fix test_worker_failure
* fix test_worker_failed again
* fix DisconnectWorker
* update test_worker_failed
* Revert some python tests
* lint
* address comments
2020-09-29 23:54:53 -07:00
Tao Wang
15ae8816f7
[GCS]Remove useless / heavy heartbeat pub ( #11132 )
2020-09-29 23:38:17 -07:00
Tao Wang
1db83764bf
[GCS]Use new getting all available resources interface instead of pub-sub … ( #10914 )
...
* Use new all available resources getting interface instead of pub-sub in state.py
* add missing server handler and test cases, fix comments
* add fine grained test assert
* per comments
* involve new added function _available_resources_per_node
* change ClientID to NodeID
* fix compile
* fix client id and lint
* robust tests check
* robust tests
2020-09-29 09:41:10 -07:00
SangBin Cho
0a6164ab15
[Core] Improve logging messages. ( #11082 )
2020-09-28 21:07:45 -07:00
fangfengbin
872219940b
[GCS]Fix miss PollOwnerForActorOutOfScope
after gcs restarts bug ( #11054 )
...
* fix_RemoveActorFromOwner_crash_bug
* fix review comment
* fix review comment
* rm unused ut
* add testcase
* fix review comment
* rm unused import
* fix code style
* fix ut bug
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-09-28 10:06:40 -07:00
Lingxuan Zuo
27e1f513e3
[Log] make glog flush and RAY_LOG thread-safe ( #11002 )
...
* make glog flush and RAY_LOG thread-safe
* dump error log to console
* mapping all levels to destination
* hack glog for exporting message to stdout if no base name given
* patch lint
* use stdout logger by default
* add raylet std/err pytest checker
* add worker logs file check
* fix asan check
* loop in glog enums
* fix python lint
* lint for autoindent
* fix indent lint
* make raylet.err is not empty
2020-09-28 22:15:15 +08:00
Tao Wang
25ac8f9aa5
[GCS]Use new flag to indicate whether resources are updated and update realtime resources view ( #10906 )
...
* Handle resources turning empty and update realtime view
* add up missing flag
* per comments
* use flag instead of special key to represent if resource changed
* Update src/ray/protobuf/gcs.proto
Co-authored-by: fangfengbin <869218239a@zju.edu.cn>
* fix lint in gcs.proto
* fix embarrassed mistake
Co-authored-by: fangfengbin <869218239a@zju.edu.cn>
2020-09-28 01:57:27 -07:00
fangfengbin
2e41a29c8f
[Placement Group]Support placement group request processing idempotent in raylet ( #10998 )
...
* add part code
* fix review comment
* fix review comment
* fix review comment
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-09-28 01:56:43 -07:00
fang yeqing
0765d989ae
[Core] Simplify logic in node manager class. ( #11063 )
...
Co-authored-by: 逗角 <yeqing.fyq@antfin.com>
2020-09-28 01:54:06 -07:00
fangfengbin
142234cbcb
[GCS]Fix ServiceBasedGcsClientTest bug ( #11031 )
...
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-09-28 14:50:36 +08:00
fangfengbin
86e5db4d59
[GCS]Fix GCS actor manager idempotent bug ( #11003 )
...
* [GCS]Fix GCS actor manager idempotent bug
* fix review comment
* fix review comment
* fix review comments
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-09-27 21:12:42 -07:00
SangBin Cho
1e39c40370
[Placement Group] Capture child tasks by default. ( #11025 )
...
* In progress.
* Finished up.
* Improve comment.
* Addressed code review.
* Fix test failure.
* Fix ci failures.
* Fix CI issues.
2020-09-27 19:33:00 -07:00
fangfengbin
f0787a63da
[GCS] fix rpc port bug ( #11055 )
...
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-09-27 19:08:48 -07:00
DK.Pino
db7097fb1f
[Refactor] Rename ClientId to NodeId ( #10992 )
...
* rename ClientId to NodeId
* format lint
* format lint
* fix conflicts
* rename new ClientId to NodeId
* update lint
* make same version of clang-format with travis ci
2020-09-27 10:24:21 -07:00
Stephanie Wang
552ebdbeda
[Core] Announce worker port at end of constructor ( #11036 )
2020-09-25 21:56:00 -07:00
SangBin Cho
29663d89f1
[Placement Group] Remove warning msg for placement groups. ( #11034 )
...
* Done.
* Addressed code review.
* Fixed typo.
* Addressed code review.
2020-09-25 20:53:42 -07:00
SangBin Cho
8abe13023f
[Metric] Fix issue 10634 ( #10940 )
...
* Fix.
* Revert "Fix."
This reverts commit 52c9c1ee646b551a4dd2b639c78be67683db2b1c.
* ADdressed code review.
* Addressed code review.
2020-09-25 09:11:05 -07:00
Alex Wu
0f168bf2ef
[hotfix] Use ref in WorkerPool::TryKillingIdleWorkers ( #11017 )
2020-09-24 17:23:56 -07:00
SangBin Cho
5e6b887f2d
[Placement Group] Capture Child Task Part 1 ( #10968 )
...
* In progress.
* In progers.
* Done.
* Addressed code review.
* Increase timeout to make a test less flaky.
* Addressed code review.
* Addressed code review.
2020-09-24 09:02:03 -07:00
DK.Pino
4fa6523e4e
[Core] Remove unnecessary if judgment ( #10971 )
...
* Remove unnecessary if judgment
* format code style
2020-09-23 21:24:11 -07:00
fangfengbin
2a79571c29
[Placement Group] Optimize log ( #10974 )
...
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-09-23 20:28:08 -07:00
Kai Yang
b251a445dd
[Core] Fix maximum_startup_concurrency
caused by AnnounceWorkerPort
( #10853 )
...
* Fix maximum_startup_concurrency caused by AnnounceWorkerPort
* Address comment
* Update src/ray/raylet/worker_pool.h
Co-authored-by: Eric Liang <ekhliang@gmail.com>
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-09-23 20:27:44 -07:00
Alex Wu
295782d411
[New Scheduler] Refactor cluster resource scheduler ( #10938 )
2020-09-23 15:46:31 -07:00
SangBin Cho
7931b6ce2e
Fix placement group bug failing in release test ( #10944 )
2020-09-23 12:37:28 -07:00
fangfengbin
a260e66016
[Placement Group]Fix CommitResources crash bug ( #10951 )
2020-09-23 17:24:53 +08:00
SangBin Cho
390107b6cb
[Core] Allow to pass node ip address to gcs server. ( #10946 )
...
* Allow to pass node ip address to gcs server.
* Fix.
* Addressed code review.
* Fixed an error.
* Addressed code review.
2020-09-23 01:52:26 -07:00
Kai Yang
864d1d2b59
[Core] Multi-tenancy: Kill idle workers in FIFO order ( #10597 )
...
* Kill idle workers in FIFO order
* Update test
* minor update
* Address comments
* fix after merge
* fix worker_pool_test
2020-09-22 10:59:11 -07:00
SangBin Cho
e3b4850224
[Placement group] Release test ( #10924 )
...
* Done.
* Lint.
* Addressed code review.
2020-09-22 00:49:04 -07:00
fangfengbin
1cc4543048
[GCS]Limit the number of profile table ( #10888 )
...
* add part code
* add part code
* fix compile bug
* fix compile bug
* fix review comments
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-09-21 21:53:42 -07:00
fangfengbin
3e94c690c7
Fix flaky placement group test bug ( #10915 )
...
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-09-20 19:50:55 -07:00
fangfengbin
3f90ec5963
[GCS]Fix actor idempotent bug ( #10856 )
2020-09-20 12:35:45 +08:00
fangfengbin
890fa6704f
[GCS]Fix MGetValues Command to send is too large bug ( #10877 )
2020-09-19 12:22:20 +08:00
SangBin Cho
bc74a10748
[Core] Fix Flaky GCS actor manager test ( #10600 )
...
* Try.
* Fix the issue.
* Fix.
2020-09-17 16:10:57 -07:00
SangBin Cho
fe4c6ab778
[Core] Remove unused credis related code. ( #10849 )
...
* Done.
* Lint.
2020-09-16 23:34:54 -07:00
fyrestone
50784e2496
[Dashboard] Dashboard node grouping ( #10528 )
...
* Add RAY_NODE_ID environment var to agent
* Node ralated data use node id as key
* ray.init() return node id; Pass test_reporter.py
* Fix lint & CI
* Fix comments
* Minor fixes
* Fix CI
* Add const to ClientID in AgentManager::Options
* Use fstring
* Add comments
* Fix lint
* Add test_multi_nodes_info
Co-authored-by: 刘宝 <po.lb@antfin.com>
2020-09-16 10:17:29 -07:00
Basasuya
5e030db8a5
[EVENT] add log reporter ( #10419 )
2020-09-16 11:54:05 +08:00
Kai Yang
4c03f7ca2f
[Core] Multi-tenancy: Reject worker registration if job has finished ( #10569 )
2020-09-14 14:49:31 +08:00
Kai Yang
a43817f34b
[Java] Attach owner address for pass-by-reference task arguments ( #9634 )
2020-09-14 11:46:59 +08:00
Xianyang Liu
8166d71bde
[Java] Support exchange ObjectRef between processes ( #10729 )
2020-09-13 11:54:45 +08:00
SangBin Cho
517e164fb7
[Core] Update the object manager pulling objects error message to warning. ( #10657 )
...
* Update the message to expose less implementation details and make the severity WARNING.
* Fix formatting.
2020-09-11 15:53:04 -07:00
Stephanie Wang
dbca2f9889
Fix segfault in network utils ( #10741 )
2020-09-11 15:35:03 -07:00
Kai Yang
23051385a4
Fix Java CI crash caused by incorrect destruction order in core worker ( #10709 )
2020-09-11 17:33:09 +08:00
Barak Michener
c6b1ed7f8f
release process: bump version number to 1.1.0.dev0 everywhere ( #10686 )
2020-09-10 16:00:21 -07:00
Max Fitton
3e8164ff8a
[Dashboard] Logical View Actor Class Grouping Details ( #10453 )
...
* wip
* wip
* wip
* wip
* Need to track the timestamp actors are created for the dashboard. This adds that functionality back in and deletes unused code
* Add the materialui lab packages to get access to the Alert component and fix up some vulnerabilities with npm audit.
* Finish supporting information on a per-actor-class basis in the logical view, add bug fixes around timestamps and infeasible task names, and add a new warning popup that shows if there are infeasible actors around.
* lint and add seconds annotation to actor lifetime values
* real lint
* remove typo
* Somehow missed something last lint
* Add new comments for actor states
* Add underscores to some private functions
* Add tooltips to the actor states on the logical view
* change test metrics to be aligned with new changes.
* lint
* Remove some unnecessary log lines and catch error that happens when we try to decode data from an unexpected source
* Re-add a function I had removed. It is used in the Java codebase.
Co-authored-by: Max Fitton <max@semprehealth.com>
2020-09-09 10:34:54 -07:00
Kai Yang
afa0216280
Remove the '--include-java' option ( #10594 )
2020-09-09 17:01:17 +08:00
chaokunyang
ccf27a9ad2
[Streaming] Fix streaming ci ( #10665 )
2020-09-09 16:53:43 +08:00