Commit graph

215 commits

Author SHA1 Message Date
Eric Liang
99a63b3dd1
Remove old scheduler and friends (#14184) 2021-03-03 18:29:15 -08:00
Stephanie Wang
5c6c9d5b91
[core] Spill tasks from waiting queue (#14288)
* Spill back waiting tasks

* test

* test

* todo

* Avoid iterating over args

* update

* lint

* Fix test

* test

* Test force spillback

* Unit test resource scheduler

* test

* travis?

* rename

* debug

* revert flaky test

* lint

* fix test

* fix
2021-03-02 22:30:02 -08:00
Stephanie Wang
a24ac13671
[core] Randomize actor ID to avoid collisions (#14358)
* Randomize actor ID

* Mix index and current time, add python test

* test

* nanos
2021-03-02 10:00:28 -08:00
Eric Liang
cc156f7b3c
Fix deadlock in unhandled exception handler and re-merge (#3) (#14192) 2021-02-19 11:52:09 -08:00
SangBin Cho
66f93a3d63
Revert "Fix OSX error and re-merge unhandled exceptions handling (#14138)" (#14180)
This reverts commit ee584e8328.
2021-02-18 10:35:38 -08:00
Eric Liang
ee584e8328
Fix OSX error and re-merge unhandled exceptions handling (#14138) 2021-02-17 13:35:07 -08:00
architkulkarni
3ce03a52bc
Revert "Revert "Revert "Unhandled exception handler based on local ref counti… (#14113)" (#14136)
This reverts commit e457872fe1.
2021-02-16 11:47:09 -08:00
Eric Liang
e457872fe1
Revert "Revert "Unhandled exception handler based on local ref counti… (#14113)
* Revert "Revert "Unhandled exception handler based on local ref counting (#14049)" (#14099)"

This reverts commit b45ae76765.

* reomve test

* fix

* fix
2021-02-15 14:11:11 -08:00
SangBin Cho
b45ae76765
Revert "Unhandled exception handler based on local ref counting (#14049)" (#14099)
This reverts commit 9dc671ae02.
2021-02-14 22:08:32 -08:00
Eric Liang
9dc671ae02
Unhandled exception handler based on local ref counting (#14049) 2021-02-12 22:58:38 -08:00
Stephanie Wang
0998d69968
[core] Admission control for pulling objects to the local node (#13514)
* Admission control, TODO: tests, object size

* Unit tests for admission control and some bug fixes

* Add object size to object table, only activate pull if object size is known

* Some fixes, reset timer on eviction

* doc

* update

* Trigger OOM from the pull manager

* don't spam

* doc

* Update src/ray/object_manager/pull_manager.cc

Co-authored-by: Eric Liang <ekhliang@gmail.com>

* Remove useless tests

* Fix test

* osx build

* Skip broken test

* tests

* Skip failing tests

Co-authored-by: Eric Liang <ekhliang@gmail.com>
2021-01-21 16:46:42 -08:00
fangfengbin
33b092de28
[GCS]Add gcs resource scheduler (#13072) 2021-01-14 20:05:55 +08:00
Siyuan (Ryans) Zhuang
46cf433f0e
[Core] Remove Arrow dependencies (#13157)
* remove arrow ubsan

* remove arrow build depend

* remove arrow buffer
2021-01-04 11:19:09 -08:00
Clark Zinzow
c2bff64699
[Core] Locality-aware leasing: Milestone 1 - Owned refs, pinned location (#12817)
* Locality-aware leasing for owned refs (pinned locations).

* LessorPicker --> LeasePolicy.

* Consolidate GetBestNodeIdForTask and GetBestNodeIdForObjects.

* Update comments.

* Turn on locality-aware leasing feature flag by default.

* Move local fallback logic to LeasePolicy, move feature flag check to CoreWorker constructor, add local-only lease policy.

* Add lease policy consulting assertions to the direct task submitter tests.

* Add lease policy tests.

* LocalityLeasePolicy --> LocalityAwareLeasePolicy.

* Add missing const declarations.

Co-authored-by: SangBin Cho <rkooo567@gmail.com>

* Add RAY_CHECK for raylet address nullptr when creating lease client.

* Make the fact that LocalLeasePolicy always returns the local node more explicit.

* Flatten GetLocalityData conditionals to make it more readable.

* Add ReferenceCounter::GetLocalityData() unit test.

* Add data-intensive microbenchmarks for single-node perf testing.

* Add data-intensive microbenchmarks for simulated cluster perf testing.

* Remove redundant comment.

* Remove data-intensive benchmarks.

* Add locality-aware leasing Python test.

* Formatting changes in ray_perf.py.

Co-authored-by: SangBin Cho <rkooo567@gmail.com>
2021-01-04 09:49:08 -08:00
Siyuan (Ryans) Zhuang
cf9952a028
[Core] Remote outdated external store (#13080)
* remove outdated external store
2020-12-24 17:30:06 -08:00
Stephanie Wang
4461f9980a
Refactor TaskDependencyManager, allow passing bundles of objects to ObjectManager (#13006)
* New dependency manager

* Switch raylet to new DependencyManager

* PullManager accepts bundles

* Cleanup, remove old task dependency manager

* x

* PullManager unit tests

* lint

* Unit tests

* Rename

* lint

* test

* Update src/ray/raylet/dependency_manager.cc

Co-authored-by: SangBin Cho <rkooo567@gmail.com>

* Update src/ray/raylet/dependency_manager.cc

Co-authored-by: SangBin Cho <rkooo567@gmail.com>

* x

* lint

Co-authored-by: SangBin Cho <rkooo567@gmail.com>
2020-12-23 18:36:00 -08:00
DK.Pino
6e19facc7f
[GCS] Delete redis gcs client and redis_xxx_accessor (#12996) 2020-12-23 20:31:46 +08:00
DK.Pino
6404f1e609
[Placement Group][New scheduler] New scheduler pg implementation (#12910) 2020-12-18 11:56:45 +08:00
DK.Pino
153b24746c
[Placement Group] Refactor pg resource constrain in node manager (#12538)
* first version by pointer

* second version reference

* clean up

* add cpp ut

* lint

* extract LocalPlacementGroupManagerInterface

* lint

* fix commemt

* add idempotency test

* lint

* fix pg ut

* fix pg ut

* python lint

* fix pg ut timeout

* python lint

* fix comment

* lint

* lint
2020-12-12 23:32:15 -08:00
Alex Wu
676ec363f6
[Object Manager] Pull Manager refactor (#12335) 2020-12-11 11:56:23 -08:00
Keqiu Hu
ee012532fb
[core] Use node manager client pool for GCS service #10398 (#12368)
* raylet client pool

* Fix merging conflict

* Fix documentation typo

* fix linting

* address comments

* fix typo

* remove unintended logging

* address comments

* fix bazel file lint error
2020-12-09 12:44:40 -08:00
fangfengbin
d5215745e4
[PlacementGroup] Introduce GcsResourceManager and avoid copying resources when scheduling placement groups (#12253) 2020-11-26 11:21:58 +08:00
Stephanie Wang
c49554fb7a
Abstract plasma store creation request queue (#12039) 2020-11-16 17:09:15 -08:00
Barak Michener
272edcca94
[ray_client]: Implement function calls (#11922) 2020-11-12 16:49:34 -08:00
Siyuan (Ryans) Zhuang
b8dda0e3d0
[Serialization] Fix buffer alignment issues (#11888)
* fix buffer alignment issues

* remove unused fields

* aligned memory allocation

* windows compat

* license. fix compiler warnings

* fix compilation error

* reinterpret_cast
2020-11-10 23:44:16 -08:00
Eric Liang
ee2da0cf45
[Core] PushManager for reliable broadcast (#11869) 2020-11-09 18:01:47 -08:00
Barak Michener
27c810a97e
Basic protos for ray client (#11762) 2020-11-05 16:23:54 -08:00
Stephanie Wang
0ba777af99
[Object spilling] Add policy to automatically spill objects on OutOfMemory (#11673) 2020-11-02 12:42:02 -08:00
Stephanie Wang
427b5af0ae
[Object spilling] Refactor raylet to add a local object manager class (#11647)
* Fix pytest...

* Release objects that have been spilled

* GCS object table interface refactor

* Add spilled URL to object location info

* refactor to include spilled URL in notifications

* improve tests

* Add spilled URL to object directory results

* Remove force restore call

* Merge spilled URL and location

* fix

* tmp

* refactor

* unit test skeleton

* unit testing

* unit test fixes

* cleanup

* cleanup

* update

* Separate pinning from waiting for object free, fixes pytest

* Update src/ray/raylet/local_object_manager.h

Co-authored-by: Eric Liang <ekhliang@gmail.com>

Co-authored-by: Tyler Westenbroek <westenbroekt@berkeley.edu>
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-10-28 10:38:42 -04:00
Scott Graham
c4ae94d60b
[autoscaler] Azure deployment fixes (#11613)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-10-27 15:27:18 -07:00
Max Fitton
caf3b04b27
[Dashboard] Turn on new dashboard by default pt 2 (#11510) 2020-10-23 15:52:14 -05:00
Ian Rodney
acbd12eabf
[Docker] Set Docker as the Default (#11416) 2020-10-19 10:53:30 -07:00
SangBin Cho
b1481c6acf
Revert "[PlacementGroup]Add node manager test framework (#11174)" (#11398)
This reverts commit 241e765d3a.
2020-10-14 11:09:20 -07:00
fangfengbin
241e765d3a
[PlacementGroup]Add node manager test framework (#11174)
* add part code

* add part code

* add part code

* add part code

* add part code

* add part code

* fix ut bug

* fix ut bug

Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-10-13 19:27:11 -07:00
fangfengbin
180c259702
[GCS]Remove unused api(ServiceBasedActorInfoAccessor::AsyncRegister/ServiceBasedActorInfoAccessor::AsyncUpdate) (#11099)
* remove unused gcs api

* fix ut bug

Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-10-02 00:54:28 -07:00
Eric Liang
6a227ae501
[autoscaler] Split autoscaler interface public private (#10898) 2020-09-18 18:16:23 -07:00
Basasuya
5e030db8a5
[EVENT] add log reporter (#10419) 2020-09-16 11:54:05 +08:00
chaokunyang
ccf27a9ad2
[Streaming] Fix streaming ci (#10665) 2020-09-09 16:53:43 +08:00
SangBin Cho
b7040f1310
Revert "[Streaming] fix streaming ci (#9675)" (#10656)
This reverts commit 3645a05644.
2020-09-08 19:07:21 -07:00
chaokunyang
3645a05644
[Streaming] fix streaming ci (#9675) 2020-09-08 22:20:58 +08:00
Eric Liang
519354a39a
[api] Initial API deprecations for Ray 1.0 (#10325) 2020-08-28 15:03:50 -07:00
fyrestone
05c103af94
[Dashboard] Start the new dashboard (#10131)
* Use new dashboard if environment var RAY_USE_NEW_DASHBOARD exists; new dashboard startup

* Make fake client/build/static directory for dashboard

* Add test_dashboard.py for new dashboard

* Travis CI enable new dashboard test

* Update new dashboard

* Agent manager service

* Add agent manager

* Register agent to agent manager

* Add a new line to the end of agent_manager.cc

* Fix merge; Fix lint

* Update dashboard/agent.py

Co-authored-by: SangBin Cho <rkooo567@gmail.com>

* Update dashboard/head.py

Co-authored-by: SangBin Cho <rkooo567@gmail.com>

* Fix bug

* Add tests for dashboard

* Fix

* Remove const from Process::Kill() & Fix bugs

* Revert error check of execute_after

* Raise exception from DashboardAgent.run

* Add more tests.

* Fix compile on Linux

* Use dict comprehension instead of dict(generator)

* Fix lint

* Fix windows compile

* Fix lint

* Test Windows CI

* Revert "Test Windows CI"

This reverts commit 945e01051ec95cff5fcc1c0bc37045b46e7ad9a6.

* Fix ParseWindowsCommandLine bug

* Update src/ray/util/util.cc

Co-authored-by: Robert Nishihara <robertnishihara@gmail.com>

Co-authored-by: 刘宝 <po.lb@antfin.com>
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: Robert Nishihara <robertnishihara@gmail.com>
2020-08-24 13:24:23 -07:00
Simon Mo
bedc2c24c8
Export Metrics in OpenCensus Protobuf Format (#10080) 2020-08-18 11:32:42 -07:00
Robert Nishihara
36e626e95d
Revert "[Dashboard] Start the new dashboard (#9860)" (#10116)
This reverts commit 739933e5b8.
2020-08-14 14:06:57 -07:00
fyrestone
739933e5b8
[Dashboard] Start the new dashboard (#9860) 2020-08-13 11:01:46 +08:00
Zhuohan Li
a6fed4820e
[Core] Preliminary implementation of ownership-based object directory (#9735) 2020-08-11 15:04:13 -07:00
Basasuya
0400a88bf1
[EVENT] Basic Function and Definition (#9657) 2020-08-11 17:36:07 +08:00
mehrdadn
5331c30e35
Improve Clang-IWYU to automatically make #include fixes (#9858)
Co-authored-by: Mehrdad <noreply@github.com>
2020-08-10 12:49:58 -07:00
Barak Michener
1d01c668f0
rpc: Core Worker client pool (#9934) 2020-08-07 16:34:29 -07:00
SangBin Cho
44826878ff
[Core] Remove Legacy Raylet Code (#9936)
* Remove a flag and some methods in node manager including HandleDisconnectedActor, ResubmitTask, and HandleTaskReconstruction

* Make actor creator always required + remove raylet transport

* Remove actor reporter + remove FinishAssignedActorCreationTask

* Remove actor tasks.

* Remove finishactortask and switched it to finishactorcreation task

* Remove reconstruction policy.

* Remove lineage cache.

* Formatting.

* Remove actor frontier code.

* Removed build error.

* Revert "Remove reconstruction policy."

This reverts commit 9d25c9bced4da5fbcac5d484d51013345f16513b.

* Recover HandleReconstruction to mark expired objects as failed.
2020-08-06 16:37:50 -07:00