Eric Liang
992437eafe
Yield plasma lock to other threads during long-running gets ( #16408 )
2021-06-14 16:23:05 -07:00
Simon Mo
5f4495108e
Fix macOS compilation ( #16412 )
2021-06-14 13:30:38 -07:00
SangBin Cho
b4e2ca39f9
[Pubsub] Using OBOD command batch for both reference counting and wait for object eviction ( #16334 )
...
* In progress/
* Basic implementation for wait for object eviction done
* Port ref count
* Fixing tests.
* Fix unit testse and remove unnecessary code
* In progress with ref count test
* Command batch done.
* done.
* Add a implementation note
* Fix all issues.
* Addressed the first batch of code review.
* one last thing; fix unit test
* Fix all issues.
* Fix a type issue.
* Fix the type issue
2021-06-14 10:10:35 -07:00
Eric Liang
f93ca2b673
Make it much simpler to turn on event stats ( #16401 )
2021-06-14 09:51:24 -07:00
Eric Liang
acb439e8f2
Prioritize get requests over wait request, and disallow overcommit of wait requests in unlimited allocation mode ( #16351 )
2021-06-12 14:06:43 -07:00
Chen Shen
24e409f948
[spilled object push optimization 3/3] ObjectManager Push from Spilled Object ( #16364 )
2021-06-11 15:57:51 -07:00
Eric Liang
47bbca04be
Add fallback allocator stats to "ray memory" ( #16362 )
2021-06-10 18:33:59 -07:00
Chen Shen
dd677f367e
[spilled object push optimization 2/3] Refactor ObjectManager's Push for integrating with SpilledObject ( #16352 )
2021-06-10 16:29:19 -07:00
Eric Liang
b0b160b701
Make fallback directory for plasma configurable based on tempdir ( #16361 )
2021-06-10 14:55:10 -07:00
Alex Wu
83a458dbf2
[core] Resource broadcast cleanup ( #16261 )
...
* .
* use new protobuf
* .
* .
* add todo
* .
* comments
* lint
Co-authored-by: Alex Wu <alex@anyscale.com>
2021-06-10 10:27:28 -07:00
SangBin Cho
a3f82112d8
Fix test_reference_count_2 ( #16353 )
2021-06-10 10:12:39 -07:00
Chen Shen
54f9aef35b
[spilled object push optimization 1/3] create a SpilledObject that reads data in chunks.
2021-06-10 10:08:51 -07:00
Eric Liang
ae0e38b86d
Remove legacy feature flags / features ( #16349 )
2021-06-10 09:31:38 -07:00
Tao Wang
9741bc00c9
[Core]Limit job error message size ( #16336 )
2021-06-10 19:28:00 +08:00
Eric Liang
d390344a8f
Enable plasma fallback allocations by default ( #16244 )
2021-06-09 22:05:52 -07:00
Chen Shen
5fe03667b9
[RFC] add ray.util.get_locations() to look up objects' location. ( #16130 )
...
* Implement GetLocationFromOwner at CoreWorker that looks up the locations
for a list of object ids
* plumbing GetLocationAPI to CoWorker
* introduce primary_node_id in refcounter
* add python tests
* address comments
* fix linit
* remove C++ tests
* more tests
* add more tests
* linter
* lint
* lint
* address comments
* fix merge issue
* nits
2021-06-09 11:30:42 -07:00
Eric Liang
6c7147dc97
Fix active RPC tracking in event tracker
2021-06-09 10:53:30 -07:00
SongGuyang
874e947d6f
[runtime env] support create or delete runtime envs in agent ( #15904 )
2021-06-09 20:22:25 +08:00
SangBin Cho
1795e73cf2
Revert "Batch the AddSpilledURLs RPC ( #16303 )" ( #16331 )
...
This reverts commit deda35fb4a
.
2021-06-09 00:33:57 -07:00
SangBin Cho
d9227d8506
[Pubsub] Pubsub module command batch part 1 ( #16167 )
...
* Basic command batch implemented.
* working.
* fix bugs.
* Improve a protobuf message.
* Update description of protobuf.
* Addressed code review.
2021-06-09 00:27:06 -07:00
Tao Wang
1c94906efc
[Test][Tiny]Check argv in right way ( #16325 )
2021-06-09 13:18:27 +08:00
Kai Yang
81be461ba2
[Core] Limit starting workers with maximum_startup_concurrency per worker type ( #16214 )
2021-06-09 13:11:53 +08:00
Eric Liang
deda35fb4a
Batch the AddSpilledURLs RPC ( #16303 )
2021-06-08 12:10:35 -07:00
Alex Wu
ae1cb12221
Revert "[GCS] Bookkeeping normal task resources in GCS ( #16185 )" ( #16315 )
...
This reverts commit f2384a9743
.
2021-06-08 11:02:28 -07:00
Chong-Li
f2384a9743
[GCS] Bookkeeping normal task resources in GCS ( #16185 )
2021-06-08 19:58:15 +08:00
Lixin Wei
870a0c16a3
[Logging] Change std::exit to std::_Exit ( #16280 )
...
* change abort to exit
* change to std::_Exit
2021-06-08 00:14:17 -07:00
Lixin Wei
75196cf7f4
[scheduler] Clean up TaskRequest ( #16288 )
2021-06-07 11:38:34 -07:00
SangBin Cho
f867c27eda
[Object spilling] Fix race condition that deletes files at the wrong timing. ( #16153 )
...
* Error fix.
* remove debug code
* Add unit test
* Fix a test failure
2021-06-07 09:56:55 -07:00
Eric Liang
1d8cb2d19e
Add event stats documentation, fix misc race condition ( #16236 )
...
* update
* stats
* udpate
* fix
2021-06-06 12:44:30 -07:00
Stephanie Wang
dd73e8d31b
[core] Add object store debug information ( #16232 )
...
* debug
* todo
* periodic dump
* Build and debug
* x
* debug
* more debug
2021-06-04 19:42:00 -07:00
yncxcw
e13509075d
[Core] Make the the exit type explict for workers being killed TryKillingIdleWorkers ( #16211 )
2021-06-04 18:23:36 -07:00
Lixin Wei
59a2879216
[New Scheduler] Remove Useless Fields in Cluster Resource Data ( #16254 )
...
* non-tests done
* test modifed
2021-06-04 18:00:13 -07:00
Eric Liang
527d51b83a
Allow configuring internal config with RAY_{name} env vars.
2021-06-04 15:37:31 -07:00
Lixin Wei
cf58cd76c7
[Logging] Disable Core Dumps in Fatal Logging ( #16189 )
2021-06-04 11:44:08 -07:00
Eric Liang
608991999c
Fix release resources race that leads to extra worker launches ( #16184 )
2021-06-03 18:35:45 -07:00
Eric Liang
a9db4e62cb
Unlimited plasma allocations by falling back to a filesystem allocator (off by default) ( #16097 )
2021-06-03 18:35:09 -07:00
SangBin Cho
611da62739
Fix atof bug ( #16140 )
2021-06-02 10:25:25 -07:00
Stephanie Wang
ce25d4e896
[core] Record Plasma object sources and dump on out of memory ( #16179 )
...
* debug
* lint, build
* clean up logs
* fix build
2021-06-02 10:04:15 -07:00
DK.Pino
9497a65a57
commit ( #16183 )
2021-06-02 06:50:04 -07:00
Lixin Wei
113c7fdecc
[core] Fix ResourceMapToTaskRequest ( #16172 )
2021-06-01 12:20:03 -07:00
Alex Wu
de0f856b68
[namespaces] Isolation for named placement groups ( #16000 )
2021-06-01 05:50:19 -07:00
Chong-Li
d5d0072635
Refactor RayletBasedActorScheduler ( #16018 )
2021-05-31 15:28:00 +08:00
Lixin Wei
3d37e3a315
[Refactor] Replace FractionalResourceQuantity with FixedPoint ( #16052 )
...
* refactor
* fix
* fix compilation
* fix
* fix cross-platform compilation
* lint
* fix test
* Revert "fix test"
This reverts commit 0ff23b125ce4159b91cc170dbc17b5ed70c9ab11.
* change rounding to truncating
* Update BUILD.bazel
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2021-05-28 09:32:51 -07:00
SangBin Cho
d0dc9abdfc
[Plasma store] Improve the OOM logging message. ( #16051 )
2021-05-27 10:09:58 -07:00
Yi Cheng
5d0b302121
[core] Trigger global gc when plasma store is under pressure. ( #15775 )
2021-05-27 10:07:59 -07:00
Tao Wang
881e4913f1
Don't broadcast empty resources data ( #16104 )
2021-05-27 10:06:32 -07:00
DK.Pino
ea0ee86063
[Placement Group]Fix actor scheduling with Placement Group bug. ( #16006 )
2021-05-26 22:16:38 -07:00
Eric Liang
2f4628fdb4
Fix CHECK_FAIL when scheduling task with duplicate object requests ( #16063 )
2021-05-26 15:13:16 -07:00
Stephanie Wang
55bb1e93b4
[core] Wait for objects to be sealed before throwing OutOfMemory ( #15955 )
...
* Wait for objects to seal
* x
* comments
* error code
2021-05-26 14:18:32 -07:00
Eric Liang
3d1ba4a70e
Add feature flag for plasma overcommit ( #16061 )
2021-05-26 10:53:57 -07:00