Commit graph

2157 commits

Author SHA1 Message Date
Tao Wang
2affe97f1a
[Core][Minor]Remove the hard check when disconnect GCS client (#16572) 2021-06-22 09:29:25 +08:00
Alex Wu
9b5c0c32da
Same worker id in python and c++ (#16568)
* .

* .

* test

Co-authored-by: Alex Wu <alex@anyscale.com>
2021-06-21 13:22:52 -07:00
Eric Liang
a0da009645
Allocate inbound object chunks using CreateRequestQueue instead of immediate allocation (#16523) 2021-06-20 09:22:12 -07:00
Alex Wu
319d4fb164
Job timestamp should always be in milliseconds (fixed) (#16548)
* .

* Revert "Revert "Job timestamp should always be in milliseconds (#16455)" (#16545)"

This reverts commit 5030ed8588.

* .

* .

* .

Co-authored-by: Alex Wu <alex@anyscale.com>
2021-06-18 17:07:21 -07:00
Amog Kamsetty
416cf3a2e7
Revert "Revert "Enable TryCreateImmediately to use the fallback allocation" (#16542)" (#16544)
This reverts commit 36fd741e6f.
2021-06-18 15:39:37 -07:00
Alex Wu
5030ed8588
Revert "Job timestamp should always be in milliseconds (#16455)" (#16545)
This reverts commit 1df19a04fe.
2021-06-18 12:37:05 -07:00
Amog Kamsetty
36fd741e6f
Revert "Enable TryCreateImmediately to use the fallback allocation" (#16542)
This reverts commit 41cf2e3d50.
2021-06-18 12:22:18 -07:00
architkulkarni
54d66ac637
[Core] iterate over entire dispatch queue instead of returning when worker unavailable (#16535) 2021-06-18 13:25:45 -05:00
Eric Liang
41cf2e3d50
Enable TryCreateImmediately to use the fallback allocation 2021-06-18 10:49:34 -07:00
architkulkarni
6498ca3995
[Core] [runtime env] Don't delete working_dir from runtime env (#16475) 2021-06-18 10:15:20 -05:00
Stephanie Wang
5eb51c8b26
[core] Make object directory robust to out-of-order updates (#16314)
* Sequence ops

* id

* fix

* lint
2021-06-17 20:40:35 -07:00
Alex Wu
6696c0c165
Revert "[Placement Group] Support infeasible placement groups for Placement Group. (#16188)" (#16509)
This reverts commit 7f91cfedd5.
2021-06-17 11:04:01 -07:00
Alex Wu
1df19a04fe
Job timestamp should always be in milliseconds (#16455)
* .

* .

* .

* .

Co-authored-by: Alex Wu <alex@anyscale.com>
2021-06-17 00:05:55 -07:00
Tao Wang
2523072a3d
[large scale]Use gcs client instead of redis client to increase job id (#16190)
Co-authored-by: Alex Wu <itswu.alex@gmail.com>
2021-06-17 15:01:32 +08:00
DK.Pino
7f91cfedd5
[Placement Group] Support infeasible placement groups for Placement Group. (#16188)
* init

* update comment

* update logical

* ut failing

* compile passing

* add ut

* lint

* fix comment

* lint

* fix ut and typo

* fix ut and typo

* lint

* typo
2021-06-16 21:48:39 -07:00
Alex Wu
45357ff590
[core] Fix multi-node placement group/job config bugs (#16345)
* .

* .

* seems to work?

* seems to work?

* .

* implement delete

* implement delete

* .

* tests

* .

* .

* .

* fix

* .

* .

* .

* .

* fix

* fix

* bump timeout

* bump timeout

* .

Co-authored-by: Alex Wu <alex@anyscale.com>
2021-06-16 21:12:20 -07:00
Eric Liang
3209084213
Fix fd reuse errors with plasma fallback allocation (#16451) 2021-06-16 19:28:23 -07:00
Amog Kamsetty
b986938f0f
Revert "[Pubsub] Use a pubsub module for Ownership based object directory (#16407)" (#16486)
This reverts commit 90599d3562.
2021-06-16 15:38:11 -07:00
Tao Wang
1a1b0da8c9
Run fn in specified io service completely (#15539) 2021-06-16 14:53:17 -07:00
Clark Zinzow
00eb833de2
[Core] Stopgap fix for async actor lost object bug, and adds reproduction as test. (#16414)
* Support asyncio with max_concurrency == 1.

* Added test that reproduces lost object error.

* Create a fiber thread per caller instead of sharing a fiber thread among all callers.

* Formatting.

* Remove debug print statement.

* Try to accomodate dumb stupid linter that apparently doesn't know that async list comprehensions landed in Python 3.6, let alone await in list literals.
2021-06-16 12:39:45 -07:00
SangBin Cho
90599d3562
[Pubsub] Use a pubsub module for Ownership based object directory (#16407)
* in progress

* In progress 2

* progress

* OBOD pubsub done

* Fix

* Fix a bug.

* Clean up getObjectLocationOwner

* Fix a build issue.

* Lint issue.

* test fix in progress

* continue debugging

* in progress

* Fix issues again.

* Formatting

* formating

* fix issues.

* Revert "fix issues."

This reverts commit 2da577e68abc6278e03d64a60e8b96c3136145bf.

* Fix a critical bug.

* Revert "Revert "fix issues.""

This reverts commit 6546ecbd1eb9798de0bf990b30b85a3ca3e5b4ad.

* Addressed code review.
2021-06-16 09:15:13 -07:00
Eric Liang
1ef207abb6
Call Unblockifneeded (#16422) 2021-06-15 08:40:23 -07:00
Chong-Li
500248163f
[GCS] Fix: bookkeeping normal task resources in GCS (#16371) 2021-06-15 21:13:25 +08:00
Eric Liang
992437eafe
Yield plasma lock to other threads during long-running gets (#16408) 2021-06-14 16:23:05 -07:00
Simon Mo
5f4495108e
Fix macOS compilation (#16412) 2021-06-14 13:30:38 -07:00
SangBin Cho
b4e2ca39f9
[Pubsub] Using OBOD command batch for both reference counting and wait for object eviction (#16334)
* In progress/

* Basic implementation for wait for object eviction done

* Port ref count

* Fixing tests.

* Fix unit testse and remove unnecessary code

* In progress with ref count test

* Command batch done.

* done.

* Add a implementation note

* Fix all issues.

* Addressed the first batch of code review.

* one last thing; fix unit test

* Fix all issues.

* Fix a type issue.

* Fix the type issue
2021-06-14 10:10:35 -07:00
Eric Liang
f93ca2b673
Make it much simpler to turn on event stats (#16401) 2021-06-14 09:51:24 -07:00
Eric Liang
acb439e8f2
Prioritize get requests over wait request, and disallow overcommit of wait requests in unlimited allocation mode (#16351) 2021-06-12 14:06:43 -07:00
Chen Shen
24e409f948
[spilled object push optimization 3/3] ObjectManager Push from Spilled Object (#16364) 2021-06-11 15:57:51 -07:00
Eric Liang
47bbca04be
Add fallback allocator stats to "ray memory" (#16362) 2021-06-10 18:33:59 -07:00
Chen Shen
dd677f367e
[spilled object push optimization 2/3] Refactor ObjectManager's Push for integrating with SpilledObject (#16352) 2021-06-10 16:29:19 -07:00
Eric Liang
b0b160b701
Make fallback directory for plasma configurable based on tempdir (#16361) 2021-06-10 14:55:10 -07:00
Alex Wu
83a458dbf2
[core] Resource broadcast cleanup (#16261)
* .

* use new protobuf

* .

* .

* add todo

* .

* comments

* lint

Co-authored-by: Alex Wu <alex@anyscale.com>
2021-06-10 10:27:28 -07:00
SangBin Cho
a3f82112d8
Fix test_reference_count_2 (#16353) 2021-06-10 10:12:39 -07:00
Chen Shen
54f9aef35b
[spilled object push optimization 1/3] create a SpilledObject that reads data in chunks. 2021-06-10 10:08:51 -07:00
Eric Liang
ae0e38b86d
Remove legacy feature flags / features (#16349) 2021-06-10 09:31:38 -07:00
Tao Wang
9741bc00c9
[Core]Limit job error message size (#16336) 2021-06-10 19:28:00 +08:00
Eric Liang
d390344a8f
Enable plasma fallback allocations by default (#16244) 2021-06-09 22:05:52 -07:00
Chen Shen
5fe03667b9
[RFC] add ray.util.get_locations() to look up objects' location. (#16130)
* Implement GetLocationFromOwner at CoreWorker that looks up the locations
for a list of object ids

* plumbing GetLocationAPI to CoWorker

* introduce primary_node_id in refcounter

* add python tests

* address comments

* fix linit

* remove C++ tests

* more tests

* add more tests

* linter

* lint

* lint

* address comments

* fix merge issue

* nits
2021-06-09 11:30:42 -07:00
Eric Liang
6c7147dc97
Fix active RPC tracking in event tracker 2021-06-09 10:53:30 -07:00
SongGuyang
874e947d6f
[runtime env] support create or delete runtime envs in agent (#15904) 2021-06-09 20:22:25 +08:00
SangBin Cho
1795e73cf2
Revert "Batch the AddSpilledURLs RPC (#16303)" (#16331)
This reverts commit deda35fb4a.
2021-06-09 00:33:57 -07:00
SangBin Cho
d9227d8506
[Pubsub] Pubsub module command batch part 1 (#16167)
* Basic command batch implemented.

* working.

* fix bugs.

* Improve a protobuf message.

* Update description of protobuf.

* Addressed code review.
2021-06-09 00:27:06 -07:00
Tao Wang
1c94906efc
[Test][Tiny]Check argv in right way (#16325) 2021-06-09 13:18:27 +08:00
Kai Yang
81be461ba2
[Core] Limit starting workers with maximum_startup_concurrency per worker type (#16214) 2021-06-09 13:11:53 +08:00
Eric Liang
deda35fb4a
Batch the AddSpilledURLs RPC (#16303) 2021-06-08 12:10:35 -07:00
Alex Wu
ae1cb12221
Revert "[GCS] Bookkeeping normal task resources in GCS (#16185)" (#16315)
This reverts commit f2384a9743.
2021-06-08 11:02:28 -07:00
Chong-Li
f2384a9743
[GCS] Bookkeeping normal task resources in GCS (#16185) 2021-06-08 19:58:15 +08:00
Lixin Wei
870a0c16a3
[Logging] Change std::exit to std::_Exit (#16280)
* change abort to exit

* change to std::_Exit
2021-06-08 00:14:17 -07:00
Lixin Wei
75196cf7f4
[scheduler] Clean up TaskRequest (#16288) 2021-06-07 11:38:34 -07:00