Commit graph

2383 commits

Author SHA1 Message Date
Lixin Wei
d287fc941b
[Core] Add Running Count to instrumented_io_context (#17664) 2021-08-12 13:56:40 -07:00
Chen Shen
9565fa549e
[Core][RFC] limit the total number of inlined bytes in task request rpc
Co-authored-by: Clark Zinzow <clarkzinzow@gmail.com>
2021-08-12 13:55:54 -07:00
Eric Liang
ce171f10a1
Remove legacy plasma unlimited and pull manager pinning flag (#17753) 2021-08-11 20:19:12 -07:00
Qing Wang
6d6a1ea43e
Support reading system configs from native in Java. (#17703)
* Support reading system configs from native in Java.

* Fix lint

* Lint cpp

* Fix Java cases.

* Address comments.

* Address comments.
2021-08-12 10:06:01 +08:00
Yi Cheng
02e79f3fe5
Revert "[Observability] Export useful metrics (#17578)" (#17752)
This reverts commit bd4db53df2.
2021-08-11 12:21:50 -07:00
SongGuyang
4176e43ef2
Remove binary printing from RAY_CHECK log (#17728) 2021-08-11 18:32:12 +08:00
Yi Cheng
bd4db53df2
[Observability] Export useful metrics (#17578)
* up

* up

* up

* up

* up

* up

* up

* up

* up

* up

* up

* up

* up

* checkpoint

* up

* up

* up

* up

* fix

* up

* up

* up

* up

* up

* up

* up

* up

* up

* up

* add comments

* up

* up

* up

* up

* add tests
2021-08-10 17:14:42 -07:00
SongGuyang
63c15d7ced
[core] make 'PopWorker' to be an async function (#17202)
* make 'PopWorker' to be an async function

* pop worker async works

* fix

* address comments

* bugfix

* fix cluster_task_manager_test

* fix

* bugfix of detached actor

* address comments

* fix

* address comments

* fix aioredis

* Revert "fix aioredis"

This reverts commit 041b983eac95b105ab0e853e84c4cf2647008431.

* bug fix

* fix

* fix test_step_resources test

* format

* add unit test

* fix

* add test case PopWorkerStatus

* address commit

* fix lint

* address comments

* add python test

* address comments

* make an independent function

* Update test_basic_3.py

Co-authored-by: Hao Chen <chenh1024@gmail.com>
2021-08-10 17:03:17 -07:00
SangBin Cho
6160c06c69
[Core] Fix a bug where get_actor crashes gcs if the actor is already killed. (#17670)
* Fix a bug where get_actor crashes gcs if the actor is already killed.

* Test the restart code path.

* Add an additional test

* Add a comment

* addressed code review.
2021-08-10 09:58:09 -07:00
Yi Cheng
473740b739
[gcs] Fix actor killing hang due to race condition (#17634)
* Revert "Revert "[gcs] Fix actor killing race condition (#17456)" (#17599)"

This reverts commit 381ffdb6d0.

* update

* format

* up
2021-08-09 21:11:26 -07:00
qicosmos
05da724521
[C++ Worker] Replace Ray::xxx with ray::xxx and update namespaces (#17388) 2021-08-10 11:17:59 +08:00
wanxing
8312628c30
Remove unused Spill function (#17607) 2021-08-09 10:10:03 -07:00
Tao Wang
5990b60f8b
[Core]Cache named actor in local in case of getting them from GCS frequently. (#17339)
* [Core]Cach named actor in local in case of getting them from GCS frequently

* lint

* fix nullptr

* typo

* add namespace to cache

* lint

* lock, reference and others

* lint

* fix comments and add test

* lint

* lint

* optimize test

* add necessary fields in pub for caching

* add removing test

* fix test
2021-08-09 14:01:57 +08:00
Hao Chen
0858f0e4f2
Change core worker C++ namespace to ray::core (#17610) 2021-08-08 23:34:25 +08:00
SangBin Cho
654718902f
Fix (#17660) 2021-08-07 18:07:27 -07:00
Qing Wang
4cc34588db
[Core] Support ConcurrentGroup part1 (#16795)
* Core change and Java change.

* Fix void call.

* Address comments and fix cases.

* Fix asyncio
2021-08-07 22:41:33 +08:00
SangBin Cho
4616e8a03c
Fix wrong invariant pubsub (#17620)
* ip

* loose check failure

* Fix the bug properly.

* Fix comments.
2021-08-06 14:14:54 -07:00
liuyang-my
12bd904594
[Serve] Define BackendConfig protobuf and adapt it in Java (#17201) 2021-08-06 09:50:45 -07:00
Zhi Lin
82123123c4
[object store] Java API for Assign the object owner in Ray.put() (#17237)
Co-authored-by: Qing Wang <kingchin1218@126.com>
Co-authored-by: Kai Yang <kfstorm@outlook.com>
2021-08-06 15:26:59 +08:00
Stephanie Wang
a06d71477f
[core] Do not spill back tasks blocked on args to blocked nodes (#17550)
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2021-08-05 20:43:32 -07:00
Chen Shen
920a4e3d56
[core] Improve fatal message for fallback allocation (#17595) 2021-08-05 17:58:45 -07:00
Chen Shen
4ff35d43b3
[object store refactor 3/n] introduce object_store (#17332)
refactor-allocator

add object_store
2021-08-05 17:36:27 -07:00
SangBin Cho
8bc9286296
Remove an unused profile event code from object manager. (#17529)
* Remove an unused profile event code from object manager.

* Addressed code review.

* Temporarily skip a test

* lint
2021-08-05 17:13:16 -07:00
SangBin Cho
381ffdb6d0
Revert "[gcs] Fix actor killing race condition (#17456)" (#17599)
This reverts commit 521457b51b.
2021-08-05 15:54:03 -07:00
architkulkarni
e84ae6caa5
[Core] [runtime env] Avoid spurious worker startup (#17422) 2021-08-05 15:46:23 -05:00
Eric Liang
8ff3fce4ba
Add a warning if the number of queued tasks to an actor exceeds 5k (#17581) 2021-08-05 12:03:48 -07:00
SongGuyang
79bec61e12
[event] support WithField option in RAY_EVENT api (#17476) 2021-08-05 20:45:55 +08:00
Eric Liang
6db63990af
Don't capture child tasks in placement groups by default (#17527) 2021-08-04 16:09:45 -07:00
Chen Shen
53a0c74413
[nightly-test] fix non_streaming_shuffle_1tb_5000_partitions 2021-08-04 16:06:53 -07:00
architkulkarni
63708468df
[runtime env] [Doc] Runtime env doc and messaging improvements (#17547) 2021-08-04 12:28:42 -07:00
Yi Cheng
521457b51b
[gcs] Fix actor killing race condition (#17456) 2021-08-04 10:37:56 -07:00
Lixin Wei
a2b0d2f99f
[Core] Add Back Pressure to GCS's gRPC Server (#17427) 2021-08-04 10:36:39 -07:00
SongGuyang
3e42f54910
Support copyright format for c++ files (#14348) 2021-08-04 17:19:38 +08:00
Eric Liang
cb48f3a712
Be more conservative in warning about too many workers (#17531) 2021-08-03 22:30:18 -07:00
Eric Liang
fbd3f11533
OBOD log source error properly 2021-08-02 20:57:01 -07:00
Lixin Wei
6f4c8ebdb2
[Core] Rmove the GetActorIfno RPC for Current Actor When Creating Actors (#17334) 2021-08-01 22:10:40 -07:00
Chen Shen
1b89fa8624
[object store refactor 2/n] More refactor on PlasmaAllocator, and add unit tests 2021-08-01 22:10:03 -07:00
Chen Shen
96c69f8c77
[object store refactor 1/n] Introduce IAllocator and PlasmaAllocator (#17307)
* initial commit

* address comments
2021-07-30 19:08:20 -07:00
Stephanie Wang
c9a2046287
[core] Update error message for hanging ray.get (#17449)
* Update error message

* x
2021-07-30 17:57:10 -07:00
Jiao
d67c57007b
change placement group report size to 1k (#17216)
Co-authored-by: Jiao Dong <jiaodong@anyscale.com>
2021-07-30 11:29:41 -07:00
Chen Shen
32803b53b0
Fix potential dead-lock (#17396) 2021-07-30 11:28:49 -07:00
wanxing
705248f4ee
[CoreWorker]Remove plasma_objects_only parameter (#17384) 2021-07-30 14:48:36 +08:00
Tao Wang
411c49746d
Remove deprecated HEARTBEAT table (#17405)
* Remove deprecated HEARTBEAT table

* incr by 1
2021-07-29 10:14:59 -07:00
Edward Oakes
7007c6271d
[runtime_env] Gracefully fail tasks when an environment fails to be set up (#17249) 2021-07-28 15:25:02 -05:00
Yi Cheng
72abf81900
[gcs] Fix GCS related issues: ByteSizeLong and redis connection (#17373) 2021-07-28 13:01:54 -07:00
Simon Mo
4a4210a083
Support streaming output of runtime env setup to logger/driver (#17306) 2021-07-27 16:39:15 -07:00
fyrestone
57b9b1bb0f
[Dashboard] Use a dedicated RPC to check the GCS is alive (#16330)
* Dashboard check gcs is alive

* Fix dashboard hangs at exit

* ray health-check call GCS CheckAlive

* Minor fixes

Co-authored-by: 刘宝 <po.lb@antfin.com>
2021-07-27 14:05:44 +08:00
DK.Pino
684e2b28e9
Placement group bug fix (#17320) 2021-07-26 21:03:35 -07:00
Tao Wang
d98ec7fc4d
Remove libray_redis_module (#17283) 2021-07-25 23:15:29 -07:00
Lixin Wei
ded239205f
[Core] Close RPC Server After GcsHearbeatManager (#17238) 2021-07-23 09:12:13 -07:00