Commit graph

2383 commits

Author SHA1 Message Date
Stephanie Wang
d43d297d9a
[core] Attach call site to ObjectRefs, print on error (#17971)
* Attach call site to ObjectRef

* flag

* Fix build

* build

* build

* build

* x

* x

* skip on windows

* lint
2021-09-01 15:29:05 -07:00
Yi Cheng
d470e679df
[core] Add some mock headers for ray core (#18265)
* up

* up

* up

* format

* up

* up

* format
2021-09-01 13:04:35 -07:00
Jiajun Yao
fbb3ac6a86
Retry application-level errors (#18176)
* Retry application-level errors

* Retry application-level errors

* Push retry message to the driver
2021-09-01 10:53:06 -07:00
mwtian
be50c13251
[Client] Use a single RPC to fetch ClientObjectRefs passed in a list (#16944) 2021-08-31 16:31:13 -07:00
Guyang Song
be772df4dc
[Event] Add some error level events (#18118)
* add event 'RAY_WORKER_FAILURE' and 'RAY_DRIVER_FAILURE'

* add some events

* move event 'EL_RAY_NODE_REMOVED' to 'RemoveNode()'
2021-08-31 14:15:13 -07:00
SangBin Cho
d240d26525
[Object Spilling] Fix a bug where object url is empty. (#18193)
* Fix a bug

* Addressed code review.

* Fix a test
2021-08-31 10:10:28 -07:00
Stephanie Wang
8e06db7280
Revert "[Core] revert: revert Unified worker starter (#18008)" (#18228)
This reverts commit b9978dd02b.
2021-08-30 17:28:41 -07:00
SangBin Cho
2ee1b90c17
[Core] Batch obod location updates (#18016)
* Batch impl

* done

* Remove a client pool

* in progress

* Added unit tests.

* Handle owner failure case.

* Fix unit tests

* Addressed code review.
2021-08-30 11:04:08 -07:00
Eric Liang
1adce7da4e
Revert "Auto discover dashboard agent port (#17855)" (#18217)
This reverts commit 53ddb551d5.
2021-08-30 10:46:37 -07:00
SangBin Cho
0e968c1e82
[Core] Reduce spilling threshold (#17910)
* Lower the threshold

* ip

* Handle test failure

* lint

* last fix

* .

* Retry
2021-08-30 00:09:35 -07:00
fyrestone
53ddb551d5
Auto discover dashboard agent port (#17855) 2021-08-30 12:06:28 +08:00
Stephanie Wang
7bc1ef0dd9
[core] Prestart workers up to available CPU limit (#18166)
* Prestart workers according to num available CPUs

* lint

* Prestart min(available CPU, backlog)

* Fix test, adjust policy

* debug

* retry

* lint
2021-08-29 14:11:53 -07:00
mwtian
26679d62c5
[Core][ObjectRef] Change default to not record call stack during ObjectRef creation (#18078) 2021-08-27 15:45:34 -07:00
SangBin Cho
a25cc47399
[Core] Set keepalive only at gcs (#18086) 2021-08-27 01:26:51 -07:00
Edward Oakes
5c4c735119
[runtime_env] Make log message when deleting runtime_env INFO instead of ERROR (#18083) 2021-08-26 15:21:59 -05:00
SangBin Cho
405418f8e8
[Object Spilling] Unpin before updating URL (#17994)
* Unpin before updating URL

* Remove unnecessary logs.

* update compiling issue

* Check the consistent local state instead of stale information from obod.

* Fix the test

* Addressed code review.
2021-08-26 10:23:53 -07:00
Chen Shen
a29b157e2e
[core] better error message for lost objects (#18068) 2021-08-26 00:03:29 -07:00
Tao Wang
15a7514cf6
[Core] Some request counts are missing in debug info (#18069) 2021-08-25 14:02:03 -07:00
Guyang Song
16502cc438
[Event] support multi-thread context copy (#17919) 2021-08-25 14:03:20 +08:00
Tao Wang
0b5f5890f7
[Named Actor] Throw RayException when getting named actor timed out (#17998)
* [Named Actor]throw RayException when getting named actor timed out

* lint

* correct the message

* lint

* nice catch
2021-08-25 13:50:53 +08:00
Yi Cheng
995d3cb487
Update id_specification.md (#18035) 2021-08-24 10:49:56 -07:00
Alex Wu
6e3dd7b3cf
Revert "[Core]make thread of client manager in gcs server configurable (#17978)" (#18041)
This reverts commit f0edbf0d30.
2021-08-24 07:57:59 -07:00
Qing Wang
7c1f14ddd8
Do not connect in constructor to avoid potential risk. (#17916)
* Do not connect in ctor.

* Fix lint.

Co-authored-by: Qing Wang <jovany.wq@antgroup.com>
2021-08-24 16:41:30 +08:00
wanxing
abb46de4dc
[object store refactor 5/n] Add eviction policy tests (#17984)
* add eviction policy tests

* fix object_lifecycle_manager_test build

* make IsObjectExists private
2021-08-24 00:50:28 -07:00
Tao Wang
f0edbf0d30
[Core]make thread of client manager in gcs server configurable (#17978) 2021-08-24 11:27:35 +08:00
chenk008
b9978dd02b
[Core] revert: revert Unified worker starter (#18008) 2021-08-23 13:34:32 -07:00
Clark Zinzow
5ca28b1cc8
[Core] Update Bazel (to 3.4.1), gRPC, boringssl, and absl as a precursor to gRPC streaming PR. (#17903)
* Update Bazel (to 3.4.1), gRPC, boringssl, absl.

* Always reinstall Bazel if needing to upgrade to a new Bazel version.

* Add patch for properly detecting Windows Python headers when building gRPC.

* Add minimum Bazel version check.

* Update docs with new Bazel version.
2021-08-21 11:33:11 -07:00
Edward Oakes
b969aa3c80
[dashboard] Don't start dashboard agent when missing dependencies (#17966) 2021-08-21 01:04:21 -07:00
Lixin Wei
05502da271
Add dispatch proxy to event tracker (#17983) 2021-08-20 15:32:10 -07:00
SangBin Cho
cd42d30d7b
[Core] Removing GCS object directory from raylet (#17962) 2021-08-20 12:57:16 -07:00
Stephanie Wang
b8fe776638
[core] Fix inlined nested ids (#17834)
* test

* Use ObjectRef instead of ObjectID in nested refs

* java

* doc

* java

* build

* build

* x

* lint

* simplify

* fix
2021-08-20 08:58:29 -07:00
Eric Liang
236b772465
Revert "[GCS] GCS Based Actor Scheduler (#16580)" (#17941)
This reverts commit a9b4545502.
2021-08-19 21:46:52 -07:00
Eric Liang
661ac4e37b
Remove last traces of ref-counting flag (#17932) 2021-08-19 21:08:13 -07:00
Chen Shen
a16a25852a
[Core] fix event race condition (#17947) 2021-08-19 14:20:34 -07:00
Eric Liang
a9073d16f4
Revert "[Core] Unified worker initiators (#17401)" (#17935)
This reverts commit c3764ffd7d.
2021-08-18 18:06:24 -07:00
Chen Shen
89d83228f6
[Core][Plasma-store] add stats-collector that eagerly collect stats 2021-08-18 13:47:50 -07:00
Chong-Li
a9b4545502
[GCS] GCS Based Actor Scheduler (#16580) 2021-08-18 13:44:59 -07:00
Eric Liang
5536c5fff6
Add namespace argument to Ray client get actor call (#17878) 2021-08-17 16:41:18 -07:00
Chen Shen
880797d5c2
[Core][Test] Add ubsan support for C++ tests (#17812)
* support ubsan

* update
2021-08-17 10:22:03 -07:00
chenk008
c3764ffd7d
[Core] Unified worker initiators (#17401)
* use setup_worker as starter

* use setup_worker as starter

* add java test

* fix

* fix

* lint

* sleep in ci

* sleep in ci

* fix ut

* fix

* fix

* fix

* fix

* fix

* fix

* change test size

* test

* fix

* fix

* fix ut

* restore sgd test

* change test size

* fix merge confict

* restore cpp worker flag

* fix

* fix

* add worker-languange in setup_runtime_env.py

* lint

* fix java command

Co-authored-by: root <chenk008>
2021-08-17 19:37:26 +08:00
Guyang Song
8227e24424
[event] event framework integration in raylet, gcs server and core worker (#17671) 2021-08-17 11:21:23 +08:00
Chen Shen
a9757a86b3
[Core] Fix nested ref count bug: add NestedIds to reference_counter once a task returns (#17802)
* add nested reference

* fix bug
2021-08-16 19:02:26 -07:00
Yi Cheng
03a82d733a
Revert "Revert "Export useful metrics"" (#17755)
* Revert "Revert "[Observability] Export useful metrics (#17578)" (#17752)"

This reverts commit 02e79f3fe5.

* Update metric.h

* up

* up

* Update server_call.h

* Update test_metrics_agent.py

* up

* fix comment
2021-08-16 17:05:56 -07:00
Ian Rodney
2f200e5c2b
[Client] Pass ray.init() args to the remote server (#17776) 2021-08-16 12:34:01 -07:00
Alex Wu
1209a87ead
[core] Remove push based resource report code path (#17825) 2021-08-16 12:03:38 -07:00
Chen Shen
b349c6bc4f
[object store refactor 4/n] object lifecycle manager (#17344)
* lifecycle

* address comments
2021-08-16 09:58:35 -07:00
qicosmos
a2a1c46c83
[C++ Worker]Fix for mac (#17633)
* linkopts shared

* replace gflags with absl flags

* fix

* add test option

* fix

* add cpp worker to mac ci

* fix

* support empty redis password;mod arc argv

* add encoding

* test

* ignore example test on mac

* support mac

* fix

* fix and update doc

* fix

* fix run.sh

* fix init

* fix typo

* fix run.sh

* fix lint

Co-authored-by: 久龙 <guyang.sgy@antfin.com>
2021-08-13 12:22:37 +08:00
SangBin Cho
21635b32e5
[Core] Fix the segfault (#17772) 2021-08-12 18:17:50 -07:00
Yi Cheng
e32d33f39c
Fix ray.init hanging due to failure. (#17732)
* up

* change to 30s

* up

* up

* format
2021-08-12 16:56:10 -07:00
wanxing
e4c8125c86
Make some function private (#17729)
* ReceiveObjectChunk

* more
2021-08-12 15:27:37 -07:00