Commit graph

2123 commits

Author SHA1 Message Date
Simon Mo
908aa2c7f3
Fix runtime env and dispatch queue take 2 (#17163) 2021-07-20 10:24:08 -07:00
Kai Yang
f0c148b158
[Core] Simplify the code to read env variables in RayConfig (#16775)
* Simplify the code to read env variables in RayConfig

* simplify

* Correctly print config type

* Change to lower case

* fix template specialization

* lint
2021-07-20 08:40:16 -07:00
SangBin Cho
d6b6356173
[Core] Properly call shutdown instead of deleting a reference (#17096)
* Properly call shutdown instead of deleting a reference

* Add unit tests

* Add test ray shutdown

* Formatting

* format2

* Revert main logic to see if windows issue still fail

* Skip tests for windows.

* formatting

* Try fixing flakiness

* Remove node removed code path

Co-authored-by: Eric Liang <ekhliang@gmail.com>
2021-07-20 08:22:33 -07:00
Siyuan (Ryans) Zhuang
8efc04a8a6
[Core] Actor namespace (#17178)
* set actor namespace in Python on creation

* get actor with namespace in Python

* update message
2021-07-19 21:51:04 -07:00
Chen Shen
b26fcd3fce
fix spill bug (#17187) 2021-07-19 17:44:12 -07:00
Chen Shen
80e013f342
[core] Fix SIGABRT on erase call (#17140) 2021-07-19 11:42:38 -07:00
SangBin Cho
bfc9e5c36f
[Logs] Clean core worker logs (#17033)
* Ready

* Formatting

* Fix

* addressed review.
2021-07-19 11:25:41 -07:00
Qing Wang
195cdcf5b8
Fix memory leak in JNI. (#17177)
Co-authored-by: Qing Wang <jovany.wq@antgroup.com>
2021-07-19 14:06:30 +08:00
Amog Kamsetty
8dfd471823
Revert "Revert "[Dashboard][event] Basic event module (#16985)" (#17068)" (#17107)
This reverts commit c17e171f92.

Co-authored-by: 刘宝 <po.lb@antfin.com>
2021-07-18 12:59:04 +08:00
Clark Zinzow
8302b5a335
[Core] Reverts full dispatch queue iteration PRs. (#17127)
* Revert "[Core] iterate over entire dispatch queue instead of returning when worker unavailable (#16535)"

This reverts commit 54d66ac637.

* Revert "[Core] [runtime env] [Tests] Add C++ unit test for dispatch queue nonblocking behavior (#16751)"

This reverts commit 13a133817b.

* Revert failing runtime_env test.
2021-07-16 10:28:00 -07:00
SongGuyang
dcb1baabd7
[C++ API] support loading C++ dynamic libraries from code search path (#16828) 2021-07-16 13:02:45 +08:00
Chen Shen
c39571a1f2
Fix GCS shutdown order (#17135) 2021-07-15 17:41:19 -07:00
Yi Cheng
138676295f
[core] Add bundle id as a label; (#16819)
* check

* up

* up

* up

* up

* up

* up

* format

* up

* up

* add test

* format

* up

* format

* up

* format

* up

* up

* up

* rollback

* uncomment

* format

* fix comments

* fix mac build
2021-07-15 16:05:42 -07:00
Lixin Wei
06f6f4e0ec
[Core] Limit Batch Size When Broadcasting Resources (#17072) 2021-07-15 14:28:57 -07:00
Stephanie Wang
bdaa96bf43
[core] Fix bugs in worker cleanup on driver exit (#17049)
* unit test

* cleanup test

* Don't kill workers when job finishes

* better test

* lint

* lint

* comment

* check
2021-07-15 12:53:51 -07:00
Chen Shen
ba70d8dbc6
[RFC] Fix object size inconsistency caused by object-marked-failed. (#16976) 2021-07-14 23:33:36 -07:00
chenk008
42e6c9b020
[Core] Use shim process in dedicated_workers_to_tasks (#17076)
* use shim process in dedicated_workers_to_tasks

* lint
2021-07-15 13:50:54 +08:00
Chen Shen
92f19170ab
[error message] change noisy missing object error to debug (#17081) 2021-07-14 12:36:30 -07:00
Amog Kamsetty
c17e171f92
Revert "[Dashboard][event] Basic event module (#16985)" (#17068)
This reverts commit f1faa79a04.
2021-07-13 23:18:43 -07:00
Chen Shen
645d8fcaf0
[logging][rfc] add RAY_LOG_EVERY_N and RAY_LOG_EVERY_MS (#17018)
* introduce log-every-n

* add n

* linter

* add license
2021-07-13 19:14:28 -07:00
fyrestone
f1faa79a04
[Dashboard][event] Basic event module (#16985)
* Basic event module

* Fix comments

* Set the SCAN_EVENT_DIR_INTERVAL_SECONDS defaults to 2

* Fix lint

* Fix lint

* Clean code

* Try to fix flaky

* Fix test

* Disable event module by default

* Make monitor events task cancellable

* Fix error

Co-authored-by: 刘宝 <po.lb@antfin.com>
2021-07-13 19:08:39 -07:00
Edward Oakes
f7759fa484
[core] Add ray.util.list_actors() API (#16642) 2021-07-13 10:00:28 -05:00
Tao Wang
5b7e76770d
[Java] Use gcs client instead of redis client to get session dir (#16773)
* Use gcs client instead of redis client to get session dir

* fix compile and add comments

* fix compile

* lint

* fix

* lint

* lint

* Update src/ray/gcs/gcs_client/global_state_accessor.h

Co-authored-by: Qing Wang <kingchin1218@126.com>

* Update java/runtime/src/main/java/io/ray/runtime/RayNativeRuntime.java

Co-authored-by: Qing Wang <kingchin1218@126.com>

* per comment

Co-authored-by: Qing Wang <kingchin1218@126.com>
2021-07-13 14:01:22 +08:00
chenk008
f7bcfc5324
[Core] add scheduler_cpu_share_enabled (#16920) 2021-07-12 20:04:32 -07:00
Eric Liang
7a1e8fdb8b
Cleanup info logs in raylet (#17015) 2021-07-12 19:43:44 -07:00
Qing Wang
4bde71ca86
[Java][Core] Support get current actor handle. (#14900) 2021-07-12 15:27:54 -07:00
Amog Kamsetty
a14342ce6f
Revert "[Dashboard][event] Basic event module (#16698)" (#17004)
This reverts commit 66ea099897.
2021-07-12 11:22:46 -07:00
qicosmos
298d2afc35
[Ray Log] remove glog dependency (#16077) 2021-07-12 17:06:52 +08:00
fyrestone
66ea099897
[Dashboard][event] Basic event module (#16698)
* Basic event module

* Fix comments

* Set the SCAN_EVENT_DIR_INTERVAL_SECONDS defaults to 2

* Fix lint

* Fix lint

* Clean code

* Try to fix flaky

* Fix test

* Disable event module by default

Co-authored-by: 刘宝 <po.lb@antfin.com>
2021-07-09 10:25:30 -07:00
Chen Shen
ae6e5db927
Fix minor message in plasma (#16953) 2021-07-07 20:30:59 -07:00
Kai Yang
e925051ce4
[Core] Get node to connect for driver in global state accessor (#16810) 2021-07-08 11:21:12 +08:00
Chen Shen
0421fa188e
[core] use fallocate for fallback allocation to avoid SIGBUS (#16824) 2021-07-07 14:50:11 -07:00
Chen Shen
dbd3260141
[core] Deprecate QuotaAwareEvictionPolicy (#16911) 2021-07-07 13:44:41 -07:00
Eric Liang
639a29437b
Add debug message to buffer check fail (#16924) 2021-07-06 23:37:51 -07:00
mwtian
18d126192e
Deprecate StringIdMap::Remove() (#16888) 2021-07-06 22:46:25 -07:00
Eric Liang
5a94632f8d
Revert the gRPC-based resource broadcast due to check failures during cluster autoscaling (#16910) 2021-07-06 18:47:02 -07:00
Kai Yang
7c21be5450
[Object spilling] Clean up spilled objects on disk when Raylet starts (#16669) 2021-07-05 12:01:25 +08:00
Eric Liang
172b54382d
Fix log level on push failure (#16862) 2021-07-03 14:14:37 -07:00
SangBin Cho
61451af06b
[OBOD] Bug fix from test_scheduling.py (#16791) 2021-07-02 19:26:31 -07:00
architkulkarni
f02e41a822
[Core] [runtime env] Add RuntimeEnvHash and JobID to SchedulingKey (#16766)
* add python integration test

* improve readability

* remove unneccessary ray start --head

* add shutdown_only

* move RuntimeEnvHash from worker_pool to task_spec

* lint

* Add runtimeEnvHash and JobID to SchedulingKey

* remove JobID from key and hopefully fix compile

* add test for same env

* lint
2021-07-02 18:15:28 -07:00
Eric Liang
0841b6d2de
Remove stray error message (#16796) 2021-07-01 10:11:48 -07:00
Lixin Wei
e00d898b75
[Core] Lightweight Resource Report for New Scheduler (#16527)
* check resource diff

* fix

* fix

* comment modified

* fix
2021-06-30 21:27:29 -07:00
architkulkarni
13a133817b
[Core] [runtime env] [Tests] Add C++ unit test for dispatch queue nonblocking behavior (#16751) 2021-06-29 20:16:17 -07:00
Alex Wu
d89f148fbf
[Pubsub] Don't depend on subscriber address (#16752)
* remove subscriber address

* .

* lint

* test

* done

* lint

* .

* Update BUILD.bazel

Co-authored-by: Alex <alex@anyscale.com>
2021-06-29 17:34:37 -07:00
SangBin Cho
3cde8c36c9
Properly update the pinned object size (#16476) 2021-06-29 17:00:19 -07:00
chenk008
c318293d9f
[Core] start worker in container (#16671) 2021-06-29 10:12:47 -07:00
SangBin Cho
804a867b3d
Revert revert OBOD pubsub PR (#16487)
* Revert "Revert "[Pubsub] Use a pubsub module for Ownership based object directory (#16407)" (#16486)"

This reverts commit b986938f0f.

* revert the obod problem.

* Add stats.

* Fix a possible regression.

* in another progress

* debugging

* Fix stats bug

* update

* Add more stats.

* Add stats

* lint

* Fix issue

* remove spammy logs

* lint

* better error msg for debugging

* Add even more logging

* Remove spammy logs

* Fix iterator invalidation issue

* more debugging info

* fix

* Add more debug logs

* add debug logs

* Remove the problematic line for confirmation

* Completed

* Fixed a broken test.

* experiment

* Lint

* Add a better error message

* try out

* revert the build file.

* In progress again

* IP

* Formatting

* Revert the log level

* Unskip test array

* final clean up.

* fix a build issue

* debug logs

* remove

* .

* Add more critical logs.

* format

* tmp

* log

* log

* issue fix

* Upgrade

* test experiment

* Fix an issue

* Fix issues.

* Lint

* remove unnecessary code

* last clean up.

Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>
2021-06-28 20:30:31 -07:00
Eric Liang
3f5ce01949
Address leftover comments from https://github.com/ray-project/ray/pull/16394/files (#16684) 2021-06-25 16:45:50 -07:00
Eric Liang
9b17c35bee
Fix PullManager handling of get requests and liveness issues (#16394) 2021-06-25 13:01:46 -07:00
architkulkarni
06dfd8dddb
Revert "[Dashboard][event] Basic event module (#16283)" (#16676)
This reverts commit 5afa53aa64.
2021-06-25 09:38:18 -07:00