Commit graph

117 commits

Author SHA1 Message Date
Qing Wang
1172195571
[Java] Remove global named actor and global pg (#20135)
This PR removes global named actor and global PGs.

I believe these APIs are not used widely in OSS.
CPP part is not included in this PR.
@kfstorm @clay4444 @raulchen Please take a look if this change is reasonable.


IMPORTANT NOTE: This is a Java API change and will lead backward incompatibility in Java global named actor and global PG usage.

CPP part is not included in this PR.
INCLUDES:

 Remove setGlobalName() and getGlobalActor() APIs.
 Remove getGlobalPlacementGroup() and setGlobalPG
 Add getActor(name, namespace) API
 Add getPlacementGroup(name, namespace) API
 Update doc pages.
2021-11-15 16:28:53 +08:00
Yi Cheng
e54d3117a4
[gcs] Update all redis kv usage in python except function table (#20014)
## Why are these changes needed?
This is part of redis removal project. In this PR all direct usage of redis got removed except function table.
Function table will be migrated in the next PR

## Related issue number
#19443
2021-11-10 20:24:53 -08:00
Tao Wang
60df705b4e
[Cpp]Get next job id globally instead of random selecting (#20102)
## Why are these changes needed?

## Related issue number
Final part of #13984

## Checks

- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [x] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [x] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(
2021-11-09 15:46:57 +08:00
Kai Yang
e84391d1d3
[Core] Encode job ID in randomized task IDs for user-created threads (#19320)
## Why are these changes needed?

Currently, when `WorkerContext::GetCurrentTaskID()` returns a random task ID in user-created threads, and the returned task ID doesn't include the job ID. In this case, subsequent non-actor tasks and return values, and objects created by `ray.put()` don't include the job ID neither. This makes us hard to find the correct job ID from a task or object ID.

This PR updates the task ID generation code to always encode the job ID.

A side-effect of this PR is the change of possibility of task ID collision in user-created threads due to the fixed job ID part. w/o this PR: `sqrt(pi * 256 ^ 12 / 2)` ~= 352 trillion tasks. w/ this PR: `sqrt(pi * 256 ^ 8 / 2)` ~= 5 billion tasks. But this should be OK because the job ID part of task IDs in non-user-created threads are always fixed, so it won't be worse than non-user-created threads.

## Related issue number

## Checks

- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(
2021-11-08 21:00:40 +08:00
Alex Wu
146b3d6bcc
[scheduler] Include depth and function descriptor in scheduling class (#20004) 2021-11-05 08:19:48 -07:00
qicosmos
246a901aea
[C++ API] Support object ref args (#19550) 2021-10-29 17:36:53 +08:00
qicosmos
efef38f240
[C++ Worker] Add basic ref counting test cases (#17768) 2021-10-29 11:22:19 +08:00
Qing Wang
048e7f7d5d
[Core] Port concurrency groups with asyncio (#18567)
## Why are these changes needed?
This PR aims to port concurrency groups functionality with asyncio for Python.

### API
```python
@ray.remote(concurrency_groups={"io": 2, "compute": 4})
class AsyncActor:
    def __init__(self):
        pass

    @ray.method(concurrency_group="io")
    async def f1(self):
        pass

    @ray.method(concurrency_group="io")
    def f2(self):
        pass

    @ray.method(concurrency_group="compute")
    def f3(self):
        pass

    @ray.method(concurrency_group="compute")
    def f4(self):
        pass

    def f5(self):
        pass
```
The annotation above the actor class `AsyncActor` defines this actor will have 2 concurrency groups and defines their max concurrencies, and it has a default concurrency group.  Every concurrency group has an async eventloop and a pythread to execute the methods which is defined on them.

Method `f1` will be invoked in the `io` concurrency group. `f2` in `io`, `f3` in `compute` and etc.
TO BE NOTICED, `f5` and `__init__` will be invoked in the default concurrency.

The following method `f2` will be invoked in the concurrency group `compute` since the dynamic specifying has a higher priority.
```python
a.f2.options(concurrency_group="compute").remote()
```

### Implementation
The straightforward implementation details are:
 - Before we only have 1 eventloop binding 1 pythread for an asyncio actor. Now we create 1 eventloop binding 1 pythread for every concurrency group of the asyncio actor.
- Before we have 1 fiber state for every caller in the asyncio actor. Now we create a FiberStateManager for every caller in the asyncio actor. And the FiberStateManager manages the fiber states for concurrency groups.


## Related issue number
#16047
2021-10-21 21:46:56 +08:00
Guyang Song
c04fb62f1d
[C++ worker] set native library path for shared library search (#19376) 2021-10-18 16:03:49 +08:00
Gagandeep Singh
d226cbf21a
Added StartupToken to idenitfy a process at startup (#19014)
* Added StartupToken to idenitfy a process at startup

* Applied linting formats

* Addressed reviews

* Fixing worker_pool_test

* Fixed worker_pool_test

* Applied linting formatting

* Added documentation for StartupToken

* Fixed linting

* Reordered initialisation of WorkerPool members

* Fixed Python docs

* Fixing bugs in cluster_mode_test

* Fixing Java tests

* Create and set shim process after verifying startup_token

* shim_process.GetId() -> worker_shim_pid

* Improvements in startup token and modifying java files

* update io_ray_runtime_RayNativeRuntime.h

* Fixed java tests by adding startup-token to conf

* Applied linting

* Increased arg count for startup_token

* Attempt to fix streaming tests

* Type correction

* applied linting

* Corrected index of startup token arg

* Modified, mock_worker.cc to accept startup tokens

* Applied linting

* Applied linting changes from CI

* Removed override from worker.h

* Applied linting from scripts/format.sh

* Addressed reviews and applied scripts/format.sh

* Applied linting script from ci/travis

* Removed unrequired methods from public scope

* Applied linting
2021-10-15 15:13:13 -07:00
Guyang Song
ab55b808c5
[runtime env] move worker env to runtime env in Java (#19060) 2021-10-11 17:25:09 +08:00
gjoliver
635010d460
Update build rules and patches for darwin_arm64 platform. (#19037)
* Update build rules and patches for darwin_arm64 platform.

Changes include:

Update nelhage/rules_boost package from current version (08/5/2020) to 5/27/2021 version.
Remove rules_boost-undefine-boost_fallthrough.patch, since BOOST_FALLTHROUGH seems to be defined now.
Minor changes to rules_boost-windows-linkopts.patch to use default condition to add -lpthread flag for all platforms.
Add darwin_arm64 config to BUILD files for lib civetweb pulled in via prometheu dependency.

* upgrade boost to 1.74.0 from 1.71.0 to match the udpated build file for windows.

* Fix ray_cpp_pkg

* Use boost/bind/bind.hpp

boost/bind.hpp and global namespace placeholders are deprecated.

* lint

* Use absl::bind_front when possible. Otherwise, NOLINT

* lint

* lint

* lint

* lint

* more lint

* final lint

* trigger build
2021-10-09 18:48:35 -07:00
Jiajun Yao
ed9118393c
Listen to 127.0.0.1 by default on mac osx (#18904) 2021-09-29 11:40:19 -07:00
Guyang Song
337005d5a5
[C++ API][hotfix] fix C++ worker dynamic library loading issue on macOS (#18877)
* fix C++ worker in macox

* fix
2021-09-24 23:39:00 +08:00
Guyang Song
739cf64115
[C++ API] support head_args config in C++ API (#18709) 2021-09-23 19:30:53 +08:00
qicosmos
64c25987f3
[C++ Worker]Simple kv store example (#18613) 2021-09-18 16:02:44 +08:00
Jiajun Yao
ffe7108eae
Fix cpp api doc (#18671) 2021-09-17 14:01:23 -07:00
Guyang Song
187e4a86ca
[C++ API] expose C++ task failure event (#18596) 2021-09-16 19:20:16 +08:00
qicosmos
d7c631209b
[C++ Worker]Add api get placement group (#18535) 2021-09-15 14:11:31 +08:00
Stephanie Wang
284dee493e
[core][usability] Disambiguate ObjectLostErrors for better understandability (#18292)
* Define error types, throw error for ObjectReleased

* x

* Disambiguate OBJECT_UNRECONSTRUCTABLE and OBJECT_LOST

* OwnerDiedError

* fix test

* x

* ObjectReconstructionFailed

* ObjectReconstructionFailed

* x

* x

* print owner addr

* str

* doc

* rename

* x
2021-09-13 16:16:17 -07:00
qicosmos
ac0a153b06
[C++ Worker]Add some api of placement group (#18431) 2021-09-13 15:10:54 +08:00
qicosmos
dd096c8e73
[C++ Worker]Fix abi issue (#18273) 2021-09-10 11:53:05 +08:00
qicosmos
ba0084e9c7
[C++ Worker]Add gcs global state accessor (#17976) 2021-09-09 12:08:08 +08:00
qicosmos
1da05209b9
[C++ Worker]Add get actor API. (#17897)
* linkopts shared

* add get actor api

* fix

* improve

* reduce some duplicate code

* improve some
2021-09-06 11:46:46 +08:00
qicosmos
72739462a9
[C++ Worker]Add some api of placement group part1. (#17925)
* linkopts shared

* add some pg api

* add Wait for PlacementGroup
2021-09-03 13:32:28 +08:00
Stephanie Wang
d43d297d9a
[core] Attach call site to ObjectRefs, print on error (#17971)
* Attach call site to ObjectRef

* flag

* Fix build

* build

* build

* build

* x

* x

* skip on windows

* lint
2021-09-01 15:29:05 -07:00
Jiajun Yao
fbb3ac6a86
Retry application-level errors (#18176)
* Retry application-level errors

* Retry application-level errors

* Push retry message to the driver
2021-09-01 10:53:06 -07:00
Stephanie Wang
8e06db7280
Revert "[Core] revert: revert Unified worker starter (#18008)" (#18228)
This reverts commit b9978dd02b.
2021-08-30 17:28:41 -07:00
Eric Liang
1adce7da4e
Revert "Auto discover dashboard agent port (#17855)" (#18217)
This reverts commit 53ddb551d5.
2021-08-30 10:46:37 -07:00
fyrestone
53ddb551d5
Auto discover dashboard agent port (#17855) 2021-08-30 12:06:28 +08:00
Jiajun Yao
25ef452b15
[Core] Fix typo in local_mode_task_submitter.cc (#18046) 2021-08-24 13:03:05 -07:00
chenk008
b9978dd02b
[Core] revert: revert Unified worker starter (#18008) 2021-08-23 13:34:32 -07:00
Stephanie Wang
b8fe776638
[core] Fix inlined nested ids (#17834)
* test

* Use ObjectRef instead of ObjectID in nested refs

* java

* doc

* java

* build

* build

* x

* lint

* simplify

* fix
2021-08-20 08:58:29 -07:00
Eric Liang
661ac4e37b
Remove last traces of ref-counting flag (#17932) 2021-08-19 21:08:13 -07:00
Simon Mo
b573864928
[CI] Add test owners (#17893) 2021-08-18 18:38:31 -07:00
Eric Liang
a9073d16f4
Revert "[Core] Unified worker initiators (#17401)" (#17935)
This reverts commit c3764ffd7d.
2021-08-18 18:06:24 -07:00
chenk008
c3764ffd7d
[Core] Unified worker initiators (#17401)
* use setup_worker as starter

* use setup_worker as starter

* add java test

* fix

* fix

* lint

* sleep in ci

* sleep in ci

* fix ut

* fix

* fix

* fix

* fix

* fix

* fix

* change test size

* test

* fix

* fix

* fix ut

* restore sgd test

* change test size

* fix merge confict

* restore cpp worker flag

* fix

* fix

* add worker-languange in setup_runtime_env.py

* lint

* fix java command

Co-authored-by: root <chenk008>
2021-08-17 19:37:26 +08:00
qicosmos
a2a1c46c83
[C++ Worker]Fix for mac (#17633)
* linkopts shared

* replace gflags with absl flags

* fix

* add test option

* fix

* add cpp worker to mac ci

* fix

* support empty redis password;mod arc argv

* add encoding

* test

* ignore example test on mac

* support mac

* fix

* fix and update doc

* fix

* fix run.sh

* fix init

* fix typo

* fix run.sh

* fix lint

Co-authored-by: 久龙 <guyang.sgy@antfin.com>
2021-08-13 12:22:37 +08:00
Guyang Song
b97027ec64
[C++ API] support cpu gpu num 0 (#17783)
* support cpu gpu num 0

* support cpu gpu num 0

* fix
2021-08-13 08:45:33 +08:00
Guyang Song
88b8de5904
[C++ API] support ray::IsInitialized (#17780)
* support ray::IsInitialized

* address comments

* fix
2021-08-13 00:51:26 +08:00
Guyang Song
e53aeca6bb
[C++ API]support set resources in RayConfig (#17779) 2021-08-12 22:53:42 +08:00
Guyang Song
63f9ba2858
[C++ API][Fix] support ray::Init without RayConfig (#17733) 2021-08-12 10:59:21 +08:00
qicosmos
05da724521
[C++ Worker] Replace Ray::xxx with ray::xxx and update namespaces (#17388) 2021-08-10 11:17:59 +08:00
SongGuyang
c62ce78be8
make C++ example more simpler (#17609) 2021-08-09 19:39:16 +08:00
Hao Chen
0858f0e4f2
Change core worker C++ namespace to ray::core (#17610) 2021-08-08 23:34:25 +08:00
qicosmos
f1f7d4a085
[C++ Worker]Add some APIs for task call part one (#16499) 2021-08-05 17:25:36 +08:00
Chen Shen
53a0c74413
[nightly-test] fix non_streaming_shuffle_1tb_5000_partitions 2021-08-04 16:06:53 -07:00
SongGuyang
3e42f54910
Support copyright format for c++ files (#14348) 2021-08-04 17:19:38 +08:00
Siyuan (Ryans) Zhuang
8efc04a8a6
[Core] Actor namespace (#17178)
* set actor namespace in Python on creation

* get actor with namespace in Python

* update message
2021-07-19 21:51:04 -07:00
SongGuyang
21b464ae9d
[C++ API] support get ray address from env (#17144) 2021-07-16 17:17:43 +08:00