For the purpose to provide an alternative option for running multiple actor instances in a Java worker process, and the eventual goal is to remove the original multi-worker-instances in one worker process implementation. we're proposing supporting parallel actor concept in Java. This feature enables that users could define some homogeneous parallel execution instances in an actor, and all instances hold one thread as the execution backend.
### Introduction
For the following example, we define a parallel actor with 10 parallelism. The backend actor has 10 concurrency groups for the parallel executions, it also means there're 10 threads for that.
We can access the instance by the instance handle, like:
```java
ParallelActorHandle<A> actor = ParallelActor.actor(A::new).setParallelism(10).remote();
ParallelInstance<A> instance = actor.getInstance(/*index=*/ 2);
Preconditions.checkNotNull(instance);
Ray.get(instance.task(A::incr, 1000000).remote()); // print 1000000
instance = actor.getInstance(/*index=*/ 2);
Preconditions.checkNotNull(instance);
Ray.get(instance.task(A::incr, 2000000).remote().get()); // print 3000000
instance = actor.getInstance(/*index=*/ 3);
Preconditions.checkNotNull(instance);
Ray.get(instance.task(A::incr, 2000000).remote().get()); // print 2000000
```
### Limitation
- It doesn't support concurrency group on a parallel actor yet.
Co-authored-by: Kai Yang <kfstorm@outlook.com>
This PR adds the API `setRuntimeEnv` for submitting a normal task, for the usage:
```java
RuntimeEnv runtimeEnv =
new RuntimeEnv.Builder()
.addEnvVar("KEY1", "A")
.build();
/// Return `A`
Ray.task(RuntimeEnvTest::getEnvVar, "KEY1").setRuntimeEnv(runtimeEnv).remote().get();
```
1. Support setting environment variables in runtime env for a job, like:
```yaml
ray : {
job : {
runtime-env: {
// Environment variables to be set on worker processes in current job.
"env-vars": {
// key1: "value11"
// key2: "value22"
}
}
}
}
```
It could be set by system properties before `Ray.init()` as well:
```java
System.setProperty("ray.job.runtime-env.env-vars.KEY1", "A");
System.setProperty("ray.job.runtime-env.env-vars.KEY2", "B");
Ray.init();
```
2. Setting environment variables for an actor will overwrite and merge to the environment variables of job.
```java
System.setProperty("ray.job.runtime-env.env-vars.KEY1", "A");
System.setProperty("ray.job.runtime-env.env-vars.KEY2", "B");
Ray.init();
RuntimeEnv runtimeEnv = new RuntimeEnv.Builder().addEnvVar("KEY1", "C").build();
/// actor1 has the env vars: {"KEY1" : "C", "KEY2" : "B"}
ActorHandle<A> actor1 = Ray.actor(A::new).setRuntimeEnv(runtimeEnv).remote();
/// actor2 has the env vars: {"KEY1" : "A", "KEY2" : "B"}
ActorHandle<A> actor2 = Ray.actor(A::new).remote();
```
This PR supports setting actor level env vars for Java worker in runtime env.
General API looks like:
```java
RuntimeEnv runtimeEnv = new RuntimeEnv.Builder()
.addEnvVar("KEY1", "A")
.addEnvVar("KEY2", "B")
.addEnvVar("KEY1", "C") // This overwrites "KEY1" to "C"
.build();
ActorHandle<A> actor1 = Ray.actor(A::new).setRuntimeEnv(runtimeEnv).remote();
```
If `num-java-workers-per-process` > 1, it will never reuse the worker process except they have the same runtime envs.
Co-authored-by: Qing Wang <jovany.wq@antgroup.com>
Support the ability to specify a default lifetime for actors which are not specified lifetime when creating. This is a job level configuration item.
#### API Change
The Python API looks like:
```python
ray.init(job_config=JobConfig(default_actor_lifetime="detached"))
```
Java API looks like:
```java
System.setProperty("ray.job.default-actor-lifetime", defaultActorLifetime.name());
Ray.init();
```
One example usage is:
```python
ray.init(job_config=JobConfig(default_actor_lifetime="detached"))
a1 = A.options(lifetime="non_detached").remote() # a1 is a non-detached actor.
a2 = A.remote() # a2 is a non-detached actor.
```
Co-authored-by: Kai Yang <kfstorm@outlook.com>
Co-authored-by: Qing Wang <jovany.wq@antgroup.com>
In python or C++, we can specify the bundle index as -1 to use any available bundle in the placement group. We should also enable it in Java to keep the API consistent across all languages.
This PR introduces statically defining ConcurrencyGroup APIs in Java.
We introduce 2 APIs:
1. Introducing `@DefConcurrencyGroup` annotation for an actor class to define a concurrency group statically.
2. Introducing `@UseConcurrencyGroup` annotation for actor methods to define the concurrency group to be used in the method.
Examples are below:
```java
@DefConcurrencyGroup(name = "io", maxConcurrency = 2)
@DefConcurrencyGroup(name = "compute", maxConcurrency = 4)
private static class MyActor {
@UseConcurrencyGroup(name = "io")
public long f1() { }
@UseConcurrencyGroup(name = "io")
public long f2() { }
@UseConcurrencyGroup(name = "compute")
public long f3(int a, int b) { }
@UseConcurrencyGroup(name = "compute")
public long f4() { }
}
ActorHandle<> myActor = Ray.actor(MyActor::new).remote();
myActor.task(MyActor::f1).remote();
myActor.task(MyActor::f2).remote();
myActor.task(MyActor::f3).remote();
myActor.task(MyActor::f4).remote();
```
`MyActor` has 3 concurrency groups: `io` with 2 concurrency, `compute` with 4 concurrency and `default` with 1 concurrency.
f1 and f2 will be executed in `io`, f3 and f4 will be executed in `compute`.
In Xlang(Python call Java), a Java method which overrides a `default` method of the super class is not able to be invoked successfully, due to we treat it as overloaded method instead of overrided method. This PR correctly handle it at the case it overrides a `default` method.
Before this PR, the following usage is not able to be invoked from Python -> Java.
```Java
public interface ExampleInterface {
default String echo(String inp) {
return inp;
}
}
public class ExampleImpl implements ExampleInterface {
@Override
public String echo(String inp) {
return inp + " echo";
}
}
```
```python
/// Invoke it in Python.
cls = ray.java_actor_class("io.ray.serve.util.ExampleImpl")
handle = cls.remote()
print(ray.get(handle.echo.remote("hi")))
```
We add a enum class ActorLifetime to indicate the lifetime of an actor. In this PR, we also add the necessary API to create an actor with specifying lifetime.
Currently, it has 2 values: detached and default.
Resubmit the PR https://github.com/ray-project/ray/pull/19936
I've figure out that the test case `//rllib:tests/test_gpus::test_gpus_in_local_mode` failed due to deadlock in local mode.
In local mode, if the user code submits another task during the executing of current task, the `CoreWorker::actor_task_mutex_` may cause deadlock.
The solution is quite simple, release the lock before executing task in local mode.
In the commit 7c2f61c76c:
1. Release the lock in local mode to fix the bug. @scv119
2. `test_local_mode_deadlock` added to cover the case. @rkooo567
3. Left a trivial change in `rllib/tests/test_gpus.py` to make the `RAY_CI_RLLIB_DIRECTLY_AFFECTED ` to take effect.
Why are these changes needed?
If max concurrency is 1 in default group, a blocking task executing in default group will block the following tasks in different group. See reproduction script in #20475
The issue is due to tasks executing in the default concurrent group run in the main task execution thread, and tasks in other concurrent groups will be blocked if the main task execution thread is blocked.
This PR only changes concurrent actor behavior that default group will not block other groups.
Related issue number
Fix#20475
Why are these changes needed?
Add timeout(ms) param for Java ray.get. The API changes have been updated to doc ([Ray Core Walkthrough]->[Fetching Results]).
eg:
ObjectRef<Integer> objRef = Ray.put(1);
objRef.get(1000)
Ray.get(Ray.task(MyRayApp::slowFunction).remote(), 3000)
Related issue number
#20247
This PR removes global named actor and global PGs.
I believe these APIs are not used widely in OSS.
CPP part is not included in this PR.
@kfstorm @clay4444 @raulchen Please take a look if this change is reasonable.
IMPORTANT NOTE: This is a Java API change and will lead backward incompatibility in Java global named actor and global PG usage.
CPP part is not included in this PR.
INCLUDES:
Remove setGlobalName() and getGlobalActor() APIs.
Remove getGlobalPlacementGroup() and setGlobalPG
Add getActor(name, namespace) API
Add getPlacementGroup(name, namespace) API
Update doc pages.
Why are these changes needed?
For Java worker, we generate a UUID string as the namespace if a job is not specified a namespace by user.
Related issue number
#16474