hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-05 10:01:43 -05:00

Author	SHA1	Message	Date
SangBin Cho	f74f155af4	Revert "Revert "Revert "[serve][xlang]Support deploying Python deploy… (#28153 ) this starts breaking Mac java build with new errors; I think it is the same issue as before why we reverted this PR …ment from Java. …" (#27945)" This reverts commit `af488e1`.	2022-08-30 12:00:29 -07:00
Guyang Song	cf2cb66d29	[runtime env][java] Support runtime env config in Java (#28083 ) Support job level and task/actor level runtime env config eg. `setupTimeoutSeconds` and `eagerInstall`.	2022-08-26 08:37:39 +08:00
Ian Rodney	8d04afd72b	[Java] Update GSON package (#28072 ) Fixes CVE: https://nvd.nist.gov/vuln/detail/CVE-2022-25647	2022-08-24 13:45:29 -07:00
xiaofeng	af488e1cc2	Revert "Revert "[serve][xlang]Support deploying Python deployment from Java. …" (#27945 )	2022-08-18 17:57:37 -07:00
Simon Mo	6659971f95	[Serve][Java] Add Serve to Jar Building Process (#27976 ) So that they are available to be to be downloaded and installed on nightly	2022-08-17 23:06:14 -05:00
liuyang-my	1c4b3879a1	[Serve]Fix classloader bug in Java Deployment (#27899 ) We have encountered `java.lang.ClassNotFoundException` when deploying Java Ray Serve deployments. The property `ray.job.code-search-path` which specifies the search path of user's classes is not working. The reason is that `ray.job.code-search-path` is loaded in an independent classloader in Ray context, but Serve Replica initialized user class with `AppClassLoader`. We need to change the classloader used to construct user classes to the one in Ray context.	2022-08-16 15:22:00 +08:00
liuyang-my	6b886d394c	[Serve] Java documentation (#26321 )	2022-08-12 09:07:12 -07:00
SangBin Cho	8c190e2d09	Revert "[serve][xlang]Support deploying Python deployment from Java. (#26877 )" (#27626 ) This reverts commit `9f8b596aaa`.	2022-08-08 06:54:27 -07:00
xiaofeng	9f8b596aaa	[serve][xlang]Support deploying Python deployment from Java. (#26877 ) In the previously merged pr(https://github.com/ray-project/ray/pull/22726/commits), java serve's support for python deployment was not implemented. This PR is used to implement this feature. Co-authored-by: nanqi.yxf <nanqi.yxf@antgroup.com>	2022-08-06 14:35:49 +08:00
Guyang Song	79fb4a9821	[hotfix] Fix the failure of java test (#27183 ) Signed-off-by: 久龙 <guyang.sgy@antfin.com>	2022-07-29 00:15:41 +08:00
Guyang Song	06b0e715c7	[runtime env] plugin refactor [7/n]: support runtime env in C++ API (#27010 ) Signed-off-by: 久龙 <guyang.sgy@antfin.com>	2022-07-27 18:24:31 +08:00
Guyang Song	419e78180a	[runtime env] plugin refactor[6/n]: java api refactor (#26783 )	2022-07-26 09:00:57 +08:00
Sihan Wang	8ecd928c34	[Serve] Make the checkpoint and recover only from GCS (#26753 )	2022-07-25 14:24:53 -07:00
Tao Wang	4d6cbb0fd4	[Java]More efficient getAllNodeInfo() (#26872 ) Like https://github.com/ray-project/ray/pull/26760, we remove the unnecessary rpc calls for each node to get their resources, instead we use the existing fields.	2022-07-25 16:53:03 +08:00
Qing Wang	e301f9b543	Cleanup ActorContext due to multi actor instances got removed. (#26497 )	2022-07-15 23:30:09 +08:00
Guyang Song	1949f35901	[runtime env] plugin refactor[4/n]: remove runtime env protobuf (#26522 )	2022-07-15 13:56:12 +08:00
Tao Wang	5a0ca8da10	Revert "[Test]Disable java call cpp actor case for now (#26288 )" (#26462 ) The hanging is caused by hiding symbols(see https://github.com/ray-project/ray/issues/26435), let's enable this test again.	2022-07-13 10:42:48 +08:00
Tao Wang	bb6c805bd7	[Java worker][Cpp worker]Support Java call Cpp Task (#26182 )	2022-07-12 17:49:22 +08:00
Tao Wang	b3ba1e7ea2	[Test]Disable java call cpp actor case for now (#26288 )	2022-07-06 19:53:30 +08:00
liuyang-my	a6ad48d778	[Serve] Java Client API and End to End Tests (#22726 )	2022-07-05 21:19:18 -07:00
Kai Yang	ba642dd271	[Java] Make Java test more stable (#26282 ) If compile Ray in debug mode, * run `MetricsTest:: testAddHistogram` will crash with below error message: ``` BucketBoundaries::Explicit called with non-monotonic boundary list. java: external/io_opencensus_cpp/opencensus/stats/internal/bucket_boundaries.cc:64: opencensus::stats::BucketBoundaries::Explicit(std::__debug::vector<double>)::<lambda()>: Assertion `false && "0"' failed. ``` * run `NamespaceTest::testIsolationInTheSameNamespaces` can fail with great possibility with below error message: ``` java.util.NoSuchElementException: No value present at java.util.Optional.get(Optional.java:135) at io.ray.test.NamespaceTest.lambda$testIsolationInTheSameNamespaces$2(NamespaceTest.java:39) at io.ray.test.NamespaceTest.testIsolation(NamespaceTest.java:116) at io.ray.test.NamespaceTest.testIsolationInTheSameNamespaces(NamespaceTest.java:36) ```	2022-07-05 11:18:19 +08:00
Qing Wang	2d4663d0cd	[Java] Support getCurrentNodeId API for RuntimeContext (#26147 ) Add an API to get the node id of this worker, see usage: ```java UniqueId currNodeId = Ray.getRuntimeContext().getCurrentNodeId(); ``` for the requirement from Ray Serve.	2022-06-30 16:19:32 +08:00
Simon Mo	5043cc1a82	[Java] Bump jna to 5.8.0 to compile on M1 macs (#26180 )	2022-06-29 11:47:07 -07:00
Qing Wang	cb77209ce1	[Java] Allow to specify zero CPU as resource. (#26148 ) This is aligned to the behavior of Python resources validation.	2022-06-29 22:53:00 +08:00
Qing Wang	a09bf61cea	[Java][Minor] Remove unused class AsyncContext. (#26146 ) AsyncContext class has not been in used any longer, it should be removed.	2022-06-29 22:41:59 +08:00
Tao Wang	49cafc6323	[Cpp worker][Java worker]Support Java call Cpp Actor (#25933 )	2022-06-29 14:33:32 +08:00
Qing Wang	8884bfb445	[Java] Support starting named actors in different namespace. (#25995 ) Allow you start actors in different namespace instead of the driver namespace. Usage is simple: ```java Ray.init(namespace="a"); /// Named actor a will starts in namespace `b` ActorHandle<A> a = Ray.actor(A::new).setName("myActor", "b").remote(); ``` Co-authored-by: Hao Chen <chenh1024@gmail.com>	2022-06-28 13:49:15 +08:00
Lixin Wei	c4b9e9ffa5	[Java] Add entry to run custom test using bazel (#26073 ) Now we can run custom java tests by: 0. `cp testng_custom_template.xml testng_custom.xml` 1. Specify test class/method in `testng_custom.xml` 2. `bazel test //java:custom_test --test_output=streamed`	2022-06-27 11:40:16 +08:00
Tao Wang	593a522abd	[Cpp worker]Support cpp call java actor (#25581 )	2022-06-14 14:17:14 +08:00
shrekris-anyscale	3278763dd7	[Serve] Start all Serve actors in the `"serve"` namespace only (#25575 )	2022-06-13 10:31:28 -07:00
Eric Liang	48acbf0d69	[hotfix] Revert "[runtime env] runtime env inheritance refactor (#24538 )" (#25487 ) This reverts commit `eb2692c`. This is a temporary mitigation for #25484	2022-06-05 14:55:38 -07:00
Qing Wang	99429b7a92	[Core] Remove thread local core worker instance 2/n (#25159 ) We removed the thread local core worker instance in this PR, which is the further arch cleaning stuff for removing multiple workers in one process. It also removes the unnecessary parameter `workerId` from JNI.	2022-06-03 14:08:44 +08:00
Qing Wang	65d863d349	Revert "Revert "[Java] Remove RayRuntimeInternal class (#25016 )" (#25… (#25153 ) This reverts commit `804b6b11d1`.	2022-05-26 14:15:51 +08:00
Kai Fricke	804b6b11d1	Revert "[Java] Remove RayRuntimeInternal class (#25016 )" (#25139 ) This reverts commit `4026b38b09`. Broke test_raydp_dataset	2022-05-24 13:17:47 +01:00
Qing Wang	4026b38b09	[Java] Remove RayRuntimeInternal class (#25016 ) Due to we have already removed the multiple workers in one process, remove RayRuntimeInternal for purpose.	2022-05-24 09:22:48 +08:00
Guyang Song	eb2692cb32	[runtime env] runtime env inheritance refactor (#24538 ) * [runtime env] runtime env inheritance refactor (#22244) Runtime Environments is already GA in Ray 1.6.0. The latest doc is [here](https://docs.ray.io/en/master/ray-core/handling-dependencies.html#runtime-environments). And now, we already supported a [inheritance](https://docs.ray.io/en/master/ray-core/handling-dependencies.html#inheritance) behavior as follows (copied from the doc): - The runtime_env["env_vars"] field will be merged with the runtime_env["env_vars"] field of the parent. This allows for environment variables set in the parent’s runtime environment to be automatically propagated to the child, even if new environment variables are set in the child’s runtime environment. - Every other field in the runtime_env will be overridden by the child, not merged. For example, if runtime_env["py_modules"] is specified, it will replace the runtime_env["py_modules"] field of the parent. We think this runtime env merging logic is so complex and confusing to users because users can't know the final runtime env before the jobs are run. Current PR tries to do a refactor and change the behavior of Runtime Environments inheritance. Here is the new behavior: - If there is no runtime env option when we create actor, inherit the parent runtime env. - Otherwise, use the optional runtime env directly and don't do the merging. Add a new API named `ray.runtime_env.get_current_runtime_env()` to get the parent runtime env and modify this dict by yourself. Like: ```Actor.options(runtime_env=ray.runtime_env.get_current_runtime_env().update({"X": "Y"}))``` This new API also can be used in ray client.	2022-05-20 10:53:54 +08:00
Qing Wang	af418fb729	[Java][API CHANGE] Move exception to api module. (#24540 ) This PR moves all exception classes from runtime module to api module. It's aiming to eliminate the confusion about ray exceptions. It means that Ray users don't need to touch runtime module when API programming after this PR. Note that this should be merged onto 2.0.	2022-05-19 10:18:20 +08:00
Qing Wang	cc621ff08a	[Java][API CHANGE] Rename mode `SINGLE_PROCESS` to `LOCAL` (#24714 ) for aligning to the key concept local mode, this PR renames SINGLE_PROCESS to LOCAL.	2022-05-19 10:17:24 +08:00
Qing Wang	eb29895dbb	[Core] Remove multiple core workers in one process 1/n. (#24147 ) This is the 1st PR to remove the code path of multiple core workers in one process. This PR is aiming to remove the flags and APIs related to `num_workers`. After this PR checking in, we needn't to consider the multiple core workers any longer. The further following PRs are related to the deeper logic refactor, like eliminating the gap between core worker and core worker process, removing the logic related to multiple workers from workerpool, gcs and etc. BREAK CHANGE This PR removes these APIs: - Ray.wrapRunnable(); - Ray.wrapCallable(); - Ray.setAsyncContext(); - Ray.getAsyncContext(); And the following APIs are not allowed to invoke in a user-created thread in local mode: - Ray.getRuntimeContext().getCurrentActorId(); - Ray.getRuntimeContext().getCurrentTaskId() Note that this PR shouldn't be merged to 1.x.	2022-05-19 00:36:22 +08:00
Qing Wang	40774ac219	Minor changes for Java runtime env. (#24840 )	2022-05-17 11:33:59 +08:00
Qing Wang	d40fa391a5	[RuntimeEnv][Java] Support runtime env jars for job. (#24725 ) This PR supports specifying the jars(or zip packages) for a job, which are used for all workers for this job. You can specify jars or zips in the config file of your job: ```yml ray { job { runtime-env: { "jars": [ "https://my_host/a.jar", "https://my_host/b.jar" ] } } } ``` or via system properties: ```java System.setProperty("ray.job.runtime-env.jars.0", "https://my_host/a.jar"); System.setProperty("ray.job.runtime-env.jars.1", "https://my_host/a.jar"); Ray.init(); // all workers of this job will add a.jar and b.jar into the classpath. ```	2022-05-16 15:07:02 +08:00
Kai Yang	f5c6c7d28f	[Core] Allow failing new tasks immediately while the actor is restarting (#22818 ) Currently, when an actor has `max_restarts` > 0 and has crashed, the actor will enter RESTARTING state and then ALIVE. Imagine this scenario: an online service provides HTTP service and the proxy actor receives requests, forwards them to worker actors, and replies to clients with the execution results from worker actors. ``` -> Worker A (actor) / / HTTP requests -------> Proxy (actor with HTTP server) ---> Worker B (actor) \ \ -> ... ``` For each HTTP request, the proxy picks one worker (e.g. worker A) based on some algorithm, sends the request to it, and calls `ray.get()` to wait for the result. If for some reason the picked worker crashed, Ray will restart the actor, and `ray.get()` will throw an error. The proxy may pick another worker (e.g. worker B) and re-send the request to it. This is OK. But new requests keep coming. The proxy may pick worker A again. But because worker A is still in RESTARTING state, it's not ready to serve requests. `ray.get()` on subsequent requests sent to worker A will hang until worker A is back online (ALIVE state). The proxy won't be able to reschedule these requests to another worker because currently there's no way to know if worker A is alive or not before sending a request. We can't say worker A is not alive just based on whether `ray.get()` hangs either. To solve this issue, we change the semantics of `max_task_retries`. * When max_task_retries is 0 (which is the default value), if the callee actor is in the RESTARTING state, subsequently submitted tasks will fail immediately with a RayActorError. Users can catch the RayActorError and implement their own fallback strategies to improve service availability and mitigate service outages. * When max_task_retries is not 0, subsequently submitted tasks will be queued on the caller side and we only send them to the callee when the callee actor is back to the ALIVE state. TODO - [x] Add test cases. - [ ] Update docs. - [x] API change review.	2022-05-14 10:48:47 +08:00
Qing Wang	2627c7b5bc	[Core] Use async post instead of PostBlocking for concurrency group executor. (#24293 ) Aiming to: 1. addressing the bug about concurrency group, see #19593 2. improving the stability of the ray call latency perf in online applications. we're proposing using async post instead of `PostBlocking` in threadpool. Note that since we have already had back pressure in the caller side, I believe this change is safe to merge and it doesn't break any behavior.	2022-05-13 11:30:52 +08:00
Qing Wang	3208cfc167	[Runtime env][Java] Add unit tests for specifying jars for tasks. (#24712 ) It seems that we have already supported specifying java jars for normal tasks, this PR only needs to add unit tests for that.	2022-05-13 09:46:20 +08:00
Qing Wang	259661042c	[runtime env] [java] Support jars in runtime env for Java (#24170 ) This PR supports setting the jars for an actor in Ray API. The API looks like: ```java class A { public boolean findClass(String className) { try { Class.forName(className); } catch (ClassNotFoundException e) { return false; } return true; } } RuntimeEnv runtimeEnv = new RuntimeEnv.Builder() .addJars(ImmutableList.of("https://github.com/ray-project/test_packages/raw/main/raw_resources/java-1.0-SNAPSHOT.jar")) .build(); ActorHandle<A> actor1 = Ray.actor(A::new).setRuntimeEnv(runtimeEnv).remote(); boolean ret = actor1.task(A::findClass, "io.testpackages.Foo").remote().get(); System.out.println(ret); // true ```	2022-05-12 09:34:40 +08:00
Qing Wang	ea6c2d634b	[Java] Shade jackson to avoid conflict. (#24535 ) Jackson is a widely-used utility. User from Ant reports the jackson class is conflicted between Ray jar and user's jar. This PR shade the jackson in Ray jar to avoid the conflict. Co-authored-by: Kai Yang <kfstorm@outlook.com>	2022-05-07 10:44:31 +08:00
Qing Wang	c5252c5ceb	[Java] Support parallel actor in experimental. (#21701 ) For the purpose to provide an alternative option for running multiple actor instances in a Java worker process, and the eventual goal is to remove the original multi-worker-instances in one worker process implementation. we're proposing supporting parallel actor concept in Java. This feature enables that users could define some homogeneous parallel execution instances in an actor, and all instances hold one thread as the execution backend. ### Introduction For the following example, we define a parallel actor with 10 parallelism. The backend actor has 10 concurrency groups for the parallel executions, it also means there're 10 threads for that. We can access the instance by the instance handle, like: ```java ParallelActorHandle<A> actor = ParallelActor.actor(A::new).setParallelism(10).remote(); ParallelInstance<A> instance = actor.getInstance(/index=/ 2); Preconditions.checkNotNull(instance); Ray.get(instance.task(A::incr, 1000000).remote()); // print 1000000 instance = actor.getInstance(/index=/ 2); Preconditions.checkNotNull(instance); Ray.get(instance.task(A::incr, 2000000).remote().get()); // print 3000000 instance = actor.getInstance(/index=/ 3); Preconditions.checkNotNull(instance); Ray.get(instance.task(A::incr, 2000000).remote().get()); // print 2000000 ``` ### Limitation - It doesn't support concurrency group on a parallel actor yet. Co-authored-by: Kai Yang <kfstorm@outlook.com>	2022-04-21 22:54:33 +08:00
Qing Wang	77b0015ea0	[Java] Add NO_RESTART and INFINITE_RESTART constants. (#23771 )	2022-04-12 10:40:44 +08:00
Qing Wang	e0ea7567c4	Add getJobId API for ActorId (#23770 )	2022-04-08 11:30:53 +08:00
Larry	d0b324990f	[Java] Add doc for Ray.get api that throws an exception if it times out (#23666 ) Add doc for Ray.get api that throws an exception if it times out ![image](https://user-images.githubusercontent.com/11072802/161364231-4337124d-3141-4334-879c-f88cecc0d818.png) Co-authored-by: 稚鱼 <lianjunwen.ljw@antgroup.com>	2022-04-02 18:29:19 +08:00

1 2 3 4 5 ...

590 commits