Commit graph

546 commits

Author SHA1 Message Date
architkulkarni
774163f9c9
[Java] Bump log4j 2.16.0 -> 2.17.0 (#21176)
Resolves [CVE-2021-45105](https://github.com/advisories/GHSA-p6xc-xr62-6r2g).
2021-12-20 10:27:24 +08:00
Ian Rodney
deb3505150
[Java] Bump Log4j2 to completely remove lookups (#21081)
As per the 2.16.0 release of Lo4j2, Lookup support is removed 🎉 
https://logging.apache.org/log4j/2.x/changes-report.html#a2.16.0
2021-12-15 15:45:56 +08:00
WanXing Wang
72bd2d7e09
[Core] Support back pressure for actor tasks. (#20894)
Resubmit the PR https://github.com/ray-project/ray/pull/19936

I've figure out that the test case `//rllib:tests/test_gpus::test_gpus_in_local_mode` failed due to deadlock in local mode.
In local mode, if the user code submits another task during the executing of current task, the `CoreWorker::actor_task_mutex_` may cause deadlock.
The solution is quite simple, release the lock before executing task in local mode.

In the commit 7c2f61c76c:
1. Release the lock in local mode to fix the bug. @scv119 
2. `test_local_mode_deadlock` added to cover the case. @rkooo567 
3. Left a trivial change in `rllib/tests/test_gpus.py` to make the `RAY_CI_RLLIB_DIRECTLY_AFFECTED ` to take effect.
2021-12-13 23:56:07 -08:00
Seonggwon Yoon
f1acabe9cf
Bump log4j from 2.14.0 to 2.15.0 (#21036)
Fix Remote code injection in Log4j
Log4j versions prior to 2.15.0 are subject to a remote code execution vulnerability via the ldap JNDI parser.

Check this refer: [CVE-2021-44228](https://github.com/advisories/GHSA-jfh8-c2jp-5v3q)
2021-12-12 15:07:50 +08:00
liuyang-my
f3ef6a221f
[Serve]Change Java LongPollClient's polling thread to singleton. (#20756)
Now the Java LongPollClient's is not singleton, and a new polling thread will be created within a new LongPollClient for per RayServeHandle. It will degrade the performance of Replica Actor. So we change the LongPollClient's polling thread to singleton.
2021-12-07 11:14:00 -08:00
Kai Fricke
d4413299c0
Revert "[Core] Support back pressure for actor tasks (#19936)" (#20880)
This reverts commit a4495941c2.
2021-12-03 17:48:47 -08:00
DK.Pino
4ef0d4a37a
[Java] [Placement Group] Make class PlacementGroupImpl serializable (#20759) 2021-12-03 13:06:17 +08:00
WanXing Wang
a4495941c2
[Core] Support back pressure for actor tasks (#19936)
Support back pressure in core worker.
Job config added for python worker and java worker.
2021-12-02 14:41:30 -08:00
Tao Wang
f481081904
[Java]Get next job id only in driver (#20813)
## Why are these changes needed?
Job id is only used in driver, we should not get it in WORKER.
2021-12-01 15:48:21 +08:00
Qing Wang
84f7062329
[Java] Cleanup temp file of libcore_worker.so (#20748)
Why are these changes needed?
Replace the existing temp file to avoid the issue that the previous worker dies and leaves the temp file there, resulting in the next coming workers are not able to write a new temp file since there is an existing one.
2021-11-29 16:05:06 +08:00
Qing Wang
cd2b83a259
[Core][ConcurrencyGroup] Fix blocking task in default group block tasks in other group. (#20525)
Why are these changes needed?
If max concurrency is 1 in default group, a blocking task executing in default group will block the following tasks in different group. See reproduction script in #20475

The issue is due to tasks executing in the default concurrent group run in the main task execution thread, and tasks in other concurrent groups will be blocked if the main task execution thread is blocked.

This PR only changes concurrent actor behavior that default group will not block other groups.

Related issue number
Fix #20475
2021-11-25 14:24:17 +08:00
Guyang Song
53630ee03b
Revert "Revert "[runtime env] redefine runtime env to protobuf"" and fix windows compiling (#20692)
- Fix windows compiling and revert https://github.com/ray-project/ray/pull/20641
- Seems the pr https://github.com/ray-project/ray/pull/20670 can solve the windows compiling issue.
2021-11-24 09:01:01 -08:00
Alex Wu
9388d28233
Revert "[runtime env] redefine runtime env to protobuf" (#20641)
Reverts #19511

Breaks windows compilation
2021-11-22 13:11:30 -08:00
Lixin Wei
a912b68375
[Java] Reenable Named Actor Test. (#20627)
We skipped testGetNonExistingNamedActor for some reason. Now this test is ready to enable. This PR reenables this test.
2021-11-22 16:25:16 +08:00
Guyang Song
ad56b9b432
[runtime env] redefine runtime env to protobuf (#19511) 2021-11-20 16:54:42 +08:00
Larry
454db6902c
[Java] Add timeout parameter for Ray.get() API (#20282)
Why are these changes needed?

Add timeout(ms) param for Java ray.get. The API changes have been updated to doc ([Ray Core Walkthrough]->[Fetching Results]).

eg:
ObjectRef<Integer> objRef = Ray.put(1);
objRef.get(1000) 
Ray.get(Ray.task(MyRayApp::slowFunction).remote(), 3000)

Related issue number
#20247
2021-11-17 11:02:17 +08:00
Qing Wang
6504ad6bb2
[xlang] Add named actor xlang tests. (#20368)
We add named actor xlang tests, including both getting java named actor in python and get python named actor in Java.

Related issue number
#19794
2021-11-16 21:42:05 +08:00
Yi Cheng
a4e187c0e7
[gcs] Update function table to use internal kv (#20152)
## Why are these changes needed?
This is a part of redis removal. This PR remove redis kv in function table. 
rpush related code is not updated in this PR.

## Related issue number
2021-11-15 23:34:41 -08:00
Qing Wang
1172195571
[Java] Remove global named actor and global pg (#20135)
This PR removes global named actor and global PGs.

I believe these APIs are not used widely in OSS.
CPP part is not included in this PR.
@kfstorm @clay4444 @raulchen Please take a look if this change is reasonable.


IMPORTANT NOTE: This is a Java API change and will lead backward incompatibility in Java global named actor and global PG usage.

CPP part is not included in this PR.
INCLUDES:

 Remove setGlobalName() and getGlobalActor() APIs.
 Remove getGlobalPlacementGroup() and setGlobalPG
 Add getActor(name, namespace) API
 Add getPlacementGroup(name, namespace) API
 Update doc pages.
2021-11-15 16:28:53 +08:00
Qing Wang
7500f7d88a
Remove deprecated Java PG APIs. (#20219)
These APIs were deprecated at least 7+ months and 4+ versions, it's the time and very necessary to remove them.
2021-11-12 09:29:48 +08:00
Yi Cheng
e54d3117a4
[gcs] Update all redis kv usage in python except function table (#20014)
## Why are these changes needed?
This is part of redis removal project. In this PR all direct usage of redis got removed except function table.
Function table will be migrated in the next PR

## Related issue number
#19443
2021-11-10 20:24:53 -08:00
liuyang-my
efca009258
[Serve] Make Java Replica Extendable (#19463) 2021-11-10 15:05:37 -08:00
Stephanie Wang
ffcc5935d7
[core] Evict lineage to bound memory usage (#19946)
* bound lineage

* Bound lineage in bytes

* test

* Lineage evicted error

* Lineage evicted

* lint

* test

* test

* comment

* doc

* x

* x

* x

* x
2021-11-08 21:53:40 -08:00
Qing Wang
f9d94f51aa
Revert "[Java] Skip javadoc when deploying. (#19428)" (#20137)
This reverts commit 1047914ee0.
2021-11-08 15:53:31 +08:00
Qing Wang
6d8a7291ab
Add getNamespace API for Java worker (#20057)
[Java API] Add getNamespace API for Java worker.
2021-11-08 15:51:14 +08:00
Qing Wang
4373aa1e3b
Support generating a UUID string as the anonymous namespace for Java worker. (#19986)
Why are these changes needed?
For Java worker, we generate a UUID string as the namespace if a job is not specified a namespace by user.

Related issue number
#16474
2021-11-04 11:40:17 +08:00
Jiajun Yao
5de4a38948
[CI] Run Java CI on Mac (#19757)
Why are these changes needed?
Enable Java tests on Mac CI to avoid more breakages.

Related issue number
Closes #19700
2021-11-03 23:40:05 +08:00
Edward Oakes
e1e0cb5eaa
[serve] Rename backend tag -> deployment name (#19997) 2021-11-03 09:49:52 -05:00
Qing Wang
da6894848d
Support Java namespace APIs (#19468)
## Why are these changes needed?

## Related issue number
#16474
2021-11-02 11:05:40 +08:00
Edward Oakes
ee57025be6
[serve] Rename BackendConfig -> DeploymentConfig (#19923) 2021-11-01 10:24:02 -07:00
Tao Wang
7a2e9e00e8
[Tiny]Remove duplicated assignment (#19866) 2021-11-01 11:44:01 +08:00
Edward Oakes
e507b7ba6e
[serve] Rename BackendVersion -> DeploymentVersion (#19798) 2021-10-31 10:27:19 -05:00
Qing Wang
7647ea3512
[Java] Add helper method to build driver process. (#19740)
We make the buildDriver() process as a helpful util to avoid duplicate code.
2021-10-27 10:17:37 +08:00
Jiajun Yao
e4542be0d1
[Java] Run java on mac with public ip (#19701) 2021-10-25 11:38:33 -07:00
Jiajun Yao
805ce453dd
[Java] Remove auto-generated pom.xml files. (#19475) 2021-10-19 17:35:37 +08:00
Qing Wang
1047914ee0
[Java] Skip javadoc when deploying. (#19428) 2021-10-17 15:21:13 +08:00
Gagandeep Singh
d226cbf21a
Added StartupToken to idenitfy a process at startup (#19014)
* Added StartupToken to idenitfy a process at startup

* Applied linting formats

* Addressed reviews

* Fixing worker_pool_test

* Fixed worker_pool_test

* Applied linting formatting

* Added documentation for StartupToken

* Fixed linting

* Reordered initialisation of WorkerPool members

* Fixed Python docs

* Fixing bugs in cluster_mode_test

* Fixing Java tests

* Create and set shim process after verifying startup_token

* shim_process.GetId() -> worker_shim_pid

* Improvements in startup token and modifying java files

* update io_ray_runtime_RayNativeRuntime.h

* Fixed java tests by adding startup-token to conf

* Applied linting

* Increased arg count for startup_token

* Attempt to fix streaming tests

* Type correction

* applied linting

* Corrected index of startup token arg

* Modified, mock_worker.cc to accept startup tokens

* Applied linting

* Applied linting changes from CI

* Removed override from worker.h

* Applied linting from scripts/format.sh

* Addressed reviews and applied scripts/format.sh

* Applied linting script from ci/travis

* Removed unrequired methods from public scope

* Applied linting
2021-10-15 15:13:13 -07:00
Qing Wang
2cc164e616
[Java] Fix incompleted core worker dynamic library. (#19342)
* Fix incompleted core worker dynamic library.

* Fix lint.
2021-10-14 14:42:05 +08:00
hazeone
c2f0035fd2
[Java]Support getGpuIds API (#19031)
Add java getGpuIds() API which is the same as get_gpu_ids in python. We can get deviceId if we've allocated a GPU to a worker.
2021-10-13 23:40:26 +08:00
Qing Wang
b6d67d2ba9
Use javac -h instead of javah. (#19311) 2021-10-12 22:37:14 +08:00
Guyang Song
ab55b808c5
[runtime env] move worker env to runtime env in Java (#19060) 2021-10-11 17:25:09 +08:00
liuyang-my
5353c5c2f1
Define Java Proxy and RayServeHandle (#18630) 2021-10-10 23:39:04 -07:00
Edward Oakes
9cf19b67cc
[serve] Remove log poll client from replicas (#19145)
In general, broadcasting changes to the replicas via the LongPollClient is hard to reason about (it circumvents our versioning semantics as there's no rolling update). Ideally we would only be using the LongPollClient to broadcast replica membership and nothing else.
2021-10-08 12:32:42 -05:00
Stephanie Wang
940f84cedb
[core] Remove unused plasma promotion path (#19122)
* remove unused

* lint

* lint

* lint
2021-10-07 10:55:50 -07:00
Qing Wang
90d2456ec7
[Java] Support userloggers. (#18846)
Co-authored-by: Kai Yang <kfstorm@outlook.com>
2021-09-26 16:53:06 +08:00
Qing Wang
3ad1553b34
[Java] Remove API setJvmOptions(String). (#18664) 2021-09-22 20:00:49 +08:00
liuyang-my
ed04ab7140
Define protobuf for RequestMetadata and HTTPRequestWrapper (#18203) 2021-09-15 14:39:27 -07:00
Stephanie Wang
284dee493e
[core][usability] Disambiguate ObjectLostErrors for better understandability (#18292)
* Define error types, throw error for ObjectReleased

* x

* Disambiguate OBJECT_UNRECONSTRUCTABLE and OBJECT_LOST

* OwnerDiedError

* fix test

* x

* ObjectReconstructionFailed

* ObjectReconstructionFailed

* x

* x

* print owner addr

* str

* doc

* rename

* x
2021-09-13 16:16:17 -07:00
Guyang Song
3bc5f0501f
fix WaitPlacementGroupReady API (#18464) 2021-09-13 14:07:40 +08:00
Lingxuan Zuo
a67b9ee8d7
Remove custom resource from streaming (#18490) 2021-09-12 12:20:59 -07:00