hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-08 11:31:40 -05:00

Author	SHA1	Message	Date
Yi Cheng	b729d458e2	[client] Move Client implementation of ObjectRef/ActorRef to python (#22148 ) `__dealloc__` is not allowed to call python code and this leads to two problems: - The data has already been cleaned up - Deadlock if there are locks used. THis PR move the implementation to python layer to avoid this	2022-02-06 13:03:51 -08:00
Yi Cheng	5ae8d5b8af	Revert "Revert "[client] Fix ray client object ref releasing in wrong context."" (#22091 ) Reverts ray-project/ray#22090	2022-02-04 14:50:23 -08:00
Yi Cheng	7ff1cbbb12	Revert "[client] Fix ray client object ref releasing in wrong context." (#22090 ) Reverts ray-project/ray#22025	2022-02-03 13:59:52 -08:00
Yi Cheng	588d540b68	[client] Fix ray client object ref releasing in wrong context. (#22025 )	2022-02-01 22:42:39 -08:00
Qing Wang	a37d9a2ec2	[Core] Support default actor lifetime. (#21283 ) Support the ability to specify a default lifetime for actors which are not specified lifetime when creating. This is a job level configuration item. #### API Change The Python API looks like: ```python ray.init(job_config=JobConfig(default_actor_lifetime="detached")) ``` Java API looks like: ```java System.setProperty("ray.job.default-actor-lifetime", defaultActorLifetime.name()); Ray.init(); ``` One example usage is: ```python ray.init(job_config=JobConfig(default_actor_lifetime="detached")) a1 = A.options(lifetime="non_detached").remote() # a1 is a non-detached actor. a2 = A.remote() # a2 is a non-detached actor. ``` Co-authored-by: Kai Yang <kfstorm@outlook.com> Co-authored-by: Qing Wang <jovany.wq@antgroup.com>	2022-01-22 12:26:08 +08:00
Stephanie Wang	3a5dd9a10b	[core] Pin object if it already exists (#20447 ) A worker can crash right after putting its return values into the object store. Then, the owner will receive the worker crashed error, but the return objects will still be in the remote object store. Later, if the task is retried, the worker will crash on [this line](https://github.com/ray-project/ray/blob/master/src/ray/core_worker/transport/direct_actor_transport.cc#L105) because the object already exists. Another way this can happen is if a task has multiple return values, and one of those return values is transferred to another node. If the task is later re-executed on that node, the task will fail because of the same error. This PR fixes the crash so that: 1. If an object already exists, we try to pin that copy. Ideally, we should destroy the old copy and create the new one to make sure that metadata like the owner address is in sync, but this is pretty complicated to do right now. 2. If the pinning fails, we store an OBJECT_LOST error to throw to the application. 3. On the raylet, we check whether we already have the object pinned, and only subscribe to the owner's eviction message if the object is not pinned. 4. Also fixes bugs in the analogous case for `ray.put` (previously this would hang, now the application will receive an error if a `ray.put` object already exists).	2021-12-10 15:56:43 -08:00
Jiajun Yao	5b168a1515	[Scheduler] Support per task/actor PlacementGroupSchedulingStrategy (#20507 ) This PR adds per task/actor scheduling strategy and currently the only strategy are PlacementGroupSchedulingStrategy and DefaultSchedulingStrategy. Going forward, people should use `scheduling_strategy=PlacementGroupSchedulingStrategy` to define placement group for actor/task. The old way will be deprecated.	2021-12-07 23:11:31 -08:00
Qing Wang	048e7f7d5d	[Core] Port concurrency groups with asyncio (#18567 ) ## Why are these changes needed? This PR aims to port concurrency groups functionality with asyncio for Python. ### API ```python @ray.remote(concurrency_groups={"io": 2, "compute": 4}) class AsyncActor: def __init__(self): pass @ray.method(concurrency_group="io") async def f1(self): pass @ray.method(concurrency_group="io") def f2(self): pass @ray.method(concurrency_group="compute") def f3(self): pass @ray.method(concurrency_group="compute") def f4(self): pass def f5(self): pass ``` The annotation above the actor class `AsyncActor` defines this actor will have 2 concurrency groups and defines their max concurrencies, and it has a default concurrency group. Every concurrency group has an async eventloop and a pythread to execute the methods which is defined on them. Method `f1` will be invoked in the `io` concurrency group. `f2` in `io`, `f3` in `compute` and etc. TO BE NOTICED, `f5` and `__init__` will be invoked in the default concurrency. The following method `f2` will be invoked in the concurrency group `compute` since the dynamic specifying has a higher priority. ```python a.f2.options(concurrency_group="compute").remote() ``` ### Implementation The straightforward implementation details are: - Before we only have 1 eventloop binding 1 pythread for an asyncio actor. Now we create 1 eventloop binding 1 pythread for every concurrency group of the asyncio actor. - Before we have 1 fiber state for every caller in the asyncio actor. Now we create a FiberStateManager for every caller in the asyncio actor. And the FiberStateManager manages the fiber states for concurrency groups. ## Related issue number #16047	2021-10-21 21:46:56 +08:00
Edward Oakes	1fa81673bd	[runtime_env] Clean up validation logic (#18984 ) Splits the runtime_env parsing/validation and overriding into two separate codepaths. Adds unit testing for both.	2021-10-07 14:24:41 -05:00
mwtian	e41109a5e7	[Client] Use async rpc for remote call and actor creation (#18298 ) * Use async rpc for remote calls, task and actor creations. * fix * check placement * check placement group. wait for id in destructor * fix * fix exception in destructor * Add test * revert change * Fix comment * fix	2021-09-22 18:30:50 -07:00
mwtian	32f71765e9	[Client] Allow Client{Object,Actor}Ref to accept a future. (#18677 ) * Allow Client{Object,Actor}Ref to accept a future. Check number of args and returns synchronously. * rename callback, fix	2021-09-18 16:32:02 -07:00
Stephanie Wang	284dee493e	[core][usability] Disambiguate ObjectLostErrors for better understandability (#18292 ) * Define error types, throw error for ObjectReleased * x * Disambiguate OBJECT_UNRECONSTRUCTABLE and OBJECT_LOST * OwnerDiedError * fix test * x * ObjectReconstructionFailed * ObjectReconstructionFailed * x * x * print owner addr * str * doc * rename * x	2021-09-13 16:16:17 -07:00
Stephanie Wang	d43d297d9a	[core] Attach call site to ObjectRefs, print on error (#17971 ) * Attach call site to ObjectRef * flag * Fix build * build * build * build * x * x * skip on windows * lint	2021-09-01 15:29:05 -07:00
Chen Shen	9565fa549e	[Core][RFC] limit the total number of inlined bytes in task request rpc Co-authored-by: Clark Zinzow <clarkzinzow@gmail.com>	2021-08-12 13:55:54 -07:00
Jialing He	492076806d	[object store] Assign the object owner in `ray.put()` (#16833 )	2021-07-20 11:06:00 -07:00
Stephanie Wang	ce25d4e896	[core] Record Plasma object sources and dump on out of memory (#16179 ) * debug * lint, build * clean up logs * fix build	2021-06-02 10:04:15 -07:00
architkulkarni	a0c1cfe034	[Core] Pass RuntimeEnv as opaque string in the task spec (#15658 )	2021-05-13 10:32:00 -05:00
fyrestone	52cfa1cdd7	Fix load code from local (#12102 )	2021-03-24 11:49:58 +08:00
Clark Zinzow	cd7e567a57	[Core] Ownership-based Object Directory - Added support for object spilling in the ownership-based object directory. (#13948 ) * Add support for object spilling in the ownership-based object directory. * Move owner address hashmap into pinned_objects_ and objects_pending_spill_. * Update local object manager tests. * Feedback and misc. fixes. * Move spilled unpin callback lambda to std::binded private method. * Skip test_delete_objects_multi_node test on MacOS for now.	2021-02-11 10:36:22 -08:00
Hao Chen	77cd0d5a21	Fix a crash problem caused by GetActorHandle in ActorManager (#13164 )	2021-01-08 12:11:08 +08:00
mehrdadn	fb5280f21b	Fix some Windows CI issues (#9708 ) Co-authored-by: Mehrdad <noreply@github.com>	2020-07-28 18:10:23 -07:00
Hao Chen	d49dadf891	Change Python's `ObjectID` to `ObjectRef` (#9353 )	2020-07-10 17:49:04 +08:00
SangBin Cho	8f19f1eafb	[Core] Actor handle refactoring (#8895 ) * Marking needed changes. * Resolve basic dependencies. * In progress. * linting. * In progress 2. * Linting. * Refactor done. Cleanup needed. * Linting. * Recover kill actor in core worker because it is used inside raylet * Cleanup. * Use unique pointer instead. Unit tests are broken now. * Fix the upstream change. * Addressed code review 1. * Lint. * Addressed code review 2. * Fix weird github history. * Lint. * Linting using clang 7.0. * Use a better check message. * Revert cpp stuff. * Fix weird linting errors. * Manuall fix all lint issues. * Update a newline. * Refactor some interface. * Addressed all code review. * Addressed code review	2020-07-07 11:11:41 -07:00
mehrdadn	92f67cd2ae	Add Optional Fast Build Configuration (#8925 ) * Fast builds by default * Update doc/source/development.rst Co-authored-by: Simon Mo <xmo@berkeley.edu> Co-authored-by: Mehrdad <noreply@github.com> Co-authored-by: Simon Mo <xmo@berkeley.edu>	2020-06-18 14:12:12 -07:00
Edward Oakes	2677b71003	Implement named actors using the GCS service (#8328 )	2020-05-09 08:58:10 -05:00
Kai Yang	48b48cc8c2	Support multiple core workers in one process (#7623 )	2020-04-07 11:01:47 +08:00
ijrsvt	9bfc2c4b54	Moving Local Mode to C++ (#7670 )	2020-04-01 15:50:57 -05:00
Simon Mo	b804d40c04	Stop vendoring pyarrow (#7233 )	2020-02-19 19:01:26 -08:00
Simon Mo	7bef7031c2	Revert "Revert "Revert "Removing Pyarrow dependency (#7146 )" (#7209 ) (#7214 )" (#7232 )	2020-02-19 13:35:29 -08:00
Simon Mo	e8941b1b79	Revert "Revert "Removing Pyarrow dependency (#7146 )" (#7209 ) (#7214 )	2020-02-19 10:08:52 -08:00
Eric Liang	0aa9373d62	Revert "Removing Pyarrow dependency (#7146 )" (#7209 ) This reverts commit `2116fd3bca`.	2020-02-18 14:12:06 -08:00
ijrsvt	2116fd3bca	Removing Pyarrow dependency (#7146 )	2020-02-17 18:00:13 -08:00
Simon Mo	0e94e1dc2a	[Asyncio] Increase recursion limit manually (#7142 )	2020-02-12 14:15:36 -08:00
fyrestone	0648bd28ef	[xlang] Cross language Python support (#6709 )	2020-02-08 13:01:28 +08:00
Edward Oakes	984490d2be	Collect object IDs during serialization (#6946 )	2020-02-03 18:38:11 -08:00
Edward Oakes	2a4d2c6e9e	Basic reference counting & pinning (#6554 )	2020-01-06 17:30:26 -06:00
Chaokun Yang	6272907a57	[Streaming] Streaming data transfer and python integration (#6185 )	2019-12-10 20:33:24 +08:00

37 commits