hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Hao Chen	f31a79f3f7	Implement actor checkpointing (#3839 ) * Implement Actor checkpointing * docs * fix * fix * fix * move restore-from-checkpoint to HandleActorStateTransition * Revert "move restore-from-checkpoint to HandleActorStateTransition" This reverts commit 9aa4447c1e3e321f42a1d895d72f17098b72de12. * resubmit waiting tasks when actor frontier restored * add doc about num_actor_checkpoints_to_keep=1 * add num_actor_checkpoints_to_keep to Cython * add checkpoint_expired api * check if actor class is abstract * change checkpoint_ids to long string * implement java * Refactor to delay actor creation publish until checkpoint is resumed * debug, lint * Erase from checkpoints to restore if task fails * fix lint * update comments * avoid duplicated actor notification log * fix unintended change * add actor_id to checkpoint_expired * small java updates * make checkpoint info per actor * lint * Remove logging * Remove old actor checkpointing Python code, move new checkpointing code to FunctionActionManager * Replace old actor checkpointing tests * Fix test and lint * address comments * consolidate kill_actor * Remove __ray_checkpoint__ * fix non-ascii char * Loosen test checks * fix java * fix sphinx-build	2019-02-13 19:39:02 +08:00
Wang Qing	c523bc04ad	Enable redis password in Java worker (#3943 ) * Support Java redis password * Fix * Refine * Fix lint.	2019-02-12 13:11:25 +08:00
Wang Qing	bc438ca73b	[Java] Refine Java config item (#4014 ) * Refine * Address comment.	2019-02-11 23:55:40 +08:00
Yuhong Guo	3a66d47a3a	Remove RAY_CHECK from JNI code (#3978 ) * Remove RAY_CHECK in JNI * Try to add mvn test to test the exception. * Refine * Address comments	2019-02-09 18:10:22 +08:00
Ion	f987572795	Inline objects (#3756 ) * added store_client_ to object_manager and node_manager * half through... * all code in, and compiling! Nothing tested though... * something is working ;-) * added a few more comments * now, add only one entry to the in GCS for inlined objects * more comments * remove a spurious todo * some comment updates * add test * added support for meta data for inline objects * avoid some copies * Initialize plasma client in tests * Better comments. Enable configuring nline_object_max_size_bytes. * Update src/ray/object_manager/object_manager.cc Co-Authored-By: istoica <istoica@cs.berkeley.edu> * Update src/ray/raylet/node_manager.cc Co-Authored-By: istoica <istoica@cs.berkeley.edu> * Update src/ray/raylet/node_manager.cc Co-Authored-By: istoica <istoica@cs.berkeley.edu> * fiexed comments * fixed various typos in comments * updated comments in object_manager.h and object_manager.cc * addressed all comments...hopefully ;-) * Only add eviction entries for objects that are not inlined * fixed a bunch of comments * Fix test * Fix object transfer dump test * lint * Comments * Fix test? * Fix test? * lint * fix build * Fix build * lint * Use const ref * Fixes, don't let object manager hang * Increase object transfer retry time for travis? * Fix test * Fix test? * Add internal config to java, fix PlasmaFreeTest	2019-02-07 10:32:39 -08:00
Philipp Moritz	3bb65677dc	Use one memory mapped file for plasma (#3871 )	2019-02-06 23:53:05 -08:00
Wang Qing	e1c68a0881	Enable including Java worker for `ray start` command (#3838 )	2019-02-04 16:23:43 +08:00
Wang Qing	dcb744518e	Implement actor dummy object gc in java (#3822 ) * Add dummy object gc in java * Fix * Address comments. * Refine * Address comments.	2019-01-23 11:56:25 -08:00
Wang Qing	816406ea3d	[Java] Fix `setCurrentTask()` in multi threading (#3821 )	2019-01-23 20:45:30 +08:00
Wang Qing	3cf59855af	[Java] Replace junit with testNG (#3768 )	2019-01-14 17:49:17 +08:00
Hao Chen	1bb20badec	[Java] Fix bug when actor creation task fails (#3740 ) * [Java] Fix bug when actor creation task fails * remove imports	2019-01-14 11:09:15 +08:00
Wang Qing	8674606e26	Support to auto-generate Java files from flatbuffer (#3749 ) * auto gen flatbuffers for Java * Add auto_gen_tool.py * Refine * Add a comment * address comments. * Address comments. * Addressed * Refine * Address comments * Fix typo * Add exception * Address comments. * Refine * Fix lint * Fix * Fix lint and address comment. * Fix lint error	2019-01-13 11:39:23 -08:00
Wang Qing	0a556dc0b5	Refine redis client (#3758 )	2019-01-12 23:01:48 +08:00
Wang Qing	a0cf8ee5a8	Refine Java worker code (#3735 )	2019-01-12 22:45:33 +08:00
Hao Chen	597abb24ea	Refine multi-threading support (#3672 ) * [Python] refine multi-threading support fix * [java] refine multithreading code fix java * format	2019-01-10 13:58:11 -08:00
Stephanie Wang	04f31db54d	Actor dummy object garbage collection (#3593 ) * Convert UniqueID::nil() to a constructor * Cleanup actor handle pickling code * Add new actor handles to the task spec * Pass in new actor handles * Add new handles to the actor registration * Regression test for actor handle forking and GC * lint and doc * Handle pickled actor handles in the backend and some refactoring * Add regression test for dummy object GC and pickled actor handles * Check for duplicate actor tasks on submission * Regression test for forking twice, fix failed named actor leak * Fix bug for forking twice * lint * Revert "Fix bug for forking twice" This reverts commit 3da85e59d401e53606c2e37ffbebcc8653ff27ac. * Add new actor handles when task is assigned, not finished * Remove comment * remove UniqueID() * Updates * update * fix * fix java * fixes * fix	2019-01-09 10:37:11 -08:00
Wang Qing	692fdc6bc3	[Java] Allow actor handle to be serialized without forking (#3686 )	2019-01-06 00:29:08 +08:00
Wang Qing	c59b506c6e	[Java] Support calling Ray APIs from multiple threads (#3646 )	2018-12-28 17:44:31 +08:00
Wang Qing	4cde971916	[Java] Print the log message slowly. (#3633 )	2018-12-26 16:33:21 +08:00
Wang Qing	a971b73bbe	[Java] Fix the issue when waiting an empty list or a null pointer (#3632 )	2018-12-26 11:29:29 +08:00
Wang Qing	8393df2516	Use BaseTest to instead of TestListener. (#3577 )	2018-12-21 16:29:16 -08:00
bibabolynn	e65b8f18f4	[java] change RayLog.core to org.slf4j.Logger (#3579 )	2018-12-21 15:58:32 +08:00
Yuhong Guo	fb33fa9097	Enable function_descriptor in backend to replace the function_id (#3028 )	2018-12-18 18:53:59 -05:00
bibabolynn	7fd24e384b	[java] Pass large args by reference (#3504 )	2018-12-14 23:32:35 +08:00
Yuhong Guo	a4abe6c0fe	Add test to test raylet client connection when raylet crashes. (#3518 )	2018-12-13 23:40:50 -08:00
Hao Chen	e7b51cbd1b	[xray] Implement Actor Reconstruction (#3332 ) * Implement Actor Reconstruction * fix * fix actor handle __del__ * fix lint * add comment * Remove actorCreationDummyObjectId * address comments * fix * address comments * avoid copy * change log to debug * fix error name	2018-12-13 21:28:58 -08:00
Si-Yuan	84fae57ab5	Convert the raylet client (the code in local_scheduler_client.cc) to proper C++. (#3511 ) * refactoring * fix bugs * create client class * create client class for java; bug fix * remove legacy code * improve code by using std::string, std::unique_ptr rename private fields and removing legacy code * rename class * improve naming * fix * rename files * fix names * change name * change return types * make a mutex private field * fix comments * fix bugs * lint * bug fix * bug fix * move too short functions into the header file * Loose crash conditions for some APIs. * Apply suggestions from code review Co-Authored-By: suquark <suquark@gmail.com> * format * update * rename python APIs * fix java * more fixes * change types of cpython interface * more fixes * improve error processing * improve error processing for java wrapper * lint * fix java * make fields const * use pointers for [out] parameters * fix java & error msg * fix resource leak, etc.	2018-12-13 13:39:10 -08:00
Yuhong Guo	0136af5aac	Add return value for recontruction RPC. (#3493 ) * Add return value for recontruct RPC. * Fix comment function name	2018-12-09 00:08:44 -08:00
Hao Chen	abd37df41e	Add stress test for Java worker (#3424 )	2018-12-01 16:11:09 -08:00
Stephanie Wang	447604a9fe	Use actor ID for the dummy object (#3437 )	2018-11-29 22:31:04 -08:00
Stephanie Wang	d950e92f63	Allow multiple threads to call ray.get and ray.wait (#3244 ) * Handle multiple threads calling ray.get * Multithreaded ray.wait * Pass in current task ID in java backend * Add multithreaded actor to tests, add warning messages to worker for multithreaded ray.get * Fix test * Some cleanups * Improve error message * Add assertion * Cleanup, throw error in HandleTaskUnblocked if task not actually blocked * lint * Fix python worker reset * Fix references to reconstruct_objects * Linting * java lint * Fix java * Fix iterator	2018-11-07 22:39:28 -08:00
Richard Liaw	0bab8ed95c	Expose internal config parameters for starting Ray (#3246 ) ## What do these changes do? This PR exposes the CL option for using a config parameter. This is important for certain tests (i.e., FT tests that removing nodes) to run quickly. Note that this is bad practice and should be replaced with GFLAGS or some equivalent as soon as possible. #3239 depends on this. TODO: - [x] Add documentation to method arguments before merging. - [x] Add test to verify this works? ## Related issue number	2018-11-07 21:46:02 -08:00
Robert Nishihara	fd854ff090	Allow the node manager port and object manager port to be set through… (#3130 ) * Allow the node manager port and object manager port to be set through ray start. * Linting * Fix Java test * Address comments.	2018-10-28 17:28:41 -07:00
Robert Nishihara	658c14282c	Remove legacy Ray code. (#3121 ) * Remove legacy Ray code. * Fix cmake and simplify monitor. * Fix linting * Updates * Fix * Implement some methods. * Remove more plasma manager references. * Fix * Linting * Fix * Fix * Make sure class IDs are strings. * Some path fixes * Fix * Path fixes and update arrow * Fixes. * linting * Fixes * Java fixes * Some java fixes * TaskLanguage -> Language * Minor * Fix python test and remove unused method signature. * Fix java tests * Fix jenkins tests * Remove commented out code.	2018-10-26 13:36:58 -07:00
bibabolynn	b4614ae69a	[java] customize path of ray.conf (#3100 ) users can add custom path of ray.config by using -Dray.config=/path/to/ray.conf	2018-10-26 13:36:34 +08:00
Hanwei Jin	7c1fd19fd9	[Java] support python worker command in raylet (#3092 ) <!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> ## What do these changes do? support raylet, which is started by java runManager, to start python default_worker.py . So when doing local test of java call python task, it helps auto start python worker. ## Related issue number <!-- Are there any issues opened that will be resolved by merging this change? -->	2018-10-24 20:43:39 +08:00
bibabolynn	9a5c273db7	[java] fix check exception type (#3093 ) <!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> ## What do these changes do? remove TaskExecutionException, use RayException instead <!-- Please give a short brief about these changes. --> ## Related issue number <!-- Are there any issues opened that will be resolved by merging this change? -->	2018-10-19 06:43:42 -07:00
Wang Qing	b410ee0d29	[Java] Support dynamically defining resources when submitting task. (#3070 ) ## What do these changes do? Before this PR, if we want to specify some resources, we must do as following codes: ```java @RayRemote(Resources={ResourceItem("CPU", 10)}) public static void f1() { // do sth } @RayRemote(Resources={ResourceItem("CPU", 10)}) class Demo { // sth } ``` Unfortunately, it's no way for us to create another actor or task with different resources required. After this PR, the thing will be: ```java ActorCreationOptions option = new ActorCreationOptions(); option.resources.put("CPU", 4.0); RayActor<Echo> echo1 = Ray.createActor(Echo::new, option); option.resources.put("Res-A", 4.0); RayActor<Echo> echo2 = Ray.createActor(Echo::new, option); //if we don't specify resource, the resources will be `{"cpu":0.0}` by default. Ray.call(Echo::echo, echo2, 100); ``` ## Related issue number N/A	2018-10-19 06:22:32 -07:00
Wang Qing	64e5eb305e	[Java] Add jvm-parameters in Config. (#3065 )	2018-10-16 15:03:18 -07:00
Wang Qing	828fe24b39	[Java] Fix loading driver resources issue. (#3046 ) ## What do these changes do? Fix the issue how we load driver resources by a specified path. Also this addressed the comments from the related PR [3044](https://github.com/ray-project/ray/pull/3044). ## Related PRs: [#3044](https://github.com/ray-project/ray/pull/3044) and [#3001](https://github.com/ray-project/ray/pull/3001).	2018-10-11 09:45:21 -07:00
Wang Qing	4a2ed47b6c	[Java] Improve some Java code (#3040 ) This PR improves some java codes, and removes some duplicated code.	2018-10-10 17:30:23 -07:00
Wang Qing	84bf5fc8f3	[Java] Load driver resources from local path. (#3001 ) ## What do these changes do? 1. Add a configuration item `driver.resource-path`. 2. Load driver resources from the local path which is specified in the `ray.conf`. Before this change, we should add all driver resources(like user's jar package, dependencies package and config files) into `classpath`. After this change, we should add the driver resources into the mount path which we can configure it in `ray.conf`, and we shouldn't configure `classpath` for driver resources any more. ## Related issue number N/A	2018-10-08 21:05:26 +01:00
Robert Nishihara	faa31ae018	Introduce concept of resources required for placing a task. (#2837 ) * Introduce concept of resources required for placement. * Add placement resources to task spec * Update java worker * Update taskinfo.java	2018-10-04 10:35:39 -07:00
bibabolynn	9c606ea06c	fix bug: (#3000 ) before fix,RAY_FUN_CACHE use only get method ,can only get null fix : put after create	2018-10-02 22:53:54 -07:00
Wang Qing	fcef4edd46	[Java] Fix the required-resources issue of actor member function in Java worker. (#3002 ) This fixes a bug in which Java actor methods inherit the resource requirements of the actor creation task.	2018-10-01 12:56:36 -07:00
Hao Chen	4ffe1e3556	[Java] Fix: task spec's resource map should contain CPU (#2987 )	2018-09-28 14:23:38 -05:00
Wang Qing	68cf194e90	[fix] Fix ray.home configuration item. (#2977 ) If we set `ray.home` configuration item to `""`. The current `RayConfig` will set it to current work directory, like `/User/My/Ray`. But the some other configuration items(like `redisServerExecutablePath`) will be set to `/User/My/Ray//build/src/common/thirdparty/redis/src/redis-server` by mistake. Note: There are 2 `/` between current work directory and `build/src/common....` This PR will fix this issue.	2018-09-28 00:06:14 -05:00
Wang Qing	8e8e123777	[Java] Simplify Java worker configuration (#2938 ) ## What do these changes do? Previously, Java worker configuration is complicated, because it requires setting environment variables as well as command-line arguments. This PR aims to simplify Java worker's configuration. 1) Configuration management is now migrated to [lightbend config](https://github.com/lightbend/config), thus doesn't require setting environment variables. 2) Many unused config items are removed. 3) Provide a simple `example.conf` file, so users can get started quickly. 4) All possible options and their default values are declared and documented in `ray.default.conf` file. This PR also simplifies and refines the following code: 1) The process of `Ray.init()`. 2) `RunManager`. 3) `WorkerContext`. ### How to use this configuration? 1. Copy `example.conf` into your classpath and rename it to `ray.conf`. 2. Modify/add your configuration items. The all items are declared in `ray.default.conf`. 3. You can also set the items in java system prosperities. Note: configuration is read in this priority: System properties > `ray.conf` > `ray.default.conf` ## Related issue number N/A	2018-09-26 20:14:22 +08:00
Wang Qing	0e552fbb22	[Java] Update maven version to 0.1-SNAPSHOT Update the version in maven from 0.1 to 0.1-SNAPSHOT, because SNAPSHOT is the conventional version name in dev process. Non-snapshot versions are only used for release.	2018-09-26 18:08:46 +08:00
Hao Chen	971df5ea8a	[java] put function meta in task spec and load functions with function meta (#2881 ) This PR adds a `function_desc` field into task spec. a function descriptor is a list of strings that can uniquely describe a function. - For a Python function, it should be: [module_name, class_name, function_name] - For a Java function, it should be: [class_name, method_name, type_descriptor] There're a couple of purposes to add this field: In this PR: - Java worker needs to know function's class name to load it. Previously, since task spec didn't have such a field to hold this info, we did a hack by appending the class name to the argument list. With this change, we fixed that hack and significantly simplified function management in Java. Will be done in subsequent PRs: - Support cross-language invocation (#2576): currently Python worker manages functions by saving them in GCS and pass function id in task spec. However, if we want to call a Python function from Java, we cannot save it in GCS and get the function id. But instead, we can pass the function descriptor (module name, class name, function name) in task spec and use it to load the function. - Support deployment: one major problem of Python worker's current function management mechanism is #2327. In prod env, we should have a mechanism to deploy code and dependencies to the cluster. And when code is already deployed, we don't need to save functions to GCS any more and can use `function_desc` to manage functions.	2018-09-25 23:05:05 -07:00

... 2 3 4 5 6

252 commits