* Modifying WORKSPACE file, so that you can make the project used as a thirdparty
* Modifying WORKSPACE file, so that you can make the project used as a thirdparty
* add some files
* modify some repositories
* modify the name of 'ray_deps_build_all'
* Implement Actor checkpointing
* docs
* fix
* fix
* fix
* move restore-from-checkpoint to HandleActorStateTransition
* Revert "move restore-from-checkpoint to HandleActorStateTransition"
This reverts commit 9aa4447c1e3e321f42a1d895d72f17098b72de12.
* resubmit waiting tasks when actor frontier restored
* add doc about num_actor_checkpoints_to_keep=1
* add num_actor_checkpoints_to_keep to Cython
* add checkpoint_expired api
* check if actor class is abstract
* change checkpoint_ids to long string
* implement java
* Refactor to delay actor creation publish until checkpoint is resumed
* debug, lint
* Erase from checkpoints to restore if task fails
* fix lint
* update comments
* avoid duplicated actor notification log
* fix unintended change
* add actor_id to checkpoint_expired
* small java updates
* make checkpoint info per actor
* lint
* Remove logging
* Remove old actor checkpointing Python code, move new checkpointing code to FunctionActionManager
* Replace old actor checkpointing tests
* Fix test and lint
* address comments
* consolidate kill_actor
* Remove __ray_checkpoint__
* fix non-ascii char
* Loosen test checks
* fix java
* fix sphinx-build
* distinct plasma client exception
* Update ObjectStoreProxy.java
* Update and rename PlasmaArrowTest.java to PlasmaStoreTest.java
* store put
* Use testng to replace junit to fix test failure
* added store_client_ to object_manager and node_manager
* half through...
* all code in, and compiling! Nothing tested though...
* something is working ;-)
* added a few more comments
* now, add only one entry to the in GCS for inlined objects
* more comments
* remove a spurious todo
* some comment updates
* add test
* added support for meta data for inline objects
* avoid some copies
* Initialize plasma client in tests
* Better comments. Enable configuring nline_object_max_size_bytes.
* Update src/ray/object_manager/object_manager.cc
Co-Authored-By: istoica <istoica@cs.berkeley.edu>
* Update src/ray/raylet/node_manager.cc
Co-Authored-By: istoica <istoica@cs.berkeley.edu>
* Update src/ray/raylet/node_manager.cc
Co-Authored-By: istoica <istoica@cs.berkeley.edu>
* fiexed comments
* fixed various typos in comments
* updated comments in object_manager.h and object_manager.cc
* addressed all comments...hopefully ;-)
* Only add eviction entries for objects that are not inlined
* fixed a bunch of comments
* Fix test
* Fix object transfer dump test
* lint
* Comments
* Fix test?
* Fix test?
* lint
* fix build
* Fix build
* lint
* Use const ref
* Fixes, don't let object manager hang
* Increase object transfer retry time for travis?
* Fix test
* Fix test?
* Add internal config to java, fix PlasmaFreeTest
## What do these changes do?
Before this PR, if we want to specify some resources, we must do as following codes:
```java
@RayRemote(Resources={ResourceItem("CPU", 10)})
public static void f1() {
// do sth
}
@RayRemote(Resources={ResourceItem("CPU", 10)})
class Demo {
// sth
}
```
Unfortunately, it's no way for us to create another actor or task with different resources required.
After this PR, the thing will be:
```java
ActorCreationOptions option = new ActorCreationOptions();
option.resources.put("CPU", 4.0);
RayActor<Echo> echo1 = Ray.createActor(Echo::new, option);
option.resources.put("Res-A", 4.0);
RayActor<Echo> echo2 = Ray.createActor(Echo::new, option);
//if we don't specify resource, the resources will be `{"cpu":0.0}` by default.
Ray.call(Echo::echo, echo2, 100);
```
## Related issue number
N/A
## What do these changes do?
1. Add a configuration item `driver.resource-path`.
2. Load driver resources from the local path which is specified in the `ray.conf`.
Before this change, we should add all driver resources(like user's jar package, dependencies package and config files) into `classpath`.
After this change, we should add the driver resources into the mount path which we can configure it in `ray.conf`, and we shouldn't configure `classpath` for driver resources any more.
## Related issue number
N/A
If we set `ray.home` configuration item to `""`.
The current `RayConfig` will set it to current work directory, like `/User/My/Ray`.
But the some other configuration items(like `redisServerExecutablePath`) will be set to `/User/My/Ray//build/src/common/thirdparty/redis/src/redis-server` by mistake.
Note: There are 2 `/` between current work directory and `build/src/common....`
This PR will fix this issue.
## What do these changes do?
Previously, Java worker configuration is complicated, because it requires setting environment variables as well as command-line arguments.
This PR aims to simplify Java worker's configuration.
1) Configuration management is now migrated to [lightbend config](https://github.com/lightbend/config), thus doesn't require setting environment variables.
2) Many unused config items are removed.
3) Provide a simple `example.conf` file, so users can get started quickly.
4) All possible options and their default values are declared and documented in `ray.default.conf` file.
This PR also simplifies and refines the following code:
1) The process of `Ray.init()`.
2) `RunManager`.
3) `WorkerContext`.
### How to use this configuration?
1. Copy `example.conf` into your classpath and rename it to `ray.conf`.
2. Modify/add your configuration items. The all items are declared in `ray.default.conf`.
3. You can also set the items in java system prosperities.
Note: configuration is read in this priority:
System properties > `ray.conf` > `ray.default.conf`
## Related issue number
N/A
Update the version in maven from 0.1 to 0.1-SNAPSHOT, because SNAPSHOT is the conventional version name in dev process. Non-snapshot versions are only used for release.
This PR adds a `function_desc` field into task spec. a function descriptor is a list of strings that can uniquely describe a function.
- For a Python function, it should be: [module_name, class_name, function_name]
- For a Java function, it should be: [class_name, method_name, type_descriptor]
There're a couple of purposes to add this field:
In this PR:
- Java worker needs to know function's class name to load it. Previously, since task spec didn't have such a field to hold this info, we did a hack by appending the class name to the argument list. With this change, we fixed that hack and significantly simplified function management in Java.
Will be done in subsequent PRs:
- Support cross-language invocation (#2576): currently Python worker manages functions by saving them in GCS and pass function id in task spec. However, if we want to call a Python function from Java, we cannot save it in GCS and get the function id. But instead, we can pass the function descriptor (module name, class name, function name) in task spec and use it to load the function.
- Support deployment: one major problem of Python worker's current function management mechanism is #2327. In prod env, we should have a mechanism to deploy code and dependencies to the cluster. And when code is already deployed, we don't need to save functions to GCS any more and can use `function_desc` to manage functions.
* Trigger reconstruction in ray.wait and mark worker as blocked.
* Add test.
* Linting.
* Don't run new test with legacy Ray.
* Only call HandleClientUnblocked if it actually blocked in ray.wait.
* Reduce time to ray.wait in the test.