* Implement Actor checkpointing
* docs
* fix
* fix
* fix
* move restore-from-checkpoint to HandleActorStateTransition
* Revert "move restore-from-checkpoint to HandleActorStateTransition"
This reverts commit 9aa4447c1e3e321f42a1d895d72f17098b72de12.
* resubmit waiting tasks when actor frontier restored
* add doc about num_actor_checkpoints_to_keep=1
* add num_actor_checkpoints_to_keep to Cython
* add checkpoint_expired api
* check if actor class is abstract
* change checkpoint_ids to long string
* implement java
* Refactor to delay actor creation publish until checkpoint is resumed
* debug, lint
* Erase from checkpoints to restore if task fails
* fix lint
* update comments
* avoid duplicated actor notification log
* fix unintended change
* add actor_id to checkpoint_expired
* small java updates
* make checkpoint info per actor
* lint
* Remove logging
* Remove old actor checkpointing Python code, move new checkpointing code to FunctionActionManager
* Replace old actor checkpointing tests
* Fix test and lint
* address comments
* consolidate kill_actor
* Remove __ray_checkpoint__
* fix non-ascii char
* Loosen test checks
* fix java
* fix sphinx-build
* distinct plasma client exception
* Update ObjectStoreProxy.java
* Update and rename PlasmaArrowTest.java to PlasmaStoreTest.java
* store put
* Use testng to replace junit to fix test failure
* added store_client_ to object_manager and node_manager
* half through...
* all code in, and compiling! Nothing tested though...
* something is working ;-)
* added a few more comments
* now, add only one entry to the in GCS for inlined objects
* more comments
* remove a spurious todo
* some comment updates
* add test
* added support for meta data for inline objects
* avoid some copies
* Initialize plasma client in tests
* Better comments. Enable configuring nline_object_max_size_bytes.
* Update src/ray/object_manager/object_manager.cc
Co-Authored-By: istoica <istoica@cs.berkeley.edu>
* Update src/ray/raylet/node_manager.cc
Co-Authored-By: istoica <istoica@cs.berkeley.edu>
* Update src/ray/raylet/node_manager.cc
Co-Authored-By: istoica <istoica@cs.berkeley.edu>
* fiexed comments
* fixed various typos in comments
* updated comments in object_manager.h and object_manager.cc
* addressed all comments...hopefully ;-)
* Only add eviction entries for objects that are not inlined
* fixed a bunch of comments
* Fix test
* Fix object transfer dump test
* lint
* Comments
* Fix test?
* Fix test?
* lint
* fix build
* Fix build
* lint
* Use const ref
* Fixes, don't let object manager hang
* Increase object transfer retry time for travis?
* Fix test
* Fix test?
* Add internal config to java, fix PlasmaFreeTest
* Convert UniqueID::nil() to a constructor
* Cleanup actor handle pickling code
* Add new actor handles to the task spec
* Pass in new actor handles
* Add new handles to the actor registration
* Regression test for actor handle forking and GC
* lint and doc
* Handle pickled actor handles in the backend and some refactoring
* Add regression test for dummy object GC and pickled actor handles
* Check for duplicate actor tasks on submission
* Regression test for forking twice, fix failed named actor leak
* Fix bug for forking twice
* lint
* Revert "Fix bug for forking twice"
This reverts commit 3da85e59d401e53606c2e37ffbebcc8653ff27ac.
* Add new actor handles when task is assigned, not finished
* Remove comment
* remove UniqueID()
* Updates
* update
* fix
* fix java
* fixes
* fix
## What do these changes do?
This PR exposes the CL option for using a config parameter. This is important for certain tests (i.e., FT tests that removing nodes) to run quickly.
Note that this is bad practice and should be replaced with GFLAGS or some equivalent as soon as possible.
#3239 depends on this.
TODO:
- [x] Add documentation to method arguments before merging.
- [x] Add test to verify this works?
## Related issue number
<!--
Thank you for your contribution!
Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request.
-->
## What do these changes do?
support raylet, which is started by java runManager, to start python default_worker.py .
So when doing local test of java call python task, it helps auto start python worker.
## Related issue number
<!-- Are there any issues opened that will be resolved by merging this change? -->
<!--
Thank you for your contribution!
Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request.
-->
## What do these changes do?
remove TaskExecutionException, use RayException instead
<!-- Please give a short brief about these changes. -->
## Related issue number
<!-- Are there any issues opened that will be resolved by merging this change? -->
## What do these changes do?
Before this PR, if we want to specify some resources, we must do as following codes:
```java
@RayRemote(Resources={ResourceItem("CPU", 10)})
public static void f1() {
// do sth
}
@RayRemote(Resources={ResourceItem("CPU", 10)})
class Demo {
// sth
}
```
Unfortunately, it's no way for us to create another actor or task with different resources required.
After this PR, the thing will be:
```java
ActorCreationOptions option = new ActorCreationOptions();
option.resources.put("CPU", 4.0);
RayActor<Echo> echo1 = Ray.createActor(Echo::new, option);
option.resources.put("Res-A", 4.0);
RayActor<Echo> echo2 = Ray.createActor(Echo::new, option);
//if we don't specify resource, the resources will be `{"cpu":0.0}` by default.
Ray.call(Echo::echo, echo2, 100);
```
## Related issue number
N/A