Commit graph

1148 commits

Author SHA1 Message Date
Edward Oakes
3aec683f61
Avoid fate sharing with owner for detached actors (#8267) 2020-05-01 11:58:47 -05:00
Edward Oakes
484f68765c
Fix resource_ids_ data race (#8253) 2020-04-30 18:55:54 -05:00
mehrdadn
254b1ec370
Set up testing and wheels for Windows on GitHub Actions (#8131)
* Move some Java tests into ci.sh

* Move C++ worker tests into ci.sh

* Define run()

* Prepare to move Python tests into ci.sh

* Fix issues in install-dependencies.sh

* Reload environment for GitHub Actions

* Move wheels to ci.sh and fix related issues

* Don't bypass failures in install-ray.sh anymore

* Make CI a little quieter

* Move linting into ci.sh

* Add vitals test right after build

* Fix os.uname() unavailability on Windows

Co-authored-by: Mehrdad <noreply@github.com>
2020-04-29 21:19:02 -07:00
Edward Oakes
ebdccde030
Fetch internal config from raylet (#8195) 2020-04-28 13:12:11 -05:00
fangfengbin
deffc340ea
[GCS]Add in-memory gcs table storage (#8184) 2020-04-28 17:19:46 +08:00
mehrdadn
b9de9dadd7
Fix Windows build (#8186)
Co-authored-by: Mehrdad <noreply@github.com>
2020-04-26 13:07:25 -07:00
fangfengbin
5bff707d20
[GCS]Add in-memory store client (#8144) 2020-04-26 19:09:26 +08:00
ZhuSenlin
9255fcd516
[GCS] Add node failure detector (#8119) 2020-04-26 19:08:27 +08:00
fangfengbin
c5d181e3d9
gcs adapts to worker table pub sub (#8182)
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-04-26 17:58:55 +08:00
fangfengbin
f17bea2de5
Fix get gcs server address block bug (#8126) 2020-04-26 10:01:06 +08:00
ijrsvt
69ff7e3e35
TaskCancellation (#7669)
* Smol comment

* WIP, not passing ray.init

* Fixed small problem

* wip

* Pseudo interrupt things

* Basic prototype operational

* correct proc title

* Mostly done

* Cleanup

* cleaner raylet error

* Cleaning up a few loose ends

* Fixing Race Conds

* Prelim testing

* Fixing comments and adding second_check for kill

* Working_new_impl

* demo_ready

* Fixing my english

* Fixing a few problems

* Small problems

* Cleaning up

* Response to changes

* Fixing error passing

* Merged to master

* fixing lock

* Cleaning up print statements

* Format

* Fixing Unit test build failure

* mock_worker fix

* java_fix

* Canel

* Switching to Cancel

* Responding to Review

* FixFormatting

* Lease cancellation

* FInal comments?

* Moving exist check to CoreWorker

* Fix Actor Transport Test

* Fixing task manager test

* chaning clock repr

* Fix build

* fix white space

* lint fix

* Updating to medium size

* Fixing Java test compilation issue

* lengthen bad timeouts
2020-04-25 16:04:52 -07:00
fangfengbin
38dfe5db86
remove store client template (#8160)
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-04-24 21:19:12 +08:00
fangfengbin
713e375d50
[GCS]GCS adapts to job table pub sub (#8145) 2020-04-24 16:33:25 +08:00
Qing Wang
d66d12661b
Improve the perf of constructing actor task specs. (#8093) 2020-04-21 11:54:09 +08:00
Stephanie Wang
eefea4e29c
[core] Post task submission to IO loop (#8090)
* Post to IO loop

* Unused

* Fix build
2020-04-20 19:13:50 -07:00
Stephanie Wang
1323e1753d
[core] When reconstruction is enabled, pin objects created by ray.put() (#8021)
* Unit test and pin ray.put objects until they have no more lineage references

* c++ tests

* lint

* Mark ray.put objects as pinned
2020-04-20 13:09:54 -07:00
ZhuSenlin
3f28a8a229
[GCS] reply to the owner only after the actor has been successfully created. (#8079)
* reply to the owner only after the actor is successfully created.

* reply immediately if the actor is already created

* fix comment

* add test_actor_creation_task provided by @Stephanie Wang

Co-authored-by: senlin.zsl <senlin.zsl@antfin.com>
2020-04-19 09:53:02 -07:00
Edward Oakes
90ef585fd5
Revert "Add ability to specify worker and driver ports (#7833)" (#8069)
This reverts commit 9f751ff8c4.
2020-04-17 12:32:22 -05:00
Eric Liang
55ce2bba10
Record num plasma errs in map (#8034) 2020-04-16 13:16:40 -07:00
Edward Oakes
9f751ff8c4
Add ability to specify worker and driver ports (#7833) 2020-04-16 13:49:25 -05:00
Clark Zinzow
d4cae5f632
[Core] Added ability to specify different IP addresses for a core worker and its raylet. (#7985) 2020-04-16 10:32:24 -05:00
fangfengbin
5a7882bb44
Fix gcs_server get invalid local address (#7842) 2020-04-16 14:58:19 +08:00
mehrdadn
ba00c29b67
Factor out Travis 'install' sections for use with GitHub Actions (#7988) 2020-04-15 08:10:22 -07:00
fangfengbin
efbaf155b2
[GCS]Add publish and subscribe function of gcs table (#7909) 2020-04-15 04:24:52 -07:00
fangfengbin
c17404918c
[GCS]Add gcs table storage interface (#7949) 2020-04-15 10:48:12 +08:00
fangfengbin
026abb119c
fix GrpcServer out-of-bounds bug (#7995)
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-04-14 10:34:29 +08:00
ZhuSenlin
4a81793ba5
GCS-Based actor management implementation (#6763)
* add gcs actor manager

* fix test_metrics.py

* fix TestTaskInfo

* fix comment

* fix comment

* fix comment

* fix comment

* fix comment

* fix comment

* fix compile error

* fix merge error

Co-authored-by: senlin.zsl <senlin.zsl@antfin.com>
2020-04-13 09:48:48 -07:00
mehrdadn
1b0f6fd558
Check AF_UNIX path length (#7951) 2020-04-13 09:30:01 -07:00
micafan
c222d64ca1
[GCS] Add MessagePublisher to GCS (#7771) 2020-04-13 19:32:28 +08:00
mehrdadn
7c52359b00
Fix Windows build (#7987)
Co-authored-by: Mehrdad <noreply@github.com>
2020-04-12 13:29:48 -07:00
Qing Wang
98bfcd53bc
[Java] Rename group id and package name. (#7864)
* Initial

* Change streaming's

* Fix

* Fix

* Fix org_ray

* Fix cpp file name

* Fix streaming

* Fix

* Fix

* Fix testlistening

* Fix missing sth in python

* Fix

* Fix

* Fix SPI

* Fix

* Fix complation

* Fix

* Fix CI

* Fix checkstyle

Fix checkstyle

* Fix streaming tests

* Fix streaming CI

* Fix streaming checkstyle.

* Fix build

* Fix bazel dep

* Fix

* Fix ray checkstyle

* Fix streaming checkstyle

* Fix bazel checkstyle
2020-04-12 17:59:34 +08:00
mehrdadn
07002825aa
Proper command-line parsing (#7603)
* Command-line parsing functions

* Work around bug in MSVCRT for passing command-lines to programs

* Polishing

* Fix std::regex_replace() overload compatibility issue with GCC 4.8.x

* Try to work around linker error

* Implement ScanToken()

* Parse command-lines via ScanToken

* Merge src/ray/util.cc and src/ray/url.cc

Co-authored-by: Mehrdad <noreply@github.com>
2020-04-11 23:07:07 -07:00
Stephanie Wang
d7eef808b8
[core] Reconstruction for lost plasma objects (#7733)
* Add a lineage_ref_count to References

* Refactor TaskManager to store TaskEntry as a struct

* Refactor to fix deadlock between TaskManager and ReferenceCounter
Add references to task specs

* Pin TaskEntries and References in the lineage of any ObjectIDs in scope

* Fix deadlock, convert num_plasma_returns to a set of object IDs

* fix unit tests

* Feature flag

* Do not release lineage for objects that were promoted to plasma

* fix build

* fix build

* Remove num executions

* Remove num executions

* Add pinned locations to ReferenceCounter, empty handler for node death

* Fix num returns for actor tasks, fix Put return value

* Add regression test

* Clear pinned locations and callbacks on node removal

* Clear pinned locations and callbacks on node removal

* Simplify num return values

* Remove unused

* doc

* tmp

* Set num returns

* Move lineage pinning flag to ReferenceCounter

* comments

* Recover from plasma failures by pinning a new copy

* Basic object reconstruction, no concurrent reqs yet

* reconstruction test suite and a few fixes:
- fix for disabling lineage
- fix for updating submitted task refs

* Handle concurrent attempts to recover the same object

* Fix deadlock in DrainAndShutdown

* Revert "[core] Revert lineage pinning (#7499) (#7692)"

This reverts commit ba86a02b37.

* debug rllib

* debug rllib

* turn on all rllib tests again

* debug rllib

* Fix drain bug, check number of pending tasks

* revert rllib debug

* remove todo

* Trigger rllib tests

* revert rllib debug commit

* Split out logic into ObjectRecoveryManager

* Fix python tests

* Refactor to remove dependency on gcs client

* Unit tests

* Move pinned at node ID to direct memory store

* Unit test fixes and lint

* simplify and more tests

* Add ResubmitTask test for TaskManager

* Doc

* fix build

* comments

* Fix

* debug

* Update

* fix

* Fix

* Fix bad status handling, unit test

* Fix build
2020-04-11 16:52:57 -07:00
Stephanie Wang
18e9a076e5
[core] Cancel worker lease requests that are no longer needed (#7929)
* regression test

* Cancel lease requests

* unit tests

* update

* fix build

* Move unit test

* Set success

* Ref to shared_ptr

* debug

* Revert "debug"

This reverts commit 6b2c25805a8223b41ffcc2d88d903e16ea415089.

* Bad move

* Fix bad status handling
2020-04-11 16:51:32 -07:00
fangfengbin
061043229f
[GCS]Optimize gcs client testcases (#7895) 2020-04-09 12:30:58 +08:00
Kai Yang
48b48cc8c2
Support multiple core workers in one process (#7623) 2020-04-07 11:01:47 +08:00
micafan
e91595f955
[GCS] Add ObjectLocator to gcs server (#7557) 2020-04-07 10:37:24 +08:00
Ion
9f6cbf168e
New scheduler local node (#7899) 2020-04-06 14:43:42 -05:00
mehrdadn
203c077895
Switch to Boost generic sockets (#7656)
* Use generic Boost sockets

* Un-templatize server/client connections

Co-authored-by: Mehrdad <noreply@github.com>
2020-04-05 22:26:46 -07:00
micafan
185d591108
No need to send actor died signal from RedisActorInfoAccessor (#7883) 2020-04-03 17:45:39 -07:00
ijrsvt
9bfc2c4b54
Moving Local Mode to C++ (#7670) 2020-04-01 15:50:57 -05:00
micafan
780c1c3b08
[GCS] impl RedisStoreClient for GCS Service (#7675) 2020-04-01 21:18:19 +08:00
fangfengbin
bfb9248532
fix gcs server resolver error (#7822)
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-03-30 22:57:55 -07:00
mehrdadn
8958728139
Windows bug fixes (#7740) 2020-03-30 20:39:23 -05:00
Simon Mo
dc9b62e007
Deserialize Args in Event Loop Thread (#7806) 2020-03-30 18:28:13 -07:00
mehrdadn
f86e623095
Fix & improve GitHub Actions CI builds (#7784) 2020-03-30 16:29:54 -07:00
mehrdadn
fc23f79f82
Windows process issues (#7739) 2020-03-29 12:48:32 -07:00
fangfengbin
6ce8b63bb6
fix TestTaskLeaseRenewal test failure (#7765) 2020-03-29 11:18:47 +08:00
Kai Yang
6a3503c494
Fix reusing the cached hash of nil ID (#7753) 2020-03-27 23:40:03 +08:00
SongGuyang
c195dc8f88
Basic C++ worker implementation (#6125) 2020-03-27 23:01:08 +08:00