Commit graph

1419 commits

Author SHA1 Message Date
mehrdadn
7c52359b00
Fix Windows build (#7987)
Co-authored-by: Mehrdad <noreply@github.com>
2020-04-12 13:29:48 -07:00
Qing Wang
98bfcd53bc
[Java] Rename group id and package name. (#7864)
* Initial

* Change streaming's

* Fix

* Fix

* Fix org_ray

* Fix cpp file name

* Fix streaming

* Fix

* Fix

* Fix testlistening

* Fix missing sth in python

* Fix

* Fix

* Fix SPI

* Fix

* Fix complation

* Fix

* Fix CI

* Fix checkstyle

Fix checkstyle

* Fix streaming tests

* Fix streaming CI

* Fix streaming checkstyle.

* Fix build

* Fix bazel dep

* Fix

* Fix ray checkstyle

* Fix streaming checkstyle

* Fix bazel checkstyle
2020-04-12 17:59:34 +08:00
mehrdadn
07002825aa
Proper command-line parsing (#7603)
* Command-line parsing functions

* Work around bug in MSVCRT for passing command-lines to programs

* Polishing

* Fix std::regex_replace() overload compatibility issue with GCC 4.8.x

* Try to work around linker error

* Implement ScanToken()

* Parse command-lines via ScanToken

* Merge src/ray/util.cc and src/ray/url.cc

Co-authored-by: Mehrdad <noreply@github.com>
2020-04-11 23:07:07 -07:00
Stephanie Wang
d7eef808b8
[core] Reconstruction for lost plasma objects (#7733)
* Add a lineage_ref_count to References

* Refactor TaskManager to store TaskEntry as a struct

* Refactor to fix deadlock between TaskManager and ReferenceCounter
Add references to task specs

* Pin TaskEntries and References in the lineage of any ObjectIDs in scope

* Fix deadlock, convert num_plasma_returns to a set of object IDs

* fix unit tests

* Feature flag

* Do not release lineage for objects that were promoted to plasma

* fix build

* fix build

* Remove num executions

* Remove num executions

* Add pinned locations to ReferenceCounter, empty handler for node death

* Fix num returns for actor tasks, fix Put return value

* Add regression test

* Clear pinned locations and callbacks on node removal

* Clear pinned locations and callbacks on node removal

* Simplify num return values

* Remove unused

* doc

* tmp

* Set num returns

* Move lineage pinning flag to ReferenceCounter

* comments

* Recover from plasma failures by pinning a new copy

* Basic object reconstruction, no concurrent reqs yet

* reconstruction test suite and a few fixes:
- fix for disabling lineage
- fix for updating submitted task refs

* Handle concurrent attempts to recover the same object

* Fix deadlock in DrainAndShutdown

* Revert "[core] Revert lineage pinning (#7499) (#7692)"

This reverts commit ba86a02b37.

* debug rllib

* debug rllib

* turn on all rllib tests again

* debug rllib

* Fix drain bug, check number of pending tasks

* revert rllib debug

* remove todo

* Trigger rllib tests

* revert rllib debug commit

* Split out logic into ObjectRecoveryManager

* Fix python tests

* Refactor to remove dependency on gcs client

* Unit tests

* Move pinned at node ID to direct memory store

* Unit test fixes and lint

* simplify and more tests

* Add ResubmitTask test for TaskManager

* Doc

* fix build

* comments

* Fix

* debug

* Update

* fix

* Fix

* Fix bad status handling, unit test

* Fix build
2020-04-11 16:52:57 -07:00
Stephanie Wang
18e9a076e5
[core] Cancel worker lease requests that are no longer needed (#7929)
* regression test

* Cancel lease requests

* unit tests

* update

* fix build

* Move unit test

* Set success

* Ref to shared_ptr

* debug

* Revert "debug"

This reverts commit 6b2c25805a8223b41ffcc2d88d903e16ea415089.

* Bad move

* Fix bad status handling
2020-04-11 16:51:32 -07:00
fangfengbin
061043229f
[GCS]Optimize gcs client testcases (#7895) 2020-04-09 12:30:58 +08:00
Kai Yang
48b48cc8c2
Support multiple core workers in one process (#7623) 2020-04-07 11:01:47 +08:00
micafan
e91595f955
[GCS] Add ObjectLocator to gcs server (#7557) 2020-04-07 10:37:24 +08:00
Ion
9f6cbf168e
New scheduler local node (#7899) 2020-04-06 14:43:42 -05:00
mehrdadn
203c077895
Switch to Boost generic sockets (#7656)
* Use generic Boost sockets

* Un-templatize server/client connections

Co-authored-by: Mehrdad <noreply@github.com>
2020-04-05 22:26:46 -07:00
micafan
185d591108
No need to send actor died signal from RedisActorInfoAccessor (#7883) 2020-04-03 17:45:39 -07:00
ijrsvt
9bfc2c4b54
Moving Local Mode to C++ (#7670) 2020-04-01 15:50:57 -05:00
micafan
780c1c3b08
[GCS] impl RedisStoreClient for GCS Service (#7675) 2020-04-01 21:18:19 +08:00
fangfengbin
bfb9248532
fix gcs server resolver error (#7822)
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-03-30 22:57:55 -07:00
mehrdadn
8958728139
Windows bug fixes (#7740) 2020-03-30 20:39:23 -05:00
Simon Mo
dc9b62e007
Deserialize Args in Event Loop Thread (#7806) 2020-03-30 18:28:13 -07:00
mehrdadn
f86e623095
Fix & improve GitHub Actions CI builds (#7784) 2020-03-30 16:29:54 -07:00
mehrdadn
fc23f79f82
Windows process issues (#7739) 2020-03-29 12:48:32 -07:00
fangfengbin
6ce8b63bb6
fix TestTaskLeaseRenewal test failure (#7765) 2020-03-29 11:18:47 +08:00
Kai Yang
6a3503c494
Fix reusing the cached hash of nil ID (#7753) 2020-03-27 23:40:03 +08:00
SongGuyang
c195dc8f88
Basic C++ worker implementation (#6125) 2020-03-27 23:01:08 +08:00
fangfengbin
e196fcdbaf
Add gcs_service_enabled function to avoid getting environment variable directly (#7742) 2020-03-26 22:02:53 +08:00
Eric Liang
23b6fdcda1
ray memory should collect statistics from all nodes (#7721) 2020-03-25 16:31:31 -07:00
Stephanie Wang
46404d8a0b
[core] Pin lineage of plasma objects that are still in scope (#7690)
* Fix deadlock in DrainAndShutdown

* Revert "[core] Revert lineage pinning (#7499) (#7692)"

This reverts commit ba86a02b37.

* debug rllib

* debug rllib

* turn on all rllib tests again

* debug rllib

* Fix drain bug, check number of pending tasks

* revert rllib debug

* remove todo

* Trigger rllib tests

* revert rllib debug commit
2020-03-25 09:29:32 -07:00
Stephanie Wang
a1cee6af7b
Revert "New scheduler local node (#7441)" (#7732)
This reverts commit 6141fdab95.
2020-03-24 18:32:16 -07:00
Ion
6141fdab95
New scheduler local node (#7441) 2020-03-24 13:59:50 -05:00
fangfengbin
bf866de6fd
Enable GCS Service by default (#7541) 2020-03-24 14:20:23 +08:00
mehrdadn
b4030cdbbe
File HANDLE/descriptor translation layer for Windows (#7657)
* Use TCP sockets on Windows with custom HANDLE <-> FD translation layer

* Get Plasma working on Windows

Co-authored-by: Mehrdad <noreply@github.com>
2020-03-23 21:08:25 -07:00
Edward Oakes
9318b29f5e
Remove is_direct logic from the raylet (#7698) 2020-03-23 17:09:35 -05:00
Stephanie Wang
7f38cc1d03
Debug statements and increase timeout for test array (#7713) 2020-03-23 13:02:14 -07:00
ZhuSenlin
74825db804
Fix TestGcsRedisFailureDetector (#7710)
* fix test_gcs_redis_failure_detector

* fix test_gcs_redis_failure_detector

Co-authored-by: senlin.zsl <senlin.zsl@antfin.com>
2020-03-23 22:48:53 +08:00
ZhuSenlin
039961b63a
rename ActorTable to LogBasedActorTable and add new ActorTable (#7643) 2020-03-23 15:05:43 +08:00
Edward Oakes
8b4f5a9431
Remove non-direct-call code from core worker (#7625) 2020-03-22 19:20:08 -05:00
Stephanie Wang
ba86a02b37
[core] Revert lineage pinning (#7499) (#7692)
* Revert "fix (#7681)"

This reverts commit 6a12a31b2e.

* Revert "[core] Pin lineage of plasma objects that are still in scope (#7499)"

This reverts commit 014929e658.
2020-03-21 18:35:43 -07:00
Zhijun Fu
a7a5d172b1
[core] fix bug that actor tasks from reconstructed actor is ignored by scheduling queue (#7637) 2020-03-21 13:05:24 +08:00
Stephanie Wang
6a12a31b2e
fix (#7681) 2020-03-20 18:53:28 -07:00
Stephanie Wang
014929e658
[core] Pin lineage of plasma objects that are still in scope (#7499)
* Add a lineage_ref_count to References

* Refactor TaskManager to store TaskEntry as a struct

* Refactor to fix deadlock between TaskManager and ReferenceCounter
Add references to task specs

* Pin TaskEntries and References in the lineage of any ObjectIDs in scope

* Fix deadlock, convert num_plasma_returns to a set of object IDs

* fix unit tests

* Feature flag

* Do not release lineage for objects that were promoted to plasma

* fix build

* fix build

* Remove num executions

* Simplify num return values

* Remove unused

* doc

* Set num returns

* Move lineage pinning flag to ReferenceCounter

* comments

* Fixes

* Remove irrelevant test (replaced by ref counting tests)
2020-03-20 10:56:43 -07:00
mehrdadn
e69664b74b
Miscellaneous Windows compatibility bugfixes (#7658)
* Windows compatibility bug fixes

* Use WSASend/WSARecv as WSASendMsg/WSARecvMsg do not work with TCP sockets

* Clean up some TODOs

* Fix duplicate compilations

* RedisAsioClient boost::asio::error::connection_reset

Co-authored-by: Mehrdad <noreply@github.com>
2020-03-19 19:32:53 -07:00
Stephanie Wang
c7cae036c3
[core] Only drain references for non-actor workers on shutdown (#7668)
* Only drain ref counter for non-actor tasks

* Don't force kill actors that have gone out of scope
2020-03-19 18:46:16 -07:00
fangfengbin
0d0a41f598
[GCS]Tie lifecycle of gcs service and redis together (#7601) 2020-03-19 19:52:35 +08:00
Stephanie Wang
b499100a88
Enable distributed ref counting by default (#7628)
* enable

* Turn on eager eviction

* Shorten tests and drain ReferenceCounter

* Don't force kill actor handles that have gone out of scope, lint

* Fix locks

* Cleanup Plasma Async Callback (#7452)

* [rllib][tune] fix some nans (#7611)

* Change /tmp to platform-specific temporary directory (#7529)

* [Serve] UI Improvements (#7569)

* bugfix about test_dynres.py (#7615)

Co-authored-by: senlin.zsl <senlin.zsl@antfin.com>

* Java call Python actor method use actor.call (#7614)

* bug fix about useage of absl::flat_hash_map::erase and absl::flat_hash_set::erase (#7633)

Co-authored-by: senlin.zsl <senlin.zsl@antfin.com>

* [Java] Make both `RayActor` and `RayPyActor` inheriting from `BaseActor` (#7462)

* [Java] Fix the issue that the cached value in `RayObject` is serialized (#7613)

* Add failure tests to test_reference_counting (#7400)

* Fix typo in asyncio documentation (#7602)

* Fix segfault

* debug

* Force kill actor

* Fix test
2020-03-18 22:39:21 -07:00
Stephanie Wang
35a4bfc885
[core] Fix leak for subscribing to object dependencies in NodeManager (#7630)
* Fix GetDependencies

* lint
2020-03-18 11:01:29 -07:00
Eric Liang
745b9d643d
First pass at ray memory command for memory debugging (#7589) 2020-03-17 20:45:07 -07:00
Edward Oakes
c1b0f9ccdf
Add failure tests to test_reference_counting (#7400) 2020-03-17 10:30:21 -05:00
ZhuSenlin
dfa5d9b8e9
bug fix about useage of absl::flat_hash_map::erase and absl::flat_hash_set::erase (#7633)
Co-authored-by: senlin.zsl <senlin.zsl@antfin.com>
2020-03-17 19:39:56 +08:00
ZhuSenlin
ffa9df4683
bugfix about test_dynres.py (#7615)
Co-authored-by: senlin.zsl <senlin.zsl@antfin.com>
2020-03-17 13:58:44 +08:00
mehrdadn
a0700e2f86
Change /tmp to platform-specific temporary directory (#7529) 2020-03-16 18:10:14 -07:00
fangfengbin
6b37be9677
[GCS]Add job id when operating gcs table (#7592) 2020-03-15 12:04:04 +08:00
Kai Yang
630e48967d
[Java] Allow passing internal config from raylet to Java worker (#7532) 2020-03-15 12:03:38 +08:00
mehrdadn
a87199d240
Fix cyclic dependency between ray/util and ray/common (#7581)
* Fix cyclic dependency

Headers in ray/util should not depend on those in ray/common

* Move random generations to ray/common/test_util.h

* Add license header

Co-authored-by: Mehrdad <noreply@github.com>
Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>
2020-03-14 12:44:53 -07:00