Commit graph

63 commits

Author SHA1 Message Date
Lixin Wei
2b95e71dac
[Streaming] Test build fixed (#10617) 2020-09-08 14:31:54 +08:00
kisuke95
b7003839bd
[Core] Use core worker options to initialize (#10467)
* fix

* fix

* .
2020-09-07 16:36:43 -07:00
Lixin Wei
f31ee84bfd
[Streaming] Fault Tolerance Implementation (#10595) 2020-09-05 16:40:47 +08:00
SangBin Cho
cb919c5e5c
Revert "[Streaming] Fault Tolerance Implementation (#10008)" (#10582)
This reverts commit 1b1466748f.
2020-09-04 13:21:18 -07:00
Lixin Wei
1b1466748f
[Streaming] Fault Tolerance Implementation (#10008) 2020-09-04 20:44:34 +08:00
Clark Zinzow
0c0b0d0a73
[Core] Added support for submission-time task names. (#10449)
* Added support for submission-time task names.

* Suggestions from code review: add missing consts

Co-authored-by: SangBin Cho <rkooo567@gmail.com>

* Add num_returns arg to actor method options docstring example.

* Add process name line and proctitle assertion to submission-time task name section of advanced docs.

* Add submission-time task name --> proctitle test for Python worker.

* Added Python actor options tests for num_returns and name.

* Added Java test for submission-time task names.

* Add dashboard image to task name docs section.

* Move to fstrings.

Co-authored-by: SangBin Cho <rkooo567@gmail.com>
2020-09-03 11:45:24 -07:00
kisuke95
24a7a8a04d
[Streaming] Build fix (#10233) 2020-08-25 11:37:21 -07:00
Ian Rodney
b14c56e599
fix lint (#10315) 2020-08-25 10:07:20 -07:00
wanxing
e816e3aefb
[Streaming]Streaming queue support failover (#8161) 2020-08-25 14:19:45 +08:00
Barak Michener
8e76796fd0
ci: Redo format.sh --all script & backfill lint fixes (#9956) 2020-08-07 16:49:49 -07:00
SangBin Cho
ec2f1a225e
[Stats] Metrics Export User Interface Part 1 (#9913)
* Metrics export port expose done.

* Support exposing metrics port + metrics agent service discovery through ray.nodes()

* Formatting.

* Added a doc.

* Linting.

* Change the location of metrics agent port.

* Addressed code review.

* Addressed code review.
2020-08-06 16:16:29 -07:00
Siyuan (Ryans) Zhuang
54a0d8b69e
[Core] Try remove all windows compat shims (#9671)
* try remove compat for arrow

* remove unistd.h

* remove socket compat

* delete arrow windows patch
2020-07-25 12:00:36 -07:00
Lingxuan Zuo
58a38e81d1
use help proto-init-macro for streaming config (#9272) 2020-07-24 17:59:33 +08:00
mehrdadn
b14728d999
Shellcheck quoting (#9596)
* Fix SC2006: Use $(...) notation instead of legacy backticked `...`.

* Fix SC2016: Expressions don't expand in single quotes, use double quotes for that.

* Fix SC2046: Quote this to prevent word splitting.

* Fix SC2053: Quote the right-hand side of == in [[ ]] to prevent glob matching.

* Fix SC2068: Double quote array expansions to avoid re-splitting elements.

* Fix SC2086: Double quote to prevent globbing and word splitting.

* Fix SC2102: Ranges can only match single chars (mentioned due to duplicates).

* Fix SC2140: Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A\"B\"C"?

* Fix SC2145: Argument mixes string and array. Use * or separate argument.

* Fix SC2209: warning: Use var=$(command) to assign output (or quote to assign string).

Co-authored-by: Mehrdad <noreply@github.com>
2020-07-21 21:56:41 -05:00
mehrdadn
2554a1a997
Bazel fixes (#9519) 2020-07-19 12:53:08 -07:00
mehrdadn
37942ea1e7
Windows cleanup (#9508)
* Remove unneeded code for Windows

* Get rid of usleep()

* Make platform_shims includes non-transitive

Co-authored-by: Mehrdad <noreply@github.com>
2020-07-17 02:08:15 -07:00
Stephanie Wang
b42d6a1ddc
[core] Refactor task arguments and attach owner address (#9152)
* Add intended worker ID to GetObjectStatus, tests

* Remove TaskID owner_id

* lint

* Add owner address to task args

* Make TaskArg a virtual class, remove multi args

* Set owner address for task args

* merge

* Fix tests

* Fix

* build

* update

* build

* java

* Move code

* build

* Revert "Fix Google log directory again (#9063)"

This reverts commit 275da2e400.

* Fix free

* x

* build

* Fix java

* Revert "Revert "Fix Google log directory again (#9063)""

This reverts commit 4a326fcb148ca09a35bc7de11d89df10edbb56e7.

* lint
2020-07-06 21:25:14 -07:00
Lixin Wei
aea3d53545
[Streaming] Supports multiple downstream collector (#9240) 2020-07-03 11:05:07 +08:00
mehrdadn
7135cb2aec
Fix .exe file extensions (#9197)
Co-authored-by: Mehrdad <noreply@github.com>
2020-07-02 15:29:34 -05:00
mehrdadn
29acf272b7
Build with Visual C++ (#9190)
Co-authored-by: Mehrdad <noreply@github.com>
Co-authored-by: Simon Mo <xmo@berkeley.edu>
2020-07-02 09:34:24 -07:00
Simon Mo
b6d425526d
Move actor task submission to io service (#9093) 2020-06-23 10:07:33 -07:00
Lingxuan Zuo
a734f77757
[Streaming] fix cv hang in multithread variables race (#8984) 2020-06-19 12:16:40 +08:00
chaokunyang
5edddf6eac
[Streaming] operator chain (#8910) 2020-06-18 15:11:07 +08:00
Siyuan (Ryans) Zhuang
b68fede30b
Convert include guard to pragma once (#8957) 2020-06-16 01:29:43 -07:00
mehrdadn
101c215125
Get more tests running on Windows (#6537)
* Get rid of system() calls

* Work around '/usr/share/mini' showing up on GitHub Actions (probably due to psutil truncation)

https://github.com/ray-project/ray/runs/722480047?check_suite_focus=true

* Don't check for socket max path length on Windows

* Don't check for socket existence on Windows

* Fix race condition in Windows fate-sharing

* Work around missing .exe extension for Redis tests

* Add more tests to GitHub Actions

Co-authored-by: Mehrdad <noreply@github.com>
2020-06-12 21:32:10 -07:00
Tianyi Chen
ec5ecb661f
[Streaming] Implement streaming job-worker. (#8780) 2020-06-10 14:13:55 +08:00
chaokunyang
d04953ab3c
[Streaming] Union api (#8612) 2020-06-08 14:28:11 +08:00
Siyuan (Ryans) Zhuang
ea05ebe89e
Ship plasma store with Ray (#7901) 2020-06-03 17:44:34 -07:00
Tao Wang
a1298686d7
[TEST]Use manager class to start/stop components instead of spreading duplicated codes everywhere (#8500) 2020-05-27 16:51:51 +08:00
Kai Yang
2e5e789294
Allow enabling logging in core worker with empty log_dir (#8529) 2020-05-22 18:02:37 +08:00
Stephanie Wang
bd169749e0
Option to retry failed actor tasks (#8330)
* Python

* Consolidate state in the direct actor transport, set the caller starts at

* todo

* Remove unused

* Update and unit tests

* Doc

* Remove unused

* doc

* Remove debug

* Update src/ray/core_worker/transport/direct_actor_transport.h

Co-authored-by: Eric Liang <ekhliang@gmail.com>

* Update src/ray/core_worker/transport/direct_actor_transport.cc

Co-authored-by: Eric Liang <ekhliang@gmail.com>

* lint and fix build

* Update

* Fix build

* Fix tests

* Unit test for max_task_retries=0

* Fix java?

* Fix bad test

* Cross language fix

* fix java

Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-05-15 20:15:15 -07:00
Max Fitton
00325eb2b2
Rename max_reconstructions to max_restarts and use -1 for infinite (#8274)
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
2020-05-14 10:30:29 -05:00
Edward Oakes
2677b71003
Implement named actors using the GCS service (#8328) 2020-05-09 08:58:10 -05:00
mehrdadn
254b1ec370
Set up testing and wheels for Windows on GitHub Actions (#8131)
* Move some Java tests into ci.sh

* Move C++ worker tests into ci.sh

* Define run()

* Prepare to move Python tests into ci.sh

* Fix issues in install-dependencies.sh

* Reload environment for GitHub Actions

* Move wheels to ci.sh and fix related issues

* Don't bypass failures in install-ray.sh anymore

* Make CI a little quieter

* Move linting into ci.sh

* Add vitals test right after build

* Fix os.uname() unavailability on Windows

Co-authored-by: Mehrdad <noreply@github.com>
2020-04-29 21:19:02 -07:00
chaokunyang
91f630f709
[Streaming] Streaming Cross-Lang API (#7464) 2020-04-29 13:42:08 +08:00
chaokunyang
5cf49d5edd
Fix streaming ci (#8159) 2020-04-26 20:56:58 +08:00
ijrsvt
69ff7e3e35
TaskCancellation (#7669)
* Smol comment

* WIP, not passing ray.init

* Fixed small problem

* wip

* Pseudo interrupt things

* Basic prototype operational

* correct proc title

* Mostly done

* Cleanup

* cleaner raylet error

* Cleaning up a few loose ends

* Fixing Race Conds

* Prelim testing

* Fixing comments and adding second_check for kill

* Working_new_impl

* demo_ready

* Fixing my english

* Fixing a few problems

* Small problems

* Cleaning up

* Response to changes

* Fixing error passing

* Merged to master

* fixing lock

* Cleaning up print statements

* Format

* Fixing Unit test build failure

* mock_worker fix

* java_fix

* Canel

* Switching to Cancel

* Responding to Review

* FixFormatting

* Lease cancellation

* FInal comments?

* Moving exist check to CoreWorker

* Fix Actor Transport Test

* Fixing task manager test

* chaning clock repr

* Fix build

* fix white space

* lint fix

* Updating to medium size

* Fixing Java test compilation issue

* lengthen bad timeouts
2020-04-25 16:04:52 -07:00
Clark Zinzow
d4cae5f632
[Core] Added ability to specify different IP addresses for a core worker and its raylet. (#7985) 2020-04-16 10:32:24 -05:00
wanxing
9345d03ffb
[Streaming] Streaming data transfer supports cross language. (#7961)
* add init parameters for java

* fix bug

* cython

* fix compile

* fix test_direct_tranfer

* comment

* ChannelCreationParameter

* fix comment

* builder

* lint and fix tests

* fix single process test

* fix checkstyle and lint

* checkstyle

* lint python

Co-authored-by: wanxing <wanxing@B-458DMD6M-1753.local>
2020-04-16 15:16:48 +08:00
Qing Wang
98bfcd53bc
[Java] Rename group id and package name. (#7864)
* Initial

* Change streaming's

* Fix

* Fix

* Fix org_ray

* Fix cpp file name

* Fix streaming

* Fix

* Fix

* Fix testlistening

* Fix missing sth in python

* Fix

* Fix

* Fix SPI

* Fix

* Fix complation

* Fix

* Fix CI

* Fix checkstyle

Fix checkstyle

* Fix streaming tests

* Fix streaming CI

* Fix streaming checkstyle.

* Fix build

* Fix bazel dep

* Fix

* Fix ray checkstyle

* Fix streaming checkstyle

* Fix bazel checkstyle
2020-04-12 17:59:34 +08:00
Lingxuan Zuo
0d713e3eba
[Streaming] Try to trigger mock transfer tests ci (#7885)
* try to trigger mock transfer tests ci

* execute transfer tests

* show all logs when bazel test streaming

* temporary repeated ci runs

* Revert "temporary repeated ci runs"

This reverts commit dc77d2f9f79b5fa7b490221a8e9089e6349e067d.
2020-04-10 11:56:59 +08:00
fangfengbin
061043229f
[GCS]Optimize gcs client testcases (#7895) 2020-04-09 12:30:58 +08:00
Kai Yang
48b48cc8c2
Support multiple core workers in one process (#7623) 2020-04-07 11:01:47 +08:00
Edward Oakes
8b4f5a9431
Remove non-direct-call code from core worker (#7625) 2020-03-22 19:20:08 -05:00
mehrdadn
a0700e2f86
Change /tmp to platform-specific temporary directory (#7529) 2020-03-16 18:10:14 -07:00
mehrdadn
a87199d240
Fix cyclic dependency between ray/util and ray/common (#7581)
* Fix cyclic dependency

Headers in ray/util should not depend on those in ray/common

* Move random generations to ray/common/test_util.h

* Add license header

Co-authored-by: Mehrdad <noreply@github.com>
Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>
2020-03-14 12:44:53 -07:00
fangfengbin
428fb79b27
Fix streaming compile bug (#7577)
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-03-12 17:26:45 +08:00
chaokunyang
8b6784de06
[Streaming] Streaming Python API (#6755) 2020-02-25 10:33:33 +08:00
Lingxuan Zuo
f995099e00
[Streaming] Support streaming flow control (#7152)
* streaming writer use event driven model.

* add RefreshChannelInfo

* fix name

* minor changes according reviewer comments

* Fix according to reviewer's comments

* fix bazel lint

* code polished

* Add more comments

* rename Stop & Start of EventQueue to Freeze and Unfreeze.

* add override

* fix

* fix return value

* support flow control

* add flow control ut in mock transfer

* minor changes according to comments

* add java and python worker adaption

Co-authored-by: wanxing <wanxing.wwx@alibaba-inc.com>
2020-02-24 23:48:04 +08:00
Stephanie Wang
f76ce836b2
Distributed ref counting for serialized ObjectIDs (#6945)
* Skeleton plus a unit test for simple borrower case

* First unit test passes - forward an ID and task returns with 1 submitted task pending on the inner ID

* Invariant for contained_in

* Unit test passes for testing task return without creating a borrower

* Wrap ref count functionality in test case

* Fix bad delete

* Unit test and fix for borrowers creating more borrowers

* Unit test and fix for simple borrowing, but owner sends call after borrower's ref count goes to 0

* Refactor:
- keep a sentinel ref count for task argument IDs
- keep contained_in_borrowed in addition to contained_in_owned

* Unit test for nested IDs passes

* Refactor so that an object ID can only be contained in 1 borrowed ID at a time

* Add check

* Fix

* Unit test (passes) to test nesting object IDs but no borrowers created

* Unit test for nested objects from different owners passes, refactor to unset contained_in when popping refs

* Unit tests for borrowers receiving an ObjectID from multiple sources,
skip adding ownership info if we already have it to handle duplicate
refs

* Unit test for returning object ID passes

* More unit tests for returning object IDs pass

* Add serialized ID tests

* fix serialization issue

* remove swap

* It builds!

* debugging and some fixes:
- register handler for WaitForRefRemoved
- don't create a python reference for arg IDs
- pass in client factory into ReferenceCounter
- fix bad decrement in PopBorrowerRefs

* Fix accounting for serialized IDs:
- don't decrement for IDs on dependency resolution, wait until task finished
- add object IDs that were inlined when building the arguments to the task spec, pin these on the task executor until task finishes

* mu_ -> mutex_

* lint

* fix build

* clear outer_object_id

* add direct call type check

* Fix test for direct call IDs and return IDs for actor calls

* Fix CoreWorkerClient.Addr()

* Remove unneeded lock

* Remove unnecessary ObjectID refs

* Fix worker holding serialized refs test

* Fix hex IDs

* fix

* fix tests

* fix tests

* refactor and cleanups

* lint

* Put inlined Ids in task args and some cleanup

* Add back gc.collect() line for test case

* Refactor and fixes:
- store inlined IDs in RayObject
- allow storing objects with inlined IDs in memory store
- pin objects that were promoted to plasma

* oops

* make sure worker ID is set in address, pass in rpc::Address to CoreWorkerClient

* todos

* cleanups and test builds

* Fix tests

* Add feature flag

* cleanups

* address comments and some cleanups

* cleanup

* fix recursive test

* Comments for tests

* Turn off ref counting by default

* Skip tests

* Fix some bugs for test_array.py, java build

* Don't include nested objects in the ref count when the feature flag is off

* C++ feature flag does not work...

* Remove

* Turn on python tests and add a warning when plasma objects are evicted before being pinned

* Fix build and remove irrelevant test

* Fix for java

* Revert "Fix build and remove irrelevant test"

This reverts commit 056cca9b263ed05b0f9ab2250907338edcbca2d5.

* Fix ray.internal.free

* Fixes and skip some flaky tests

* fix java build

* fix windows build

* Add IDs contained in owned objects

* Update src/ray/protobuf/core_worker.proto

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* Update src/ray/core_worker/reference_count.cc

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* Update src/ray/protobuf/core_worker.proto

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* Update src/ray/protobuf/core_worker.proto

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* Update src/ray/core_worker/reference_count.h

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* Update src/ray/core_worker/reference_count.h

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* Update src/ray/core_worker/reference_count.cc

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* Apply suggestions from code review

Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>

* update

* Try to fix ::test_direct_call_serialized_id_eviction

Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
2020-02-18 18:21:34 -08:00