* Init commit for async plasma client
* Create an eventloop model for ray/plasma
* Implement a poll-like selector base on `ray.wait`. Huge improvements.
* Allow choosing workers & selectors
* remove original design
* initial implementation of epoll-like selector for plasma
* Add a param for `worker` used in `PlasmaSelectorEventLoop`
* Allow accepting a `Future` which returns object_id
* Do not need `io.py` anymore
* Create a basic testing model
* fix: `ray.wait` returns tuple of lists
* fix a few bugs
* improving performance & bug fixing
* add test
* several improvements & fixing
* fix relative import
* [async] change code format, remove old files
* [async] Create context wrapper for the eventloop
* [async] fix: context should return a value
* [async] Implement futures grouping
* [async] Fix bugs & replace old functions
* [async] Fix bugs found in tests
* [async] Implement `PlasmaEpoll`
* [async] Make test faster, add tests for epoll
* [async] Fix code format
* [async] Add comments for main code.
* [async] Fix import path.
* [async] Fix test.
* [async] Compatibility.
* [async] less verbose to not annoy the CI.
* [async] Add test for new API
* [async] Allow showing debug info in some of the test.
* [async] Fix test.
* [async] Proper shutdown.
* [async] Lint~
* [async] Move files to experimental and create API
* [async] Use async/await syntax
* [async] Fix names & styles
* [async] comments
* [async] bug fixing & use pytest
* [async] bug fixing & change tests
* [async] use logger
* [async] add tests
* [async] lint
* [async] type checking
* [async] add more tests
* [async] fix bugs on waiting a future while timeout. Add more docs.
* [async] Formal docs.
* [async] Add typing info since these codes are compatible with py3.5+.
* [async] Documents.
* [async] Lint.
* [async] Fix deprecated call.
* [async] Fix deprecated call.
* [async] Implement a more reasonable way for dealing with pending inputs.
* [async] Fix docs
* [async] Lint
* [async] Fix bug: Type for time
* [async] Set our eventloop as the default eventloop so that we can get it through `asyncio.get_event_loop()`.
* [async] Update test & docs.
* [async] Lint.
* [async] Temporarily print more debug info.
* [async] Use `Poll` as a default option.
* [async] Limit resources.
* new async implementation for Ray
* implement linked list
* bug fix
* update
* support seamless async operations
* update
* update API
* fix tests
* lint
* bug fix
* refactor names
* improve doc
* properly shutdown async_api
* doc
* Change the table on the index page.
* Adjust table size.
* Only keeps `as_future`.
* change how we init connection
* init connection in `ray.worker.connect`
* doc
* fix
* Move initialization code into the module.
* Fix docs & code
* Update pyarrow version.
* lint
* Restore index.rst
* Add known issues.
* Apply suggestions from code review
Co-Authored-By: suquark <suquark@gmail.com>
* rename
* Update async_api.rst
* Update async_api.py
* Update async_api.rst
* Update async_api.py
* Update worker.py
* Update async_api.rst
* fix tests
* lint
* lint
* replace the magic number
* Make scheduling queues RemoveTasks return task states as well.
* Add test
* Don't unsubscribe for infeasible tasks when spilling over.
* Linting
* Address comments.
Add new search algorithm (genetic) along with the base framework of the searcher (which performs some basic jobs such as logging, recording and organizing in our project).
Note that this is the initial commit. In the following days, we will add example, UT, and other refinements.
* Add signal handlers to improve debuggability.
* Fix Linux compiling
* Fix Lint
* Change SIGILL case that happens in both Linux and MaxOs
* Add signal handler to main functions.
* Change handler name.
* Address comment
* Address comment.
* Fix Linux building failure
* Introduce RAII mechanism to SignalHandlers.
* Add InitShutdownWrapper to handle all RAII requirements
* Change util_test to signal_test
* Make sure shutdown is not nullptr.
* Using google::InstallFailureSignalHandler() instead of our own signal handler
* Refine code addording to comment
* Fix valgrind test failure.
* remove Shutdown template
* consistency
* linting
## What do these changes do?
* distribute load and resource information on a heartbeat
* for each raylet, maintain total and available resource capacity as well as measure of current load
* this PR introduces a new notion of load, defined as a sum of all resource demand induced by queued ready tasks on the local raylet. This provides a heterogeneity-aware measure of load that supersedes legacy Ray's task count as a proxy for load.
* modify the scheduling policy to perform *capacity-based*, *load-aware*, *optimistically concurrent* resource allocation
* perform task spillover to the heartbeating node in response to a heartbeat, implementing heterogeneity-aware late-binding/work-stealing.
* [WIP] Support different backend log lib
* Refine code, unify level, address comment
* Address comment and change formatter
* Fix linux building failure.
* Fix lint
* Remove log4cplus.
* Add log init to raylet main and add test to travis.
* Address comment and refine.
* Update logging_test.cc
* Log a warning on remote object manager failures
* Mark a task that was failed to be forwarded as pending
* Raylet component failure test and make it harder
* Turn on component failure test for xray
* Remove return status from ReleaseSender
* lint
This PR adds a driver table for the new GCS, which enables cleanup functionality associated with monitoring driver death.
Some testing in `monitor_test.py` is restored, but redis sharding for xray is needed to enable remaining tests.
* Fix one of the stress tests, fix ray.global_state.client_table when called early on.
* Re-enable testWait.
* Convert stress_tests.py to pytest.
* Fix
* Allow yapf to lint individual files
* Add tip for using yapf
* Update doc
* Update script to autoformat changed py files
The new default is for the script to only updated changed files to encourage
using it as a pre-push hook. Travis still checks all since it's not that big an
increase to runtime.
* Exclude formatting thirdparty/autogen py files
* Symlink .travis -> scripts
Hidden directories may get glossed over otherwise.
* .travis -> scripts in docs
They are symlinks to the same thing, but `scripts` is more dev-friendly, while
`.travis` is really only for Travis CI.
* Document different yapf format functions
Most devs will only need `format_changed`, and this is run by default.
`format_changed` should be fast enough in most cases to work as a pre-commit
hook.
* Speed up yapf by only formatting changed files
* Update docs
1. Mention how yapf can be used a pre-commit hook
2. rm `bash`, script is executable
* Update yapf.sh
* Update development.rst
* Update yapf.sh
* Use bash arrays for correct argument splitting
Playing fast and loose with whitespace in bash is a terrible idea.
* Only format non-excluded by default
* Check changes against master
Normally, the remote is called `origin`, but naming it explicit
* Adding missing directory to `format_all`
* Cleanup YAPF code
Remove unused function and move around code to make clearer and adding lines
give cleaner diffs.
* Ensure correct files are autoformatted
* Fix cmd line arg splitting
Each arg has to be in its own set of quotes.
* Diff against mergebase
TIL there's a clean syntax for doing that, but it's too clever to belong in a
shell script.
We use `mapfile -t` to ensure no problems down the line with weird filenames.
* Enable java worker support
--------------------------
This commit includes a tailored version of the Java worker implementation from Ant Financial.
The changes for build system, python module, src module and arrow are in other commits, this commit consists of the following modules:
- java/api: Ray API definition
- java/common: utilities
- java/hook: binary rewrite of the Java byte-code for remote execution
- java/runtime-common: common implementation of the runtime in worker
- java/runtime-dev: a pure-java mock implementation of the runtime for fast development
- java/runtime-native: a native implementation of the runtime
- java/test: various tests
Contributors for this work:
Guyang Song, Peng Cao, Senlin Zhu,Xiaoying Chu, Yiming Yu, Yujie Liu, Zhenyu Guo
* change the format of java help document from markdown to RST
* update the vesion of Arrow for java worker
* adapt the new version of plasma java client from arrow which use byte[] instead of custom type
* add java worker test to ci
* add the example module for better usage guide
* Run xray tests in travis.
* Comment out TaskTests.testSubmittingManyTasks.
* Comment out failing tests.
* Comment out hanging test.
* Linting
* Comment out failing test.
* Comment out failing test.
* Ignore test_dataframe.py for now.
* Comment out testDriverExitingQuickly.
adding tests
fixing flake8
adding init
flake 8 on test
fixing tests, imports, and flake8
handling for index
adding tests for row, index
added more robust error handling for axis
fixing test failures
cleaning up error sfor 2.7
updating travis
resolving import
fixing flake8
moved import order
Fixing to refactor and delaying implementing ray-pd inner concat
resolving ray-pd concat and from_pandas mutation
Revert "resolving ray-pd concat and from_pandas mutation"
This reverts commit 5db43e4e89e328286532f3ef98a4526575c5d08d.
* Integrate worker with raylet.
* Begin allowing worker to attach to cluster.
* Fix linting and documentation.
* Fix linting.
* Comment tests back in.
* Fix type of worker command.
* Remove xray python files and tests.
* Fix from rebase.
* Add test.
* Copy over raylet executable.
* Small cleanup.
Summary:
Able to run 1000 tasks with object dependencies on a set of distributed Raylets.
Raylet Changes:
Finalized ClientConnection class.
Task forwarding.
NM-to-NM heartbeats.
NM resource accounting for tasks.
Simple scheduling policy with task forwarding.
Creating and maintaining NM 2 NM long-lived connections and reusing them for task forwarding.
LineageCache Changes:
LineageCache without cleanup of tasks committed by remote nodes.
Lineage cache writeback and cleanup implementation.
ObjectManager Changes:
Object manager event loop/ClientConnection refactor.
Multithreaded object manager (disabled in this PR).
Testing Changes:
Integration tests for task submission on multiple Raylets.
Stress tests for object manager (with GCS and object store integration).
Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>
Co-authored-by: Alexey Tumanov <atumanov@gmail.com>
* Add shell script for building parquet
* Use parquet ci script; remove anaconda
* Remove gcc flag, use default
* add boost_root
* Fix $TP_DIR reference issue
* fix the PR
* check out specific parquet-cpp commit