* Add signal handlers to improve debuggability.
* Fix Linux compiling
* Fix Lint
* Change SIGILL case that happens in both Linux and MaxOs
* Add signal handler to main functions.
* Change handler name.
* Address comment
* Address comment.
* Fix Linux building failure
* Introduce RAII mechanism to SignalHandlers.
* Add InitShutdownWrapper to handle all RAII requirements
* Change util_test to signal_test
* Make sure shutdown is not nullptr.
* Using google::InstallFailureSignalHandler() instead of our own signal handler
* Refine code addording to comment
* Fix valgrind test failure.
* remove Shutdown template
* consistency
* linting
## What do these changes do?
* distribute load and resource information on a heartbeat
* for each raylet, maintain total and available resource capacity as well as measure of current load
* this PR introduces a new notion of load, defined as a sum of all resource demand induced by queued ready tasks on the local raylet. This provides a heterogeneity-aware measure of load that supersedes legacy Ray's task count as a proxy for load.
* modify the scheduling policy to perform *capacity-based*, *load-aware*, *optimistically concurrent* resource allocation
* perform task spillover to the heartbeating node in response to a heartbeat, implementing heterogeneity-aware late-binding/work-stealing.
* [WIP] Support different backend log lib
* Refine code, unify level, address comment
* Address comment and change formatter
* Fix linux building failure.
* Fix lint
* Remove log4cplus.
* Add log init to raylet main and add test to travis.
* Address comment and refine.
* Update logging_test.cc
* Log a warning on remote object manager failures
* Mark a task that was failed to be forwarded as pending
* Raylet component failure test and make it harder
* Turn on component failure test for xray
* Remove return status from ReleaseSender
* lint
This PR adds a driver table for the new GCS, which enables cleanup functionality associated with monitoring driver death.
Some testing in `monitor_test.py` is restored, but redis sharding for xray is needed to enable remaining tests.
* Fix one of the stress tests, fix ray.global_state.client_table when called early on.
* Re-enable testWait.
* Convert stress_tests.py to pytest.
* Fix
* Allow yapf to lint individual files
* Add tip for using yapf
* Update doc
* Update script to autoformat changed py files
The new default is for the script to only updated changed files to encourage
using it as a pre-push hook. Travis still checks all since it's not that big an
increase to runtime.
* Exclude formatting thirdparty/autogen py files
* Symlink .travis -> scripts
Hidden directories may get glossed over otherwise.
* .travis -> scripts in docs
They are symlinks to the same thing, but `scripts` is more dev-friendly, while
`.travis` is really only for Travis CI.
* Document different yapf format functions
Most devs will only need `format_changed`, and this is run by default.
`format_changed` should be fast enough in most cases to work as a pre-commit
hook.
* Speed up yapf by only formatting changed files
* Update docs
1. Mention how yapf can be used a pre-commit hook
2. rm `bash`, script is executable
* Update yapf.sh
* Update development.rst
* Update yapf.sh
* Use bash arrays for correct argument splitting
Playing fast and loose with whitespace in bash is a terrible idea.
* Only format non-excluded by default
* Check changes against master
Normally, the remote is called `origin`, but naming it explicit
* Adding missing directory to `format_all`
* Cleanup YAPF code
Remove unused function and move around code to make clearer and adding lines
give cleaner diffs.
* Ensure correct files are autoformatted
* Fix cmd line arg splitting
Each arg has to be in its own set of quotes.
* Diff against mergebase
TIL there's a clean syntax for doing that, but it's too clever to belong in a
shell script.
We use `mapfile -t` to ensure no problems down the line with weird filenames.
* Enable java worker support
--------------------------
This commit includes a tailored version of the Java worker implementation from Ant Financial.
The changes for build system, python module, src module and arrow are in other commits, this commit consists of the following modules:
- java/api: Ray API definition
- java/common: utilities
- java/hook: binary rewrite of the Java byte-code for remote execution
- java/runtime-common: common implementation of the runtime in worker
- java/runtime-dev: a pure-java mock implementation of the runtime for fast development
- java/runtime-native: a native implementation of the runtime
- java/test: various tests
Contributors for this work:
Guyang Song, Peng Cao, Senlin Zhu,Xiaoying Chu, Yiming Yu, Yujie Liu, Zhenyu Guo
* change the format of java help document from markdown to RST
* update the vesion of Arrow for java worker
* adapt the new version of plasma java client from arrow which use byte[] instead of custom type
* add java worker test to ci
* add the example module for better usage guide
* Run xray tests in travis.
* Comment out TaskTests.testSubmittingManyTasks.
* Comment out failing tests.
* Comment out hanging test.
* Linting
* Comment out failing test.
* Comment out failing test.
* Ignore test_dataframe.py for now.
* Comment out testDriverExitingQuickly.
adding tests
fixing flake8
adding init
flake 8 on test
fixing tests, imports, and flake8
handling for index
adding tests for row, index
added more robust error handling for axis
fixing test failures
cleaning up error sfor 2.7
updating travis
resolving import
fixing flake8
moved import order
Fixing to refactor and delaying implementing ray-pd inner concat
resolving ray-pd concat and from_pandas mutation
Revert "resolving ray-pd concat and from_pandas mutation"
This reverts commit 5db43e4e89e328286532f3ef98a4526575c5d08d.
* Integrate worker with raylet.
* Begin allowing worker to attach to cluster.
* Fix linting and documentation.
* Fix linting.
* Comment tests back in.
* Fix type of worker command.
* Remove xray python files and tests.
* Fix from rebase.
* Add test.
* Copy over raylet executable.
* Small cleanup.
Summary:
Able to run 1000 tasks with object dependencies on a set of distributed Raylets.
Raylet Changes:
Finalized ClientConnection class.
Task forwarding.
NM-to-NM heartbeats.
NM resource accounting for tasks.
Simple scheduling policy with task forwarding.
Creating and maintaining NM 2 NM long-lived connections and reusing them for task forwarding.
LineageCache Changes:
LineageCache without cleanup of tasks committed by remote nodes.
Lineage cache writeback and cleanup implementation.
ObjectManager Changes:
Object manager event loop/ClientConnection refactor.
Multithreaded object manager (disabled in this PR).
Testing Changes:
Integration tests for task submission on multiple Raylets.
Stress tests for object manager (with GCS and object store integration).
Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>
Co-authored-by: Alexey Tumanov <atumanov@gmail.com>
* Add shell script for building parquet
* Use parquet ci script; remove anaconda
* Remove gcc flag, use default
* add boost_root
* Fix $TP_DIR reference issue
* fix the PR
* check out specific parquet-cpp commit
* patch up pbt
* Sat Jan 27 01:00:03 PST 2018
* Sat Jan 27 01:04:14 PST 2018
* Sat Jan 27 01:04:21 PST 2018
* Sat Jan 27 01:15:15 PST 2018
* Sat Jan 27 01:15:42 PST 2018
* Sat Jan 27 01:16:14 PST 2018
* Sat Jan 27 01:38:42 PST 2018
* Sat Jan 27 01:39:21 PST 2018
* add pbt
* Sat Jan 27 01:41:19 PST 2018
* Sat Jan 27 01:44:21 PST 2018
* Sat Jan 27 01:45:46 PST 2018
* Sat Jan 27 16:54:42 PST 2018
* Sat Jan 27 16:57:53 PST 2018
* clean up test
* Sat Jan 27 18:01:15 PST 2018
* Sat Jan 27 18:02:54 PST 2018
* Sat Jan 27 18:11:18 PST 2018
* Sat Jan 27 18:11:55 PST 2018
* Sat Jan 27 18:14:09 PST 2018
* review
* try out a ppo example
* some tweaks to ppo example
* add postprocess hook
* Sun Jan 28 15:00:40 PST 2018
* clean up custom explore fn
* Sun Jan 28 15:10:21 PST 2018
* Sun Jan 28 15:14:53 PST 2018
* Sun Jan 28 15:17:04 PST 2018
* Sun Jan 28 15:33:13 PST 2018
* Sun Jan 28 15:56:40 PST 2018
* Sun Jan 28 15:57:36 PST 2018
* Sun Jan 28 16:00:35 PST 2018
* Sun Jan 28 16:02:58 PST 2018
* Sun Jan 28 16:29:50 PST 2018
* Sun Jan 28 16:30:36 PST 2018
* Sun Jan 28 16:31:44 PST 2018
* improve tune doc
* concepts
* update humanoid
* Fri Feb 2 18:03:33 PST 2018
* fix example
* show error file
* Bring cloudpickle version 0.5.2 inside the repo.
* Use internal copy of cloudpickle everywhere.
* Fix linting.
* Import ordering.
* Change __init__.py.
* Set pickler in serialization context.
* Don't check ray location.
* Adding dataframe object and minor APIs
* Adding reduce functionality
* Adding some print and making reduce work on current Ray
* Cleanup
* Added new functionality and docs.
* Adding more functionality.
* New functionality with older cleanup
* Complying with flake8 formatting
* Added tests and addressed reviewer comments
* Complying with flake8.
* Adding pandas to travis and requirements doc
* Fixing flake8 failures
* Fixing flake8 errors from imports
* Fixing import error
* Fixing import errors
* Addressing reviewer comments
* Addressing lint error
* Add basic functionality for Cython functions and actors
* Fix up per @pcmoritz comments
* Fixes per @richardliaw comments
* Fixes per @robertnishihara comments
* Forgot double quotes when updating masked_log
* Remove import typing for Python 2 compatibility
* Comment out local scheduler valgrind test.
* Fix free/delete error.
* More free -> delete errors
* One more free -> delete and also clean up callback state in plasma manager.
* Add set -x to run_valgrind scripts.
* Fix valgrind error in CreateLocalSchedulerInfoMessage.