Commit graph

260 commits

Author SHA1 Message Date
Robert Nishihara
60b22d9a72 Don't unsubscribe dependencies for infeasible tasks. (#3338)
* Make scheduling queues RemoveTasks return task states as well.

* Add test

* Don't unsubscribe for infeasible tasks when spilling over.

* Linting

* Address comments.
2018-11-16 11:33:00 -08:00
Robert Nishihara
658c14282c Remove legacy Ray code. (#3121)
* Remove legacy Ray code.

* Fix cmake and simplify monitor.

* Fix linting

* Updates

* Fix

* Implement some methods.

* Remove more plasma manager references.

* Fix

* Linting

* Fix

* Fix

* Make sure class IDs are strings.

* Some path fixes

* Fix

* Path fixes and update arrow

* Fixes.

* linting

* Fixes

* Java fixes

* Some java fixes

* TaskLanguage -> Language

* Minor

* Fix python test and remove unused method signature.

* Fix java tests

* Fix jenkins tests

* Remove commented out code.
2018-10-26 13:36:58 -07:00
Robert Nishihara
5aa29613db Fix linting errors. (#3127) 2018-10-24 16:30:00 -07:00
Robert Nishihara
9c1826ed69 Use XRay backend by default. (#3020)
* Use XRay backend by default.

* Remove irrelevant valgrind tests.

* Fix

* Move tests around.

* Fix

* Fix test

* Fix test.

* String/unicode fix.

* Fix test

* Fix unicode issue.

* Minor changes

* Fix bug in test_global_state.py.

* Fix test.

* Linting

* Try arrow change and other object manager changes.

* Use newer plasma client API

* Small updates

* Revert plasma client api change.

* Update

* Update arrow and allow SendObjectHeaders to fail.

* Update arrow

* Update python/ray/experimental/state.py

Co-Authored-By: robertnishihara <robertnishihara@gmail.com>

* Address comments.
2018-10-23 12:46:39 -07:00
Richard Liaw
40c4148d4f Cluster Utilities for Fault Tolerance Tests (#3008) 2018-10-20 22:56:29 -07:00
Eric Liang
9d23fa03c9 [xray] All messages on main asio event loop should be written asynchronously (#3023)
* copy over ref code

* wip async writes

* compiles

* fix error handling

* add test

* amend

* fix test

* clang fmgt

* clang format

* wip

* yapf

* rename format script

* test error

* clangfmt

* add test to list

* warn

* ref test

* fix test

* comment

* add capture

* Update client_connection.cc

* wip

* fix compile
2018-10-18 21:56:22 -07:00
Peter Schafhalter
a41bbc10ef Add password authentication to Redis ports (#2952)
* Implement Redis authentication

* Throw exception for legacy Ray

* Add test

* Formatting

* Fix bugs in CLI

* Fix bugs in Raylet

* Move default password to constants.h

* Use pytest.fixture

* Fix bug

* Authenticate using formatted strings

* Add missing passwords

* Add test

* Improve authentication of async contexts

* Disable Redis authentication for credis

* Update test for credis

* Fix rebase artifacts

* Fix formatting

* Add workaround for issue #3045

* Increase timeout for test

* Improve C++ readability

* Fixes for CLI

* Add security docs

* Address comments

* Address comments

* Adress comments

* Use ray.get

* Fix lint
2018-10-16 22:48:30 -07:00
Si-Yuan
cc7e2ecdd5 Change logfile names and also allow plasma store socket to be passed in. (#2862) 2018-10-03 10:03:53 -07:00
Robert Nishihara
ed6289771a Convert runtest.py to use pytest. (#2966)
* Convert runtest.py to use pytest.

* Linting.

* Fix

* Fix

* Fix

* Fix
2018-09-30 07:59:44 -07:00
old-bear
f3c1194be3 [tune] Add AutoML algorithm of GeneticSearcher (#2699)
Add new search algorithm (genetic) along with the base framework of the searcher (which performs some basic jobs such as logging, recording and organizing in our project).
Note that this is the initial commit. In the following days, we will add example, UT, and other refinements.
2018-09-12 09:17:04 -07:00
Peter Schafhalter
5da6e78db1 Add available resources to global state (#2501) 2018-09-10 15:46:32 -07:00
Richard Liaw
af1fdc826e Pin YAPF in Travis lint build (#2848)
Avoid needing to reformat everything all the time.
2018-09-09 15:54:46 -07:00
Robert Nishihara
e5fd1d55a1 Ignore failing global sheduler valgrind test. (#2805) 2018-09-02 15:23:32 -07:00
Yuhong Guo
2691b3a11a Add signal handlers to improve debuggability (#2757)
* Add signal handlers to improve debuggability.

* Fix Linux compiling

* Fix Lint

* Change SIGILL case that happens in both Linux and MaxOs

* Add signal handler to main functions.

* Change handler name.

* Address comment

* Address comment.

* Fix Linux building failure

* Introduce RAII mechanism to SignalHandlers.

* Add InitShutdownWrapper to handle all RAII requirements

* Change util_test to signal_test

* Make sure shutdown is not nullptr.

* Using google::InstallFailureSignalHandler() instead of our own signal handler

* Refine code addording to comment

* Fix valgrind test failure.

* remove Shutdown template

* consistency

* linting
2018-09-01 21:58:23 -07:00
Alexey Tumanov
de047daea7 [xray] raylet scheduling mechanism with a simple spillback policy (#2749)
## What do these changes do?
* distribute load and resource information on a heartbeat
* for each raylet, maintain total and available resource capacity as well as measure of current load
* this PR introduces a new notion of load, defined as a sum of all resource demand induced by queued ready tasks on the local raylet. This provides a heterogeneity-aware measure of load that supersedes legacy Ray's task count as a proxy for load.
* modify the scheduling policy to perform *capacity-based*, *load-aware*, *optimistically concurrent* resource allocation
* perform task spillover to the heartbeating node in response to a heartbeat, implementing  heterogeneity-aware late-binding/work-stealing.
2018-08-28 00:03:34 -07:00
Yuhong Guo
eec1a3eb89 Support pluggable backend log lib with glog (#2695)
* [WIP] Support different backend log lib

* Refine code, unify level, address comment

* Address comment and change formatter

* Fix linux building failure.

* Fix lint

* Remove log4cplus.

* Add log init to raylet main and add test to travis.

* Address comment and refine.

* Update logging_test.cc
2018-08-23 09:43:38 -07:00
joyyoj
38867eea4e [tune] Cross-Framework Compatibility (#2646)
This commit is a first pass at restructuring the Trial execution logic to support running on multiple frameworks.
2018-08-22 10:55:45 -07:00
Stephanie Wang
4a7be6f46d [xray] Make sure raylet does not crash if remote raylet dies (#2619)
* Log a warning on remote object manager failures

* Mark a task that was failed to be forwarded as pending

* Raylet component failure test and make it harder

* Turn on component failure test for xray

* Remove return status from ReleaseSender

* lint
2018-08-09 20:36:30 -07:00
Melih Elibol
8ae82180b4 [xray] Adds a driver table. (#2289)
This PR adds a driver table for the new GCS, which enables cleanup functionality associated with monitoring driver death.

Some testing in `monitor_test.py` is restored, but redis sharding for xray is needed to enable remaining tests.
2018-08-08 23:41:40 -07:00
Richard Liaw
bb44456f6f
[rllib, tune] TrainingResult -> Dict, Removes C408 from flake8 (#2565) 2018-08-07 12:17:44 -07:00
Richard Liaw
914a433e3f
[tune] Split Search from Scheduling (#2452)
Introduces SearchAlgorithm concept, separate from schedulers in Tune. Moves HyperOpt under this concept.
2018-08-04 21:27:39 -07:00
Stephanie Wang
a45f9cfafc [xray] Implement task lease table, logic for deciding when to reconstruct a task (#2497) 2018-07-30 14:42:28 -07:00
Philipp Moritz
696a229ece Fix text verbosity in python 2.7 by running tests with pytest (#2470) 2018-07-30 11:04:06 -07:00
Peter Schafhalter
2a3b02649a Add queue test to xray tests (#2433) 2018-07-19 17:18:13 -07:00
Peter Schafhalter
f5c46c7765 Add queue data structures (#2261) 2018-07-16 16:26:20 -07:00
Robert Nishihara
e3534c46df [xray] Re-enable some stress tests and convert stress_tests to pytest. (#2285)
* Fix one of the stress tests, fix ray.global_state.client_table when called early on.

* Re-enable testWait.

* Convert stress_tests.py to pytest.

* Fix
2018-07-06 23:21:00 -07:00
Devin Petersohn
4185aaed10 Dataframe deprecation (#2353) 2018-07-06 00:16:22 -07:00
Philipp Moritz
762bdf646e [xray] Put GCS data into the redis data shard (#2298) 2018-06-30 15:42:10 -10:00
Richard Liaw
e657497225
[xray] Fix tune tests (#2305)
* fix xray tests

* yapf

* unleash tests
2018-06-26 23:56:23 -07:00
Alok Singh
42a9233e1d Improve yapf speed and document its usage (#2160)
* Allow yapf to lint individual files

* Add tip for using yapf

* Update doc

* Update script to autoformat changed py files

The new default is for the script to only updated changed files to encourage
using it as a pre-push hook. Travis still checks all since it's not that big an
increase to runtime.

* Exclude formatting thirdparty/autogen py files

* Symlink .travis -> scripts

Hidden directories may get glossed over otherwise.

* .travis -> scripts in docs

They are symlinks to the same thing, but `scripts` is more dev-friendly, while
`.travis` is really only for Travis CI.

* Document different yapf format functions

Most devs will only need `format_changed`, and this is run by default.
`format_changed` should be fast enough in most cases to work as a pre-commit
hook.

* Speed up yapf by only formatting changed files

* Update docs

1. Mention how yapf can be used a pre-commit hook
2. rm `bash`, script is executable

* Update yapf.sh

* Update development.rst

* Update yapf.sh

* Use bash arrays for correct argument splitting

Playing fast and loose with whitespace in bash is a terrible idea.

* Only format non-excluded by default

* Check changes against master

Normally, the remote is called `origin`, but naming it explicit

* Adding missing directory to `format_all`

* Cleanup YAPF code

Remove unused function and move around code to make clearer and adding lines
give cleaner diffs.

* Ensure correct files are autoformatted

* Fix cmd line arg splitting

Each arg has to be in its own set of quotes.

* Diff against mergebase

TIL there's a clean syntax for doing that, but it's too clever to belong in a
shell script.

We use `mapfile -t` to ensure no problems down the line with weird filenames.
2018-06-05 20:22:11 -07:00
songqing
4dd4698564 unify build dir for Python and Java (#2171)
* unify build dir for Python and Java

* enable executables auto installed when just running 'make'

* fix plasma_store copy error

* fix cmake error about copying executables

* lint fix

* recover python/setup.py

* enable to copy optional file automatically

* a small fix of path

* lint fix

* lint fix

* lint fix

* Add comment.
2018-06-01 16:28:27 -07:00
Robert Nishihara
c85bb8fb4e Re-encrypt key for uploading to S3 from travis to use travis-ci.com. (#2169) 2018-05-31 00:05:03 -07:00
Yujie Liu
a8d3c057c1 [JavaWorker] Enable java worker support (#2094)
* Enable java worker support
--------------------------
This commit includes a tailored version of the Java worker implementation from Ant Financial.
The changes for build system, python module, src module and arrow are in other commits, this commit consists of the following modules:
 - java/api: Ray API definition
 - java/common: utilities
 - java/hook: binary rewrite of the Java byte-code for remote execution
 - java/runtime-common: common implementation of the runtime in worker
 - java/runtime-dev: a pure-java mock implementation of the runtime for fast development
 - java/runtime-native: a native implementation of the runtime
 - java/test: various tests

Contributors for this work:
 Guyang Song, Peng Cao, Senlin Zhu,Xiaoying Chu, Yiming Yu, Yujie Liu, Zhenyu Guo

* change the format of java help document from markdown to RST

* update the vesion of Arrow for java worker

* adapt the new version of plasma java client from arrow which use byte[] instead of custom type

* add java worker test to ci

* add the example module for better usage guide
2018-05-26 14:38:50 -07:00
Kunal Gosar
4584193308 [DataFrame] Refactor GroupBy Methods and Implement Reindex (#2101)
* fix 1D blocks case

* Add groupby test code

* begin writing tests

* Fix bug on groupby(axis=1, ...)

* implement reindex

* fix index misalignment after groupby

* fix df.apply bug

* fix groupby.apply

* fix agg funcs

* finish groupby tests

* finish test suite for groupby

* fixing lint

* undo new line

* fix python2 index copy bug

* Concat Series into ray.df

* fixing python2 issues

* resolving all python 2 tests

* handle multiindex on apply

* resolve comments

* updating docstring

* fix lint

* fix lint again

* address comments
2018-05-22 16:34:07 -07:00
Stephanie Wang
6ca122f723 [xray] Sophisticated task dependency management (#2035) 2018-05-17 17:18:30 -07:00
Robert Nishihara
52b0f3734a [xray] Add Travis build for testing xray on Linux. (#2047)
* Run xray tests in travis.

* Comment out TaskTests.testSubmittingManyTasks.

* Comment out failing tests.

* Comment out hanging test.

* Linting

* Comment out failing test.

* Comment out failing test.

* Ignore test_dataframe.py for now.

* Comment out testDriverExitingQuickly.
2018-05-13 21:22:01 -07:00
Rohan Singh
1848745223 [DataFrame] Implement quantile (#1992)
* added quantile method

* updated init for datetime signatures

* updated documentation for _map_partitions return type

* removed extraneous print call

* updated for simplicity

* fixed dtyping issues and error raising

* updated datetime dtype checking

* Fixing quantile implementation

* Fix minor bug

* Fixing diff
2018-05-06 18:25:13 -07:00
adgirish
3c48783a16 [DataFrame] Adding read methods and tests (#1712)
* Adding read methods and tests

* Referencing internal partition method so constructors are more canonical with Pandas

* Fixing to reference from_pandas in utils

* Cleaning up unused imports

* rerunning tests

* fixing flake8

* resolving errors

* Added sql and sas test

* updating

* Temporarily phasing out read_csv code for wrapper while diagnosing, added io tests to travis

* Adding travis

* restoring distributed read csv

* resolving rebases

* lint

* Sampling out HD test

* adding dep

* fix pathing

* Flagging out tests

* resolving read_method issues

* fix build issue

* move additional dependencies to extras

* fixing lint

* removing IO dependencies

* updated requirements doc
2018-04-20 18:33:08 -07:00
Melih Elibol
6be73350c6 Adds Valgrind tests for multi-threaded object manager. (#1890)
* adds valgrind to new object manager.

* Add some comments.

* Update run_object_manager_valgrind.sh

typo

* Update run_object_manager_tests.sh

* update tests to reflect changes in #1891.

* reduce # tests.
2018-04-13 21:56:12 -07:00
Philipp Moritz
74162d1492 Lint Python files with Yapf (#1872) 2018-04-11 10:11:35 -07:00
adgirish
efeaacbedc Adding support for concat (#1739)
adding tests

fixing flake8

adding init

flake 8 on test

fixing tests, imports, and flake8

handling for index

adding tests for row, index

added more robust error handling for axis

fixing test failures

cleaning up error sfor 2.7

updating travis

resolving import

fixing flake8

moved import order

Fixing to refactor and delaying implementing ray-pd inner concat

resolving ray-pd concat and from_pandas mutation

Revert "resolving ray-pd concat and from_pandas mutation"

This reverts commit 5db43e4e89e328286532f3ef98a4526575c5d08d.
2018-04-09 21:36:24 -07:00
Robert Nishihara
fbfbb1c079 [xray] Integrate worker.py with raylet. (#1810)
* Integrate worker with raylet.

* Begin allowing worker to attach to cluster.

* Fix linting and documentation.

* Fix linting.

* Comment tests back in.

* Fix type of worker command.

* Remove xray python files and tests.

* Fix from rebase.

* Add test.

* Copy over raylet executable.

* Small cleanup.
2018-04-03 02:38:56 -07:00
Melih Elibol
6e06a9e338 XRay Task Forwarding Milestone (#1785)
Summary:
Able to run 1000 tasks with object dependencies on a set of distributed Raylets.

Raylet Changes:

Finalized ClientConnection class.
Task forwarding.
NM-to-NM heartbeats.
NM resource accounting for tasks.
Simple scheduling policy with task forwarding.
Creating and maintaining NM 2 NM long-lived connections and reusing them for task forwarding.
LineageCache Changes:

LineageCache without cleanup of tasks committed by remote nodes.
Lineage cache writeback and cleanup implementation.
ObjectManager Changes:

Object manager event loop/ClientConnection refactor.
Multithreaded object manager (disabled in this PR).
Testing Changes:

Integration tests for task submission on multiple Raylets.
Stress tests for object manager (with GCS and object store integration).


Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>
Co-authored-by: Alexey Tumanov <atumanov@gmail.com>
2018-03-31 18:02:58 -07:00
Eric Liang
4116c64698
[tune] Remove rllib dep again, and add a test (#1792)
* tune should not depend on rllib

* fix dep test

* Tue Mar 27 16:55:41 PDT 2018

* f401
2018-03-29 15:36:49 -07:00
Philipp Moritz
a9acfab3a6 Start chain replicated GCS with Ray (#1538) 2018-03-07 10:18:58 -08:00
Richard Liaw
aefefcb0cd Upload wheels to 'latest' folder (#1606) 2018-02-26 10:26:38 -08:00
Simon Mo
a24cc28773 [DataFrame] Add Parquet Support in Build Process (#1531)
* Add shell script for building parquet

* Use parquet ci script; remove anaconda

* Remove gcc flag, use default

* add boost_root

* Fix $TP_DIR reference issue

* fix the PR

* check out specific parquet-cpp commit
2018-02-16 07:18:42 -08:00
Eric Liang
b948405532
[tune] clean up population based training prototype (#1478)
* patch up pbt

* Sat Jan 27 01:00:03 PST 2018

* Sat Jan 27 01:04:14 PST 2018

* Sat Jan 27 01:04:21 PST 2018

* Sat Jan 27 01:15:15 PST 2018

* Sat Jan 27 01:15:42 PST 2018

* Sat Jan 27 01:16:14 PST 2018

* Sat Jan 27 01:38:42 PST 2018

* Sat Jan 27 01:39:21 PST 2018

* add pbt

* Sat Jan 27 01:41:19 PST 2018

* Sat Jan 27 01:44:21 PST 2018

* Sat Jan 27 01:45:46 PST 2018

* Sat Jan 27 16:54:42 PST 2018

* Sat Jan 27 16:57:53 PST 2018

* clean up test

* Sat Jan 27 18:01:15 PST 2018

* Sat Jan 27 18:02:54 PST 2018

* Sat Jan 27 18:11:18 PST 2018

* Sat Jan 27 18:11:55 PST 2018

* Sat Jan 27 18:14:09 PST 2018

* review

* try out a ppo example

* some tweaks to ppo example

* add postprocess hook

* Sun Jan 28 15:00:40 PST 2018

* clean up custom explore fn

* Sun Jan 28 15:10:21 PST 2018

* Sun Jan 28 15:14:53 PST 2018

* Sun Jan 28 15:17:04 PST 2018

* Sun Jan 28 15:33:13 PST 2018

* Sun Jan 28 15:56:40 PST 2018

* Sun Jan 28 15:57:36 PST 2018

* Sun Jan 28 16:00:35 PST 2018

* Sun Jan 28 16:02:58 PST 2018

* Sun Jan 28 16:29:50 PST 2018

* Sun Jan 28 16:30:36 PST 2018

* Sun Jan 28 16:31:44 PST 2018

* improve tune doc

* concepts

* update humanoid

* Fri Feb  2 18:03:33 PST 2018

* fix example

* show error file
2018-02-02 23:03:12 -08:00
Philipp Moritz
a3f8fa426b Start integrating new GCS APIs (#1379)
* Start integrating new GCS calls

* fixes

* tests

* cleanup

* cleanup and valgrind fix

* update tests

* fix valgrind

* fix more valgrind

* fixes

* add separate tests for GCS

* fix linting

* update tests

* cleanup

* fix python linting

* more fixes

* fix linting

* add plasma manager callback

* add some documentation

* fix linting

* fix linting

* fixes

* update

* fix linting

* fix

* add spillback count

* fixes

* linting

* fixes

* fix linting

* fix

* fix

* fix
2018-01-31 11:01:12 -08:00
Robert Nishihara
ab5d4a6010 Bring cloudpickle inside the repository. (#1445)
* Bring cloudpickle version 0.5.2 inside the repo.

* Use internal copy of cloudpickle everywhere.

* Fix linting.

* Import ordering.

* Change __init__.py.

* Set pickler in serialization context.

* Don't check ray location.
2018-01-25 11:36:37 -08:00