Commit graph

1858 commits

Author SHA1 Message Date
songqing
78a48fa1e0 Fix build error when building Ray for Java later than Python (#2241) 2018-06-12 21:11:30 -07:00
Eric Liang
be178ae031 [autoscaler] GCP docs (#2235) 2018-06-12 12:40:12 -07:00
Eric Liang
7fcaad264a
[autoscaler] Translate to/from AWS 'Name' tag (#2219)
* fix tag

* fix
2018-06-11 12:10:10 -07:00
Alok Singh
d47d6a6b7a [rllib] Use correct method name (#2226) 2018-06-11 09:53:31 -07:00
Devin Petersohn
b886ceca47 [DataFrame] Implement __array_wrap__ (#2218)
* Implement __array_wrap__

* Removing unnecessary test
2018-06-11 08:56:43 -07:00
Robert Nishihara
61139e1509 Enable fractional resources and resource IDs for xray. (#2187)
* Implement GPU IDs and fractional resources.

* Add documentation and python exceptions.

* Fix signed/unsigned comparison.

* Fix linting.

* Fixes from rebase.

* Re-enable tests that use ray.wait.

* Don't kill the raylet if an infeasible task is submitted.

* Ignore tests that require better load balancing.

* Linting

* Ignore array test.

* Ignore stress test reconstructions tests.

* Don't kill node manager if remote node manager disconnects.

* Ignore more stress tests.

* Naming changes

* Remove outdated todo

* Small fix

* Re-enable test.

* Linting

* Fix resource bookkeeping for blocked tasks.

* Fix linting

* Fix Java client.

* Ignore test

* Ignore put error tests
2018-06-10 15:31:43 -07:00
Richard Liaw
f19decb848
[docs] Update RLlib install to not include Tensorflow (#2178) 2018-06-10 10:29:12 -07:00
Philipp Moritz
4ec5bea03b [xray] Implement fetch (#2195) 2018-06-09 23:36:27 -07:00
Robert Nishihara
125fe1c09c Print warning when defining very large remote function or actor. (#2179)
* Print warning when defining very large remote function or actor.

* Add weak test.

* Check that warnings appear in test.

* Make wait_for_errors actually fail in failure_test.py.

* Use constants for error types.

* Fix
2018-06-09 19:59:15 -07:00
andrewztan
1475600c81 [rllib] Merge DDPG and DDPG2 implementations (#2202)
* removed ddpg2

* removed ddpg2 from codebase

* added tests used in ddpg vs ddpg2 comparison

* added notes about training timesteps to yaml files

* removed ddpg2 yaml files

* removed unnecessary configs from yaml files

* removed unnecessary configs from yaml files

* moved pendulum, mountaincarcontinuous, and halfcheetah tests to tuned_examples

* moved pendulum, mountaincarcontinuous, and halfcheetah tests to tuned_examples

* added more configuration details to yaml files

* removed random starts from halfcheetah
2018-06-09 16:46:23 -07:00
Yujie Liu
3b5e700fd7 [JavaWorker] Java code lint check and binding to CI (#2225)
* add java code lint check and fix the java code lint error

* add java doc lint check and fix the java doc lint error

* add java code and doc lint to the CI
2018-06-09 16:26:54 -07:00
Robert Nishihara
5789a247f9 [xray] Do not redirect worker output to files by default. (#2220) 2018-06-09 15:00:42 -07:00
Eric Liang
71eb558eb0 [rllib] Refactor rllib to have a common sample collection pathway (#2149) 2018-06-09 00:21:35 -07:00
Stephanie Wang
cb5e6e6d68 Add dependency between copy_ray and python extensions (#2221) 2018-06-08 20:41:54 -07:00
Eric Liang
32b9a4d3f1
Fix yapf excludes, print diff in --all mode (#2211)
* fix

* travis
2018-06-08 02:25:55 -07:00
Eric Liang
8da558f5b7 [autoscaler] Should use internal IP for ssh (#2209) 2018-06-08 01:08:59 -07:00
Eric Liang
31046f7e06 Autoscaler Python 2 queue fix (#2205) 2018-06-07 18:43:07 -07:00
Eric Liang
100d8c207f [xray] [autoscaler] Fix autoscaler / raylet integration (#2143) 2018-06-07 15:43:20 -07:00
Yuhong Guo
0a34bea0b0 Use scoped enums in C++ and flatbuffers. (#2194)
* Enable --scoped-enums in flatbuffer compiler.

* Change enum to c++11 style (enum class).

* Resolve conflicts.

* Solve building failure when RAY_USE_NEW_GCS=on and remove ERROR_INDEX suffix.

* Merge with master and fix CI failure.
2018-06-07 01:01:21 -07:00
Hao Chen
f0907a6ee9 Optimize lineage eviction efficiency (#2196)
* Java in vscode.

* Optimize lineage eviction

* minor fix

* fix ut

* fix comment and lint

* format

* format

* remove unneeded code
2018-06-07 00:35:15 -07:00
Philipp Moritz
343f29801b [xray] Fix compilation on mac (#2199) 2018-06-06 22:33:46 -07:00
Melih Elibol
7246ff80a4
[xray] Implements ray.wait (#2162)
Implements ray.wait for xray. Fixes #1128.
2018-06-06 16:56:44 -07:00
Devin Petersohn
c8c0349511 [DataFrame] Temporarily changing the requirement until our pandas compat is updated (#2197)
* Temporarily changing the requirement until our pandas compat is updated
for 0.23

* Fix lint
2018-06-06 12:01:43 -07:00
Yuhong Guo
5b0df0eca2 Change surefire version to 2.21.0 to fix test failure on Java10. (#2198) 2018-06-06 10:39:20 -07:00
Alok Singh
42a9233e1d Improve yapf speed and document its usage (#2160)
* Allow yapf to lint individual files

* Add tip for using yapf

* Update doc

* Update script to autoformat changed py files

The new default is for the script to only updated changed files to encourage
using it as a pre-push hook. Travis still checks all since it's not that big an
increase to runtime.

* Exclude formatting thirdparty/autogen py files

* Symlink .travis -> scripts

Hidden directories may get glossed over otherwise.

* .travis -> scripts in docs

They are symlinks to the same thing, but `scripts` is more dev-friendly, while
`.travis` is really only for Travis CI.

* Document different yapf format functions

Most devs will only need `format_changed`, and this is run by default.
`format_changed` should be fast enough in most cases to work as a pre-commit
hook.

* Speed up yapf by only formatting changed files

* Update docs

1. Mention how yapf can be used a pre-commit hook
2. rm `bash`, script is executable

* Update yapf.sh

* Update development.rst

* Update yapf.sh

* Use bash arrays for correct argument splitting

Playing fast and loose with whitespace in bash is a terrible idea.

* Only format non-excluded by default

* Check changes against master

Normally, the remote is called `origin`, but naming it explicit

* Adding missing directory to `format_all`

* Cleanup YAPF code

Remove unused function and move around code to make clearer and adding lines
give cleaner diffs.

* Ensure correct files are autoformatted

* Fix cmd line arg splitting

Each arg has to be in its own set of quotes.

* Diff against mergebase

TIL there's a clean syntax for doing that, but it's too clever to belong in a
shell script.

We use `mapfile -t` to ensure no problems down the line with weird filenames.
2018-06-05 20:22:11 -07:00
Adam Gleave
6ef3b255ea Launch nodes in separate threads (#2183)
Modifies the autoscaler to run launch_new_nodes in a separate thread, keeping track of the number of pending requests.
2018-06-05 20:19:31 -07:00
Richard Liaw
13d4e0db95 Add Docker Support for ASV (#2184)
* added new instructions and script

* initialize ray only once

* use ray-project/asv master
2018-06-05 15:55:35 -07:00
Simon Mo
a139a5df8c [DataFrame] Implement Memoizer (#2157)
* Implement Memoizer

* Add LRUCache

* Add comments
2018-06-05 07:18:12 -07:00
songqing
451cdb43f6 Fix redefinition of flatbuffer types (#2189) 2018-06-05 00:08:05 -07:00
Devin Petersohn
b56c8ed8dc [DataFrame] Fix equals and make it more efficient (#2186)
* Fixing equals

* Adding test fix

* Working on fix for equals and drop

* Fix equals and fix tests to use ray.dataframe.equals

* Addressing comments
2018-06-04 13:10:06 -07:00
Peter Schafhalter
a5d888e49b [DataFrames] More dtypes optimizations (#2124)
* Pass dtypes for some DataFrame constructors

* More optimizations with dtypes_cache

* Optimizations
2018-06-04 10:50:13 -07:00
Binglin Chang
19d6ca0670 Support constructing TensorFlowVariables from multiple tf operations (#2182) 2018-06-02 18:13:52 -07:00
Philipp Moritz
d699bfbf10 Use hashing function that takes into account all UniqueID bytes (#2174) 2018-06-01 23:07:29 -07:00
Philipp Moritz
e1024d84e9 [xray] Start actor workers in parallel (#2168) 2018-06-01 23:04:16 -07:00
Kunal Gosar
317d0da7d8 Add experimental API for ray.get and ray.wait with additional argument types (#2071) 2018-06-01 16:42:27 -07:00
songqing
4dd4698564 unify build dir for Python and Java (#2171)
* unify build dir for Python and Java

* enable executables auto installed when just running 'make'

* fix plasma_store copy error

* fix cmake error about copying executables

* lint fix

* recover python/setup.py

* enable to copy optional file automatically

* a small fix of path

* lint fix

* lint fix

* lint fix

* Add comment.
2018-06-01 16:28:27 -07:00
Yuhong Guo
c1de03acac Add timeout mechanism to Push function instead of retries (#2148)
Use timer instead of retries in Push when objects are not local.
2018-06-01 01:21:05 -07:00
Kristian Hartikainen
74dc14d1fc [autoscaler] GCP node provider (#2061)
* Google Cloud Platform scaffolding

* Add minimal gcp config example

* Add googleapiclient discoveries, update gcp.config constants

* Rename and update gcp.config key pair name function

* Implement gcp.config._configure_project

* Fix the create project get project flow

* Implement gcp.config._configure_iam_role

* Implement service account iam binding

* Implement gcp.config._configure_key_pair

* Implement rsa key pair generation

* Implement gcp.config._configure_subnet

* Save work-in-progress gcp.config._configure_firewall_rules.

These are likely to be not needed at all. Saving them if we happen to
need them later.

* Remove unnecessary firewall configuration

* Update example-minimal.yaml configuration

* Add new wait_for_compute_operation, rename old wait_for_operation

* Temporarily rename autoscaler tags due to gcp incompatibility

* Implement initial gcp.node_provider.nodes

* Still missing filter support

* Implement initial gcp.node_provider.create_node

* Implement another compute wait
  operation (wait_For_compute_zone_operation). TODO: figure out if we
  can remove the function.

* Implement initial gcp.node_provider._node and node status functions

* Implement initial gcp.node_provider.terminate_node

* Implement node tagging and ip getter methods for nodes

* Temporarily rename tags due to gcp incompatibility

* Tiny tweaks for autoscaler.updater

* Remove unused config from gcp node_provider

* Add new example-full example to gcp, update load_gcp_example_config

* Implement label filtering for gcp.node_provider.nodes

* Revert unnecessary change in ssh command

* Revert "Temporarily rename tags due to gcp incompatibility"

This reverts commit e2fe634c5d11d705c0f5d3e76c80c37394bb23fb.

* Revert "Temporarily rename autoscaler tags due to gcp incompatibility"

This reverts commit c938ee435f4b75854a14e78242ad7f1d1ed8ad4b.

* Refactor autoscaler tagging to support multiple tag specs

* Remove missing cryptography imports

* Update quote function import

* Fix threading issue in gcp.config with the compute discovery object

* Add gcs support for log_sync

* Fix the labels/tags naming discrepancy

* Add expanduser to file_mounts hashing

* Fix gcp.node_provider.internal_ip

* Add uuid to node name

* Remove 'set -i' from updater ssh command

* Also add TODO with the context and reason for the change.

* Update ssh key creation in autoscaler.gcp.config

* Fix wait_for_compute_zone_operation's threading issue

Google discovery api's compute object is not thread safe, and thus
needs to be recreated for each thread. This moves the
`wait_for_compute_zone_operation` under `autoscaler.gcp.config`, and
adds compute as its argument.

* Address pr feedback from @ericl

* Expand local file mount paths in NodeUpdater

* Add ssh_user name to key names

* Update updater ssh to attempt 'set -i' and fall back if that fails

* Update gcp/example-full.yaml

* Fix wait crm operation in gcp.config

* Update gcp/example-minimal.yaml to match aws/example-minimal.yaml

* Fix gcp/example-full.yaml comment indentation

* Add gcp/example-full.yaml to setup files

* Update example-full.yaml command

* Revert "Refactor autoscaler tagging to support multiple tag specs"

This reverts commit 9cf48409ca2e5b66f800153853072c706fa502f6.

* Update tag spec to only use characters [0-9a-z_-]

* Change the tag values to conform gcp spec

* Add project_id in the ssh key name

* Replace '_' with '-' in autoscaler tag names

* Revert "Update updater ssh to attempt 'set -i' and fall back if that fails"

This reverts commit 23a0066c5254449e49746bd5e43b94b66f32bfb4.

* Revert "Remove 'set -i' from updater ssh command"

This reverts commit 5fa034cdf79fa7f8903691518c0d75699c630172.

* Add fallback to `set -i` in force_interactive command

* Update autoscaler tests to match current implementation

* Update GCPNodeProvider.create_node to include hash in instance name

* Add support for creating multiple instance on one create_node call

* Clean TODOs

* Update styles

* Replace single quotes with double quotes
* Some minor indentation fixes etc.

* Remove unnecessary comment. Fix indentation.

* Yapfify files that fail flake8 test

* Yapfify more files

* Update project_id handling in gcp node provider

* temporary yapf mod

* Revert "temporary yapf mod"

This reverts commit b6744e4e15d4d936d1a14f4bf155ed1d3bb14126.

* Fix autoscaler/updater.py lint error, remove unused variable
2018-05-31 09:00:03 -07:00
Stephanie Wang
117107cb15 [xray] Evict tasks from the lineage cache (#2152) 2018-05-31 00:24:39 -07:00
Philipp Moritz
12de668ccb [ASV] Add ray.init and simple Ray benchmarks (#2166) 2018-05-31 00:06:17 -07:00
Robert Nishihara
c85bb8fb4e Re-encrypt key for uploading to S3 from travis to use travis-ci.com. (#2169) 2018-05-31 00:05:03 -07:00
Alok Singh
fd234e3171 [rllib] Fix A3C PyTorch implementation (#2036)
* Use F.softmax instead of a pointless network layer

Stateless functions should not be network layers.

* Use correct pytorch functions

* Rename argument name to out_size

Matches in_size and makes more sense.

* Fix shapes of tensors

Advantages and rewards both should be scalars, and therefore a list of them
should be 1D.

* Fmt

* replace deprecated function

* rm unnecessary Variable wrapper

* rm all use of torch Variables

Torch does this for us now.

* Ensure that values are flat list

* Fix shape error in conv nets

* fmt

* Fix shape errors

Reshaping the action before stepping in the env fixes a few errors.

* Add TODO

* Use correct filter size

Works when `self.config['model']['channel_major'] = True`.

* Add missing channel major

* Revert reshape of action

This should be handled by the agent or at least in a cleaner way that doesn't
break existing envs.

* Squeeze action

* Squeeze actions along first dimension

This should deal with some cases such as cartpole where actions are scalars
while leaving alone cases where actions are arrays (some robotics tasks).

* try adding pytorch tests

* typo

* fixup docker messages

* Fix A3C for some envs

Pendulum doesn't work since it's an edge case (expects singleton arrays, which
`.squeeze()` collapses to scalars).

* fmt

* nit flake

* small lint
2018-05-30 10:48:11 -07:00
Hao Chen
ac1e5a7d15 [JavaWorker] Do not kill local-scheduler-forked workers in RunManager.cleanup (#2151)
Local-scheduler-forked workers will be killed by local scheduler itself,
don't need to be killed here. See:
570c3153cd/src/local_scheduler/local_scheduler.cc (L184-L192)

Also, using `ps | grep | kill` might be dangerous, because it
could also kill irrelevant processes, e.g., `vim DefaultWorker.java`.
2018-05-30 00:25:03 -07:00
Robert Nishihara
aa34509bc7 Update Travis CI badge from travis-ci.org to travis-ci.com. (#2155) 2018-05-29 16:44:02 -07:00
Robert Nishihara
6172f94c04 Implement Python global state API for xray. (#2125)
* Implement global state API for xray.

* Fix object table.

* Fixes for log structure.

* Implement cluster_resources.

* Add driver task to task table.

* Remove python flatbuffers code

* Get some global state API tests running.

* Python linting.

* Fix linting.

* Fix mock modules for doc

* Copy over flatbuffer bindings.

* Fix for tests.

* Linting

* Fix monitor crash.
2018-05-29 16:25:54 -07:00
Stephanie Wang
166000b089
[xray] Improve flush algorithm for the lineage cache (#2130)
* Private method to flush a single task from the lineage cache

* Track parent->child relationships for faster flushing

* doc

* Only flush the newly ready task

* Flush() returns void

* x
2018-05-28 21:03:15 -07:00
Eric Liang
bc2a83e698 Fix support for actor classmethods (#2146) 2018-05-28 17:43:23 -07:00
Peter Veerman
eb1d7ac4bc Add empty df test (#1879) 2018-05-27 09:25:50 -07:00
Yujie Liu
a8d3c057c1 [JavaWorker] Enable java worker support (#2094)
* Enable java worker support
--------------------------
This commit includes a tailored version of the Java worker implementation from Ant Financial.
The changes for build system, python module, src module and arrow are in other commits, this commit consists of the following modules:
 - java/api: Ray API definition
 - java/common: utilities
 - java/hook: binary rewrite of the Java byte-code for remote execution
 - java/runtime-common: common implementation of the runtime in worker
 - java/runtime-dev: a pure-java mock implementation of the runtime for fast development
 - java/runtime-native: a native implementation of the runtime
 - java/test: various tests

Contributors for this work:
 Guyang Song, Peng Cao, Senlin Zhu,Xiaoying Chu, Yiming Yu, Yujie Liu, Zhenyu Guo

* change the format of java help document from markdown to RST

* update the vesion of Arrow for java worker

* adapt the new version of plasma java client from arrow which use byte[] instead of custom type

* add java worker test to ci

* add the example module for better usage guide
2018-05-26 14:38:50 -07:00
Devin Petersohn
74cca3b284 [DataFrame] Fixing the code formatting of the tests (#2123)
* Fixing the code formatting of the tests

* Fixing tests and removing from_pandas

* Addressing comment

* Addressing comments

* Fix lint
2018-05-26 11:24:01 -07:00