Commit graph

1910 commits

Author SHA1 Message Date
Philipp Moritz
62de86ff7a fix redis module build dependencies (#2247) 2018-06-13 10:18:09 -07:00
Hao Chen
8efd0f7b1b [xray] support multi-workers per process (#2244)
* support multi-workers per process

Signed-off-by: Hao Chen <chenh1024@gmail.com>

* use RayConfig

Signed-off-by: Hao Chen <chenh1024@gmail.com>

* fix

Signed-off-by: Hao Chen <chenh1024@gmail.com>

* fix

* remove clear

* address comments

* fix lint

* fix bug

* make WorkerPool and WorkerPoolMock more consistent
2018-06-13 10:14:05 -07:00
songqing
78a48fa1e0 Fix build error when building Ray for Java later than Python (#2241) 2018-06-12 21:11:30 -07:00
Eric Liang
be178ae031 [autoscaler] GCP docs (#2235) 2018-06-12 12:40:12 -07:00
Eric Liang
7fcaad264a
[autoscaler] Translate to/from AWS 'Name' tag (#2219)
* fix tag

* fix
2018-06-11 12:10:10 -07:00
Alok Singh
d47d6a6b7a [rllib] Use correct method name (#2226) 2018-06-11 09:53:31 -07:00
Devin Petersohn
b886ceca47 [DataFrame] Implement __array_wrap__ (#2218)
* Implement __array_wrap__

* Removing unnecessary test
2018-06-11 08:56:43 -07:00
Robert Nishihara
61139e1509 Enable fractional resources and resource IDs for xray. (#2187)
* Implement GPU IDs and fractional resources.

* Add documentation and python exceptions.

* Fix signed/unsigned comparison.

* Fix linting.

* Fixes from rebase.

* Re-enable tests that use ray.wait.

* Don't kill the raylet if an infeasible task is submitted.

* Ignore tests that require better load balancing.

* Linting

* Ignore array test.

* Ignore stress test reconstructions tests.

* Don't kill node manager if remote node manager disconnects.

* Ignore more stress tests.

* Naming changes

* Remove outdated todo

* Small fix

* Re-enable test.

* Linting

* Fix resource bookkeeping for blocked tasks.

* Fix linting

* Fix Java client.

* Ignore test

* Ignore put error tests
2018-06-10 15:31:43 -07:00
Richard Liaw
f19decb848
[docs] Update RLlib install to not include Tensorflow (#2178) 2018-06-10 10:29:12 -07:00
Philipp Moritz
4ec5bea03b [xray] Implement fetch (#2195) 2018-06-09 23:36:27 -07:00
Robert Nishihara
125fe1c09c Print warning when defining very large remote function or actor. (#2179)
* Print warning when defining very large remote function or actor.

* Add weak test.

* Check that warnings appear in test.

* Make wait_for_errors actually fail in failure_test.py.

* Use constants for error types.

* Fix
2018-06-09 19:59:15 -07:00
andrewztan
1475600c81 [rllib] Merge DDPG and DDPG2 implementations (#2202)
* removed ddpg2

* removed ddpg2 from codebase

* added tests used in ddpg vs ddpg2 comparison

* added notes about training timesteps to yaml files

* removed ddpg2 yaml files

* removed unnecessary configs from yaml files

* removed unnecessary configs from yaml files

* moved pendulum, mountaincarcontinuous, and halfcheetah tests to tuned_examples

* moved pendulum, mountaincarcontinuous, and halfcheetah tests to tuned_examples

* added more configuration details to yaml files

* removed random starts from halfcheetah
2018-06-09 16:46:23 -07:00
Yujie Liu
3b5e700fd7 [JavaWorker] Java code lint check and binding to CI (#2225)
* add java code lint check and fix the java code lint error

* add java doc lint check and fix the java doc lint error

* add java code and doc lint to the CI
2018-06-09 16:26:54 -07:00
Robert Nishihara
5789a247f9 [xray] Do not redirect worker output to files by default. (#2220) 2018-06-09 15:00:42 -07:00
Eric Liang
71eb558eb0 [rllib] Refactor rllib to have a common sample collection pathway (#2149) 2018-06-09 00:21:35 -07:00
Stephanie Wang
cb5e6e6d68 Add dependency between copy_ray and python extensions (#2221) 2018-06-08 20:41:54 -07:00
Eric Liang
32b9a4d3f1
Fix yapf excludes, print diff in --all mode (#2211)
* fix

* travis
2018-06-08 02:25:55 -07:00
Eric Liang
8da558f5b7 [autoscaler] Should use internal IP for ssh (#2209) 2018-06-08 01:08:59 -07:00
Eric Liang
31046f7e06 Autoscaler Python 2 queue fix (#2205) 2018-06-07 18:43:07 -07:00
Eric Liang
100d8c207f [xray] [autoscaler] Fix autoscaler / raylet integration (#2143) 2018-06-07 15:43:20 -07:00
Yuhong Guo
0a34bea0b0 Use scoped enums in C++ and flatbuffers. (#2194)
* Enable --scoped-enums in flatbuffer compiler.

* Change enum to c++11 style (enum class).

* Resolve conflicts.

* Solve building failure when RAY_USE_NEW_GCS=on and remove ERROR_INDEX suffix.

* Merge with master and fix CI failure.
2018-06-07 01:01:21 -07:00
Hao Chen
f0907a6ee9 Optimize lineage eviction efficiency (#2196)
* Java in vscode.

* Optimize lineage eviction

* minor fix

* fix ut

* fix comment and lint

* format

* format

* remove unneeded code
2018-06-07 00:35:15 -07:00
Philipp Moritz
343f29801b [xray] Fix compilation on mac (#2199) 2018-06-06 22:33:46 -07:00
Melih Elibol
7246ff80a4
[xray] Implements ray.wait (#2162)
Implements ray.wait for xray. Fixes #1128.
2018-06-06 16:56:44 -07:00
Devin Petersohn
c8c0349511 [DataFrame] Temporarily changing the requirement until our pandas compat is updated (#2197)
* Temporarily changing the requirement until our pandas compat is updated
for 0.23

* Fix lint
2018-06-06 12:01:43 -07:00
Yuhong Guo
5b0df0eca2 Change surefire version to 2.21.0 to fix test failure on Java10. (#2198) 2018-06-06 10:39:20 -07:00
Alok Singh
42a9233e1d Improve yapf speed and document its usage (#2160)
* Allow yapf to lint individual files

* Add tip for using yapf

* Update doc

* Update script to autoformat changed py files

The new default is for the script to only updated changed files to encourage
using it as a pre-push hook. Travis still checks all since it's not that big an
increase to runtime.

* Exclude formatting thirdparty/autogen py files

* Symlink .travis -> scripts

Hidden directories may get glossed over otherwise.

* .travis -> scripts in docs

They are symlinks to the same thing, but `scripts` is more dev-friendly, while
`.travis` is really only for Travis CI.

* Document different yapf format functions

Most devs will only need `format_changed`, and this is run by default.
`format_changed` should be fast enough in most cases to work as a pre-commit
hook.

* Speed up yapf by only formatting changed files

* Update docs

1. Mention how yapf can be used a pre-commit hook
2. rm `bash`, script is executable

* Update yapf.sh

* Update development.rst

* Update yapf.sh

* Use bash arrays for correct argument splitting

Playing fast and loose with whitespace in bash is a terrible idea.

* Only format non-excluded by default

* Check changes against master

Normally, the remote is called `origin`, but naming it explicit

* Adding missing directory to `format_all`

* Cleanup YAPF code

Remove unused function and move around code to make clearer and adding lines
give cleaner diffs.

* Ensure correct files are autoformatted

* Fix cmd line arg splitting

Each arg has to be in its own set of quotes.

* Diff against mergebase

TIL there's a clean syntax for doing that, but it's too clever to belong in a
shell script.

We use `mapfile -t` to ensure no problems down the line with weird filenames.
2018-06-05 20:22:11 -07:00
Adam Gleave
6ef3b255ea Launch nodes in separate threads (#2183)
Modifies the autoscaler to run launch_new_nodes in a separate thread, keeping track of the number of pending requests.
2018-06-05 20:19:31 -07:00
Richard Liaw
13d4e0db95 Add Docker Support for ASV (#2184)
* added new instructions and script

* initialize ray only once

* use ray-project/asv master
2018-06-05 15:55:35 -07:00
Simon Mo
a139a5df8c [DataFrame] Implement Memoizer (#2157)
* Implement Memoizer

* Add LRUCache

* Add comments
2018-06-05 07:18:12 -07:00
songqing
451cdb43f6 Fix redefinition of flatbuffer types (#2189) 2018-06-05 00:08:05 -07:00
Devin Petersohn
b56c8ed8dc [DataFrame] Fix equals and make it more efficient (#2186)
* Fixing equals

* Adding test fix

* Working on fix for equals and drop

* Fix equals and fix tests to use ray.dataframe.equals

* Addressing comments
2018-06-04 13:10:06 -07:00
Peter Schafhalter
a5d888e49b [DataFrames] More dtypes optimizations (#2124)
* Pass dtypes for some DataFrame constructors

* More optimizations with dtypes_cache

* Optimizations
2018-06-04 10:50:13 -07:00
Binglin Chang
19d6ca0670 Support constructing TensorFlowVariables from multiple tf operations (#2182) 2018-06-02 18:13:52 -07:00
Philipp Moritz
d699bfbf10 Use hashing function that takes into account all UniqueID bytes (#2174) 2018-06-01 23:07:29 -07:00
Philipp Moritz
e1024d84e9 [xray] Start actor workers in parallel (#2168) 2018-06-01 23:04:16 -07:00
Kunal Gosar
317d0da7d8 Add experimental API for ray.get and ray.wait with additional argument types (#2071) 2018-06-01 16:42:27 -07:00
songqing
4dd4698564 unify build dir for Python and Java (#2171)
* unify build dir for Python and Java

* enable executables auto installed when just running 'make'

* fix plasma_store copy error

* fix cmake error about copying executables

* lint fix

* recover python/setup.py

* enable to copy optional file automatically

* a small fix of path

* lint fix

* lint fix

* lint fix

* Add comment.
2018-06-01 16:28:27 -07:00
Yuhong Guo
c1de03acac Add timeout mechanism to Push function instead of retries (#2148)
Use timer instead of retries in Push when objects are not local.
2018-06-01 01:21:05 -07:00
Kristian Hartikainen
74dc14d1fc [autoscaler] GCP node provider (#2061)
* Google Cloud Platform scaffolding

* Add minimal gcp config example

* Add googleapiclient discoveries, update gcp.config constants

* Rename and update gcp.config key pair name function

* Implement gcp.config._configure_project

* Fix the create project get project flow

* Implement gcp.config._configure_iam_role

* Implement service account iam binding

* Implement gcp.config._configure_key_pair

* Implement rsa key pair generation

* Implement gcp.config._configure_subnet

* Save work-in-progress gcp.config._configure_firewall_rules.

These are likely to be not needed at all. Saving them if we happen to
need them later.

* Remove unnecessary firewall configuration

* Update example-minimal.yaml configuration

* Add new wait_for_compute_operation, rename old wait_for_operation

* Temporarily rename autoscaler tags due to gcp incompatibility

* Implement initial gcp.node_provider.nodes

* Still missing filter support

* Implement initial gcp.node_provider.create_node

* Implement another compute wait
  operation (wait_For_compute_zone_operation). TODO: figure out if we
  can remove the function.

* Implement initial gcp.node_provider._node and node status functions

* Implement initial gcp.node_provider.terminate_node

* Implement node tagging and ip getter methods for nodes

* Temporarily rename tags due to gcp incompatibility

* Tiny tweaks for autoscaler.updater

* Remove unused config from gcp node_provider

* Add new example-full example to gcp, update load_gcp_example_config

* Implement label filtering for gcp.node_provider.nodes

* Revert unnecessary change in ssh command

* Revert "Temporarily rename tags due to gcp incompatibility"

This reverts commit e2fe634c5d11d705c0f5d3e76c80c37394bb23fb.

* Revert "Temporarily rename autoscaler tags due to gcp incompatibility"

This reverts commit c938ee435f4b75854a14e78242ad7f1d1ed8ad4b.

* Refactor autoscaler tagging to support multiple tag specs

* Remove missing cryptography imports

* Update quote function import

* Fix threading issue in gcp.config with the compute discovery object

* Add gcs support for log_sync

* Fix the labels/tags naming discrepancy

* Add expanduser to file_mounts hashing

* Fix gcp.node_provider.internal_ip

* Add uuid to node name

* Remove 'set -i' from updater ssh command

* Also add TODO with the context and reason for the change.

* Update ssh key creation in autoscaler.gcp.config

* Fix wait_for_compute_zone_operation's threading issue

Google discovery api's compute object is not thread safe, and thus
needs to be recreated for each thread. This moves the
`wait_for_compute_zone_operation` under `autoscaler.gcp.config`, and
adds compute as its argument.

* Address pr feedback from @ericl

* Expand local file mount paths in NodeUpdater

* Add ssh_user name to key names

* Update updater ssh to attempt 'set -i' and fall back if that fails

* Update gcp/example-full.yaml

* Fix wait crm operation in gcp.config

* Update gcp/example-minimal.yaml to match aws/example-minimal.yaml

* Fix gcp/example-full.yaml comment indentation

* Add gcp/example-full.yaml to setup files

* Update example-full.yaml command

* Revert "Refactor autoscaler tagging to support multiple tag specs"

This reverts commit 9cf48409ca2e5b66f800153853072c706fa502f6.

* Update tag spec to only use characters [0-9a-z_-]

* Change the tag values to conform gcp spec

* Add project_id in the ssh key name

* Replace '_' with '-' in autoscaler tag names

* Revert "Update updater ssh to attempt 'set -i' and fall back if that fails"

This reverts commit 23a0066c5254449e49746bd5e43b94b66f32bfb4.

* Revert "Remove 'set -i' from updater ssh command"

This reverts commit 5fa034cdf79fa7f8903691518c0d75699c630172.

* Add fallback to `set -i` in force_interactive command

* Update autoscaler tests to match current implementation

* Update GCPNodeProvider.create_node to include hash in instance name

* Add support for creating multiple instance on one create_node call

* Clean TODOs

* Update styles

* Replace single quotes with double quotes
* Some minor indentation fixes etc.

* Remove unnecessary comment. Fix indentation.

* Yapfify files that fail flake8 test

* Yapfify more files

* Update project_id handling in gcp node provider

* temporary yapf mod

* Revert "temporary yapf mod"

This reverts commit b6744e4e15d4d936d1a14f4bf155ed1d3bb14126.

* Fix autoscaler/updater.py lint error, remove unused variable
2018-05-31 09:00:03 -07:00
Stephanie Wang
117107cb15 [xray] Evict tasks from the lineage cache (#2152) 2018-05-31 00:24:39 -07:00
Philipp Moritz
12de668ccb [ASV] Add ray.init and simple Ray benchmarks (#2166) 2018-05-31 00:06:17 -07:00
Robert Nishihara
c85bb8fb4e Re-encrypt key for uploading to S3 from travis to use travis-ci.com. (#2169) 2018-05-31 00:05:03 -07:00
Alok Singh
fd234e3171 [rllib] Fix A3C PyTorch implementation (#2036)
* Use F.softmax instead of a pointless network layer

Stateless functions should not be network layers.

* Use correct pytorch functions

* Rename argument name to out_size

Matches in_size and makes more sense.

* Fix shapes of tensors

Advantages and rewards both should be scalars, and therefore a list of them
should be 1D.

* Fmt

* replace deprecated function

* rm unnecessary Variable wrapper

* rm all use of torch Variables

Torch does this for us now.

* Ensure that values are flat list

* Fix shape error in conv nets

* fmt

* Fix shape errors

Reshaping the action before stepping in the env fixes a few errors.

* Add TODO

* Use correct filter size

Works when `self.config['model']['channel_major'] = True`.

* Add missing channel major

* Revert reshape of action

This should be handled by the agent or at least in a cleaner way that doesn't
break existing envs.

* Squeeze action

* Squeeze actions along first dimension

This should deal with some cases such as cartpole where actions are scalars
while leaving alone cases where actions are arrays (some robotics tasks).

* try adding pytorch tests

* typo

* fixup docker messages

* Fix A3C for some envs

Pendulum doesn't work since it's an edge case (expects singleton arrays, which
`.squeeze()` collapses to scalars).

* fmt

* nit flake

* small lint
2018-05-30 10:48:11 -07:00
Hao Chen
ac1e5a7d15 [JavaWorker] Do not kill local-scheduler-forked workers in RunManager.cleanup (#2151)
Local-scheduler-forked workers will be killed by local scheduler itself,
don't need to be killed here. See:
570c3153cd/src/local_scheduler/local_scheduler.cc (L184-L192)

Also, using `ps | grep | kill` might be dangerous, because it
could also kill irrelevant processes, e.g., `vim DefaultWorker.java`.
2018-05-30 00:25:03 -07:00
Robert Nishihara
aa34509bc7 Update Travis CI badge from travis-ci.org to travis-ci.com. (#2155) 2018-05-29 16:44:02 -07:00
Robert Nishihara
6172f94c04 Implement Python global state API for xray. (#2125)
* Implement global state API for xray.

* Fix object table.

* Fixes for log structure.

* Implement cluster_resources.

* Add driver task to task table.

* Remove python flatbuffers code

* Get some global state API tests running.

* Python linting.

* Fix linting.

* Fix mock modules for doc

* Copy over flatbuffer bindings.

* Fix for tests.

* Linting

* Fix monitor crash.
2018-05-29 16:25:54 -07:00
Stephanie Wang
166000b089
[xray] Improve flush algorithm for the lineage cache (#2130)
* Private method to flush a single task from the lineage cache

* Track parent->child relationships for faster flushing

* doc

* Only flush the newly ready task

* Flush() returns void

* x
2018-05-28 21:03:15 -07:00
Eric Liang
bc2a83e698 Fix support for actor classmethods (#2146) 2018-05-28 17:43:23 -07:00
Peter Veerman
eb1d7ac4bc Add empty df test (#1879) 2018-05-27 09:25:50 -07:00