Commit graph

4419 commits

Author SHA1 Message Date
Edward Oakes
9f751ff8c4
Add ability to specify worker and driver ports (#7833) 2020-04-16 13:49:25 -05:00
Richard Liaw
d5f517b2f5
[docs] Hotfix for missing css files. (#8051) 2020-04-16 11:44:55 -07:00
Richard Liaw
4d8bf5635d
[hotfix] Lint formatting for new Tune optimizer ZOOpt (#8040)
* formatting

* removedill

* lint
2020-04-16 09:24:30 -07:00
Clark Zinzow
d4cae5f632
[Core] Added ability to specify different IP addresses for a core worker and its raylet. (#7985) 2020-04-16 10:32:24 -05:00
Sven Mika
d0fab84e4d
[RLlib] DDPG PyTorch version. (#7953)
The DDPG/TD3 algorithms currently do not have a PyTorch implementation. This PR adds PyTorch support for DDPG/TD3 to RLlib.
This PR:
- Depends on the re-factor PR for DDPG (Functional Algorithm API).
- Adds learning regression tests for the PyTorch version of DDPG and a DDPG (torch)
- Updates the documentation to reflect that DDPG and TD3 now support PyTorch.

* Learning Pendulum-v0 on torch version (same config as tf). Wall time a little slower (~20% than tf).
* Fix GPU target model problem.
2020-04-16 10:20:01 +02:00
Xianyang Liu
e1d3f7eba6
[rllib]Add config for rllib to support set python environments (#8026)
* support set extra python environments

* wrap value with str

* Apply suggestions from code review

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

* addresses comments

* fix lint errors

* remove unrelated changes due to format.sh

* remove unrelated changes due to format.sh

Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-04-16 01:13:45 -07:00
wanxing
9345d03ffb
[Streaming] Streaming data transfer supports cross language. (#7961)
* add init parameters for java

* fix bug

* cython

* fix compile

* fix test_direct_tranfer

* comment

* ChannelCreationParameter

* fix comment

* builder

* lint and fix tests

* fix single process test

* fix checkstyle and lint

* checkstyle

* lint python

Co-authored-by: wanxing <wanxing@B-458DMD6M-1753.local>
2020-04-16 15:16:48 +08:00
fangfengbin
5a7882bb44
Fix gcs_server get invalid local address (#7842) 2020-04-16 14:58:19 +08:00
JianZhangYang
7b0518b993
[streaming] Async changes for resourcemanager part (#7955) 2020-04-16 14:15:45 +08:00
Servon
5c274fe631
[Tune] Add ZOOpt search algorithm (#7960)
* add zoopt

* add zoopt search algo

* add zoopt

* fix zoopt

* add zoopt requirements

* fix zoopt

* remove generated guides

* Apply suggestions from code review

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-04-15 21:13:29 -07:00
mehrdadn
956ea7c944
Hotfix CI determine_tests_to_run (#8039) 2020-04-15 17:00:38 -07:00
Simon Mo
7455610d5a
Serve Doc: Quickstart (#7940) 2020-04-15 12:25:37 -07:00
mehrdadn
ba00c29b67
Factor out Travis 'install' sections for use with GitHub Actions (#7988) 2020-04-15 08:10:22 -07:00
Sven Mika
428516056a
[RLlib] SAC Torch (incl. Atari learning) (#7984)
* Policy-classes cleanup and torch/tf unification.
- Make Policy abstract.
- Add `action_dist` to call to `extra_action_out_fn` (necessary for PPO torch).
- Move some methods and vars to base Policy
  (from TFPolicy): num_state_tensors, ACTION_PROB, ACTION_LOGP and some more.

* Fix `clip_action` import from Policy (should probably be moved into utils altogether).

* - Move `is_recurrent()` and `num_state_tensors()` into TFPolicy (from DynamicTFPolicy).
- Add config to all Policy c'tor calls (as 3rd arg after obs and action spaces).

* Add `config` to c'tor call to TFPolicy.

* Add missing `config` to c'tor call to TFPolicy in marvil_policy.py.

* Fix test_rollout_worker.py::MockPolicy and BadPolicy classes (Policy base class is now abstract).

* Fix LINT errors in Policy classes.

* Implement StatefulPolicy abstract methods in test cases: test_multi_agent_env.py.

* policy.py LINT errors.

* Create a simple TestPolicy to sub-class from when testing Policies (reduces code in some test cases).

* policy.py
- Remove abstractmethod from `apply_gradients` and `compute_gradients` (these are not required iff `learn_on_batch` implemented).
- Fix docstring of `num_state_tensors`.

* Make QMIX torch Policy a child of TorchPolicy (instead of Policy).

* QMixPolicy add empty implementations of abstract Policy methods.

* Store Policy's config in self.config in base Policy c'tor.

* - Make only compute_actions in base Policy's an abstractmethod and provide pass
implementation to all other methods if not defined.
- Fix state_batches=None (most Policies don't have internal states).

* Cartpole tf learning.

* Cartpole tf AND torch learning (in ~ same ts).

* Cartpole tf AND torch learning (in ~ same ts). 2

* Cartpole tf (torch syntax-broken) learning (in ~ same ts). 3

* Cartpole tf AND torch learning (in ~ same ts). 4

* Cartpole tf AND torch learning (in ~ same ts). 5

* Cartpole tf AND torch learning (in ~ same ts). 6

* Cartpole tf AND torch learning (in ~ same ts). Pendulum tf learning.

* WIP.

* WIP.

* SAC torch learning Pendulum.

* WIP.

* SAC torch and tf learning Pendulum and Cartpole after cleanup.

* WIP.

* LINT.

* LINT.

* SAC: Move policy.target_model to policy.device as well.

* Fixes and cleanup.

* Fix data-format of tf keras Conv2d layers (broken for some tf-versions which have data_format="channels_first" as default).

* Fixes and LINT.

* Fixes and LINT.

* Fix and LINT.

* WIP.

* Test fixes and LINT.

* Fixes and LINT.

Co-authored-by: Sven Mika <sven@Svens-MacBook-Pro.local>
2020-04-15 13:25:16 +02:00
fangfengbin
efbaf155b2
[GCS]Add publish and subscribe function of gcs table (#7909) 2020-04-15 04:24:52 -07:00
Qing Wang
dfb0ad0d3e
[Java] Fix Java CI exit code issue (#8028) 2020-04-15 15:28:52 +08:00
Jan Blumenkamp
8e439688fc
Torch sequence_mask now works for tensors on different devices (#7980) 2020-04-15 07:21:51 +02:00
fangfengbin
c17404918c
[GCS]Add gcs table storage interface (#7949) 2020-04-15 10:48:12 +08:00
Philipp Moritz
b4656ca244
Fix dashboard profiling (#8013) 2020-04-14 08:30:16 -07:00
fangfengbin
026abb119c
fix GrpcServer out-of-bounds bug (#7995)
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-04-14 10:34:29 +08:00
Robert Nishihara
d985d7537e
Replace all instances of ray.readthedocs.io with ray.io (#7994) 2020-04-13 16:17:05 -07:00
Richard Liaw
e97adba6ac
[autoscaler] Improve argument handling for submit (#7986)
* docs

* Apply suggestions from code review

Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>

* ok

Co-authored-by: Kristian Hartikainen <kristian.hartikainen@gmail.com>
2020-04-13 15:53:42 -07:00
Richard Liaw
e68d601ec7
[docs] Add link master <-> latest via sphinx version warnings (#8010) 2020-04-13 15:21:08 -07:00
ZhuSenlin
4a81793ba5
GCS-Based actor management implementation (#6763)
* add gcs actor manager

* fix test_metrics.py

* fix TestTaskInfo

* fix comment

* fix comment

* fix comment

* fix comment

* fix comment

* fix comment

* fix compile error

* fix merge error

Co-authored-by: senlin.zsl <senlin.zsl@antfin.com>
2020-04-13 09:48:48 -07:00
mehrdadn
1b0f6fd558
Check AF_UNIX path length (#7951) 2020-04-13 09:30:01 -07:00
micafan
c222d64ca1
[GCS] Add MessagePublisher to GCS (#7771) 2020-04-13 19:32:28 +08:00
mehrdadn
7c52359b00
Fix Windows build (#7987)
Co-authored-by: Mehrdad <noreply@github.com>
2020-04-12 13:29:48 -07:00
Edward Oakes
2cb9cfb2b6
[serve] Make workers fault tolerant (#7970) 2020-04-12 11:48:08 -05:00
Qing Wang
98bfcd53bc
[Java] Rename group id and package name. (#7864)
* Initial

* Change streaming's

* Fix

* Fix

* Fix org_ray

* Fix cpp file name

* Fix streaming

* Fix

* Fix

* Fix testlistening

* Fix missing sth in python

* Fix

* Fix

* Fix SPI

* Fix

* Fix complation

* Fix

* Fix CI

* Fix checkstyle

Fix checkstyle

* Fix streaming tests

* Fix streaming CI

* Fix streaming checkstyle.

* Fix build

* Fix bazel dep

* Fix

* Fix ray checkstyle

* Fix streaming checkstyle

* Fix bazel checkstyle
2020-04-12 17:59:34 +08:00
mehrdadn
3061067039
Fix bug in java/test.sh (#7952)
Co-authored-by: Mehrdad <noreply@github.com>
2020-04-11 23:56:14 -07:00
mehrdadn
07002825aa
Proper command-line parsing (#7603)
* Command-line parsing functions

* Work around bug in MSVCRT for passing command-lines to programs

* Polishing

* Fix std::regex_replace() overload compatibility issue with GCC 4.8.x

* Try to work around linker error

* Implement ScanToken()

* Parse command-lines via ScanToken

* Merge src/ray/util.cc and src/ray/url.cc

Co-authored-by: Mehrdad <noreply@github.com>
2020-04-11 23:07:07 -07:00
Stephanie Wang
d7eef808b8
[core] Reconstruction for lost plasma objects (#7733)
* Add a lineage_ref_count to References

* Refactor TaskManager to store TaskEntry as a struct

* Refactor to fix deadlock between TaskManager and ReferenceCounter
Add references to task specs

* Pin TaskEntries and References in the lineage of any ObjectIDs in scope

* Fix deadlock, convert num_plasma_returns to a set of object IDs

* fix unit tests

* Feature flag

* Do not release lineage for objects that were promoted to plasma

* fix build

* fix build

* Remove num executions

* Remove num executions

* Add pinned locations to ReferenceCounter, empty handler for node death

* Fix num returns for actor tasks, fix Put return value

* Add regression test

* Clear pinned locations and callbacks on node removal

* Clear pinned locations and callbacks on node removal

* Simplify num return values

* Remove unused

* doc

* tmp

* Set num returns

* Move lineage pinning flag to ReferenceCounter

* comments

* Recover from plasma failures by pinning a new copy

* Basic object reconstruction, no concurrent reqs yet

* reconstruction test suite and a few fixes:
- fix for disabling lineage
- fix for updating submitted task refs

* Handle concurrent attempts to recover the same object

* Fix deadlock in DrainAndShutdown

* Revert "[core] Revert lineage pinning (#7499) (#7692)"

This reverts commit ba86a02b37.

* debug rllib

* debug rllib

* turn on all rllib tests again

* debug rllib

* Fix drain bug, check number of pending tasks

* revert rllib debug

* remove todo

* Trigger rllib tests

* revert rllib debug commit

* Split out logic into ObjectRecoveryManager

* Fix python tests

* Refactor to remove dependency on gcs client

* Unit tests

* Move pinned at node ID to direct memory store

* Unit test fixes and lint

* simplify and more tests

* Add ResubmitTask test for TaskManager

* Doc

* fix build

* comments

* Fix

* debug

* Update

* fix

* Fix

* Fix bad status handling, unit test

* Fix build
2020-04-11 16:52:57 -07:00
Stephanie Wang
18e9a076e5
[core] Cancel worker lease requests that are no longer needed (#7929)
* regression test

* Cancel lease requests

* unit tests

* update

* fix build

* Move unit test

* Set success

* Ref to shared_ptr

* debug

* Revert "debug"

This reverts commit 6b2c25805a8223b41ffcc2d88d903e16ea415089.

* Bad move

* Fix bad status handling
2020-04-11 16:51:32 -07:00
Richard Liaw
87e3c39b48
[tune] Ensure Cleanup (#7967) 2020-04-11 16:28:03 -07:00
Richard Liaw
dd63178e91
[sgd] Semantic Segmentation Example (#7825)
* better_example

* test

* improve some usability things

* submit

* fix

* making a segmentation example

* segmentation_example

* segmentation

* device

* flake

* Update python/ray/util/sgd/torch/training_operator.py

* uti

* finished_example

* block

* format

* locationg

* fix

* ok

* revert

* segmentation

* lint_and_test

* address_comments
2020-04-10 20:35:45 -07:00
mehrdadn
0b4e09da76
Log to terminal if glog is also doing so (#7868) 2020-04-10 18:41:21 -05:00
aannadi
9e31ee991a
[Dashboard] Configure Subset of Parameters/Metrics and show Err… (#7726)
* Subset and Errors

* fixup! Subset and Errors

* fixup! Subset and Errors

* fixup! Subset and Errors

* fixup! Subset and Errors

* fixup! Subset and Errors

* fixup! Subset and Errors

* fixup! Subset and Errors
2020-04-10 13:27:52 -07:00
mehrdadn
4aa68b82fa
[CI] Various Improvements to Travis Scripts (#7956)
* Delete LINT section of install-ray.sh since it appears unused

* Delete install.sh since it appears unused

* Delete run_test.sh since it appears unused

* Put environment variables on separate lines in .travis.yml

* Move --jobs 50 out of install-ray.sh

* Delete upgrade-syn.sh since it appears unused

* Move CI bazel flags to .bazelrc via --config

* Make installations quieter

* Get rid of verbose Maven messages

* Install Bazel system-wide for CI so that there's no need to update PATH

* Recognize Windows as valid platform

Co-authored-by: Mehrdad <noreply@github.com>
2020-04-10 13:26:28 -07:00
Edward Oakes
7be7af11ab
[serve] Push requests to workers instead of polling via dequeue_request (#7965) 2020-04-10 14:47:03 -05:00
Edward Oakes
d8f5b52265
[serve] Don't use mixin class for class-based backends (#7957) 2020-04-10 12:01:14 -05:00
Eric Liang
31b40b00f6
[rllib] Pull out experimental dsl into rllib.execution module, add initial unit tests (#7958) 2020-04-10 00:56:08 -07:00
Lingxuan Zuo
0d713e3eba
[Streaming] Try to trigger mock transfer tests ci (#7885)
* try to trigger mock transfer tests ci

* execute transfer tests

* show all logs when bazel test streaming

* temporary repeated ci runs

* Revert "temporary repeated ci runs"

This reverts commit dc77d2f9f79b5fa7b490221a8e9089e6349e067d.
2020-04-10 11:56:59 +08:00
marload
e3ffb8ac28
[tune] Refactoring: Deduplicate (#7918)
* refactoring: Deduplication

* refactoring: Deduplication

* refactoring: Deduplication

* refactoring: Deduplication

* lint fix: Variable naming case

* fix: Remove White Space

* fix_lint

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-04-09 20:19:04 -07:00
Edward Oakes
305eb74a86
[serve] Make HTTP proxy fault tolerant (#7936) 2020-04-09 17:07:22 -05:00
Sven Mika
0a5b6d1f57
[Testing] Do not run any non-RLlib/core tests if only RLLib affected (except wheels). (#7892)
* Do not run any non-RLlib/core tests if only RLLib affected, except for generating the 2 wheels (OSX and Linux).

* Test noop RLlib change.

* Test noop RLlib change.

* Fix broken RLlib tests in master.

* Split BAZEL learning tests into cartpole and pendulum (reached the 60min barrier).

* Fix error_outputs option in BAZEL for RLlib regression tests.

* Fix.

* Test.

* WIP.

* Add env flag RAY_CI_ONLY_RLLIB_AFFECTED to refrain from testing most ray-core stuff (except wheels) if only RLlib changed.

* Test RLlib-only change.
2020-04-09 14:36:06 -07:00
Sven Mika
1b31c11806
[RLlib] DDPG re-factor to fit into RLlib's functional algorithm builder API. (#7934) 2020-04-09 14:04:21 -07:00
Simon Mo
870271d51f
[Serve] Call serve.init in function handler (#7947) 2020-04-09 11:46:15 -07:00
Sven Mika
d2b5c171cb
[RLlib] Add pytorch sigils to toc and add links to algo overview table. (#7950)
* Add torch sigils to toc-tree for DQN/APEX.

* WIP.
2020-04-09 10:40:18 -07:00
Simon Mo
59867dad75
Move Jenkins test to Github action (#7342) 2020-04-09 10:27:19 -07:00
fangfengbin
061043229f
[GCS]Optimize gcs client testcases (#7895) 2020-04-09 12:30:58 +08:00