Commit graph

4453 commits

Author SHA1 Message Date
Jan Blumenkamp
8e439688fc
Torch sequence_mask now works for tensors on different devices (#7980) 2020-04-15 07:21:51 +02:00
fangfengbin
c17404918c
[GCS]Add gcs table storage interface (#7949) 2020-04-15 10:48:12 +08:00
Philipp Moritz
b4656ca244
Fix dashboard profiling (#8013) 2020-04-14 08:30:16 -07:00
fangfengbin
026abb119c
fix GrpcServer out-of-bounds bug (#7995)
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-04-14 10:34:29 +08:00
Robert Nishihara
d985d7537e
Replace all instances of ray.readthedocs.io with ray.io (#7994) 2020-04-13 16:17:05 -07:00
Richard Liaw
e97adba6ac
[autoscaler] Improve argument handling for submit (#7986)
* docs

* Apply suggestions from code review

Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>

* ok

Co-authored-by: Kristian Hartikainen <kristian.hartikainen@gmail.com>
2020-04-13 15:53:42 -07:00
Richard Liaw
e68d601ec7
[docs] Add link master <-> latest via sphinx version warnings (#8010) 2020-04-13 15:21:08 -07:00
ZhuSenlin
4a81793ba5
GCS-Based actor management implementation (#6763)
* add gcs actor manager

* fix test_metrics.py

* fix TestTaskInfo

* fix comment

* fix comment

* fix comment

* fix comment

* fix comment

* fix comment

* fix compile error

* fix merge error

Co-authored-by: senlin.zsl <senlin.zsl@antfin.com>
2020-04-13 09:48:48 -07:00
mehrdadn
1b0f6fd558
Check AF_UNIX path length (#7951) 2020-04-13 09:30:01 -07:00
micafan
c222d64ca1
[GCS] Add MessagePublisher to GCS (#7771) 2020-04-13 19:32:28 +08:00
mehrdadn
7c52359b00
Fix Windows build (#7987)
Co-authored-by: Mehrdad <noreply@github.com>
2020-04-12 13:29:48 -07:00
Edward Oakes
2cb9cfb2b6
[serve] Make workers fault tolerant (#7970) 2020-04-12 11:48:08 -05:00
Qing Wang
98bfcd53bc
[Java] Rename group id and package name. (#7864)
* Initial

* Change streaming's

* Fix

* Fix

* Fix org_ray

* Fix cpp file name

* Fix streaming

* Fix

* Fix

* Fix testlistening

* Fix missing sth in python

* Fix

* Fix

* Fix SPI

* Fix

* Fix complation

* Fix

* Fix CI

* Fix checkstyle

Fix checkstyle

* Fix streaming tests

* Fix streaming CI

* Fix streaming checkstyle.

* Fix build

* Fix bazel dep

* Fix

* Fix ray checkstyle

* Fix streaming checkstyle

* Fix bazel checkstyle
2020-04-12 17:59:34 +08:00
mehrdadn
3061067039
Fix bug in java/test.sh (#7952)
Co-authored-by: Mehrdad <noreply@github.com>
2020-04-11 23:56:14 -07:00
mehrdadn
07002825aa
Proper command-line parsing (#7603)
* Command-line parsing functions

* Work around bug in MSVCRT for passing command-lines to programs

* Polishing

* Fix std::regex_replace() overload compatibility issue with GCC 4.8.x

* Try to work around linker error

* Implement ScanToken()

* Parse command-lines via ScanToken

* Merge src/ray/util.cc and src/ray/url.cc

Co-authored-by: Mehrdad <noreply@github.com>
2020-04-11 23:07:07 -07:00
Stephanie Wang
d7eef808b8
[core] Reconstruction for lost plasma objects (#7733)
* Add a lineage_ref_count to References

* Refactor TaskManager to store TaskEntry as a struct

* Refactor to fix deadlock between TaskManager and ReferenceCounter
Add references to task specs

* Pin TaskEntries and References in the lineage of any ObjectIDs in scope

* Fix deadlock, convert num_plasma_returns to a set of object IDs

* fix unit tests

* Feature flag

* Do not release lineage for objects that were promoted to plasma

* fix build

* fix build

* Remove num executions

* Remove num executions

* Add pinned locations to ReferenceCounter, empty handler for node death

* Fix num returns for actor tasks, fix Put return value

* Add regression test

* Clear pinned locations and callbacks on node removal

* Clear pinned locations and callbacks on node removal

* Simplify num return values

* Remove unused

* doc

* tmp

* Set num returns

* Move lineage pinning flag to ReferenceCounter

* comments

* Recover from plasma failures by pinning a new copy

* Basic object reconstruction, no concurrent reqs yet

* reconstruction test suite and a few fixes:
- fix for disabling lineage
- fix for updating submitted task refs

* Handle concurrent attempts to recover the same object

* Fix deadlock in DrainAndShutdown

* Revert "[core] Revert lineage pinning (#7499) (#7692)"

This reverts commit ba86a02b37.

* debug rllib

* debug rllib

* turn on all rllib tests again

* debug rllib

* Fix drain bug, check number of pending tasks

* revert rllib debug

* remove todo

* Trigger rllib tests

* revert rllib debug commit

* Split out logic into ObjectRecoveryManager

* Fix python tests

* Refactor to remove dependency on gcs client

* Unit tests

* Move pinned at node ID to direct memory store

* Unit test fixes and lint

* simplify and more tests

* Add ResubmitTask test for TaskManager

* Doc

* fix build

* comments

* Fix

* debug

* Update

* fix

* Fix

* Fix bad status handling, unit test

* Fix build
2020-04-11 16:52:57 -07:00
Stephanie Wang
18e9a076e5
[core] Cancel worker lease requests that are no longer needed (#7929)
* regression test

* Cancel lease requests

* unit tests

* update

* fix build

* Move unit test

* Set success

* Ref to shared_ptr

* debug

* Revert "debug"

This reverts commit 6b2c25805a8223b41ffcc2d88d903e16ea415089.

* Bad move

* Fix bad status handling
2020-04-11 16:51:32 -07:00
Richard Liaw
87e3c39b48
[tune] Ensure Cleanup (#7967) 2020-04-11 16:28:03 -07:00
Richard Liaw
dd63178e91
[sgd] Semantic Segmentation Example (#7825)
* better_example

* test

* improve some usability things

* submit

* fix

* making a segmentation example

* segmentation_example

* segmentation

* device

* flake

* Update python/ray/util/sgd/torch/training_operator.py

* uti

* finished_example

* block

* format

* locationg

* fix

* ok

* revert

* segmentation

* lint_and_test

* address_comments
2020-04-10 20:35:45 -07:00
mehrdadn
0b4e09da76
Log to terminal if glog is also doing so (#7868) 2020-04-10 18:41:21 -05:00
aannadi
9e31ee991a
[Dashboard] Configure Subset of Parameters/Metrics and show Err… (#7726)
* Subset and Errors

* fixup! Subset and Errors

* fixup! Subset and Errors

* fixup! Subset and Errors

* fixup! Subset and Errors

* fixup! Subset and Errors

* fixup! Subset and Errors

* fixup! Subset and Errors
2020-04-10 13:27:52 -07:00
mehrdadn
4aa68b82fa
[CI] Various Improvements to Travis Scripts (#7956)
* Delete LINT section of install-ray.sh since it appears unused

* Delete install.sh since it appears unused

* Delete run_test.sh since it appears unused

* Put environment variables on separate lines in .travis.yml

* Move --jobs 50 out of install-ray.sh

* Delete upgrade-syn.sh since it appears unused

* Move CI bazel flags to .bazelrc via --config

* Make installations quieter

* Get rid of verbose Maven messages

* Install Bazel system-wide for CI so that there's no need to update PATH

* Recognize Windows as valid platform

Co-authored-by: Mehrdad <noreply@github.com>
2020-04-10 13:26:28 -07:00
Edward Oakes
7be7af11ab
[serve] Push requests to workers instead of polling via dequeue_request (#7965) 2020-04-10 14:47:03 -05:00
Edward Oakes
d8f5b52265
[serve] Don't use mixin class for class-based backends (#7957) 2020-04-10 12:01:14 -05:00
Eric Liang
31b40b00f6
[rllib] Pull out experimental dsl into rllib.execution module, add initial unit tests (#7958) 2020-04-10 00:56:08 -07:00
Lingxuan Zuo
0d713e3eba
[Streaming] Try to trigger mock transfer tests ci (#7885)
* try to trigger mock transfer tests ci

* execute transfer tests

* show all logs when bazel test streaming

* temporary repeated ci runs

* Revert "temporary repeated ci runs"

This reverts commit dc77d2f9f79b5fa7b490221a8e9089e6349e067d.
2020-04-10 11:56:59 +08:00
marload
e3ffb8ac28
[tune] Refactoring: Deduplicate (#7918)
* refactoring: Deduplication

* refactoring: Deduplication

* refactoring: Deduplication

* refactoring: Deduplication

* lint fix: Variable naming case

* fix: Remove White Space

* fix_lint

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-04-09 20:19:04 -07:00
Edward Oakes
305eb74a86
[serve] Make HTTP proxy fault tolerant (#7936) 2020-04-09 17:07:22 -05:00
Sven Mika
0a5b6d1f57
[Testing] Do not run any non-RLlib/core tests if only RLLib affected (except wheels). (#7892)
* Do not run any non-RLlib/core tests if only RLLib affected, except for generating the 2 wheels (OSX and Linux).

* Test noop RLlib change.

* Test noop RLlib change.

* Fix broken RLlib tests in master.

* Split BAZEL learning tests into cartpole and pendulum (reached the 60min barrier).

* Fix error_outputs option in BAZEL for RLlib regression tests.

* Fix.

* Test.

* WIP.

* Add env flag RAY_CI_ONLY_RLLIB_AFFECTED to refrain from testing most ray-core stuff (except wheels) if only RLlib changed.

* Test RLlib-only change.
2020-04-09 14:36:06 -07:00
Sven Mika
1b31c11806
[RLlib] DDPG re-factor to fit into RLlib's functional algorithm builder API. (#7934) 2020-04-09 14:04:21 -07:00
Simon Mo
870271d51f
[Serve] Call serve.init in function handler (#7947) 2020-04-09 11:46:15 -07:00
Sven Mika
d2b5c171cb
[RLlib] Add pytorch sigils to toc and add links to algo overview table. (#7950)
* Add torch sigils to toc-tree for DQN/APEX.

* WIP.
2020-04-09 10:40:18 -07:00
Simon Mo
59867dad75
Move Jenkins test to Github action (#7342) 2020-04-09 10:27:19 -07:00
fangfengbin
061043229f
[GCS]Optimize gcs client testcases (#7895) 2020-04-09 12:30:58 +08:00
Tianyi Chen
c5bf9cc472
[streaming] Sync changes for graph part. (#7827) 2020-04-09 12:30:44 +08:00
David Chan
6521e92a95
[RaySGD] Honor the use_gpu flag (#7942) 2020-04-08 20:20:09 -07:00
ijrsvt
44825d81e9
Change Proctitle to IDLE after an Error (#7863) 2020-04-08 11:33:43 -07:00
acxz
8f94f9c372
[arch linux] add package installation instructions (#7898) 2020-04-08 11:13:42 -07:00
fyrestone
fc6259a656
Cross language serialization for primitive types (#7711)
* Cross language serialization for Java and Python

* Use strict types when Python serializing

* Handle recursive objects in Python; Pin msgpack >= 0.6.0, < 1.0.0

* Disable gc for optimizing msgpack loads

* Fix merge bug

* Java call Python use returnType; Fix ClassLoaderTest

* Fix RayMethodsTest

* Fix checkstyle

* Fix lint

* prepare_args raises exception if try to transfer a non-deserializable object to another language

* Fix CrossLanguageInvocationTest.java, Python msgpack treat float as double

* Minor fixes

* Fix compile error on linux

* Fix lint in java/BUILD.bazel

* Fix test_failure

* Fix lint

* Class<?> to Class<T>; Refine metadata bytes.

* Rename FST to Fst; sort java dependencies

* Change Class<?>[] to Optional<Class<?>>; sort requirements in setup.py

* Improve CrossLanguageInvocationTest

* Refactor MessagePackSerializer.java

* Refactor MessagePackSerializer.java; Refine CrossLanguageInvocationTest.java

* Remove unnecessary dependencies for Java; Add getReturnType() for RayFunction in Java

* Fix bug

* Remove custom cross language type support

* Replace Serializer.Meta with MutableBoolean

* Remove @SuppressWarnings support from checkstyle.xml; Add null test in CrossLanguageInvocationTest.java

* Refine MessagePackSerializer.pack

* Ray.get support RayObject as input

* Improve comments and error info

* Remove classLoader argument from serializer

* Separate msgpack from pickle5 in Python

* Pair<byte[], MutableBoolean> to Pair<byte[], Boolean>

* Remove public static <T> T get(RayObject<T> object), use RayObject.get() instead

* Refine test

* small fixes

Co-authored-by: 刘宝 <po.lb@antfin.com>
Co-authored-by: Hao Chen <chenh1024@gmail.com>
2020-04-08 21:10:57 +08:00
Eric Liang
e8c19aba41
[rllib] Add test case that we don't have a hard torch dependency (#7926) 2020-04-07 18:07:39 -07:00
Edward Oakes
85481d635d
[serve] Call serve.init() before initializing backends (#7922) 2020-04-07 17:22:52 -05:00
Edward Oakes
1be87c7fbb
[serve] Remove global state, instead access the master actor directly (#7914)
* Move _scale() to master actor

* move create_backend

* Move set_backend_config

* Move get_backend_config

* Remove backend_table from global_state

* Remove global_state, just access master directly

* Remove accidental addition
2020-04-07 15:21:40 -05:00
Sven Mika
81314143eb
[RLlib] Use framework_iterator (add torch/eager/tf) to PPO and PG tests. (#7915) 2020-04-07 12:40:34 -07:00
Edward Oakes
d3c310f408
[serve] Only access backend_table in master actor (#7913) 2020-04-07 10:12:39 -05:00
Kai Yang
48b48cc8c2
Support multiple core workers in one process (#7623) 2020-04-07 11:01:47 +08:00
micafan
e91595f955
[GCS] Add ObjectLocator to gcs server (#7557) 2020-04-07 10:37:24 +08:00
Sven Mika
c2cb5c2214
[RLlib] MARWIL torch. (#7836)
* WIP.

* WIP.

* LINT.

* Fix MARWIL so it can run with eager-mode.

* LINT.
2020-04-06 16:38:50 -07:00
Ion
9f6cbf168e
New scheduler local node (#7899) 2020-04-06 14:43:42 -05:00
Richard Liaw
a67edc4051
[tune] Improve user guides and API docs (#7716)
* create guide gallery for Tune

* mods

* ok

* fix

* fix_up_gallery

* ok

* Apply suggestions from code review

Co-Authored-By: Sven Mika <sven@anyscale.io>

* Apply suggestions from code review

Co-Authored-By: Sven Mika <sven@anyscale.io>

Co-authored-by: Sven Mika <sven@anyscale.io>
2020-04-06 12:16:35 -07:00
Sven Mika
22ccc43670
[RLlib] DQN torch version. (#7597)
* Fix.

* Rollback.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* Fix.

* Fix.

* Fix.

* Fix.

* Fix.

* WIP.

* WIP.

* Fix.

* Test case fixes.

* Test case fixes and LINT.

* Test case fixes and LINT.

* Rollback.

* WIP.

* WIP.

* Test case fixes.

* Fix.

* Fix.

* Fix.

* Add regression test for DQN w/ param noise.

* Fixes and LINT.

* Fixes and LINT.

* Fixes and LINT.

* Fixes and LINT.

* Fixes and LINT.

* Comment

* Regression test case.

* WIP.

* WIP.

* LINT.

* LINT.

* WIP.

* Fix.

* Fix.

* Fix.

* LINT.

* Fix (SAC does currently not support eager).

* Fix.

* WIP.

* LINT.

* Update rllib/evaluation/sampler.py

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

* Update rllib/evaluation/sampler.py

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

* Update rllib/utils/exploration/exploration.py

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

* Update rllib/utils/exploration/exploration.py

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

* WIP.

* WIP.

* Fix.

* LINT.

* LINT.

* Fix and LINT.

* WIP.

* WIP.

* WIP.

* WIP.

* Fix.

* LINT.

* Fix.

* Fix and LINT.

* Update rllib/utils/exploration/exploration.py

* Update rllib/policy/dynamic_tf_policy.py

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

* Update rllib/policy/dynamic_tf_policy.py

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

* Update rllib/policy/dynamic_tf_policy.py

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

* Fixes.

* WIP.

* LINT.

* Fixes and LINT.

* LINT and fixes.

* LINT.

* Move action_dist back into torch extra_action_out_fn and LINT.

* Working SimpleQ learning cartpole on both torch AND tf.

* Working Rainbow learning cartpole on tf.

* Working Rainbow learning cartpole on tf.

* WIP.

* LINT.

* LINT.

* Update docs and add torch to APEX test.

* LINT.

* Fix.

* LINT.

* Fix.

* Fix.

* Fix and docstrings.

* Fix broken RLlib tests in master.

* Split BAZEL learning tests into cartpole and pendulum (reached the 60min barrier).

* Fix error_outputs option in BAZEL for RLlib regression tests.

* Fix.

* Tune param-noise tests.

* LINT.

* Fix.

* Fix.

* test

* test

* test

* Fix.

* Fix.

* WIP.

* WIP.

* WIP.

* WIP.

* LINT.

* WIP.

Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-04-06 11:56:16 -07:00