1
0
Fork 0
mirror of https://github.com/vale981/ray synced 2025-03-22 02:46:40 -04:00
Commit graph

6730 commits

Author SHA1 Message Date
Seung Hyeon, Kim
ee49f4a875
[tune] Fix an example for _Brackets of async hyperband scheduler () 2020-03-18 19:06:32 -07:00
Stephanie Wang
35a4bfc885
[core] Fix leak for subscribing to object dependencies in NodeManager ()
* Fix GetDependencies

* lint
2020-03-18 11:01:29 -07:00
Richard Liaw
ea10cd212c
[tune] add accessible trial_info ()
* add accessible trial_info

* trial name and info

* doc

* fix
gp

* Update doc/source/tune-package-ref.rst

* Apply suggestions from code review

* fix

* trial

* fixtest

* testfix
2020-03-17 23:44:18 -07:00
Eric Liang
745b9d643d
First pass at ray memory command for memory debugging () 2020-03-17 20:45:07 -07:00
Landcold7
e6a045df48
Fix typo in asyncio documentation () 2020-03-17 10:37:37 -05:00
Edward Oakes
c1b0f9ccdf
Add failure tests to test_reference_counting () 2020-03-17 10:30:21 -05:00
Hao Chen
7678418210
[Java] Fix the issue that the cached value in RayObject is serialized () 2020-03-17 22:07:41 +08:00
Kai Yang
6b888b0247
[Java] Make both RayActor and RayPyActor inheriting from BaseActor () 2020-03-17 21:45:56 +08:00
ZhuSenlin
dfa5d9b8e9
bug fix about useage of absl::flat_hash_map::erase and absl::flat_hash_set::erase ()
Co-authored-by: senlin.zsl <senlin.zsl@antfin.com>
2020-03-17 19:39:56 +08:00
fyrestone
7697ea2be2
Java call Python actor method use actor.call () 2020-03-17 14:52:43 +08:00
ZhuSenlin
ffa9df4683
bugfix about test_dynres.py ()
Co-authored-by: senlin.zsl <senlin.zsl@antfin.com>
2020-03-17 13:58:44 +08:00
Simon Mo
ce0885a897
[Serve] UI Improvements () 2020-03-16 22:23:16 -07:00
mehrdadn
a0700e2f86
Change /tmp to platform-specific temporary directory () 2020-03-16 18:10:14 -07:00
Eric Liang
797e6cfc2a
[rllib][tune] fix some nans () 2020-03-16 11:19:58 -07:00
ijrsvt
46953c53b1
Cleanup Plasma Async Callback () 2020-03-16 10:12:44 -07:00
Simon Mo
45ce40e5d4
Disable Travis Disk Cache ()
There are some file sizes and memory issue with bazel disk cache
we will disable the cache and use remote cache exclusively for now
2020-03-16 00:18:01 -07:00
Scott Graham
37e4d29f87
[autoscaler] Adding Azure Support ()
* adding directory and node_provider entry for azure autoscaler

* adding initial cut at azure autoscaler functionality, needs testing and node_provider methods need updating

* adding todos and switching to auth file for service principal authentication

* adding role / scope to service principal

* resolving issues with app credentials

* adding retry for setting service principal role

* typo and adding retry to nic creation

* adding nsg to config, moving nic/public ip to node provider, cleanup node_provider, leaving in NodeProvider stub for testing

* linting

* updating cleanup and fixing bugs

* adding directory and node_provider entry for azure autoscaler

* adding initial cut at azure autoscaler functionality, needs testing and node_provider methods need updating

* adding todos and switching to auth file for service principal authentication

* adding role / scope to service principal

* resolving issues with app credentials

* adding retry for setting service principal role

* typo and adding retry to nic creation

* adding nsg to config, moving nic/public ip to node provider, cleanup node_provider, leaving in NodeProvider stub for testing

* linting

* updating cleanup and fixing bugs

* minor fixes

* first working version :)

* added tag support

* added msi identity intermediate

* enable MSI through user managed identity

* updated schema

* extend yaml schema
remove service principal code
add re-use of managed user identity

* fix rg_id

* fix logging

* replace manual cluster yaml validation with json schema
- improved error message
- support for intellisense in VSCode (or other IDEs)

* run linting

* updating yaml configs and formatting

* updating yaml configs and formatting

* typo in example config

* pulling default config from example-full

* resetting min, init worker prop

* adding docs for azure autoscaler and fixing status

* add azure to docs, fix config for spot instances, update azure provider to avoid caching issues during deployment

* fix for default subscription in azure node provider

* vm dev image build

* minor change

* keeping example-full.yaml in autoscaler/azure, updating azure example config

* linting azure config

* extending retries on azure config

* lint

* support for internal ips, fix to azure docs, and new azure gpu example config

* linting

* Update python/ray/autoscaler/azure/node_provider.py

Co-Authored-By: Richard Liaw <rliaw@berkeley.edu>

* revert_this

* remove_schema

* updating configs and removing ssh keygen, tweak azure node provider terminate

* minor tweaks

Co-authored-by: Markus Cozowicz <marcozo@microsoft.com>
Co-authored-by: Ubuntu <marcozo@mc-ray-jumpbox.chcbtljllnieveqhw3e4c1ducc.xx.internal.cloudapp.net>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-03-15 14:48:27 -07:00
Simon Mo
3f1fcaa024
Blocking ray.get/wait inside async context will warn instead of error () 2020-03-14 22:02:30 -07:00
fangfengbin
6b37be9677
[GCS]Add job id when operating gcs table () 2020-03-15 12:04:04 +08:00
Kai Yang
630e48967d
[Java] Allow passing internal config from raylet to Java worker () 2020-03-15 12:03:38 +08:00
mehrdadn
a87199d240
Fix cyclic dependency between ray/util and ray/common ()
* Fix cyclic dependency

Headers in ray/util should not depend on those in ray/common

* Move random generations to ray/common/test_util.h

* Add license header

Co-authored-by: Mehrdad <noreply@github.com>
Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>
2020-03-14 12:44:53 -07:00
tison
ffeab5d2bf
Support configurable python executable in format.sh () 2020-03-14 12:27:41 -07:00
Eric Liang
dd70720578
[rllib] Rename sample_batch_size => rollout_fragment_length ()
* bulk rename

* deprecation warn

* update doc

* update fig

* line length

* rename

* make pytest comptaible

* fix test

* fi sys

* rename

* wip

* fix more

* lint

* update svg

* comments

* lint

* fix use of batch steps
2020-03-14 12:05:04 -07:00
Stephanie Wang
53549314c5
[core] Option to fallback to LRU on OutOfMemory ()
* Add a test for LRU fallback

* Update error message

* Upgrade arrow to master

* Integrate with arrow

* Revert "Bazel mirrors ()"

This reverts commit 44aded5272.

* Don't LRU evict

* Revert "Revert "Bazel mirrors ()""

This reverts commit b6359fea78d1bd3925452ca88ac71e0c9e5c7dd3.

* Add lru_evict flag

* fix internal config

* Fix

* upgrade arrow

* debug

* Set free period in config for lru_evict, override max retries to fix
test

* Fix test?

* fix test

* Revert "debug"

This reverts commit 98f01c63a267f38218f5047b1866e4c1c8280017.

* fix exception str

* Fix ref count test

* Shorten travis test?
2020-03-14 11:28:43 -07:00
Eric Liang
52cf77f5a9
[rllib] SAC no_done_at_end should default to False ()
* update

* update doc

* stochastic

* cleanu
2020-03-14 11:16:54 -07:00
Eric Liang
c3a8ba399f
[rllib] Enable distributed exec api for A2C, A3C, PG by default () 2020-03-13 18:48:41 -07:00
Anthony Yu
094125cf03
[tune] Dragonfly integration ask tell nit ()
* Add sample example

* Copy relevant lines of ask from inherited Optimizer

* Ignore strategy

* Additional changes

* Add DragonflySearch for tune connector for Dragonfly

* Add example and fix small errors

* lint

* Remove skopt references

* Update example based off of Dragonfly changes

* Edit example for final Dragonfly edits

* Formatting and documentation edits

* Add documentation and add to test pipeline

* Address PR comments

* Fix Jenkins test

* Adjust Dragonfly to PR#7366

* Lint

* fix_tests

* Minor changes to ordering

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-03-13 15:27:03 -07:00
Qing Wang
d6365c2586
[Java] Enable stress test. () 2020-03-13 21:02:13 +08:00
Kai Yang
d6e8f47065
Add a flag to disable reconstruction for a killed actor () 2020-03-13 19:10:21 +08:00
Qing Wang
575c89cf47
[Java] Pass large object by reference () 2020-03-13 18:38:03 +08:00
Sven Mika
552cfb37ea
[RLlib] Fix bugs and speed up SegmentTree 2020-03-13 01:03:07 -07:00
Ujval Misra
6022eb53c4
[tune] Use newest checkpoint in normal operation ()
* Use persistent checkpoint for failures

* Fix test

* Add unpause test

* move test

* Fix tests

* remove debug statement

* Mark test as flaky
2020-03-12 22:21:42 -07:00
Qing Wang
f4656d8cc3
[Java] Enable direct call by default. ()
* WIP

* Address comments.

* Linting

* Fix

* Fix

* Fix test

* Fix

* Fix single process ci

* Fix ut

* Update java/test/src/main/java/org/ray/api/test/PlasmaFreeTest.java

* Address comments

* Fix linting

* Minor update comments.

* Fix streaming CI
2020-03-13 12:25:30 +08:00
Tianyi Chen
6993a471f1
[Streaming] Move resource-manager and scheduler to master package. () 2020-03-13 12:24:37 +08:00
micafan
cc91ed57dc
[core] Fix losing task state when giving up forward task. ()
* fix NodeManager::Forward task bug on error

* fix lint

* revert spillback task forward
2020-03-13 11:49:44 +08:00
Edward Oakes
768d0b3b3f
Allocate a buffer of 100 calls for each RPC handler () 2020-03-12 12:05:30 -07:00
Sven Mika
f165766813
[RLlib] Bug: If trainer config horizon is provided, should try to increase env steps to that value. () 2020-03-12 11:03:37 -07:00
Sven Mika
80d314ae5e
[RLlib] Add all agents to rllib rollout tests. () 2020-03-12 11:02:51 -07:00
ZhuSenlin
b663bc6d67
Use gcs server to replace raylet monitor when RAY_GCS_SERVICE_ENABLED=true () 2020-03-12 22:13:56 +08:00
fangfengbin
428fb79b27
Fix streaming compile bug ()
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-03-12 17:26:45 +08:00
Eric Liang
f5d12a958b
[rllib] Port Ape-X to distributed execution API () 2020-03-12 00:54:08 -07:00
fangfengbin
4c834b9d68
Fix the issue that gcs service client ignores error status code ()
* add gcs reply status

* rebase master

* use macro to simplify

* convert status in gcs rpc client

* define a Status message in probobuf

Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-03-12 15:08:29 +08:00
Sven Mika
20ef4a8603
[RLlib] Cleanup/unify all test cases. () 2020-03-11 20:39:47 -07:00
Sven Mika
dded5b6d22
[RLlib] ES env_config is not a EnvContext object (e.g. does not contain worker_index). () 2020-03-11 20:33:20 -07:00
Sven Mika
bc120730e5
[RLlib] PPO(torch) on CartPole not tuned well enough for consistent learning () 2020-03-11 20:31:27 -07:00
Kai Yang
932a749fa9
Fix the java_worker_options parameter ()
* fix Java CI

* Minor fix

* move json.loads out of build_java_worker_command

* lint

* fix cross language test
2020-03-12 10:44:23 +08:00
Markus Cozowicz
ba1b081477
Azure Portal cluster deployment | Support spot instances ()
* added priority option

* added head node priority

* upgrade api version
2020-03-11 18:46:11 -07:00
Simon Mo
31d63d3ca7
Fix global state actors() call () 2020-03-11 16:59:50 -07:00
Richard Liaw
b38ed4be71
[raysgd] Fix More Docs () 2020-03-11 14:17:47 -07:00
Richard Liaw
d046faeb9c
[sgd] Readme fix ()
* readme fix

* replicas
2020-03-11 13:40:18 -07:00