Simon Mo
5df9f07ff3
[CI] Use Docker image for microbenchmarks ( #12189 )
...
* [CI] Use Docker image for microbenchmarks
* Update cluster.yaml
2020-11-19 17:54:40 -08:00
dHannasch
4b2c5daf45
State which IP addresses are failing to match. ( #11957 )
...
* State which IP addresses are failing to match.
* Use f-string.
* action item?
* I could swear swear this passed with length 80 before
* wait, this is how it wants f-strings
* reword
* action item
* f
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
* f
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
* f
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
2020-11-19 17:25:25 -08:00
Eric Liang
e72abcd0aa
Enable even more new scheduler tests ( #12096 )
2020-11-19 16:47:18 -08:00
Eric Liang
dac09bd569
Fix actor_registry_ copied on each heartbeat; Improve receive object chunk debug messages ( #12187 )
2020-11-19 16:45:37 -08:00
Stephanie Wang
7bf5145d36
Lint plasma source files ( #12171 )
2020-11-19 19:08:18 -05:00
Eric Liang
dfc796b8ec
Add gdb stack dump command to docs ( #12147 )
...
* ZZ
* pid
* Update doc/source/profiling.rst
Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>
* Update profiling.rst
Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>
2020-11-19 19:02:11 -05:00
Eric Liang
67544992b5
Remove the old operator directory ( #12143 )
2020-11-19 15:37:28 -08:00
Raoul Khouri
d07ffc152b
[rllib] Rrk/12079 custom filters ( #12095 )
...
* travis reformatted
2020-11-19 13:20:20 -08:00
Kai Fricke
f1ace386db
[tune] detect docker and kubernetes syncers ( #12108 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-19 12:17:17 -08:00
Eric Liang
de86d5aff7
ActorStatisticalData() debug metrics bog down raylet with 100% CPU ( #12148 )
...
* comment out bad
* update
2020-11-19 11:38:44 -08:00
SangBin Cho
5fb410cfbf
[Dashboard] New dashboard view data doesn't exist. ( #12129 )
...
* Fix.
* Fix the issue.
2020-11-19 11:04:59 -08:00
SangBin Cho
7d67af6c2a
[Metrics] Add stats to measure process startup time + scheduling stats. ( #12100 )
...
* Add new stats.
* Fix issues.
2020-11-19 11:04:26 -08:00
Ian Rodney
7fcce785ed
[hotfix] Fix windows build ( #12146 )
...
* [hotfix] fix windows
* remove debug logs
2020-11-19 11:00:19 -08:00
dHannasch
96c1caccaf
Use Ubuntu 18.04 so that a newer version of Docker will be available by default. ( #12139 )
2020-11-19 10:40:07 -08:00
Kai Fricke
6999075c75
[tune] Add seed
parameter to BOHB ( #12160 )
2020-11-19 10:27:16 -08:00
Sven Mika
dab241dcc6
[RLlib] Fix inconsistency wrt batch size in SampleCollector (traj. view API). Makes DD-PPO work with traj. view API. ( #12063 )
2020-11-19 19:01:14 +01:00
Philipp Moritz
ff82af1588
Clean up requirements.txt ( #12136 )
2020-11-19 09:27:09 -08:00
Xianyang Liu
9481ecd180
[data] MLDataset based on ParallelIterator ( #11849 )
2020-11-19 00:33:37 -08:00
Barak Michener
2fe1321c3f
[ray_client] __getattr__ for the API Import interface ( #12089 )
...
* move all things that import real-ray into the server folder
* change the import line and have a __getattr__-able API stub
* formatting
* remove unused (duplicated) util file
* Remove module methods (but leave comment on why)
2020-11-18 22:42:02 -08:00
Ian Rodney
a74f1885db
Revert "[CLI] Fix ray commands when RAY_ADDRESS used ( #11989 )" ( #12135 )
...
* Revert "[CLI] Fix ray commands when RAY_ADDRESS used (#11989 )"
This reverts commit d23d326560
.
* only check environment for CLI commands
* use new fns
* fixing docs
* rename and return "auto"
* Update python/ray/_private/services.py
Co-authored-by: Eric Liang <ekhliang@gmail.com>
* Update services.py
* Update services.py
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-11-18 22:41:10 -08:00
dHannasch
5bc4976550
More informative error message if ray start fails to connect to Redis ( #11880 )
...
* Chain original redis.ConnectionError. More importantly, print out the address so people don't have to dig out --logging-level debug to get the number wait_for_redis_to_start() already knows.
Check the Redis password.
* f
2020-11-18 19:28:10 -08:00
Richard Liaw
0d388c4d31
[autoscaler] remove unnecessary print output ( #12131 )
2020-11-18 18:33:48 -08:00
Eric Liang
4490de356d
Fix issues in release process doc ( #12130 )
2020-11-18 18:32:27 -08:00
Richard Liaw
2bb6db5e64
[tune] temporary revert of verbosity changes ( #12132 )
2020-11-18 18:27:41 -08:00
Ameer Haj Ali
4717fcd9c0
[autoscaler] give max_workers precedence over min_workers in resource demand scheduler ( #12106 )
2020-11-18 16:24:48 -08:00
Ameer Haj Ali
d826452e0b
[autoscaler] fix max_workers bug in resource_demand_scheduler by counting the head node ( #12123 )
2020-11-18 15:24:38 -08:00
Ian Rodney
e086ddc18f
[core] Add Recursive task cancelation ( #11923 )
2020-11-18 15:18:40 -08:00
Ian Rodney
e2a147d5fb
[docs] Remove DL AMi reference ( #12120 )
2020-11-18 12:40:19 -08:00
Ian Rodney
8f2b447ba4
[docker pipeline] Base-Deps, Dataclasses & Releases ( #12119 )
2020-11-18 14:34:04 -06:00
Ian Rodney
b343db9ad5
[docker] Modify script to allow for arbitrary name changes ( #12092 )
2020-11-18 14:14:44 -06:00
Alex Wu
4b5769dab2
1.0.1 release logs ( #12127 )
...
Co-authored-by: Alex Wu <alex@anyscale.com>
2020-11-18 12:04:05 -08:00
Alex Wu
e9c9ba9c9f
[New Scheduler] Don't start tasks if the owner is dead ( #12050 )
2020-11-18 11:34:19 -08:00
Ameer Haj Ali
eef624750c
[ray client] ray wait() implementation ( #12072 )
2020-11-18 11:33:57 -08:00
Kai Fricke
2b60c5774b
[tune] cache checkpoint serialization ( #12064 )
2020-11-18 09:03:53 -08:00
Sven Mika
6da4342822
[RLlib] Add on_learn_on_batch (Policy) callback to DefaultCallbacks. ( #12070 )
2020-11-18 15:39:23 +01:00
dHannasch
b41f4fdec2
Extract the connection logic to reduce duplication. ( #12016 )
2020-11-18 00:12:58 -08:00
Ian Rodney
d23d326560
[CLI] Fix ray commands when RAY_ADDRESS used ( #11989 )
...
* [CLI] Fix ray commands when RAY_ADDRESS used
* erics suggestion
2020-11-17 23:44:59 -08:00
fangfengbin
d87af0da88
[PlacementGroup]Add gcs placement group manager debug info ( #12061 )
...
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2020-11-18 11:15:38 +08:00
Philipp Moritz
b96516e9d3
[core] Remove google dependency ( #12085 )
2020-11-17 19:01:00 -08:00
fangfengbin
f400333841
[Placement Group]Placement Group supports gcs failover(Part2) ( #12003 )
...
* add testcase
* fix ut
* fix review comment
* fix review comment
* fix review comments
* fix ut bug
* add part code
* add part code
* add part code
* add testcase
* add part code
* fix ut bug
* fix ut timeout bug
* fix ut bug
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2020-11-18 10:59:26 +08:00
Simon Mo
c476037c97
[Core] Async API should raise on all RayError ( #12043 )
...
Before this PR we are raising just RayTaskError, this means errors
like RayActorError(Actor Died) won't be propogated and thrown at
`await object_ref`. This PR fixes that.
2020-11-17 17:20:30 -08:00
Ameer Haj Ali
e8c018e8fc
[C++ API] tests for the C++ API. ( #12076 )
2020-11-17 17:07:52 -08:00
Stephanie Wang
f6bdd5ab17
[New Scheduler] Spillback from the queue of tasks assigned to the local node ( #12084 )
2020-11-17 16:13:59 -08:00
dHannasch
b5dfdb2a21
Log the Redis shard addresses as originally received from the head GCS. ( #12011 )
2020-11-17 13:11:17 -08:00
dHannasch
010e6cef3f
Allow setting the RAY_BACKEND_LOG_LEVEL to trace. ( #12012 )
2020-11-17 13:10:23 -08:00
dHannasch
f0dcf01807
Clarify that Ray is not yet retrying to connect. ( #12013 )
2020-11-17 13:01:42 -08:00
Richard Liaw
ca44222e03
[minor] log info instead of error upon ray.init rerun ( #12025 )
2020-11-17 12:59:24 -08:00
fangfengbin
7f050c706b
[PlacementGroup]Skip flaky testcase ( #12065 )
...
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2020-11-17 12:21:34 -08:00
Simon Mo
d7c95a4a90
[Serve] Rewrite Router to be Embeddable ( #12019 )
2020-11-17 08:28:18 -08:00
Maksim Smolin
23926f3e6e
[CLI] Docker Support ( #11761 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-17 00:04:39 -08:00