Commit graph

8043 commits

Author SHA1 Message Date
Simon Mo
7b79e0ed4e
[CI] Mark test_actor_pool medium (#15490) 2021-04-23 15:35:22 -07:00
architkulkarni
b08b2c5103
[Core] Add "shim process" setup_worker.py that calls "conda activate" for runtime_env (#15361) 2021-04-23 15:29:52 -05:00
Edward Oakes
ab797d1d62
[serve] Add test for redirections w/ fastapi (#15461) 2021-04-23 14:28:42 -05:00
Eric Liang
93a1ecba4b
Unhandled error messages aren't printed until next interaction with shell (#15432) 2021-04-23 11:00:34 -07:00
Simon Mo
951943c28b
[Core] Add concurrent.futures.Future wrapper for ObjectRef (#15425) 2021-04-23 11:53:46 -05:00
Dmitri Gekhtman
6b0673f207
[doc][Kubernetes][minor] Restructure section labels for operator launch (#14962) 2021-04-23 09:50:58 -07:00
Charles Tapley Hoyt
251558b753
[tune] Fix type annotation in choice (#15038)
The `Categorical.__init__()` takes any sequence, so the type annotation on `choice()` can be relaxed.
2021-04-23 09:45:32 -07:00
Dmitri Gekhtman
fd43e9e6f8
[kubernetes][doc][minor] Add namespace to job creation command (#15442) 2021-04-23 09:44:51 -07:00
Ian Rodney
cc4a610e6a
[doc] Update requirements-rtd.txt (#15485) 2021-04-23 09:35:47 -07:00
Kai Fricke
c08373b0bf
[tune] Add save/restore for ASHA scheduler (#15438) 2021-04-23 09:35:34 -07:00
Sumanth Ratna
ab542f2c45
[tune] Add HEBO to search algorithm shim function (#15468) 2021-04-23 00:17:48 -07:00
fangfengbin
d9780761a3
[GCS]Revert ping_gcs_rpc_server_max_retries to 600 (#14443) 2021-04-23 10:02:38 +08:00
Dmitri Gekhtman
0d0c2418b8
[client][placement groups] Client placement group hooks, attempt #3 (#15382) 2021-04-22 17:18:55 -07:00
Eric Liang
af01a47d59
Add support for tune,serve,rllib tests to flaky builder (#15447) 2021-04-22 15:03:29 -07:00
Micah Yong
53774209cf
[core] Extend ActorPool API to support adding / removing actors (#15228)
* Add has_free, push, and pop to actor_pool.py with corresponding tests

* Remove period

* Change name from pop to pop_idle
2021-04-22 12:45:45 -07:00
Edward Oakes
17865c0569
Remove ray.workers from __init__.py (#15460) 2021-04-22 14:20:04 -05:00
Edward Oakes
668a784553
[serve] Re-add variable route support for old API (#15455) 2021-04-22 14:07:50 -05:00
Simon Mo
79c24146bd
[Hotfix] Upload the flaky test log (#15458) 2021-04-22 10:32:27 -07:00
Sven Mika
b9761d7081
[RLlib] Discussion 1759: SampleBatch._get_slice_indices stuck for R2D2 when using incorrect Trainer. (#15451)
Thanks @Manuscrit for raising this issue!
2021-04-22 19:21:03 +02:00
Sven Mika
7e1a191f17
[RLlib] Remove all remaining tf- and MuJoCo warnings from RLlib. (#15454) 2021-04-22 19:20:19 +02:00
Simon Mo
baa1b0f360
[Serve] FastAPI allow duplicated routes in class based views (#15445) 2021-04-22 11:27:39 -05:00
Sven Mika
bdda73e2dd
[RLlib] Torch multi-GPU bug fixes (discussion 1755). (#15421)
Thanks a lot @Bam4d for raising this and your help on fixing the worker GPU issue for torch!
2021-04-22 11:29:42 +02:00
Sven Mika
7318439c3d
[RLlib] DQN native_ratio (for training intensity) incorrect (discussion 1763). (#15436)
Thanks @Manuscrit !
2021-04-22 11:06:29 +02:00
Jialing He
5403021430
Fix incorrect call function WorkerID::FromBinary (#15449) 2021-04-22 15:44:49 +08:00
Ian Rodney
810a02b3f2
[Azure][Autoscaler] Allow current user to use Docker (#15380) 2021-04-22 00:30:30 -07:00
Ameer Haj Ali
978199ceba
[autoscaler] Update azure pip packages in the cluster yaml (#15274) 2021-04-22 08:23:05 +03:00
Alex Wu
ede377bc26
ray health-check (#15429)
* .

* done?

* .

* .

* less yelling

* fixed?

* lint

* skip on windows'

* remove extra print

Co-authored-by: Alex Wu <alex@anyscale.com>
2021-04-21 21:49:55 -07:00
Edward Oakes
71a670c471
[serve] Make fastapi wrapper a normal serve backend (#15441) 2021-04-21 16:06:33 -05:00
Yi Cheng
0fa6bae104
[dev] Enable gitpod (#15420) 2021-04-21 13:26:46 -07:00
Yi Cheng
b63e493c04
[runtime_env] Fix the some bugs related with runtime_env (#15286) 2021-04-21 13:31:21 -05:00
lanlin
c7f6ffb70c
[Tune] Fix max len trial name (#15293)
* check TUNE_MAX_LEN_IDENTIFIER when use it

* fix format
2021-04-21 10:48:24 -07:00
Fabien Couthouis
fe06642df0
[RLlib] Report mean losses instead of sum in IMPALA (discussion 1709) (#15427) 2021-04-21 10:59:06 +02:00
Frank Luan
7ff436e1f3
Fix restore_spilled_objects() for external object spilling (#15426)
* Fix deserializer in metrics.Counter

* Fix restore_spilled_objects() for external object spilling
2021-04-20 16:33:44 -07:00
Yi Cheng
dbba3a456f
[core] Fixing of actor creation failure (#15411)
* Fix

* fix

* format

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* format

* fix comments
2021-04-20 15:27:45 -07:00
Kai Fricke
d7e31c0d13
[tune] Return normalized checkpoint path (#15296)
* Return normalized checkpoint path

* Lint
2021-04-20 13:36:40 -07:00
Yi Cheng
9b3ea7c32b
[core] Take care of object spilling failure (#14703)
* fix spilling failure

* format

* unittests added

* format

* format

* format

* fix

* add comment

* fix some comments

* add test cases

* format

* format
2021-04-20 10:28:48 -07:00
Eric Liang
a482034916
Flaky test builder for tests tagged "flaky" (#15408) 2021-04-20 00:19:07 -07:00
Sven Mika
7ff27dfe07
[RLlib] Remove atari dependency for RLlib (in favor of detailed error message). (#15292) 2021-04-20 08:46:58 +02:00
Sven Mika
41968512ca
[RLlib] Partial GPU examples (for learner and workers). (#15334) 2021-04-20 08:46:05 +02:00
architkulkarni
3bda2812fa
[Serve] Remove old ImportedBackend factory (#15376) 2021-04-19 16:25:59 -07:00
Edward Oakes
fbe510cd47
[serve] Clean up route prefixing behavior for deployments (#15193) 2021-04-19 12:50:46 -05:00
fangfengbin
ade684ac03
[Test] Fix gcs flaky testcase (#15391)
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2021-04-19 10:21:39 -07:00
Jiaxin Shan
86468ce59f
[kubernetes] Remove unrelated fields in manifest file (#15243) 2021-04-19 10:54:33 -05:00
DK.Pino
b0a813baad
[Placement Group] Fix PlacementGroup ready when specify memory resource (#15189)
* fix placement group ready when memory specified

* lint

* add memory resource check in suppressed

* fix lint

* update comment

* fix lint

* delete unrelated code

* update comment

* lint

* fix ut
2021-04-17 22:21:05 -07:00
Alex Wu
805b8a10a3
Move scalability envelope back down to 250 nodes (#15381)
* .

* done?

* .

Co-authored-by: Alex Wu <alex@anyscale.com>
2021-04-16 19:39:24 -07:00
SangBin Cho
5f74d0e40d
[Test] Fix flaky test failure (#15326)
* Fix trial.

* unskip test.

* Mock commit
2021-04-16 18:09:02 -07:00
Dmitri Gekhtman
e6864523cf
[autoscaler] Do not divide by zero in resource demand scheduler (#15323)
* Do not divide by zero

* Don't take min or mean of an empty list

* max workers 0 for head node in distributed benchmark

* test

* Correct the type annotation

* comment grammar tweak

* message

* docs

* test

* Move test cli to large tests.
2021-04-16 10:20:05 -07:00
Edward Oakes
822a83055e
[Buildkite] split up some tune and rllib tests (#15343) 2021-04-16 10:16:12 -07:00
Risto Vuorio
dcda4a3d60
[tune] escaping paths before globbing in TrainableUtil.get_checkpoints_paths (#15368)
* Fixes 15367 by escaping paths before globbing in TrainableUtil.get_checkpoints_paths

* Adds a test testGetTrialCheckpointsPathsByPathWithSpecialCharacters for fix_15367
2021-04-16 09:41:02 -07:00
Sven Mika
cecfc3b43b
[RLlib] Multi-GPU support for Torch algorithms. (#14709) 2021-04-16 09:16:24 +02:00