Commit graph

7910 commits

Author SHA1 Message Date
Yi Cheng
dbba3a456f
[core] Fixing of actor creation failure (#15411)
* Fix

* fix

* format

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* format

* fix comments
2021-04-20 15:27:45 -07:00
Kai Fricke
d7e31c0d13
[tune] Return normalized checkpoint path (#15296)
* Return normalized checkpoint path

* Lint
2021-04-20 13:36:40 -07:00
Yi Cheng
9b3ea7c32b
[core] Take care of object spilling failure (#14703)
* fix spilling failure

* format

* unittests added

* format

* format

* format

* fix

* add comment

* fix some comments

* add test cases

* format

* format
2021-04-20 10:28:48 -07:00
Eric Liang
a482034916
Flaky test builder for tests tagged "flaky" (#15408) 2021-04-20 00:19:07 -07:00
Sven Mika
7ff27dfe07
[RLlib] Remove atari dependency for RLlib (in favor of detailed error message). (#15292) 2021-04-20 08:46:58 +02:00
Sven Mika
41968512ca
[RLlib] Partial GPU examples (for learner and workers). (#15334) 2021-04-20 08:46:05 +02:00
architkulkarni
3bda2812fa
[Serve] Remove old ImportedBackend factory (#15376) 2021-04-19 16:25:59 -07:00
Edward Oakes
fbe510cd47
[serve] Clean up route prefixing behavior for deployments (#15193) 2021-04-19 12:50:46 -05:00
fangfengbin
ade684ac03
[Test] Fix gcs flaky testcase (#15391)
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2021-04-19 10:21:39 -07:00
Jiaxin Shan
86468ce59f
[kubernetes] Remove unrelated fields in manifest file (#15243) 2021-04-19 10:54:33 -05:00
DK.Pino
b0a813baad
[Placement Group] Fix PlacementGroup ready when specify memory resource (#15189)
* fix placement group ready when memory specified

* lint

* add memory resource check in suppressed

* fix lint

* update comment

* fix lint

* delete unrelated code

* update comment

* lint

* fix ut
2021-04-17 22:21:05 -07:00
Alex Wu
805b8a10a3
Move scalability envelope back down to 250 nodes (#15381)
* .

* done?

* .

Co-authored-by: Alex Wu <alex@anyscale.com>
2021-04-16 19:39:24 -07:00
SangBin Cho
5f74d0e40d
[Test] Fix flaky test failure (#15326)
* Fix trial.

* unskip test.

* Mock commit
2021-04-16 18:09:02 -07:00
Dmitri Gekhtman
e6864523cf
[autoscaler] Do not divide by zero in resource demand scheduler (#15323)
* Do not divide by zero

* Don't take min or mean of an empty list

* max workers 0 for head node in distributed benchmark

* test

* Correct the type annotation

* comment grammar tweak

* message

* docs

* test

* Move test cli to large tests.
2021-04-16 10:20:05 -07:00
Edward Oakes
822a83055e
[Buildkite] split up some tune and rllib tests (#15343) 2021-04-16 10:16:12 -07:00
Risto Vuorio
dcda4a3d60
[tune] escaping paths before globbing in TrainableUtil.get_checkpoints_paths (#15368)
* Fixes 15367 by escaping paths before globbing in TrainableUtil.get_checkpoints_paths

* Adds a test testGetTrialCheckpointsPathsByPathWithSpecialCharacters for fix_15367
2021-04-16 09:41:02 -07:00
Sven Mika
cecfc3b43b
[RLlib] Multi-GPU support for Torch algorithms. (#14709) 2021-04-16 09:16:24 +02:00
fangfengbin
0e3bbbeba3
[Test] Try deflaking gcs server test by adding log (#15332)
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2021-04-15 21:16:09 -07:00
Richard Liaw
dc80d9f42a
[flaky] fix mnist ptl data cache (#15344)
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2021-04-15 16:24:17 -07:00
SangBin Cho
a54d69f535
[Test] Split long runtime env tests. (#15340)
* [Test] Split long runtime env tests.

* Addressed code review.
2021-04-15 14:28:28 -07:00
SangBin Cho
1d87e4447d
[Test] increase the test size of test io that consistenly times out (#15341) 2021-04-15 14:02:41 -07:00
Siyuan (Ryans) Zhuang
4de1f35b3e
run_function_on_all_workers only once in the driver (#15203) 2021-04-15 13:58:36 -07:00
Richard Liaw
eaa3ce3f40
Fix release test -- client remote put (#15325)
* fix-test

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>

* fix

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>

* Update python/ray/util/client/server/dataservicer.py

* Update python/ray/util/client/server/dataservicer.py

* Update python/ray/_private/ray_client_microbenchmark.py
2021-04-15 13:30:38 -07:00
Kai Fricke
1c783e2eeb
[tune] Allow 0 CPU head bundles in for placement group factories (#15338) 2021-04-15 20:21:35 +01:00
SangBin Cho
4dd4756c09
[Test] skip flaky pg tests. (#15337) 2021-04-15 11:55:19 -07:00
SangBin Cho
df9329160e
[Tests] Dask on ray release test (#15256)
* done.

* Linting.

* Update readme

* Update.

* Fix issues.
2021-04-15 10:30:17 -07:00
Sven Mika
8b3554e37e
[RLlib] Remove all (already soft-deprecated) SampleBatch.data from code. (#15335) 2021-04-15 19:19:51 +02:00
Sven Mika
c90de315e5
[RLlib] APEX returns incorrect default resources (PleacementGroupFactory) colocated missing replay actors. (#15295) 2021-04-15 16:50:42 +01:00
Sven Mika
e961d2f4b2
[RLlib] Improve example scripts for attention nets, CartPole LSTM, and custom RNN-models. (#15329) 2021-04-15 16:11:34 +02:00
Sven Mika
45d6560759
[RLlib] Fix flakey custom_fast_model_torch/tf tests. (#15330) 2021-04-15 16:10:29 +02:00
Ameer Haj Ali
981fa5829a
[client] Enable ClientObjectRef Comparisons (#15320) 2021-04-15 16:46:44 +03:00
SangBin Cho
c2e240e866
[Doc] Update object spilling doc (#15301) 2021-04-14 23:38:04 -07:00
Simon Mo
57b6053cda
[Buildkite] Turn off Travis linux builds (except wheels) (#15316)
* [Buildkite] Turn off Travis linux builds (except wheels)

* naming
2021-04-14 20:37:37 -07:00
Siyuan (Ryans) Zhuang
b81d805f40
[Doc] fix ray client doc (#15308) 2021-04-14 20:35:15 -07:00
Yi Cheng
a9402c21e6
Revert "Revert "[runtime_env] Add support of exclusion (#15241)" (#15303)" with fixing (#15310)
* Revert "Revert "[runtime_env] Add support of exclusion (#15241)" (#15303)"

This reverts commit 775deca5ad.

* fix
2021-04-14 20:34:53 -07:00
Stephanie Wang
6b2da7eda8
[core] Log warning on bad max task args value (#15314) 2021-04-14 20:34:08 -07:00
SangBin Cho
27ab0c7633
[Test] Skip the failing rllib example test. (#15321) 2021-04-14 20:19:44 -07:00
Simon Mo
5f0be94989
[Buildkite] Use the build link for Travis Tracker (#15317) 2021-04-14 18:58:23 -07:00
SangBin Cho
d0e83c43ca
[Release Test] Modify parameter to reduce stress (#15048)
* Fix.

* Fix.
2021-04-14 18:27:20 -07:00
SangBin Cho
e0bbfaf87e
[Log] Fix log monitor issue. (#15302) 2021-04-14 18:11:24 -07:00
Yi Cheng
0caf96be94
Take care of failed killing request (#15313) 2021-04-14 18:07:10 -07:00
Charles
82e730078f
[autoscaler] Converting assert False into useful exceptions. (#15306) 2021-04-14 16:16:37 -07:00
Simon Mo
c4b1985a5b
[Serialization] Pydantic -> serialization_addons.py and Ray Client support. (#15181) 2021-04-14 15:21:13 -07:00
Simon Mo
5289690d1c
[Buildkite] Fix Bazel Logs Upload (#15285) 2021-04-14 12:47:31 -07:00
SangBin Cho
775deca5ad
Revert "[runtime_env] Add support of exclusion (#15241)" (#15303)
This reverts commit 359b5ce06b.
2021-04-14 11:58:53 -07:00
Richard Liaw
59bf3a7b22
ray[cluster] -> ray[default] (#15251) 2021-04-14 09:37:04 -07:00
Antoni Baum
b93bd9bef4
[tune] Set correct Optuna TrialState on trial complete (#15283) 2021-04-14 15:59:23 +01:00
Sven Mika
bbfa8ffec9
[RLlib] Minor release 1.3 warnings cleanups. (#15272) 2021-04-14 14:03:15 +02:00
Sven Mika
ef0f163d16
[RLlib] Discussion 1709: IMPALA (tf and torch) reports sum of entropy (over batch) in stats. Should report mean instead. (#15290) 2021-04-14 11:44:25 +02:00
Kai Fricke
aaa14d63a7
[tune] deflake test_convergence, add seed parameter to OptunaSearch (#15248)
* De-flake optuna convergence test

* Even higher threshold

* Add `seed` parameter to OptunaSearch
2021-04-14 01:06:49 -07:00