Commit graph

7388 commits

Author SHA1 Message Date
Raphael CHEN
343ebf8ea7
[tune] Checkpoint according to nested metric (#14379) 2021-03-01 17:14:39 +01:00
Qing Wang
f7f64e90ed
[Minor] Remove unused field. (#14382)
Co-authored-by: Qing Wang <jovany.wq@antgroup.com>
2021-03-01 19:35:28 +08:00
dependabot[bot]
cda4ad044a
[tune](deps): Bump mlflow from 1.13.1 to 1.14.0 in /python/requirements (#14396)
Bumps [mlflow](https://github.com/mlflow/mlflow) from 1.13.1 to 1.14.0.
- [Release notes](https://github.com/mlflow/mlflow/releases)
- [Changelog](https://github.com/mlflow/mlflow/blob/master/CHANGELOG.rst)
- [Commits](https://github.com/mlflow/mlflow/compare/v1.13.1...v1.14.0)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-03-01 12:28:15 +01:00
dependabot[bot]
c925e8d14c
[tune](deps): Bump ax-platform in /python/requirements (#14398)
Bumps [ax-platform](https://github.com/facebook/Ax) from 0.1.19 to 0.1.20.
- [Release notes](https://github.com/facebook/Ax/releases)
- [Changelog](https://github.com/facebook/Ax/blob/master/CHANGELOG.md)
- [Commits](https://github.com/facebook/Ax/compare/0.1.19...v0.1.20)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-03-01 12:27:45 +01:00
Kai Fricke
7f9340bb2f
[tune] Add leading zeros to checkpoint directory (#14152)
* [tune] Add leading zeros to checkpoint directory

* Fix exp analysis tests/support string indices

* Fix tests

* RLLib tests
2021-03-01 12:12:19 +01:00
Kai Fricke
8572774304
[tune] Lookup flat key first before trying to split (#14388) 2021-03-01 12:11:03 +01:00
qicosmos
277b6f5d3c
Support arbitrary arguments for c++ worker normal tasks and actor tasks (#14233) 2021-03-01 16:27:03 +08:00
niole
be9a584a94
[Docs] Remove version reference in dashboard proxy docs (#14359) 2021-02-27 21:06:25 -08:00
Kai Yang
e0e8918d60
[Core] Raylet to pick the node manager port (#14349) 2021-02-27 20:27:09 +08:00
Ian Rodney
8cfaea5fc5
[Docker] Make Docker Build Python file easier to use! (#14223) 2021-02-26 15:23:02 -08:00
Kai Fricke
b1d0aa9798
Add unit test for ray cluster-dump (#14389) 2021-02-26 14:40:09 -08:00
architkulkarni
f9364b1d5c
[Serve] Add logger with backend and replica tags (#14251) 2021-02-26 12:46:19 -08:00
SangBin Cho
2b5b0dd3fc
[Core] Fix the issue with duplicated args (#14329) 2021-02-26 12:42:58 -08:00
Clark Zinzow
17ae694405
Consolidate Bazel build and test action_env configs to prevent analysis cache discarding. (#14362) 2021-02-26 11:14:02 -08:00
Clark Zinzow
6b37720c6a
[Core] Locality-aware leasing: Milestone 4 - Borrowed refs. (#14296)
* Adds locality-aware leasing for borrowed refs.

* Added tests.
2021-02-26 10:36:12 -08:00
Simon Mo
af085ed8aa
[Serve] Add Perf Tuning Doc (#14334)
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Co-authored-by: architkulkarni <architkulkarni@users.noreply.github.com>
2021-02-26 10:28:02 -08:00
Ian Rodney
e1117ebc8d
[Autoscaler] Fix GCP User Inconsistency (#14364) 2021-02-26 10:12:46 -08:00
Amog Kamsetty
09bfcb2a0a
make experiment name configurable (#14373) 2021-02-26 08:45:52 -08:00
Raphael CHEN
8cedd16f44
[tune] Correctly validate nested metrics (#14375)
* [tune] Correctly validate nested metrics

Before:
- Nested metrics couldn't pass validation process, since the nested result was used to validate metrics

After:
- Flattened result is used to validate metrics

* Fix BO test and lint

Co-authored-by: Kai Fricke <kai@anyscale.com>
2021-02-26 14:00:06 +01:00
Kai Fricke
4014168928
[tune] Introduce durable() wrapper to convert trainables into durable trainables (#14306)
* [tune] Introduce `durable()` wrapper to convert trainables into durable trainables

* Fix wrong check

* Improve docs, add FAQ for tackling overhead

* Fix bugs in `tune.with_parameters`

* Update doc/source/tune/api_docs/trainable.rst

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* Update doc/source/tune/_tutorials/_faq.rst

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-02-26 13:59:28 +01:00
Simon Mo
f1c8c8d12f
Bump protobuf to the latest version (#14365) 2021-02-25 20:59:18 -08:00
Richard Liaw
3e9ff91218
Revert the reverted heartbeat factor PR (check windows build) (#14341) 2021-02-25 20:52:12 -08:00
Clark Zinzow
b844548b57
[dask-on-ray] Adds support for dask.persist() with inlined Ray futures. (#14294)
* Adds support for dask.persist() with inlined Ray futures.

* Update persist test.

* Add patched dask.persist() documentation.
2021-02-25 17:48:47 -08:00
Xianyang Liu
34a9714dda
[docker] Fix docker 'development' build failure (#13289) 2021-02-25 14:57:30 -08:00
Richard Liaw
a2d2275ee1
Revert "[RLlib + Tune] Add placement group support to RLlib. (#14289)" (#14360)
This reverts commit 6cd0cd3bd9.
2021-02-25 14:27:35 -08:00
Sven Mika
4cd5c1da2c
[RLlib] Remove flaky test case for mixed (tf+torch) policies trainer. (#14357) 2021-02-25 14:07:05 -08:00
architkulkarni
ba4b7ccfe8
[Serve] [Doc] Add basic Serve tutorial (#14256) 2021-02-25 14:10:08 -06:00
Guy Khazma
e3f3269b15
[doc] Fixes to RayDP docs (#14309)
* minor fix to raydp docs

* fix pytorch and tensorflow samples

* fix: minor fixes
2021-02-25 11:23:10 -08:00
Sven Mika
6cd0cd3bd9
[RLlib + Tune] Add placement group support to RLlib. (#14289) 2021-02-25 16:01:31 +01:00
Sven Mika
8000258333
[RLlib] R2D2 Implementation. (#13933) 2021-02-25 12:18:11 +01:00
SangBin Cho
4357055305
[Shuffle] Emulate multi node in shuffle.py (#14331)
* done.

* Formatting.

* done.

* Addressed code review.

* Addressed code review 2.
2021-02-24 23:49:29 -08:00
Kai Fricke
d9e5d5f47a
[RLlib] Cast fcnet_hiddens to list for DQN models (list vs tuple mismatch error) (#14308) 2021-02-25 08:06:08 +01:00
Eric Liang
adbdacae58
add more io workers (#14330) 2021-02-24 22:00:31 -08:00
Clark Zinzow
c1a1be1da6
[Core] Locality-aware leasing: Milestone 2 - Owned refs, cached locations (#14282)
* Adds locality-aware leasing for cached owned refs.

* Add tests for locality-aware leasing on cached owned refs.
2021-02-24 21:24:10 -08:00
Hao Zhang
11e721c9b3
[Collective] Address some comments and minor updates before merging multistream (#14302) 2021-02-24 20:43:42 -08:00
Kathryn Zhou
456d9aab47
Add Cypress test for Ray Dashboard (#14253) 2021-02-24 20:41:52 -08:00
Richard Liaw
80657e5dfe
Revert "[Core]Pull off timers out of heartbeat in raylet (#13963)" (#14319) 2021-02-24 19:44:31 -08:00
ZhuSenlin
be28e8fae4
use iterator to instead of operator[] to avoid garbage (#14275) 2021-02-25 11:37:36 +08:00
niole
488f63efe3
[Dashboard] Make requests sent by the dashboard reverse proxy compatible (#14012) 2021-02-24 18:31:59 -08:00
architkulkarni
ef96193b8b
fix servehandle docstring for sync/async (#14312) 2021-02-24 16:41:15 -08:00
Kai Fricke
021ed92e8a
Add debug_state.txt to cluster dump (#14310) 2021-02-24 22:47:26 +01:00
dependabot[bot]
aa36a6622d
[tune](deps): Bump xgboost in /python/requirements (#14225)
Bumps [xgboost](https://github.com/dmlc/xgboost) from 1.3.0.post0 to 1.3.3.
- [Release notes](https://github.com/dmlc/xgboost/releases)
- [Changelog](https://github.com/dmlc/xgboost/blob/master/NEWS.md)
- [Commits](https://github.com/dmlc/xgboost/commits/v1.3.3)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-02-24 13:43:19 -08:00
Richard Liaw
4dd5c9e541
[tune] fix placement group timeout (#14313)
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2021-02-24 13:35:13 -08:00
Richard Liaw
fd128a4533
disable object-spilling test (#14318) 2021-02-24 12:22:25 -08:00
Clark Zinzow
c867054f0c
Skip GCS fault-tolerance test on Windows. (#14311) 2021-02-24 11:44:41 -08:00
Eric Liang
4bae0c9228
[client] Allow ignoring version mismatch with env var for debugging (#14295) 2021-02-24 11:36:16 -08:00
Ameer Haj Ali
5155673404
set STATUS_UNINITIALIZED TAG launching head node (#14293)
* prepare for head node

* move command runner interface outside _private

* remove space

* Eric

* flake

* min_workers in multi node type

* fixing edge cases

* eric not idle

* fix target_workers to consider min_workers of node types

* idle timeout

* minor

* minor fix

* test

* lint

* eric v2

* eric 3

* min_workers constraint before bin packing

* Update resource_demand_scheduler.py

* Revert "Update resource_demand_scheduler.py"

This reverts commit 818a63a2c86d8437b3ef21c5035d701c1d1127b5.

* reducing diff

* make get_nodes_to_launch return a dict

* merge

* weird merge fix

* auto fill instance types for AWS

* Alex/Eric

* Update doc/source/cluster/autoscaling.rst

* merge autofill and input from user

* logger.exception

* make the yaml use the default autofill

* docs Eric

* remove test_autoscaler_yaml from windows tests

* lets try changing the test a bit

* return test

* lets see

* edward

* Limit max launch concurrency

* commenting frac TODO

* move to resource demand scheduler

* use STATUS UP TO DATE

* Eric

* make logger of gc freed refs debug instead of info

* add cluster name to docker mount prefix directory

* grrR

* fix tests

* moving docker directory to sdk

* move the import to prevent circular dependency

* smallf fix

* ian

* fix max launch concurrency bug to assume failing nodes as pending and consider only load_metric's connected nodes as running

* small fix

* huh?

* set initialized status for head when launching head node

* test

* patch

* fix lint

Co-authored-by: Ameer Haj Ali <ameerhajali@ameers-mbp.lan>
Co-authored-by: Alex Wu <alex@anyscale.io>
Co-authored-by: Alex Wu <itswu.alex@gmail.com>
Co-authored-by: Eric Liang <ekhliang@gmail.com>
Co-authored-by: Ameer Haj Ali <ameerhajali@Ameers-MacBook-Pro.local>
2021-02-24 18:34:05 +02:00
dependabot[bot]
94d9e0f35d
[tune](deps): Bump torchvision from 0.8.1 to 0.8.2 in /python/requirements (#14226)
* [tune](deps): Bump torchvision in /python/requirements

Bumps [torchvision](https://github.com/pytorch/vision) from 0.8.1 to 0.8.2.
- [Release notes](https://github.com/pytorch/vision/releases)
- [Commits](https://github.com/pytorch/vision/compare/v0.8.1...v0.8.2)

Signed-off-by: dependabot[bot] <support@github.com>

* Update requirements_tune.txt

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
2021-02-24 16:36:12 +01:00
fangfengbin
482a00278b
[GCS]Fix flaky testcase: ServiceBasedGcsClientTest (#14248) 2021-02-24 20:35:30 +08:00
Tao Wang
6af0291347
[Core]Pull off timers out of heartbeat in raylet (#13963) 2021-02-24 11:59:13 +08:00