Edward Oakes
be974a6596
[metrics] Only put live nodes in prometheus service discovery file ( #14495 )
2021-03-04 16:17:00 -06:00
Eric Liang
2cf4c7253c
[ray client] Fix ctrl-c for ray.get() by setting a short-server side timeout ( #14425 )
2021-03-04 10:36:42 -08:00
Ian Rodney
759892740a
[Autoscaler] chown Ray_bootstrap Files in DockerCommandRunner ( #14380 )
2021-03-03 19:13:20 -08:00
Antoine Galataud
460c2757a3
Allow assigning weight to var with close name ( #14109 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-03-03 19:11:34 -08:00
Eric Liang
99a63b3dd1
Remove old scheduler and friends ( #14184 )
2021-03-03 18:29:15 -08:00
Richard Liaw
dba533dd84
Disable more torch ( #14480 )
...
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2021-03-03 15:46:32 -08:00
tchordia
e40dc3a3e9
[serve] Better validation for arguments to client.start() ( #14327 )
2021-03-03 14:33:36 -08:00
Richard Liaw
60a8b67488
Disable mnist tests ( #14474 )
2021-03-03 13:25:01 -08:00
Hao Zhang
4135b0eb4a
[Collective] Supporting multistream, stream pool, and CUDA events. ( #14127 )
...
Co-authored-by: fustinose <fustinosej@gmail.com>
2021-03-03 09:53:45 -08:00
SangBin Cho
a04ab9b472
[Core] Fix ray memory bug ( #14452 )
...
* ray memory bug
* Fix ray memory issue.
* done.
2021-03-03 09:20:00 -08:00
SangBin Cho
1d2136959f
[Core] Fix port issue ( #14435 )
...
* Initial impl.
* Update.
* fixed a bug.
* Fix all the issues.
* Addressed code review.
* Addressed code review.
* Fix a test failure.
2021-03-03 09:16:00 -08:00
Xianyang Liu
fc9182e63c
Fixes autoscaling monitor when environment has set http_proxy or https_proxy ( #14351 )
2021-03-03 18:22:53 +02:00
Sven Mika
5637d89ecc
[RLlib] Serve + RLlib example script. ( #14416 )
2021-03-03 14:33:03 +01:00
Antoni Baum
85a092c3d7
[Tune] Fix HEBO evaluated rewards for max mode & save/restore ( #14427 )
...
* Fix HEBO evaluated rewards for max mode
* Lint
* Make sure everything necessary is saved
2021-03-03 09:44:43 +01:00
Richard Liaw
63c2b7356e
Disable windows tests for test_iter and test_reference_counting ( #14455 )
...
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2021-03-03 00:39:59 -08:00
fangfengbin
1054613da1
[Core]Fix ray.kill doesn't cancel pending actor bug ( #14154 )
...
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2021-03-03 16:12:32 +08:00
Stephanie Wang
5c6c9d5b91
[core] Spill tasks from waiting queue ( #14288 )
...
* Spill back waiting tasks
* test
* test
* todo
* Avoid iterating over args
* update
* lint
* Fix test
* test
* Test force spillback
* Unit test resource scheduler
* test
* travis?
* rename
* debug
* revert flaky test
* lint
* fix test
* fix
2021-03-02 22:30:02 -08:00
Dmitri Gekhtman
1675156a8b
[autoscaler][interface] Use multi node types in defaults.yaml and example-full.yaml ( #14239 )
...
* random doc typo
* example-full-multi
* left off max workers
* wip
* address comments, modify defaults, wip
* fix
* wip
* reformat more things
* undo useless diff
* space
* max workers
* space
* copy-paste mishaps
* space
* More copy-paste mishaps
* copy-paste issues, space, max_workers
* head_node_type
* legacy yamls
* line undeleted
* correct-gpu
* Remove redundant GPU example.
* Extraneous comment
* whitespace
* example-java.yaml
* Revert "example-java.yaml"
This reverts commit 1e9c0124b9d97e651aaeeb6ec5bf7a4ef2a2df17.
* tests and other things
* doc
* doc
* revert max worker default
* Kubernetes comment
* wip
* wip
* tweak
* Address comments
* test_resource_demand_scheduler fixes
* Head type min/max workers, aws resources
* fix example_cluster2.yaml
* Fix external node type test (compatibility with legacy-style external node types)
* fix test_autoscaler_aws
* gcp-images
* gcp node type names
* fix gcp defaults
* doc format
* typo
* Skip failed Windows tests
* doc string and comment
* assert
* remove contents of default external head and worker
* legacy external failed validation test
* Readability -- define the minimal external config at the top of the file.
* Remove default worker type min worker
* Remove extraneous global min_workers comment.
* per-node-type docker in aws/example-gpu-docker
* ray.worker.small -> ray.worker.default
* fix-docker
* fix gpu docker again
* undo kubernetes experiment
* fix doc
* remove worker max_worker from kubernetes
* remove max_worker from local worker node type
* fix doc again
* py38
* eric-comment
* fix cluster name
* fix-test-autoscaler
* legacy config logic
* pop resources
* Remove min_workers AFTER merge
* comment, warning message
* warning, comment
2021-03-03 06:16:19 +02:00
Eric Liang
ef873be9e8
Require opt-in to switching plasma to /tmp instead of /dev/shm ( #14451 )
2021-03-02 16:44:33 -08:00
Richard Liaw
d92c00e233
Pin autogluon.core for builds ( #14448 )
2021-03-02 15:55:03 -08:00
Kai Fricke
47603045f9
[tune] Move Optuna to ask/tell interface ( #14387 )
2021-03-02 15:35:11 -08:00
SangBin Cho
bacbdd297b
[Core] Do not unregister workers that own objects by worker capping mechanism. ( #14408 )
...
* Almost done.
* Initial implementation done.
* Fix issue.
* Addressed the initial code review.
* improve comments.
* Addressed code review.
* Adding unit tests.
* Complete unit tests.
* Resolve all issues.
* Fix issues.
2021-03-02 12:24:22 -08:00
Edward Oakes
b7516ef667
hide CLI option for redis shard ports ( #14434 )
2021-03-02 11:06:34 -08:00
Alex Wu
4572c6cf0f
[autoscaler] Fix tag cache bug, don't kill workers on error ( #14424 )
2021-03-02 11:06:06 -08:00
Yi Cheng
d921dca075
[core] Fixing bug when dispatching tasks to deleted placement group ( #14300 )
2021-03-02 10:24:53 -08:00
Stephanie Wang
a24ac13671
[core] Randomize actor ID to avoid collisions ( #14358 )
...
* Randomize actor ID
* Mix index and current time, add python test
* test
* nanos
2021-03-02 10:00:28 -08:00
SangBin Cho
09fd38ede1
[Multi node shuffle] More efficient ray memory --stats-only ( #14423 )
...
* Done.
* Fix all the issues.
2021-03-01 23:14:06 -08:00
Dmitri Gekhtman
58c0959ea7
[kubernetes][docs][minor] Move Kubernetes example scripts to docs ( #14412 )
2021-03-01 20:17:16 -08:00
Amog Kamsetty
ca11b189b8
[Tune] use epoch for ptl checkpoint dir name ( #14392 )
...
* use epoch for dir name
* use formatted string
2021-03-01 20:14:35 -08:00
Eric Liang
9db000ff2c
Auto report object store memory usage; remove some deprecated code ( #14260 )
2021-03-01 13:19:44 -08:00
Edward Oakes
ff00a89927
Enable test_async_goal_manager ( #14419 )
2021-03-01 14:20:28 -06:00
Barak Michener
2a28585bb3
[ray_client]: Add architecture doc ( #14265 )
2021-03-01 10:56:11 -08:00
Ian Rodney
9125b6bca3
[Autoscaler][GCP] Use Python3.8 in defaults.yaml ( #14417 )
2021-03-01 10:50:39 -08:00
Micah Yong
db0c16824c
[Dashboard][CLI] Ray memory parity with dashboard 2 ( #13444 )
...
* Minor improvements in Ray Core Walkthrough as seen in https://github.com/ray-project/ray/issues/12472
* Define node_stats() to return NodeStats object from cluster
* Add --group-by and --sort-by capabilities to ray memory script
* Resolve merge conflict
* Add helper functions for group by and sorting type in memory_utils.py
* Reformat
* Format
* Compartmentalize memory script into get_memory_summary and get_store_stats_summary
* Modify unit tests in test_mem_stat
* Lint and format
* Test cases for group_by sort_by
* Lint and format
* Fix actor handle failing test case
* Update test_memstat.py
* Resolve merge conflicts
* Adjust ray memory output based on terminal size
* Formatting and linting
* Use constant for callsite length
* Switch from OS to shutil for querying terminal size (official python support)
* Linting and formatting
* Lint and format
* Resolve lint issue in walkthrough.rst
* Revert to python 3.6
* Delete visitor.py
It was accidentally included in most recent commit
* Delete .eggs
It was accidentally included in most recent commit
* Resolve test_object_spilling.py test case
* Add stats only argument
* revert changes on this file
* Remove package-lock.json
* Add back npm installation
* Sync package-lock.json
* Linting and formatting
* Sync with package-lock
* Sync with package-lock pt 2
* Update documentation in https://docs.ray.io/en/master/memory-management.html
* Add include_memory_info as argument for node_stats
* Switch object ref and call site positions
* Linting and formatting
* Change from MiB to B
* Change from stats-only to store-true
* Add memory test case
* Add memory test case
* Lint and format
* Correct test in memstat
* Change line wrap and stats only to flags
* Clarify --stats-only and --no-format in ray memory
* --stats-only description modified
Co-authored-by: Micah Yong <micahyong@Micahs-MacBook-Pro.local>
2021-03-01 09:27:22 -08:00
Raphael CHEN
343ebf8ea7
[tune] Checkpoint according to nested metric ( #14379 )
2021-03-01 17:14:39 +01:00
dependabot[bot]
cda4ad044a
[tune](deps): Bump mlflow from 1.13.1 to 1.14.0 in /python/requirements ( #14396 )
...
Bumps [mlflow](https://github.com/mlflow/mlflow ) from 1.13.1 to 1.14.0.
- [Release notes](https://github.com/mlflow/mlflow/releases )
- [Changelog](https://github.com/mlflow/mlflow/blob/master/CHANGELOG.rst )
- [Commits](https://github.com/mlflow/mlflow/compare/v1.13.1...v1.14.0 )
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-03-01 12:28:15 +01:00
dependabot[bot]
c925e8d14c
[tune](deps): Bump ax-platform in /python/requirements ( #14398 )
...
Bumps [ax-platform](https://github.com/facebook/Ax ) from 0.1.19 to 0.1.20.
- [Release notes](https://github.com/facebook/Ax/releases )
- [Changelog](https://github.com/facebook/Ax/blob/master/CHANGELOG.md )
- [Commits](https://github.com/facebook/Ax/compare/0.1.19...v0.1.20 )
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-03-01 12:27:45 +01:00
Kai Fricke
7f9340bb2f
[tune] Add leading zeros to checkpoint directory ( #14152 )
...
* [tune] Add leading zeros to checkpoint directory
* Fix exp analysis tests/support string indices
* Fix tests
* RLLib tests
2021-03-01 12:12:19 +01:00
Kai Fricke
8572774304
[tune] Lookup flat key first before trying to split ( #14388 )
2021-03-01 12:11:03 +01:00
Kai Yang
e0e8918d60
[Core] Raylet to pick the node manager port ( #14349 )
2021-02-27 20:27:09 +08:00
Kai Fricke
b1d0aa9798
Add unit test for ray cluster-dump ( #14389 )
2021-02-26 14:40:09 -08:00
architkulkarni
f9364b1d5c
[Serve] Add logger with backend and replica tags ( #14251 )
2021-02-26 12:46:19 -08:00
SangBin Cho
2b5b0dd3fc
[Core] Fix the issue with duplicated args ( #14329 )
2021-02-26 12:42:58 -08:00
Clark Zinzow
6b37720c6a
[Core] Locality-aware leasing: Milestone 4 - Borrowed refs. ( #14296 )
...
* Adds locality-aware leasing for borrowed refs.
* Added tests.
2021-02-26 10:36:12 -08:00
Ian Rodney
e1117ebc8d
[Autoscaler] Fix GCP User Inconsistency ( #14364 )
2021-02-26 10:12:46 -08:00
Amog Kamsetty
09bfcb2a0a
make experiment name configurable ( #14373 )
2021-02-26 08:45:52 -08:00
Raphael CHEN
8cedd16f44
[tune] Correctly validate nested metrics ( #14375 )
...
* [tune] Correctly validate nested metrics
Before:
- Nested metrics couldn't pass validation process, since the nested result was used to validate metrics
After:
- Flattened result is used to validate metrics
* Fix BO test and lint
Co-authored-by: Kai Fricke <kai@anyscale.com>
2021-02-26 14:00:06 +01:00
Kai Fricke
4014168928
[tune] Introduce durable()
wrapper to convert trainables into durable trainables ( #14306 )
...
* [tune] Introduce `durable()` wrapper to convert trainables into durable trainables
* Fix wrong check
* Improve docs, add FAQ for tackling overhead
* Fix bugs in `tune.with_parameters`
* Update doc/source/tune/api_docs/trainable.rst
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
* Update doc/source/tune/_tutorials/_faq.rst
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-02-26 13:59:28 +01:00
Simon Mo
f1c8c8d12f
Bump protobuf to the latest version ( #14365 )
2021-02-25 20:59:18 -08:00
Clark Zinzow
b844548b57
[dask-on-ray] Adds support for dask.persist() with inlined Ray futures. ( #14294 )
...
* Adds support for dask.persist() with inlined Ray futures.
* Update persist test.
* Add patched dask.persist() documentation.
2021-02-25 17:48:47 -08:00