Commit graph

4760 commits

Author SHA1 Message Date
Chris K. W
a33cbec12a
[client][docs] update docs for new client support in init (#17333)
* start

* check formatting

* undo changes from base branch

* Client builder API docs

* indent

* 8

* minor fixes

* absolute path to runtime env docs

* fix runtime_env link

* Update worker.init docs

* drop clientbuilder docs, link to 1.4.1 docs instead. Specify local:// behavior when address passed

* add debug info for ray.init("local")

* local:// attaches a driver directly

* update ray.init return wording

* remote init.connect() from example

* drop local:// docs, add section on when to use ray client

* link to 1.4.1 docs in code example instead of mentioning clientbuilder

* fix backticks, doc mentions of ray.util.connect

* remove ray.util.connect mentions from examples and comments

* update tune example

* wording

* localhost:<port> also works if you're on the head node

* add quotes

* drop mentions of ray client from ray.init docstring

* local->remote

* fix section ref

* update ray start output

* fix section link

* try to fix doc again

* fix link wording

* drop local:// from docs and special handling from code

* update ray start message

* lint

* doc lint

* remove local:// codepath

* remove 'internal_config'

* Update doc/source/cluster/ray-client.rst

Co-authored-by: Ameer Haj Ali <ameerh@berkeley.edu>

* doc suggestion

* Update doc/source/cluster/ray-client.rst

Co-authored-by: Ameer Haj Ali <ameerh@berkeley.edu>
2021-08-04 05:31:44 +03:00
James Mishra
6240d22060
Validate Redis addresses before making the client (#17481) 2021-08-03 16:56:53 -07:00
Siyuan (Ryans) Zhuang
bef519b373
[Workflow] Simplify storage and bug fix (#17453)
* simplify storage

* bug fix

* use a key-value like naming

* update workflow API

* fix s3

* add test
2021-08-03 16:38:54 -07:00
Ian Rodney
f3acae6eb6
[Autoscaler] Sync Files before Starting Docker (#17361) 2021-08-03 13:25:08 -07:00
Alex Wu
8efa6be913
[Dataset] Fix reading parquet from gcs (#17528)
* .

* .

* comments

Co-authored-by: Alex Wu <alex@anyscale.com>
2021-08-03 10:10:42 -07:00
Sasha Sobol
5dbbaf7261
[autoscaler] Enforce per-node-type max workers (#17352)
* Enforce per-node-type max workers

* type annonation

Co-authored-by: Ameer Haj Ali <ameerh@berkeley.edu>

* cleanup. comments. type annotations

* additional type annotation

Co-authored-by: Ameer Haj Ali <ameerh@berkeley.edu>

* additional cleanup. comments. type annotations

* _get_nodes_needed_for_request_resources to use FrozenSet

* comments

* whitespace

* [Placement Group] Fix resource index assignment between with bundle index and without bundle index pg (#17318)

* [serve] Add Ray API stability annotations (#17295)

* Support streaming output of runtime env setup to logger/driver (#17306)

* [SGD] v2 prototype: ``WorkerGroup`` implementation (#17330)

* wip

* formatting

* increase timeouts

* address comments

* comments

* fix

* address comments

* Update python/ray/util/sgd/v2/worker_group.py

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* Update python/ray/util/sgd/v2/worker_group.py

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* address comments

* formatting

* fix

* avoid race condition

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* [RLlib] Discussion 3001: Fix comment on internal state shape (must be [B x S=state dim]). (#17341)

* [autoscaler] GCP TPU VM autoscaler (#17278)

* [Rllib] set self._allow_unknown_config (#17335)

Co-authored-by: Sven Mika <sven@anyscale.io>

* [RLlib] Discussion 2294: Custom vector env example and fix. (#16083)

* [docs] Link broken in Tune's page (#17394) (#17407)

* [Serve] Fix response_model for class based view routes as well (#17376)

* [serve] Fix single deployment nightly test (#17368)

* [RLlib] SAC tuple observation space fix (#17356)

* Support schema on read for csv/json (#17354)

* [RLlib] New and changed version of parametric actions cartpole example + small suggested update in policy_client.py (#15664)

* [gcs] Fix GCS related issues: ByteSizeLong and redis connection (#17373)

* [runtime_env] Gracefully fail tasks when an environment fails to be set up (#17249)

* [docs] update docs with pip requirements (#17317)

* removed nodes_to_keep. cleanup

* formatting

* +comment

* treat max_workers=0 as 0 workers (as opposed to unlimited)

* fix wrong comment

* warning for inconsistent config

* terminate nodes with no matching node type right away

* quotes

* special handling for head node when enforcing max_workers per type. tests. cleanup

* cleanup comments and prints

* comments

* cleanup. removed special handling of head node.

* adding an eplicit non-None check in schedule_node_termination

* raise the exception

Co-authored-by: Ameer Haj Ali <ameerh@berkeley.edu>

Co-authored-by: Ameer Haj Ali <ameerh@berkeley.edu>
Co-authored-by: DK.Pino <loushang.ls@antfin.com>
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Sven Mika <sven@anyscale.io>
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
Co-authored-by: Rohan138 <66227218+Rohan138@users.noreply.github.com>
Co-authored-by: amavilla <takashi.tameshige.jj@hitachi.com>
Co-authored-by: Jiao <sophchess@gmail.com>
Co-authored-by: Julius Frost <33183774+juliusfrost@users.noreply.github.com>
Co-authored-by: Eric Liang <ekhliang@gmail.com>
Co-authored-by: kk-55 <63732956+kk-55@users.noreply.github.com>
Co-authored-by: Yi Cheng <74173148+iycheng@users.noreply.github.com>
Co-authored-by: matthewdeng <matt@anyscale.com>
2021-08-03 11:31:32 -04:00
Antoni Baum
c40555c82b
[tune] Add define-by-run support to OptunaSearcher (#17464) 2021-08-03 16:11:58 +01:00
Antoni Baum
df2fce9ab6
[tune] Allow to pass searcher/scheduler string names to tune.run (#17517) 2021-08-03 09:28:03 +01:00
Eric Liang
f9552765cb
Avoid re-exporting same function repeatedly in dataset (#17522) 2021-08-02 18:15:25 -07:00
SangBin Cho
f1ccadbb27
Skip flaky windows object spilling tests (#17510) 2021-08-02 15:53:07 -07:00
matthewdeng
e89195bfb9
[SGD] add SGDv2 Trainer prototype implementation (#17440)
* wip

* formatting

* increase timeouts

* wip

* address comments

* comments

* fix

* address comments

* Update python/ray/util/sgd/v2/worker_group.py

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* Update python/ray/util/sgd/v2/worker_group.py

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* address comments

* formatting

* fix

* wip

* finish

* fix

* formatting

* remove reporting

* split TorchBackend

* fix tests

* address comments

* add file

* more fixes

* remove default value

* update run method doc

* add comment

* minor doc fixes

* lint

* add args to BaseWorker.execute

* address comments

* remove extra parentheses

* properly instantiate backend

* fix some of the tests

* fix torch setup

* fix type hint

* [SGD] add SGDv2 Trainer prototype implementation

* add fashion mnist test

* add HuggingFace example

* format

* formatting

* address comment

* address comments

* update comment

* Update python/ray/util/sgd/v2/examples/transformers/cluster.yaml

Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>

* update huggingface transformers

* update hugging face transformers

* fix shutdown on worker failure

* Update python/requirements/tune/requirements_tune.txt

* Update python/requirements/tune/requirements_tune.txt

* Update python/requirements/tune/requirements_tune.txt

* Update python/requirements/tune/requirements_tune.txt

* address comment and fix test

Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
2021-08-02 15:27:42 -07:00
Eric Liang
748cbbb23d
[hotfix] Parquet S3 reads broken due to pyarrow.lib.ArrowInvalid: S3 subsystem not initialized (#17492) 2021-08-02 11:48:48 -07:00
Ian Rodney
acde351cba
[GCP][GPU] Specify GPU name, not the full URL (#17409)
* convert gpu_name to URL

* update examples

* comment about scheduling

* fix node.py

* add test
2021-08-02 11:01:24 -04:00
Ian Rodney
b26ba7ba9e
[Dashboard] Allow Agent HTTP listening port to be specified. (#17392) 2021-08-02 02:09:50 -07:00
Richard Liaw
ecc7cf4c5e
[sgd] v2 documentation draft (#17253)
Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
Co-authored-by: Matthew Deng <matthew.j.deng@gmail.com>
Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
2021-08-02 01:47:14 -07:00
Eric Liang
e812691909
Support top-level tensor values in dataset (#17439) 2021-08-01 22:45:21 -07:00
Alex Wu
d9cd3800c7
Dataset speed up read (#17435) 2021-08-01 18:03:46 -07:00
Ivorius
6703091cdc
[Docs] Update example-full.yaml for ulimits as supported by docker. (#17408)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-08-01 01:36:16 -07:00
matthewdeng
3a1aed28b7
[torch] fix process group timedelta (#17468) 2021-07-30 15:47:33 -07:00
SangBin Cho
9a696cc66a
Pin aioredis version (#17472) 2021-07-30 12:04:14 -07:00
Kai Fricke
b0f00b1b4b
[default] pin aioredis < 2 (#17465) 2021-07-30 17:57:17 +01:00
Eric Liang
cd13059691
[dataset] Implement random_shuffle() and split(equal=True) (#17448) 2021-07-30 09:51:21 -07:00
Patrick Ames
131710f9f9
[autoscaler] Add support for EC2 launch templates. (#17236) 2021-07-30 08:05:59 -07:00
wanxing
705248f4ee
[CoreWorker]Remove plasma_objects_only parameter (#17384) 2021-07-30 14:48:36 +08:00
matthewdeng
58c4fe727c
[SGD] TrainerV2 API interface (#17447)
Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
2021-07-29 19:39:39 -07:00
Eric Liang
0373c54b3e
Add warning if get_gpu_ids() is called on the driver. (#17436) 2021-07-29 19:39:22 -07:00
Siyuan (Ryans) Zhuang
17c25345d0
[Workflow] Virtual actor writer - Part 2 (#17336)
* virtual actor writer

pass step_type around

simplify readonly actor

return different thing for a virtual actor

return state and output

WorkflowExecutionResult

simplify workflow execution

initial virtual actor writer

workflow_ref deeper integration

resume a step of a workflow

cache step output

Support dynamic workflow ref

* fix recovery tests

* fix

* fix get_output

* better error message

* pressure test

* fix

* verbose error message

* verbose error message

* fix get_cached_step issue

* update tests

* simplify readonly virtual actor

* fix storage tests

* workflow.resume returns state of an actor

* fix verbose

* fix comment

* make it more clear by renaming

* comment

* test init error in virtual actor

* update docs

* update docs

* update test_actor_manager/list_all

* fix comment
2021-07-29 19:29:28 -07:00
Amog Kamsetty
ff04a923ea
[SGD] v2 prototype: BackendExecutor and TorchBackend implementation (#17357)
* wip

* formatting

* increase timeouts

* wip

* address comments

* comments

* fix

* address comments

* Update python/ray/util/sgd/v2/worker_group.py

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* Update python/ray/util/sgd/v2/worker_group.py

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* address comments

* formatting

* fix

* wip

* finish

* fix

* formatting

* remove reporting

* split TorchBackend

* fix tests

* address comments

* add file

* more fixes

* remove default value

* update run method doc

* add comment

* minor doc fixes

* lint

* add args to BaseWorker.execute

* address comments

* remove extra parentheses

* properly instantiate backend

* fix some of the tests

* fix torch setup

* fix type hint

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
2021-07-29 14:38:44 -07:00
Kai Fricke
44d209dd5f
[tune] re-enable tensorboardx without torch installed (#17403) 2021-07-29 10:39:38 +01:00
xwjiang2010
93d12b1b5e
[CLI] Fix ray submit when --stop is supplied. (#17385)
* [cli] Fix `ray submit` when --stop is supplied.

* syntax sugar.
2021-07-29 00:01:25 -07:00
Eric Liang
7ed62ea0ad
Initial implementation of Dataset pipelining and docs (#17309) 2021-07-28 21:12:01 -07:00
Eric Liang
9b4bcb3bc2
[hotfix] Fix merge conflict that caused test_dataset to failed. 2021-07-28 14:58:31 -07:00
Edward Oakes
7007c6271d
[runtime_env] Gracefully fail tasks when an environment fails to be set up (#17249) 2021-07-28 15:25:02 -05:00
Yi Cheng
72abf81900
[gcs] Fix GCS related issues: ByteSizeLong and redis connection (#17373) 2021-07-28 13:01:54 -07:00
Eric Liang
4ffa549041
Support schema on read for csv/json (#17354) 2021-07-28 10:59:52 -07:00
Simon Mo
db126b24b9
[Serve] Fix response_model for class based view routes as well (#17376) 2021-07-28 09:31:02 -07:00
Antoni Baum
1f35470560
[autoscaler] GCP TPU VM autoscaler (#17278) 2021-07-27 21:24:29 -07:00
Amog Kamsetty
d01e1c15c8
[SGD] v2 prototype: `WorkerGroup` implementation (#17330)
* wip

* formatting

* increase timeouts

* address comments

* comments

* fix

* address comments

* Update python/ray/util/sgd/v2/worker_group.py

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* Update python/ray/util/sgd/v2/worker_group.py

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* address comments

* formatting

* fix

* avoid race condition

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-07-27 17:36:38 -07:00
Simon Mo
4a4210a083
Support streaming output of runtime env setup to logger/driver (#17306) 2021-07-27 16:39:15 -07:00
Edward Oakes
7225f28fff
[serve] Add Ray API stability annotations (#17295) 2021-07-27 16:00:15 -05:00
DK.Pino
2699b0f3ab
[Placement Group] Fix resource index assignment between with bundle index and without bundle index pg (#17318) 2021-07-27 13:51:02 -07:00
Alex Wu
5879e3132e
[Dataset] Support compressed files (#17355)
* .

* lint

* .

Co-authored-by: Alex Wu <alex@anyscale.com>
2021-07-27 12:35:16 -07:00
Eric Liang
e70d84953e
[hotfix] Dataset tests accidentally disabled 2021-07-27 10:40:15 -07:00
Frank Luan
a6e8497dc9
[Dataset] Sort (#17142) 2021-07-27 01:53:53 -07:00
fyrestone
57b9b1bb0f
[Dashboard] Use a dedicated RPC to check the GCS is alive (#16330)
* Dashboard check gcs is alive

* Fix dashboard hangs at exit

* ray health-check call GCS CheckAlive

* Minor fixes

Co-authored-by: 刘宝 <po.lb@antfin.com>
2021-07-27 14:05:44 +08:00
Richard Liaw
597dc08dfe
Revert "Revert "[core] remove opencensus/prometheus_exporter dependencies"" (#17254)
* Revert "Revert "[core] remove opencensus/prometheus_exporter dependencies" (#17251)"

This reverts commit 7b44dd8ecb.

* Lint

* Fix more imports

Co-authored-by: Kai Fricke <kai@anyscale.com>
2021-07-26 21:09:25 -07:00
Dmitri Gekhtman
d0e58af075
[autoscaler] Avoid race in no-updaters logic (#17328)
* Extra logic and test

* anglish
2021-07-26 16:05:33 -04:00
dependabot[bot]
4bf377ee4b
[tune](deps): Bump gym[atari] in /python/requirements/tune (#17199)
Bumps [gym[atari]](https://github.com/openai/gym) from 0.18.0 to 0.18.3.
- [Release notes](https://github.com/openai/gym/releases)
- [Commits](https://github.com/openai/gym/compare/0.18.0...0.18.3)

---
updated-dependencies:
- dependency-name: gym[atari]
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-07-26 10:53:41 -07:00
architkulkarni
756a4e7a90
[Core] [runtime env] update tests to use ray.init(runtime_env=...) and add e2e test (#17232) 2021-07-26 11:21:30 -05:00
Tao Wang
d98ec7fc4d
Remove libray_redis_module (#17283) 2021-07-25 23:15:29 -07:00