Commit graph

1755 commits

Author SHA1 Message Date
Sven Mika
e37afe0425
[RLlib; Docs] Auto API reference pages overhaul: rllib/policy and rllib/agents packages. (#20537) 2021-11-25 09:35:19 +01:00
Yi Cheng
e24cee80e8
[docs] add dask compatibility for 1.9.0 (#20707) 2021-11-24 15:00:17 -08:00
Guyang Song
53630ee03b
Revert "Revert "[runtime env] redefine runtime env to protobuf"" and fix windows compiling (#20692)
- Fix windows compiling and revert https://github.com/ray-project/ray/pull/20641
- Seems the pr https://github.com/ray-project/ray/pull/20670 can solve the windows compiling issue.
2021-11-24 09:01:01 -08:00
Eric Liang
163620ba94
[data] Make block splitting feature flagged off by default (#20660)
block splitting and makes it off by default. This makes it easier to debug problems potentially related to this feature. Criteria for enabling by default:
- We're confident all nightly tests pass (currently, there may be an issue with large-scale groupby with block splitting).
- We're confident lineage-based reconstruction can work with block splitting.
2021-11-23 19:46:18 -08:00
Ameer Haj Ali
e3e9697bea
[docs] autoscaler/K8s hiring roles (#20621)
* we are hiring

* fixes as philipp requested
2021-11-23 14:56:22 -08:00
Jules S. Damji
5d920fb1ee
[docs][job submission] Fixed minor editorial nits (#20654) 2021-11-22 22:06:31 -06:00
Alex Wu
9388d28233
Revert "[runtime env] redefine runtime env to protobuf" (#20641)
Reverts #19511

Breaks windows compilation
2021-11-22 13:11:30 -08:00
Kai Fricke
236951ee4c
[tune] Introduce TrialCheckpoint class, making checkpoint down/upload easie (#20585)
This PR introduces a TrialCheckpoint class which is returned e.g. by ExperimentAnalysis.best_checkpoint. The class enables easy access to cloud storage locations (rather than just local directories before). It also comes with utilities to download, upload, and save trial checkpoints to local and cloud targets.
2021-11-22 14:16:26 +00:00
matthewdeng
caa4ff3783
[train][datasets] update example and remove dask (#20592) 2021-11-21 17:06:44 -08:00
Guyang Song
ad56b9b432
[runtime env] redefine runtime env to protobuf (#19511) 2021-11-20 16:54:42 +08:00
Jiao
12c11894e8
[Jobs] Add documentation for ray job submission (#20530) 2021-11-19 16:59:05 -08:00
architkulkarni
42085fd3d5
[runtime env] [Doc] Add concepts and basic workflows (#20222)
Address followup comments from https://github.com/ray-project/ray/pull/19863
- Add short "Concepts" section
- Add more section headings to break up the text
- Add "Workflow: Local Files" example
- Add "Workflow: Library development" example
2021-11-19 13:58:50 -08:00
Chen Shen
77a8723bba
[Core][actor out-of-order execution 6/n] plumbing work to make it work e2e (#20177)
This PR is the last PR that enables out of order execution. Previous PR: #20176

In this PR specifically, we added an execute_out_of_order option to .options call, which creates the actor with both out_of_order_submit_queue and out_of_order_scheduling queue.

this PR also added @simon-mo original case for testing.
2021-11-19 11:05:18 -08:00
shrekris-anyscale
b910d7e9e1
[runtime_env] Remove deprecated username-password GitHub use case from doc (#20558) 2021-11-19 10:03:44 -06:00
Sven Mika
9d5c4a9d21
[RLlib] API reference pages: rllib/env package only. (#20486) 2021-11-19 10:06:40 +01:00
Alex Wu
88266a6fce
Revert "Revert "[Docs] More detailed M1 Mac installation instructions"" (#20549)
Reverts ray-project/ray#20547
2021-11-18 20:18:37 -08:00
Eric Liang
65a8698e82
Raise the dataset block size limit to 2GiB (#20551)
The default block size of 500MiB seems too low for some common workloads, e.g. shuffling 500GB. This creates 1000 blocks which means 1 million intermediate shuffle objects until we implement #20500.
2021-11-18 19:36:10 -08:00
Richard Liaw
c964455642
Revert "[Docs] More detailed M1 Mac installation instructions" (#20547)
Reverts ray-project/ray#20512 due to lint errors.
2021-11-18 12:06:57 -08:00
Antoni Baum
0b14f38ac7
[tune] Multi-objective support for Optuna (#20489)
This PR adds multi-objective support for Optuna searchers, including a test and example.

Co-authored-by: gjoliver <jungong@anyscale.com>
2021-11-18 18:47:29 +00:00
Alex Wu
540c9e35d1
[Docs] More detailed M1 Mac installation instructions (#20512)
This PR adds more detail the M1 mac installation instructions following the bug bash.
2021-11-18 09:35:43 -08:00
Sven Mika
7a585fb275
[RLlib; Documentation] RLlib README overhaul. (#20249) 2021-11-18 18:08:40 +01:00
shrekris-anyscale
65a023ef71
[runtime_env][docs] Add documentation on using remote URIs for runtime environments (#20352) 2021-11-17 23:17:48 -06:00
Amog Kamsetty
9796ae56d5
[Train][Data] Change usages of iter_datasets to iter_epochs (#20487) 2021-11-17 18:05:51 -08:00
Yi Cheng
cbf5826040
[workflow] Fix workflow event doc typo (#20465)
In the example, it says `after_checkpoint`, but this should be `event_checkpointed`
2021-11-17 16:18:20 -08:00
Qing Wang
e01f14d7df
[DOC] Add namespace doc for Java part. (#20428)
Add namespace doc for Java part.
2021-11-17 23:02:47 +08:00
Simon Mo
18d605fa7c
[Serve] Add experimental CLI for serve deploy (#20371) 2021-11-16 20:22:09 -08:00
Larry
454db6902c
[Java] Add timeout parameter for Ray.get() API (#20282)
Why are these changes needed?

Add timeout(ms) param for Java ray.get. The API changes have been updated to doc ([Ray Core Walkthrough]->[Fetching Results]).

eg:
ObjectRef<Integer> objRef = Ray.put(1);
objRef.get(1000) 
Ray.get(Ray.task(MyRayApp::slowFunction).remote(), 3000)

Related issue number
#20247
2021-11-17 11:02:17 +08:00
Simon Mo
5fccad4cc9
[Serve] Add experimental pipeline docs (#20292) 2021-11-16 16:13:55 -08:00
Richard Liaw
cf357f6bce
[docs] Add a talks section for ray.data (#20444) 2021-11-16 14:30:08 -08:00
Antoni Baum
3f9ded55f7
[tune] Merge Analysis into ExperimentAnalysis (#20197)
Co-authored-by: Kai Fricke <kai@anyscale.com>
2021-11-16 16:47:12 +00:00
Amog Kamsetty
4f88796d5a
[Train] Move to beta (#20378) 2021-11-16 08:19:30 -08:00
Kai Fricke
3e6ba5d6d2
Revert "Revert [RLlib] POC: PGTrainer class that works by sub-classing, not trainer_template.py." (#20285)
* Revert "Revert "[RLlib] POC: `PGTrainer` class that works by sub-classing, not `trainer_template.py`. (#20055)" (#20284)"
This reverts commit 246787cdd9.
Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-11-16 12:26:47 +01:00
Eric Liang
460cf86858
Split blocks automatically into 500MB chunks on file read and transformation (#20235)
This PR adds support for automatic block splitting on read and map transforms, to keep block size bounded to ~500MiB. This avoids potential OOM situations where a map task may consume too much intermediate Python heap memory, or too much object store shared memory for one block.
2021-11-15 22:25:11 -08:00
Antoni Baum
ec81f52061
[Docs] Fix typo in C++ Placement Group example (#20386) 2021-11-16 08:19:09 +09:00
Will Drevo
fa878e2d4d
Added example to user guide for cloud checkpointing (#20045)
Co-authored-by: will <will@anyscale.com>
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
Co-authored-by: Kai Fricke <kai@anyscale.com>
2021-11-15 15:43:06 +00:00
Amog Kamsetty
a74cf7ff1c
[Train] Torch Prepare utilities (#20254)
* update

* formatting

* fix failures

* fix session tests

* address comments

* add to api docs

* package refactor

* wip

* wip

* wip

* finish

* finish

* fix

* comment

* fix

* install horovod for docs

* address comment

* Update python/ray/train/session.py

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>

* Update python/ray/train/torch.py

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>

* address comments

* try fix docs

* fix doc build failure

* fix

* fix

* fix

* try fix doc highlighting

* fix docs

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
2021-11-15 07:34:17 -08:00
Qing Wang
1172195571
[Java] Remove global named actor and global pg (#20135)
This PR removes global named actor and global PGs.

I believe these APIs are not used widely in OSS.
CPP part is not included in this PR.
@kfstorm @clay4444 @raulchen Please take a look if this change is reasonable.


IMPORTANT NOTE: This is a Java API change and will lead backward incompatibility in Java global named actor and global PG usage.

CPP part is not included in this PR.
INCLUDES:

 Remove setGlobalName() and getGlobalActor() APIs.
 Remove getGlobalPlacementGroup() and setGlobalPG
 Add getActor(name, namespace) API
 Add getPlacementGroup(name, namespace) API
 Update doc pages.
2021-11-15 16:28:53 +08:00
matthewdeng
e22632dabc
[train] wrap BackendExecutor in ray.remote() (#20123)
* [train] wrap BackendExecutor in ray.remote()

* wip

* fix trainer tests

* move CheckpointManager to Trainer

* [tune] move force_on_current_node to ml_utils

* fix import

* force on head node

* init ray

* split test files

* update example

* move tests to ray client

* address comments

* move comment

* address comments
2021-11-13 15:30:44 -08:00
Sven Mika
e5ead6a4b0
[RLlib; Documentation] Minor fixes "rllib in 60s" and per-feature sigils. (#20248) 2021-11-13 22:10:47 +01:00
Amog Kamsetty
65a17da2ec
[Train] Refactor Backends (#20312)
* wip

* finish

* comment

* fix

* install horovod for docs

* address comment

* fix doc build failure
2021-11-13 11:05:53 -08:00
Antoni Baum
1b867520e6
[docs]Add pyarrow as a dependency (#20320) 2021-11-13 16:00:58 +00:00
matthewdeng
e77cc926be
[train] minor doc updates (#20271) 2021-11-12 17:20:23 -08:00
Tricia Fu
e59c14117f
[Doc] [Serve] Add summary sub header to each page (#20231) 2021-11-12 14:18:42 -08:00
xwjiang2010
cdf70c2900
[Tune] Remove legacy resources implementations in Runner and Executor. (#19773) 2021-11-12 12:33:39 -08:00
Siyuan (Ryans) Zhuang
3b62388a9a
[Workflow] Workflow tail recursion optimization (#19928)
* tail recursion optimization
2021-11-12 09:13:40 -08:00
Kai Fricke
246787cdd9
Revert "[RLlib] POC: PGTrainer class that works by sub-classing, not trainer_template.py. (#20055)" (#20284)
This reverts commit 6f85af435f.
2021-11-12 13:09:43 +00:00
Kai Fricke
d88fdd6e38
[tune] refactor SyncConfig (#20155) 2021-11-12 09:36:15 +00:00
Michael Galarnyk
dbeb2e2f73
Add Ray Serve Blogs to Doc(#19846)
The Serving ML Models in Production blog links is inline with the latest Ray Summit talk on Ray Serve.
2021-11-11 15:10:36 -08:00
Edward Oakes
59698aa89c
[Serve] add survey link (#20230) 2021-11-11 15:10:10 -08:00
Jules S. Damji
71a162d8ab
Fixed code snippet to include config parameter and a minor typo (#20193)
Signed-off-by: Jules S.Damji <jules@anyscale.com>

Co-authored-by: Jules S.Damji <jules@anyscale.com>
2021-11-11 18:37:03 +00:00