Guyang Song
ad56b9b432
[runtime env] redefine runtime env to protobuf ( #19511 )
2021-11-20 16:54:42 +08:00
Jiao
12c11894e8
[Jobs] Add documentation for ray job submission ( #20530 )
2021-11-19 16:59:05 -08:00
architkulkarni
42085fd3d5
[runtime env] [Doc] Add concepts and basic workflows ( #20222 )
...
Address followup comments from https://github.com/ray-project/ray/pull/19863
- Add short "Concepts" section
- Add more section headings to break up the text
- Add "Workflow: Local Files" example
- Add "Workflow: Library development" example
2021-11-19 13:58:50 -08:00
Chen Shen
77a8723bba
[Core][actor out-of-order execution 6/n] plumbing work to make it work e2e ( #20177 )
...
This PR is the last PR that enables out of order execution. Previous PR: #20176
In this PR specifically, we added an execute_out_of_order option to .options call, which creates the actor with both out_of_order_submit_queue and out_of_order_scheduling queue.
this PR also added @simon-mo original case for testing.
2021-11-19 11:05:18 -08:00
shrekris-anyscale
b910d7e9e1
[runtime_env] Remove deprecated username-password GitHub use case from doc ( #20558 )
2021-11-19 10:03:44 -06:00
Sven Mika
9d5c4a9d21
[RLlib] API reference pages: rllib/env
package only. ( #20486 )
2021-11-19 10:06:40 +01:00
Alex Wu
88266a6fce
Revert "Revert "[Docs] More detailed M1 Mac installation instructions"" ( #20549 )
...
Reverts ray-project/ray#20547
2021-11-18 20:18:37 -08:00
Eric Liang
65a8698e82
Raise the dataset block size limit to 2GiB ( #20551 )
...
The default block size of 500MiB seems too low for some common workloads, e.g. shuffling 500GB. This creates 1000 blocks which means 1 million intermediate shuffle objects until we implement #20500 .
2021-11-18 19:36:10 -08:00
Richard Liaw
c964455642
Revert "[Docs] More detailed M1 Mac installation instructions" ( #20547 )
...
Reverts ray-project/ray#20512 due to lint errors.
2021-11-18 12:06:57 -08:00
Antoni Baum
0b14f38ac7
[tune] Multi-objective support for Optuna ( #20489 )
...
This PR adds multi-objective support for Optuna searchers, including a test and example.
Co-authored-by: gjoliver <jungong@anyscale.com>
2021-11-18 18:47:29 +00:00
Alex Wu
540c9e35d1
[Docs] More detailed M1 Mac installation instructions ( #20512 )
...
This PR adds more detail the M1 mac installation instructions following the bug bash.
2021-11-18 09:35:43 -08:00
Sven Mika
7a585fb275
[RLlib; Documentation] RLlib README overhaul. ( #20249 )
2021-11-18 18:08:40 +01:00
shrekris-anyscale
65a023ef71
[runtime_env][docs] Add documentation on using remote URIs for runtime environments ( #20352 )
2021-11-17 23:17:48 -06:00
Amog Kamsetty
9796ae56d5
[Train][Data] Change usages of iter_datasets
to iter_epochs
( #20487 )
2021-11-17 18:05:51 -08:00
Yi Cheng
cbf5826040
[workflow] Fix workflow event doc typo ( #20465 )
...
In the example, it says `after_checkpoint`, but this should be `event_checkpointed`
2021-11-17 16:18:20 -08:00
Qing Wang
e01f14d7df
[DOC] Add namespace doc for Java part. ( #20428 )
...
Add namespace doc for Java part.
2021-11-17 23:02:47 +08:00
Simon Mo
18d605fa7c
[Serve] Add experimental CLI for serve deploy
( #20371 )
2021-11-16 20:22:09 -08:00
Larry
454db6902c
[Java] Add timeout parameter for Ray.get() API ( #20282 )
...
Why are these changes needed?
Add timeout(ms) param for Java ray.get. The API changes have been updated to doc ([Ray Core Walkthrough]->[Fetching Results]).
eg:
ObjectRef<Integer> objRef = Ray.put(1);
objRef.get(1000)
Ray.get(Ray.task(MyRayApp::slowFunction).remote(), 3000)
Related issue number
#20247
2021-11-17 11:02:17 +08:00
Simon Mo
5fccad4cc9
[Serve] Add experimental pipeline docs ( #20292 )
2021-11-16 16:13:55 -08:00
Richard Liaw
cf357f6bce
[docs] Add a talks section for ray.data ( #20444 )
2021-11-16 14:30:08 -08:00
Antoni Baum
3f9ded55f7
[tune] Merge Analysis
into ExperimentAnalysis
( #20197 )
...
Co-authored-by: Kai Fricke <kai@anyscale.com>
2021-11-16 16:47:12 +00:00
Amog Kamsetty
4f88796d5a
[Train] Move to beta ( #20378 )
2021-11-16 08:19:30 -08:00
Kai Fricke
3e6ba5d6d2
Revert "Revert [RLlib] POC: PGTrainer
class that works by sub-classing, not trainer_template.py
." ( #20285 )
...
* Revert "Revert "[RLlib] POC: `PGTrainer` class that works by sub-classing, not `trainer_template.py`. (#20055 )" (#20284 )"
This reverts commit 246787cdd9
.
Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-11-16 12:26:47 +01:00
Eric Liang
460cf86858
Split blocks automatically into 500MB chunks on file read and transformation ( #20235 )
...
This PR adds support for automatic block splitting on read and map transforms, to keep block size bounded to ~500MiB. This avoids potential OOM situations where a map task may consume too much intermediate Python heap memory, or too much object store shared memory for one block.
2021-11-15 22:25:11 -08:00
Antoni Baum
ec81f52061
[Docs] Fix typo in C++ Placement Group example ( #20386 )
2021-11-16 08:19:09 +09:00
Will Drevo
fa878e2d4d
Added example to user guide for cloud checkpointing ( #20045 )
...
Co-authored-by: will <will@anyscale.com>
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
Co-authored-by: Kai Fricke <kai@anyscale.com>
2021-11-15 15:43:06 +00:00
Amog Kamsetty
a74cf7ff1c
[Train] Torch Prepare utilities ( #20254 )
...
* update
* formatting
* fix failures
* fix session tests
* address comments
* add to api docs
* package refactor
* wip
* wip
* wip
* finish
* finish
* fix
* comment
* fix
* install horovod for docs
* address comment
* Update python/ray/train/session.py
Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
* Update python/ray/train/torch.py
Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
* address comments
* try fix docs
* fix doc build failure
* fix
* fix
* fix
* try fix doc highlighting
* fix docs
Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
2021-11-15 07:34:17 -08:00
Qing Wang
1172195571
[Java] Remove global named actor and global pg ( #20135 )
...
This PR removes global named actor and global PGs.
I believe these APIs are not used widely in OSS.
CPP part is not included in this PR.
@kfstorm @clay4444 @raulchen Please take a look if this change is reasonable.
IMPORTANT NOTE: This is a Java API change and will lead backward incompatibility in Java global named actor and global PG usage.
CPP part is not included in this PR.
INCLUDES:
Remove setGlobalName() and getGlobalActor() APIs.
Remove getGlobalPlacementGroup() and setGlobalPG
Add getActor(name, namespace) API
Add getPlacementGroup(name, namespace) API
Update doc pages.
2021-11-15 16:28:53 +08:00
Sven Mika
e5ead6a4b0
[RLlib; Documentation] Minor fixes "rllib in 60s" and per-feature sigils. ( #20248 )
2021-11-13 22:10:47 +01:00
Amog Kamsetty
65a17da2ec
[Train] Refactor Backends ( #20312 )
...
* wip
* finish
* comment
* fix
* install horovod for docs
* address comment
* fix doc build failure
2021-11-13 11:05:53 -08:00
matthewdeng
e77cc926be
[train] minor doc updates ( #20271 )
2021-11-12 17:20:23 -08:00
Tricia Fu
e59c14117f
[Doc] [Serve] Add summary sub header to each page ( #20231 )
2021-11-12 14:18:42 -08:00
xwjiang2010
cdf70c2900
[Tune] Remove legacy resources implementations in Runner and Executor. ( #19773 )
2021-11-12 12:33:39 -08:00
Siyuan (Ryans) Zhuang
3b62388a9a
[Workflow] Workflow tail recursion optimization ( #19928 )
...
* tail recursion optimization
2021-11-12 09:13:40 -08:00
Kai Fricke
246787cdd9
Revert "[RLlib] POC: PGTrainer
class that works by sub-classing, not trainer_template.py
. ( #20055 )" ( #20284 )
...
This reverts commit 6f85af435f
.
2021-11-12 13:09:43 +00:00
Kai Fricke
d88fdd6e38
[tune] refactor SyncConfig ( #20155 )
2021-11-12 09:36:15 +00:00
Michael Galarnyk
dbeb2e2f73
Add Ray Serve Blogs to Doc( #19846 )
...
The Serving ML Models in Production blog links is inline with the latest Ray Summit talk on Ray Serve.
2021-11-11 15:10:36 -08:00
Edward Oakes
59698aa89c
[Serve] add survey link ( #20230 )
2021-11-11 15:10:10 -08:00
Jules S. Damji
71a162d8ab
Fixed code snippet to include config parameter and a minor typo ( #20193 )
...
Signed-off-by: Jules S.Damji <jules@anyscale.com>
Co-authored-by: Jules S.Damji <jules@anyscale.com>
2021-11-11 18:37:03 +00:00
Dmitri Gekhtman
8971422d8f
[autoscaler] Use drain node api in autoscaler before terminating nodes ( #20013 )
...
* wip
* Draft
* Use bytest for node id
* remove stray helm change
* fix autoscaler init arg
* don't forget to instantiate new load metrics dict
* remove extraneous diff
* Timeout, comments, function signature.
* typo
* another comment
* tweak
* docstring
* shorter timeout
* Use a better error code
* missing self
* Dedent example
* Add drain node prometheus metric.
* comment
* Update tests part 1: test_autoscaler.py
* Update tests part 2: test_resource_demand_scheduler
* lint
* Update tests part 3: test_autoscaling_policy
* Unit tests for new Prometheus metric and DrainNode error handling.
* comment
* removed unused function
* Try adding ability to mock out process termination to fake node provider
* Add integration test.
* fix
* fix
* lint
* Improve log message
* fix
* Simplify test
* Fix doc example
* remove unused dict
* Mock out process termination in a subclass
* Add add doc string and comment explaining prune active ips.
* Comment: wtf is use_node_id_as_ip
* one more comment
* more explanation
* period
* tweak
2021-11-11 08:31:40 -08:00
Sven Mika
6f85af435f
[RLlib] POC: PGTrainer
class that works by sub-classing, not trainer_template.py
. ( #20055 )
2021-11-11 12:16:20 +01:00
Will Drevo
2fdb1c46c7
[RLlib; Documentation] Added atari pip installs to Pong-v0 example. ( #20225 )
...
* Added imports to Pongv0 example
* Added comment
* Apply suggestions from code review
Co-authored-by: will <will@anyscale.com>
Co-authored-by: Sven Mika <sven@anyscale.io>
2021-11-11 09:08:02 +01:00
Tobias Kaymak
893f57591d
[serve] Add Google Cloud Storage as a backend ( #20104 )
2021-11-10 19:45:19 -08:00
Edward Oakes
082a4af3e6
[serve] Remove lingering backend/endpoint wording in docs ( #20229 )
2021-11-10 16:49:29 -08:00
Sven Mika
ebd56b57db
[RLlib; documentation] "RLlib in 60sec" overhaul. ( #20215 )
2021-11-10 22:20:06 +01:00
matthewdeng
790e22f9ad
[tune] move force_on_current_node to ml_utils ( #20211 )
2021-11-10 10:21:24 -08:00
Sven Mika
143d23a278
[RLlib] Issue 20062: Action inference examples missing ( #20144 )
2021-11-10 18:49:06 +01:00
Kim Pevey
82a5bf68fa
[Docs] Add note for multi-node on Windows ( #20184 )
...
* add note for multi-node on Windows
* update message
Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>
2021-11-09 16:02:01 -08:00
Kai Fricke
9c2b8c8501
[tune] Deprecate DurableTrainable ( #19880 )
2021-11-08 20:56:07 +00:00
Amog Kamsetty
b1f24768a1
[Tune] More fixes to PTL Tutorial ( #20065 )
...
* ptl-fix-2
* improve
* fix
2021-11-08 09:13:44 -08:00