Commit graph

2293 commits

Author SHA1 Message Date
Clark Zinzow
399334d53c
[Datasets] Overhaul "Accessing Datasets" feature guide. (#24963)
This PR overhauls the "Accessing Datasets", adding proper coverage of each data consuming methods, including the ML framework exchange APIs (to_torch() and to_tf()).
2022-05-19 12:50:00 -07:00
kourosh hakhamaneshi
3815e52a61
[RLlib] Agents to algos: DQN w/o Apex and R2D2, DDPG/TD3, SAC, SlateQ, QMIX, PG, Bandits (#24896) 2022-05-19 18:30:42 +02:00
Kai Fricke
9a8c8f4889
[air/docs] Move some examples from ml/examples to docs (#24959)
This moves the basic LightGBM, Sklearn, and XGBoost examples from the examples/ folder to the docs. We keep a symlink in the examples folder.
2022-05-19 14:01:49 +01:00
Kai Fricke
88dfc20c6e
[docs] Remove link do Windows py 3.6 wheels (#24954)
Windows wheels are not being built for Python 3.6. This removes the link in the docs, currently breaking linkcheck.
2022-05-19 08:40:14 +01:00
Qing Wang
af418fb729
[Java][API CHANGE] Move exception to api module. (#24540)
This PR moves all exception classes from runtime module to api module. It's aiming to eliminate the confusion about ray exceptions. It means that Ray users don't need to touch runtime module when API programming after this PR.

Note that this should be merged onto 2.0.
2022-05-19 10:18:20 +08:00
Clark Zinzow
0b6505e8c6
[Datasets] Miscellaneous GA docs P0s. (#24891)
This PR knocks off a few miscellaneous GA docs P0s given in our docs tracker. Namely:

- Documents Datasets resource allocation model.
- De-emphasizes global/windowed shuffling.
- Documents lazy execution mode, and expands our execution model docs in general.
2022-05-18 16:17:48 -07:00
Jian Xiao
9fe4dba4ad
Revamp the Getting Started page for Dataset (#24860)
This is part of the Dataset GA doc fix effort to update/improve the documentation.
This PR revamps the Getting Started page.

What are the changes:
- Focus on basic/core features that are bread-and-butter for users, leave the advanced features out
- Focus on high level introduction, leave the detailed spec out (e.g. what are possible batch_types for map_batches() API)
- Use more realistic (yet still simple) data example that's familiar to people (IRIS dataset in this case)
- Use the same data example throughout to make it context-switch free
- Use runnable code rather than faked
- Reference to the code from doc, instead of inlining them in the doc

Co-authored-by: Ubuntu <ubuntu@ip-172-31-32-136.us-west-2.compute.internal>
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2022-05-18 13:46:23 -07:00
SangBin Cho
f228245520
[Placement group] Update the old placement group API usage to the new scheduling_strategy based API (#24544)
Documentation should use the new API, not the old one that will be deprecated
2022-05-18 09:41:51 -07:00
Antoni Baum
1d5e6d908d
[AIR] HuggingFace Text Classification example (#24402) 2022-05-18 09:35:12 -07:00
Max Pumperla
3ffcb81bd3
[docs] remove non-functional lbfgs example (#24727)
This example simply doesn't run as is. We can bring it back up again later, if it makes sense. But it's not clear what the variables used there, like actor are. Fixes #21328

Signed-off-by: Max Pumperla <max.pumperla@googlemail.com>
2022-05-18 10:53:14 +01:00
Max Pumperla
7844aeafde
[docs] add robots txt (#24726)
as per discussion on slack, this should avoid having old versions of the docs indexed by search engines, see readthedocs/readthedocs.org#2430 for reference.

Signed-off-by: Max Pumperla <max.pumperla@googlemail.com>
2022-05-18 10:52:22 +01:00
Sven Mika
8f50087908
[RLlib] AlphaZero uses training_iteration API. (#24507) 2022-05-18 09:58:25 +02:00
Simon Mo
9b2086c726
[Serve] Alias ray.serve.dag.InputNode (#24630) 2022-05-17 22:35:51 -07:00
Clark Zinzow
26ea82d3a6
[Datasets] Add basic data ecosystem overview, user guide links, other data processing options card. (#23346) 2022-05-17 20:57:42 -07:00
Simon Mo
c3ac6fcf3f
Bump Ray Version from 2.0.0.dev0 to 3.0.0.dev0 (#24894) 2022-05-17 19:31:05 -07:00
Clark Zinzow
4444150c29
[Datasets] Overhaul of "Creating Datasets" feature guide. (#24831)
This PR is a general overhaul of the "Creating Datasets" feature guide, providing complete coverage of all (public) dataset creation APIs and highlighting features and quirks of the individual APIs, data modalities, storage backends, etc. In order to keep the page from getting too long and keeping it easy to navigate, tabbed views are used heavily.
2022-05-17 16:23:42 -07:00
Eric Liang
437df9431c
[docs] Remove bad suggestions to use local_mode or num_cpus in init (#24827) 2022-05-17 12:55:04 -07:00
Max Pumperla
ac850055bf
use absolute url for search, fixes #24808 (#24877)
Signed-off-by: Max Pumperla <max.pumperla@googlemail.com>
2022-05-17 13:22:53 +01:00
Antoni Baum
c74886a55e
[CI] Run doc notebooks in CI (#24816)
Currently, we are not running doc notebooks in CI due to a bazel misconfiguration - we are using `glob` in a top level package in order to get the paths for the notebooks, but those are contained inside subpackages, which glob purposefully ignores. Therefore, the lists of notebooks to run are empty. This PR fixes that by:
* Running the `py_test_run_all_notebooks` macro inside the relevant subpackages
* Editing the `test_myst_doc.py` script to allow for recursive search for the target file, allowing to deal with mismatches between `name` and `data` arguments in `py_test_run_all_notebooks`
* Setting the `allow_empty=False` flag inside `glob` calls in our macros to ensure that this oversight is caught early
* Enabling detection of changes in doc folder for `*.ipynb` and `BUILD` files

This PR also adds a GPU runner for doc tests, allowing one of our examples to pass - and setting the infra for more to come. Finally, a misconfigured path for one set of doc tests is also fixed.
2022-05-17 09:50:42 +01:00
Clark Zinzow
ea635aecd2
[Datasets] Support tensor columns in to_tf and to_torch. (#24752)
This PR adds support for tensor columns in the to_tf() and to_torch() APIs.

For Torch, this involves an explicit extension array check and (zero-copy) conversion of the tensor column to a NumPy array before converting the column to a Torch tensor.

For TensorFlow, this involves bypassing df.values when converting tensor feature columns to NumPy arrays, instead manually creating a single NumPy array from the column Series.

In both cases, I think that the UX around heterogeneous feature columns and squeezing the column dimension could be improved, but I'm saving that for a future PR.
2022-05-17 01:11:00 -07:00
Clark Zinzow
ef870e936c
[Datasets] Change range_arrow() API to range_table() (#24704)
This PR changes the ray.data.range_arrow() to ray.data.range_table(), making the Arrow representation an implementation detail.
2022-05-17 01:09:45 -07:00
Eric Liang
a565948094
[docs] After careful consideration, choose the lesser of two evils and set white-space: pre-wrap #24873 2022-05-16 22:49:00 -07:00
Antoni Baum
7158aeda33
[Datasets] Add Dataset.split_proportionately and ray.ml.train_test_split (#24476)
Adds a Dataset.split_proportionately method that allows the user to split a dataset using proportions. This is a very common use-case for eg. train-test splitting. The implementation is a thin wrapper over Dataset.split_at_indices.

Additionally, this PR adds a ray.ml.train_test_split function intended to provide a familiar API to ML practitioners.
2022-05-16 20:47:29 -07:00
Siyuan (Ryans) Zhuang
2766284b14
[workflow] Update workflow doc and examples (#24804)
* update doc of workflow options

* update examples and make sure they are working
2022-05-16 15:41:14 -07:00
Sihan Wang
830af1f14d
[Serve/Doc] Add combine nodes with same input in parallel pattern (#24760) 2022-05-16 11:00:54 -07:00
Edward Oakes
5685d2e0b6
[serve][docs] Rework landing page to match Tune's structure (#24693)
Updates the landing page to match the format and content of Tune's. Added some shorter quickstarts and sharpened up the messaging in our "Why choose Serve?" section, those are the main content changes.

I also moved all of the `doc_code` into one directory and added a bazel target that should run all of the examples added there. Split into a separate PR: https://github.com/ray-project/ray/pull/24736.
2022-05-16 11:38:43 -05:00
Edward Oakes
f99aa5cb40
[serve][docs] Unify doc_code directories and add bazel target (#24736)
Split off from https://github.com/ray-project/ray/pull/24693/, unifying the redundant directories we had and making sure all `serve/doc_code` snippets are run in CI.
2022-05-16 09:49:42 -05:00
Kai Fricke
96da5dc776
[rllib] Fix some missing agent->algorithm doc changes (#24841)
#24797 missed some doc changes that popped up in broken linkcheck. Note that there could be others that were not caught by this.
2022-05-16 11:52:49 +01:00
Jun Gong
68a9a33386
[RLlib] Retry agents -> algorithms. with proper doc changes this time. (#24797) 2022-05-16 09:45:32 +02:00
Ofey Chan
c6c72a6f89
[Doc] [Core] Enhance actor queue doc code (#24532)
Why are these changes needed?
Current documentation code in Message passing using Ray Queue can be enhanced, for better demonstration of the message queue.

It creates 10 tasks but only 2 consumers, and each consumer consumes one task then exit. Therefore, the output is a bit vague:

(consumer pid=1022727) got work 0
(consumer pid=1022595) got work 1
So I make consumer working until the queue is empty. The output shows consumer 1 and 2 working in parallel:

(consumer pid=1030876) consumer 0 got work 0
(consumer pid=1030876) consumer 0 got work 1
(consumer pid=1030876) consumer 0 got work 3
(consumer pid=1030876) consumer 0 got work 5
(consumer pid=1030876) consumer 0 got work 7
(consumer pid=1030876) consumer 0 got work 9
(consumer pid=1030949) consumer 1 got work 2
(consumer pid=1030949) consumer 1 got work 4
(consumer pid=1030949) consumer 1 got work 6
(consumer pid=1030949) consumer 1 got work 8
P.S. Also fix a typo in doc.
2022-05-15 17:38:21 -07:00
Kai Fricke
3f9eea00af
[ci/linkcheck] Fix broken gym envs link (#24817)
These are currently broken in CI.
2022-05-15 18:59:31 +01:00
Richard Liaw
41de6acd10
[air] fix-docs (#24792)
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2022-05-13 15:58:31 -07:00
Chen Shen
cc21979998
Revert "[Datasets] Add documentation for bulk parquet read API and file metadata providers. (#24354)" (#24785)
This reverts commit e2ee2140f9.
2022-05-13 11:18:30 -07:00
Archit Kulkarni
738da639d9
[runtime env] Add FAQ for runtime_env (#24412)
Adds some frequently asked user questions to the docs.

Co-authored-by: shrekris-anyscale <92341594+shrekris-anyscale@users.noreply.github.com>
2022-05-13 11:03:58 -05:00
Chen Shen
9b1154dce4
fix inter (#24761) 2022-05-13 08:18:22 -07:00
Kai Fricke
06ef672699
[ci/docs] Fix broken linkcheck URL (#24777)
The hyperband blogpost URL is broken, link to other blog post
2022-05-13 15:58:36 +01:00
Kai Fricke
a92ce9721c
[air] Example to run tuning and analyze results (#24602)
This is a notebook showing how to tune an xgboost model and analyze the results.

Also adds a `get_dataframe()` method to `ResultsGrid` to fetch the trial results.

Depends on #24483 for toctree.
2022-05-13 15:22:36 +01:00
Max Pumperla
cd5218f831
[docs] Tune examples better navigation, minor fixes (#24733)
Replaces #24225 and adds example navigation

Signed-off-by: Max Pumperla <max.pumperla@googlemail.com>
2022-05-13 14:39:18 +01:00
Kai Fricke
9e21e392ee
[air/doc] Add examples doc structure (#24770)
Add the basic toc/structure for Ray AIR examples
2022-05-13 11:56:34 +01:00
Richard Liaw
ce5a27e31b
[docs] Add initial AIR documentation (#24483)
Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2022-05-13 01:29:59 -07:00
Chen Shen
02042e1305
[Core] Revert "[Core] Batch PinObjectIDs requests from Raylet client (#24322)" and "[Core] rename PinObjectIDs to PinObjectID (#24451)" (#24741)
we noticed performance regression for nightly test shuffle_1tb_5000_partitions. concretely the test previously takes 1h10m to finish but now it takes more than 2h30minutes.

after investigation we believe mostly likely 5a82640 caused the regression.

here is the run before 5a82640: https://console.anyscale.com/o/anyscale-internal/projects/prj_SVFGM5yBqK6DHCfLtRMryXHM/clusters/ses_1ejykCYq9BnkC5v8ZJjrqc2b?command-history-section=command_history
here is the run after 5a82640:
https://console.anyscale.com/o/anyscale-internal/projects/prj_SVFGM5yBqK6DHCfLtRMryXHM/clusters/ses_Lr5N8jVRdHCWJWYA2SRaUkzZ?command-history-section=command_history
2022-05-12 16:17:40 -07:00
Jiajun Yao
0a0c52e351
[Doc] Improve doc for task locality aware scheduling (#24717) 2022-05-12 13:42:48 -07:00
Patrick Ames
e2ee2140f9
[Datasets] Add documentation for bulk parquet read API and file metadata providers. (#24354)
API doc updates for #23179 and #24094. All data docs related to #23179 should be up-to-date once this PR and #24203 are merged.
2022-05-12 10:19:33 -07:00
Max Pumperla
42e877d2f7
[docs] full results on enter, fixes #24519 (#24722)
Signed-off-by: Max Pumperla <max.pumperla@googlemail.com>
2022-05-12 10:00:35 -07:00
Amog Kamsetty
c4bf38daa6
[AIR] Add AIR install extra (#24701)
Closes #23439
2022-05-12 09:25:52 -07:00
Edward Oakes
fb71743935
[serve] Convert "End-to-end Tutorial" to "Getting Started" (#24690) 2022-05-12 08:44:43 -07:00
Guilherme
bb0bcbace0
[docs] Fix example in ray-get-loop.rst (#24609) 2022-05-12 00:05:57 -07:00
Sihan Wang
c5bfe1d694
[Serve] Add deployment graph cookbook (#24524) 2022-05-11 16:24:55 -07:00
Sebastián Ramírez
2842b074bb
📝 Update structure in development docs (#24377)
This is a small update for the structure of the docs about building Ray from source.

My idea was to isolate steps that are shared and then steps required per platform/system. Also consolidating the instructions to clone with git, install, directory structure, etc.

I'm still figuring out the building steps (installing the dependencies for docs in an M1), but I wanted to start the draft right away.
2022-05-11 15:47:54 -05:00
Eric Liang
2b598ca440
[doc] Improve the object reference documentation (#24636) 2022-05-10 18:39:16 -07:00