hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Eric Liang	38925f60d2	Add a `get_if_exists` option for simpler creation of named actors (#23344 ) Getting or creating a named actor is a common pattern, however it is somewhat esoteric in how to achieve this. Add a utility function and test that it doesn't cause any scary error messages. Actor.options(name="my_singleton", get_if_exists=True).remote(args)	2022-03-23 22:02:58 -07:00
Jiajun Yao	ce93bfff7e	Fix broken doc link (#23440 ) https://github.com/ray-project/ray/blob/master/benchmarks/README.md is moved to a new place.	2022-03-23 18:54:02 -07:00
Chen Shen	48d456d373	[RFC][Doc] add a page describe actor execution order. (#23406 ) * add * task-orders * fix * address comments * add * address comments	2022-03-23 11:07:18 -07:00
Kai Fricke	668eade515	[docs] Add oracle to linkcheck ignore list (#23422 ) This link currently breaks the linter CI.	2022-03-23 17:14:52 +00:00
Max Pumperla	9b1a3f9f9a	[docs] fix nav (#23417 ) Algolia search now does not overflow on mobile devices anymore, making the nav scrollable again. Signed-off-by: Max Pumperla <max.pumperla@googlemail.com>	2022-03-23 10:38:33 +00:00
mwtian	51feac9868	Clean up dev docs (#23407 )	2022-03-22 23:22:56 -07:00
Richard Liaw	1fe110f8f4	[ml] Add a starter page for docstrings (#23312 )	2022-03-21 17:20:45 -07:00
Kai Fricke	b64452bc63	[tune] Add multinode sync test (#23229 ) This adds a multinode checkpoint/restore test for Ray Tune. This covers some of the functionality of the release tests, but in a more controlled environment. In a follow-up PR, we should test (mocked) cloud checkpointing, too.	2022-03-21 17:02:17 +00:00
Michael (Mike) Gelbart	99d60ef18c	[docs] Fix typos in ray docs contributing guide (#23360 ) There are a couple typos in the [Ray contributing guide](https://docs.ray.io/en/master/ray-contribute/docs.html). I fixed the typos, added a relevant link, and reworded a sentence.	2022-03-21 10:01:41 -07:00
Jiajun Yao	d3159f201b	[Doc] Add scheduling doc (#23343 )	2022-03-20 16:05:06 -07:00
Philipp Moritz	886cc4d674	Fix broken links in documentation and put linkcheck linter in place on CI (#23340 )	2022-03-18 21:02:52 -07:00
Junwen Yao	8fff665455	[Train] Add torch data prefetch benchmark example (#22974 ) Add a benchmark example for the auto pipeline functionality for host to device data transfer.	2022-03-18 13:27:26 -07:00
Jian Xiao	0b1a2a44c0	[Dataset GA doc] Decompose the monolith of Getting Started page (and get them under User Guide) (#23311 ) Improve the Dataset documentation for GA.	2022-03-18 11:25:43 -07:00
Jialing He	4a83bc3dc2	[runtime env] Support set timeout for runtime env setup (#23082 ) Interface example: ```python @ray.remote(runtime_env=RuntimeEnv(..., config=RuntimeEnvConfig(setup_timeout_s=10)) def f(): pass @ray.remote(runtime_env={..., "config": {"setup_timeout_s": 10}}) def f(): pass ``` Support set timeout second for timeout of runtime environment creation. Co-authored-by: 捕牛 <hejialing.hjl@antgroup.com>	2022-03-18 12:52:59 -05:00
Archit Kulkarni	76bb5396c7	[Doc] [jobs] Add links to Job Submission and improve doc (#23209 ) - Adds links to Job Submission from existing library tutorials where `ray submit` is used. When Jobs becomes GA, we should fully replace the uses of `ray submit` with Ray job submission and ensure this is tested. - Adds docstrings for the Jobs SDK, which automatically show up in the API reference - Improve the Job Submission main page - Add a "Deployment Guide" landing page explaining when to use Ray Client vs Ray Jobs Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>	2022-03-18 12:52:13 -05:00
Archit Kulkarni	16fd099b8b	[runtime env] Change `pip_check` default from `True` to `False` (#23306 ) @SongGuyang @Catch-Bull @edoakes I know we discussed this earlier, but after thinking about it some more I think a more reasonable default is for `pip check` to be `False` by default. My guess is that a lot of users (including myself) work inside an environment where `python -m pip check` fails, but the environment doesn't cause them any problems otherwise. So a lot of users will hit an error when trying a simple `runtime_env` `pip` example, and possibly give up. Another less important piece of evidence is that we had to set `pip_check = False` to make some CI tests pass in the original PR. This also matches the default behavior of pip which allows this situation to occur in the first place: `pip install` doesn't error when there's a dependency conflict; rather the command succeeds, the package is installed and usable, and it prints a warning (which is confusingly titled "ERROR")	2022-03-18 12:51:41 -05:00
shrekris-anyscale	86169d2452	[docs] Fix malformatted list in "Advanced Pattern: Fault Tolerance with Actor Checkpointing" (#23319 )	2022-03-18 10:50:13 -07:00
Eric Liang	08dc31e747	[minor] Fix incorrect link to ray core user guide (#23316 )	2022-03-17 20:58:56 -07:00
Guyang Song	1ad019aac3	[C++ API][Doc] Add doc and error log to notice C++ API is not supported on Windows (#23272 ) We don't support Windows entirely now. ## Checks - [X] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :(	2022-03-18 10:52:57 +08:00
Eric Liang	015181ab9a	Add random access support for Datasets (experimental feature) (#22749 ) This PR adds experimental support for random access to datasets. A Dataset can be random access enabled by calling `ds.to_random_access_dataset(key, num_workers=N)`. This creates a RandomAccessDataset. RandomAccessDataset partitions the dataset across the cluster by the given sort key, providing efficient random access to records via binary search. A number of worker actors are created, each of which has zero-copy access to the underlying sorted data blocks of the Dataset. Performance-wise, you can expect each worker to provide ~3000 records / second via ``get_async()``, and ~10000 records / second via ``multiget()``. Since Ray actor calls go direct from worker->worker, throughput scales linearly with the number of workers.	2022-03-17 15:01:12 -07:00
Archit Kulkarni	684a1821d3	[Doc] [runtime_env] Add limitation about single-file `py_modules` to doc (#23248 ) Until #23151 is fixed, this PR adds it as a known limitation in the documentation.	2022-03-17 16:23:46 -05:00
Simon Mo	f400b4333a	[Serve] Remove legacy pipeline codebase (#23172 )	2022-03-17 13:27:16 -07:00
Jian Xiao	8c9e3f6c2e	Move the third-party data integrations (non-Dataset stuff) out of the user guides which is for Dataset (#23162 ) Improve documentation of Ray Dataset.	2022-03-17 11:27:40 -07:00
Eric Liang	c8f207f746	[docs] Core docs refactor (#23216 ) This PR makes a number of major overhauls to the Ray core docs: Add a key-concepts section for {Tasks, Actors, Objects, Placement Groups, Env Deps}. Re-org the user guide to align with key concepts. Rewrite the walkthrough to link to mini-walkthroughs in the key concept sections. Minor tweaks and additional transition material.	2022-03-17 11:26:17 -07:00
Balaji Veeramani	83986a4d83	[Train] Add support for automatic mixed precision (#22227 ) Closes #20643 Co-authored-by: Ubuntu <ubuntu@ip-172-31-58-19.us-west-2.compute.internal>	2022-03-16 20:53:02 -07:00
Archit Kulkarni	8707eb6288	[runtime env] Support `.whl` files in `py_modules` (#22368 ) The `py_modules` field of runtime_env supports uploading local Python modules for use on the Ray cluster. One gap in this is if the local Python module is in the form of a wheel (`.whl` file.) This PR adds the missing support for uploading and installing the `.whl` file.	2022-03-16 16:37:10 -05:00
Max Pumperla	71c57c619b	[docs] RLlib broken links (fixes #23160 ) (#23226 )	2022-03-16 12:38:18 +01:00
Kai Fricke	b80f79a072	[ci/multinode] Improve multi-node tests (#23196 ) The current multi node tests use a hardcoded mapping for local development mounts. With this PR, a new environment variable is introduced to be able to control this dynamically. Additionally, some minor improvements to the test utilities and monitor are added.	2022-03-16 09:59:50 +00:00
Eric Liang	678d23fe42	Remove beta label from Datasets (#23220 )	2022-03-15 23:05:59 -07:00
Jian Xiao	10435d2d8f	Update dask version for Ray 1.12.0 (#23197 )	2022-03-15 19:22:19 -07:00
Jiaxin Shan	158ff3394f	[Job submission] Improve job submission docs (#23115 ) I am following job submission docs here https://docs.ray.io/en/latest/cluster/job-submission.html and run some examples. I notice there're few minor issues. 1. some required libraries are not imported in any code snippets 2. Get job api returns `{'status': 'SUCCEEDED'}` instead of `job_status` so code snippet here doesn't work https://docs.ray.io/en/latest/cluster/job-submission.html#rest-api	2022-03-15 21:20:33 -05:00
Antoni Baum	3625c4760f	[ML/Train] Add `TensorflowTrainer` interface (#23072 ) Interface for TensorflowTrainer Depends on #22988 Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>	2022-03-15 14:02:17 -07:00
Eric Liang	ca1100397e	Update paper links to include exoshuffle and remove whitepaper (moved to docs) (#23099 )	2022-03-15 13:12:01 -07:00
Balaji Veeramani	c694ed4594	[Train] Add `enable_reproducibility` (#22851 ) This PR adds a feature that allows user to make their training runs more reproducible. I've implemented this feature by following PyTorch's guide on how to limit sources of randomness (https://pytorch.org/docs/stable/notes/randomness.html). These changes will make it easier for us to benchmark Ray Train, and also make it easier for users to reproduce their experiments.	2022-03-15 11:07:34 -07:00
Amog Kamsetty	e1f24a244b	[ml/train] Training Interfaces [3/4]: `DataParallelTrainer` interface (#22988 ) Interface for DataParallelTrainer and updates to ScalingConfig definition. Depends on #22986 Co-authored-by: Eric Liang <ekhliang@gmail.com> Co-authored-by: Richard Liaw <rliaw@berkeley.edu> Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>	2022-03-15 08:11:05 -07:00
Max Pumperla	ad30123339	[docs] fix includes for md files (#23180 ) the include of content for md files like our central getting started page didn't render. fixed here. Signed-off-by: Max Pumperla <max.pumperla@googlemail.com>	2022-03-15 11:09:18 +00:00
Pamphile Roy	81b17669a4	[core][docs] Document port/IP binding and slurm concerns (#22663 ) Using Ray on SLURM system is documented but missing some pitfalls about network. This PR adds some information about port binding and address binding (I will open a feature request with more and link it here later). I did not put any real recommendation on this last point since `--address` did not work. I had cannot resolve issue after setting an internal IP although it's reachable.	2022-03-15 01:43:46 -07:00
Jules S. Damji	0246f3532e	[DOC] Added a full example how to access deployments (#22401 )	2022-03-14 21:15:52 -05:00
Jialing He	39a6c054d3	[runtime env][feature] introduce pip_check_enable and pip_version (#22826 )	2022-03-14 23:41:19 +08:00
Jiaxin Shan	8823ca48b4	[Workflow] Improve workflow docs (#23114 ) * [Workflow] Improve workflow docs * Update doc/source/workflows/concepts.rst Co-authored-by: Siyuan (Ryans) Zhuang <suquark@gmail.com>	2022-03-13 18:55:45 -07:00
Scott Graham	f673acb0ad	Scgraham/azure docs (#22296 ) Fixes potential error if function not found in azure sdk when deploying ray cluster on azure Adds additional python package needed to deploy ray cluster on azure in docs Co-authored-by: Scott Graham <scgraham@microsoft.com>	2022-03-13 18:08:08 -07:00
Kenneth	07372927cc	Enable buffering and spilling to multiple remote storages (#22798 ) Buffering writes to AWS S3 is highly recommended to maximize throughput. Reducing the number of remote I/O requests can make spilling to remote storages as effective as spilling locally. In a test where 512GB of objects were created and spilled, varying just the buffer size while spilling to a S3 bucket resulted in the following runtimes. Buffer Size \| Runtime (s) -- \| -- Default \| 3221.865916 256KB \| 1758.885839 1MB \| 748.226089 10MB \| 526.406466 100MB \| 494.830513 Based on these results, a default buffer size of 1MB has been added. This is the minimum buffer size used by AWS Kinesis Firehose, a streaming service for S3. On systems with larger availability, it is good to configure a larger buffer size. For processes that reach the throughput limits provided by S3, we can remove that bottleneck by supporting more prefixes/buckets. These impacts are less noticeable as the performance gains from using a large buffer prevent us from reaching a bottleneck. The following runtimes were achieved by spilling 512GB with a 1MB buffer and varying prefixes. Prefixes \| Runtime (s) -- \| -- 1 \| 748.226089 3 \| 527.658646 10 \| 516.010742 Together these changes enable faster large-scale object spilling. Co-authored-by: Ubuntu <ubuntu@ip-172-31-54-240.us-west-2.compute.internal>	2022-03-11 11:27:02 -05:00
matthewdeng	3a3a7b4be4	[test] add back deleted datasets train test file (#23051 )	2022-03-10 21:46:07 -08:00
Archit Kulkarni	52a722ffe7	[jobs] Make local pip/conda requirements files work with jobs (#22849 )	2022-03-10 15:15:16 -06:00
Max Pumperla	2b8faae40c	[docs] re/move old core examples (#22802 )	2022-03-10 12:17:00 -08:00
Max Pumperla	11c40e363d	[docs] external promo content (#22823 )	2022-03-10 11:39:44 -08:00
qicosmos	e4a9517739	[C++ Worker]Python call cpp worker (#22820 )	2022-03-10 11:06:14 -08:00
Max Pumperla	d8e862eaba	[docs] templates and contribution guide (fixes #21753 ) (#23003 ) Adding an explicit contributor guide and example templates for our users to help with docs. Signed-off-by: Max Pumperla <max.pumperla@googlemail.com>	2022-03-10 15:28:07 +00:00
Dmitri Gekhtman	413fe08f87	Move KubeRay autoscaler files into Ray autoscaler directory, add an entry-point. (#22847 ) This PR consists of the following clean-up items for KubeRay autoscaler integration: Remove the docker/kuberay directory Move the Python files formerly in docker/kuberay to the autoscaler directory. Use a rayproject/ray image for the autoscaler. Add an entry point for the kuberay autoscaler to scripts.py. Use the entry point in the example config. Slightly simplify the code that starts the autoscaler. Ray versions are updated to Ray 1.11.0, which will be officially released within the next couple of days. By default, Ray >= 1.11.0 runs without Redis. References to Redis are removed from the example config. Add the autoscaler configuration test to the CI. Update development documentation to reflect the changes in this PR.	2022-03-09 18:26:57 -08:00
Alex Wu	b84aaef38a	Promote python 3.9 support to stable (#22923 ) Remove the experimental note from python 3.9 since it and its core dependencies have been stable for quite some time now. Co-authored-by: Alex Wu <alex@anyscale.com>	2022-03-08 17:24:54 -08:00

1 2 3 4 5 ...

1953 commits