hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
matthewdeng	bcec60d898	Revert "[data] set iter_batches default batch_size #26869 " (#26938 ) This reverts commit `b048c6f659`.	2022-07-23 17:46:45 -07:00
matthewdeng	b048c6f659	[data] set iter_batches default batch_size #26869 Why are these changes needed? Consumers (e.g. Train) may expect generated batches to be of the same size. Prior to this change, the default behavior would be for each batch to be one block, which may be of different sizes. Changes Set default batch_size to 256. This was chosen to be a sensible default for training workloads, which is intentionally different from the existing default batch_size value for Dataset.map_batches. Update docs for Dataset.iter_batches, Dataset.map_batches, and DatasetPipeline.iter_batches to be consistent. Updated tests and examples to explicitly pass in batch_size=None as these tests were intentionally testing block iteration, and there are other tests that test explicit batch sizes.	2022-07-23 13:44:53 -07:00
Stephanie Wang	55a0f7bb2d	[core] ray.init defaults to an existing Ray instance if there is one (#26678 ) ray.init() will currently start a new Ray instance even if one is already existing, which is very confusing if you are a new user trying to go from local development to a cluster. This PR changes it so that, when no address is specified, we first try to find an existing Ray cluster that was created through `ray start`. If none is found, we will start a new one. This makes two changes to the ray.init() resolution order: 1. When `ray start` is called, the started cluster address was already written to a file called `/tmp/ray/ray_current_cluster`. For ray.init() and ray.init(address="auto"), we will first check this local file for an existing cluster address. The file is deleted on `ray stop`. If the file is empty, autodetect any running cluster (legacy behavior) if address="auto", or we will start a new local Ray instance if address=None. 2. When ray.init(address="local") is called, we will create a new local Ray instance, even if one is already existing. This behavior seems to be necessary mainly for `ray.client` use cases. This also surfaces the logs about which Ray instance we are connecting to. Previously these were hidden because we didn't set up the log until after connecting to Ray. So now Ray will log one of the following messages during ray.init: ``` (Connecting to existing Ray cluster at address: <IP>...) ...connection... (Started a local Ray cluster.\| Connected to Ray Cluster.)( View the dashboard at <URL>) ``` Note that this changes the dashboard URL to be printed with `ray.init()` instead of when the dashboard is first started. Co-authored-by: Eric Liang <ekhliang@gmail.com>	2022-07-23 11:27:22 -07:00
Eric Liang	63a6c1dfac	[docs] Cleanup the Datasets key concept docs (#26908 ) Clean up the Datasets key concept doc to be suitable for consumption by a beginner level user and improving the diagrams.	2022-07-22 23:30:54 -07:00
Kai Fricke	1f32cb95db	[air/tune] Add top-level imports for Tuner, TuneConfig, move CheckpointConfig (#26882 )	2022-07-22 20:17:06 -07:00
Eric Liang	36c46e9686	[docs] Improve AIR table of contents titles (#26858 )	2022-07-22 17:17:49 -07:00
Kai Fricke	77ba30d34e	[tune] Docs for custom command based syncer (awscli / gsutil) (#26879 ) Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>	2022-07-22 15:28:53 -07:00
Siyuan (Ryans) Zhuang	4b50ef6a28	[Workflow] Rename the argument of "workflow.get_output" (#26876 ) * rename get_output Signed-off-by: Siyuan Zhuang <suquark@gmail.com> * update doc Signed-off-by: Siyuan Zhuang <suquark@gmail.com>	2022-07-22 12:06:19 -07:00
Clark Zinzow	a29baf93c8	[Datasets] Add `.iter_torch_batches()` and `.iter_tf_batches()` APIs. (#26689 ) This PR adds .iter_torch_batches() and .iter_tf_batches() convenience APIs, which takes care of ML framework tensor conversion, the narrow tensor waste for the .iter_batches() call ("numpy" format), and unifies batch formats around two options: a single tensor for simple/pure-tensor/single-column datasets, and a dictionary of tensors for multi-column datasets.	2022-07-22 10:09:36 -07:00
Eric Liang	9272bcbbca	[docs] Add ecosystem map to AIR guide (#26859 )	2022-07-21 19:06:47 -07:00
matthewdeng	14e2b2548c	[air] update remaining dict scaling_configs (#26856 )	2022-07-21 18:55:21 -07:00
Jiao	db027d86af	[P0][AIR] Fix train to serve notebooks (#26821 ) Co-authored-by: Simon Mo <simon.mo@hey.com>	2022-07-21 18:04:13 -07:00
Sihan Wang	27f1532a15	[Serve] Promote graceful shutdown and health check (#26682 )	2022-07-21 17:37:10 -05:00
Jules S. Damji	6db2536971	[RayAIR] Minor tweaks to the why ray air for clarity (#26680 )	2022-07-21 10:21:26 -07:00
Balaji Veeramani	ac1d21027d	[AIR] Add framework-specific checkpoints (#26777 )	2022-07-20 19:33:27 -07:00
Richard Liaw	9f0d35b97c	[air/docs] add tensorflow benchmarks into table (#26800 )	2022-07-20 17:12:40 -07:00
Eric Liang	d6f29eb9ca	[docs] Mark pipelined prediction as experimental for now (#26792 )	2022-07-20 15:31:19 -07:00
xwjiang2010	e7957f4a3e	[air] update offline/online rl example and enable them. (#26786 )	2022-07-20 14:06:03 -07:00
Siyuan (Ryans) Zhuang	0063d94166	[Core] Make "GetTimeoutError" a subclass of "TimeoutError" (#26771 ) I am surprised by the fact that `GetTimeoutError` is not a subclass of `TimeoutError`, which is counter-intuitive and may discourage users from trying the timeout feature in `ray.get`, because you have to "guess" the correct error type. For most people, I believe the first error type in their mind would be `TimeoutError`. This PR fixes this.	2022-07-20 14:37:39 -05:00
tomsunelite	d915529e9e	Add doc for custom lifetime of java actor (#26706 ) Custom lifetime of java Actor is already supported, but the related document is not updated Co-authored-by: sunkunjian1 <sunkunjian1@jd.com>	2022-07-20 22:19:44 +08:00
Tao Wang	4f2747f12a	[Core][C++ worker] Add GetNamespace api (#26509 )	2022-07-20 11:17:14 +08:00
Tao Wang	cd521ed132	[Doc][namespaces][C++ worker]add document for c++ worker namespace and specifying namespace while creating/getting named actors (#26498 ) We've supported namespace in c++ worker in https://github.com/ray-project/ray/pull/26327. Here we add doc for usage and also reinforce the documents of Java and Python, like adding explanation of specifying namespace while creating named actors. - [x] add doc for basic c++ worker namespace usage - [x] add explanation for specifying namespace while creating named actors, in Python, Java and C++	2022-07-20 10:58:41 +08:00
Dmitri Gekhtman	fdd5c53bfd	[KubeRay] Documentation structure and skeleton (#26589 ) Adds outline and structure for new KubeRay-based Ray-on-Kubernetes docs.	2022-07-19 13:28:04 -07:00
Richard Liaw	6563c2762d	[air] add pytorch benchmark number (#26719 )	2022-07-19 09:51:13 -07:00
Richard Liaw	7e62e1187c	[air/benchmark] Torch benchmarks for 4x4 (#26692 ) Add benchmark data for 4x4 GPU setup. Signed-off-by: Richard Liaw <rliaw@berkeley.edu> Co-authored-by: Jimmy Yao <jiahaoyao.math@gmail.com> Co-authored-by: Kai Fricke <kai@anyscale.com>	2022-07-19 17:06:37 +01:00
Siyuan (Ryans) Zhuang	5b937167d3	[Workflow] Fix typo in workflow event doc (#26686 ) Signed-off-by: Siyuan Zhuang <suquark@gmail.com>	2022-07-18 23:26:50 -07:00
Siyuan (Ryans) Zhuang	eb4ed49c1f	[Workflow] Unify the semantics of max_retries of workflow task and Ray task (#26350 ) * workflow task retry Signed-off-by: Siyuan Zhuang <suquark@gmail.com> * move and enhance tests Signed-off-by: Siyuan Zhuang <suquark@gmail.com> * use "max_retries" of Ray task Signed-off-by: Siyuan Zhuang <suquark@gmail.com> * add test for disabling lineage reconstruction in workflow Signed-off-by: Siyuan Zhuang <suquark@gmail.com>	2022-07-18 23:25:44 -07:00
Sumanth Ratna	759966781f	[air] Allow users to use instances of `ScalingConfig` (#25712 ) Co-authored-by: Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>	2022-07-18 15:46:58 -07:00
matthewdeng	6670708010	[air] add placement group max CPU to data benchmark (#26649 ) Set experimental `_max_cpu_fraction_per_node` to prevent deadlock. This should technically be a no-op with the SPREAD strategy.	2022-07-18 10:34:40 -07:00
Chen Shen	b20f5f51df	[Air][Data] Don't promote locality_hints for split (#26647 ) Why are these changes needed? Since locality_hints is an experimental feature, we stop promoting it in doc and don't enable it in AIR. See #26641 for more context	2022-07-17 22:18:30 -07:00
Jiao	98a07920d3	[AIR][CUJ] Make distributing training benchmark at silver tier (#26640 )	2022-07-17 22:07:09 -07:00
Jules S. Damji	55368402ee	added summary why and when to use bulk vs streaming data ingest (#26637 )	2022-07-17 18:46:58 -07:00
Eric Liang	12825fc5aa	[air] Add a warning if no CPUs are reserved for dataset execution (#26643 )	2022-07-17 16:33:51 -07:00
Clark Zinzow	864af14f41	[Datasets] [Local Shuffle - 1/N] Add local shuffling option. (#26094 ) Co-authored-by: Eric Liang <ekhliang@gmail.com> Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> Co-authored-by: Matthew Deng <matt@anyscale.com> Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2022-07-17 16:21:14 -07:00
Eric Liang	400330e9c0	[air] Add _max_cpu_fraction_per_node to ScalingConfig and documentation (#26634 )	2022-07-16 21:55:51 -07:00
Amog Kamsetty	3a345a470c	[AIR/Docs] Add Predictor Docs (#25833 )	2022-07-16 21:14:21 -07:00
Jiao	77e2ef2eb6	[AIR] Update Torch benchmarks with documentation (#26631 ) Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2022-07-16 17:58:21 -07:00
Eric Liang	0855bcb77e	[air] Use SPREAD strategy by default and don't special case it in benchmarks (#26633 )	2022-07-16 17:37:06 -07:00
M Waleed Kadous	7c32993c15	[core/docs]Add a new section under Ray Core called Ray Gotchas (#26624 ) Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2022-07-16 16:53:01 -07:00
Antoni Baum	fb6f3cf708	[AIR/Docs] Small improvements to Train user guide (#26577 ) Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>	2022-07-16 16:51:17 -07:00
Eric Liang	6217138eb0	[docs] Move AIR benchmarks to top level (#26632 )	2022-07-16 15:34:31 -07:00
Philipp Moritz	081bbfbff1	[Examples] Test OCR example in documentation tests (#26482 ) Make sure the OCR example is tested in documentation after we discovered that example notebooks are not tested in CI. Signed-off-by: Philipp Moritz <pcmoritz@gmail.com>	2022-07-16 10:51:28 -07:00
Richard Liaw	799311b2f7	[air/docs] update examples to remove pandas again (#26598 )	2022-07-16 08:40:44 -07:00
Balaji Veeramani	34cf1f17ea	[Datasets] Add `ImageFolderDatasource` (#24641 ) Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2022-07-15 22:43:23 -07:00
matthewdeng	e3a096f412	[air] add bulk ingest benchmarks (#26618 )	2022-07-15 22:01:23 -07:00
Richard Liaw	5ad4e75831	[air] Add initial benchmark section (#26608 )	2022-07-15 15:33:48 -07:00
Jiao	647e12b6c7	[AIR] Fix convert_existing_pytorch_code_to_ray_air notebook (#26523 )	2022-07-14 14:30:55 -07:00
Tim Gates	e42dc7943e	docs: Fix a few typos (#26556 ) There are small typos in: - doc/source/data/faq.rst - python/ray/serve/replica.py Fixes: - Should read `successfully` rather than `succssifully`. - Should read `pseudo` rather than `psuedo`.	2022-07-14 12:38:33 -07:00
Jiajun Yao	60dd77a2d3	Enable usage stats collection for ray.init iff nightly wheels (#26461 ) For nightly wheels, we want to collect usage stats for local clusters started via ray.init() as well.	2022-07-14 12:14:01 -07:00
Amog Kamsetty	6595bd6e2d	[AIR] Introduce better scoring API for `BatchPredictor` (#26451 ) Signed-off-by: Amog Kamsetty <amogkamsetty@yahoo.com> As discussed offline, allow configurability for feature columns and keep columns in BatchPredictor for better scoring UX on test datasets.	2022-07-14 11:26:12 -07:00

1 2 3 4 5 ...

2279 commits