hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 02:21:39 -05:00

Author	SHA1	Message	Date
Kai Fricke	e3987d85c3	[tune] Mark cloud OSS release tests as unstable (#23240 ) These tests have been flaky for a while. Until this is addressed, mark them as unstable.	2022-03-16 17:37:58 +00:00
Edward Oakes	d1a528d6af	[serve] Use `deploy_group` in `serve run` and set HTTP options (#23215 )	2022-03-16 12:37:21 -05:00
shrekris-anyscale	56ddea85a1	[Serve] Fix typo `language` (#23213 )	2022-03-16 10:14:44 -07:00
shrekris-anyscale	34ebb3409e	[serve] Make Dashboard start Serve in the "serve" namespace (#23198 ) The Ray Dashboard starts Serve in the `"_ray_internal_dashboard"` namespace. However, Serve by default starts in the `"serve"` namespace. This causes surprising behavior when working with the Serve CLI and REST API. This change make the Ray Dashboard start Serve in the `"serve"` namespace, allowing the REST API to work intuitively with the Python API.	2022-03-16 12:03:44 -05:00
Kai Fricke	eca5bcfc87	[ci/release] Reload modules after installing matching Ray (#23227 ) Apparently, ray gets imported somewhere before running the client runner (maybe from an anyscale package). This means that we need to reload the ray package after installing a matching local ray wheel. Additionally, job submission should also install a matching local ray to match with the job submission server.	2022-03-16 15:44:43 +00:00
Max Pumperla	71c57c619b	[docs] RLlib broken links (fixes #23160 ) (#23226 )	2022-03-16 12:38:18 +01:00
Kai Fricke	b80f79a072	[ci/multinode] Improve multi-node tests (#23196 ) The current multi node tests use a hardcoded mapping for local development mounts. With this PR, a new environment variable is introduced to be able to control this dynamically. Additionally, some minor improvements to the test utilities and monitor are added.	2022-03-16 09:59:50 +00:00
Siyuan (Ryans) Zhuang	d67c34256b	[Workflow] Optimize out tail recursion in python (#22794 ) * add test * warning when inplace subworkflows may use different resources	2022-03-16 01:51:18 -07:00
Gagandeep Singh	60a3340387	[workflow] Suggestions of correct inputs to `create_storage` in error message under windows (#23190 ) * Provide suggestions of correct inputs to create_storage in error msg * Applied linting format * Added test for verifying error message	2022-03-16 01:42:12 -07:00
Avnish Narayan	6c20e9d898	[RLlib] Change the slateq regression learning test with GPU to use torch only (#23168 )	2022-03-16 09:15:59 +01:00
Siyuan (Ryans) Zhuang	7c43c66b6b	[workflow] Implement workflow continuation unification (#23217 ) * implement workflow continuation unification * fix comments * fix: strict scope for workflow execution	2022-03-16 00:04:01 -07:00
mwtian	72ef9f91aa	[Remove Redis Pubsub 1/n] Remove `enable_gcs_pubsub()` (#23189 ) GCS pubsub has been the default for awhile. There is little chance that we would need to revert back to Redis pubsub in future. This is the step in removing Redis pubsub, by first removing the `enable_gcs_pubsub()` feature guard.	2022-03-15 23:56:15 -07:00
Eric Liang	678d23fe42	Remove beta label from Datasets (#23220 )	2022-03-15 23:05:59 -07:00
Tao Wang	4614536572	Migrating to flat hash map [core worker&object manager] (#23126 ) Next move of #22932. This pr replace unordered_map to flat_hash_map in core worker and object manager module. Also some interfaces, like GetAllReferenceCounts, which expose user interfaces in Java/Python, is exclusive as it's a little bit complicated. We save them to deal with pg together. The follow-up PRs would be migrating in reference counting, placement group and others.	2022-03-15 22:16:28 -07:00
Amog Kamsetty	2548083dcb	[ml] Trainer implementation (#22969 ) Implementation for base Trainer Co-authored-by: Eric Liang <ekhliang@gmail.com> Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2022-03-15 20:35:54 -07:00
Qing Wang	149d06442b	[Core][Java][Remove JVM FullGC 3/N] Disable every 10min FullGC. (#21443 ) In this PR, we disabled every 10min FullGC which is not triggered by a global gc event in Java worker. As detail, we added `triggered_by_global_gc` flag to indicate whether the gc event is triggered by a global gc event. If it's triggered by global gc, we still need to do FullGC. Co-authored-by: Qing Wang <jovany.wq@antgroup.com>	2022-03-16 11:18:12 +08:00
Guyang Song	30ae287dac	enable test_runtime_env_working_dir_3.py and fix cache size to be negative (#23183 )	2022-03-16 11:00:48 +08:00
qicosmos	d8de5a445a	[C++ Worker]Python call cpp actor (#23061 ) [Last PR](https://github.com/ray-project/ray/pull/22820) has supported python call c++ normal task, this PR supports python call c++ actor task.	2022-03-15 19:54:10 -07:00
Jian Xiao	10435d2d8f	Update dask version for Ray 1.12.0 (#23197 )	2022-03-15 19:22:19 -07:00
Jiaxin Shan	158ff3394f	[Job submission] Improve job submission docs (#23115 ) I am following job submission docs here https://docs.ray.io/en/latest/cluster/job-submission.html and run some examples. I notice there're few minor issues. 1. some required libraries are not imported in any code snippets 2. Get job api returns `{'status': 'SUCCEEDED'}` instead of `job_status` so code snippet here doesn't work https://docs.ray.io/en/latest/cluster/job-submission.html#rest-api	2022-03-15 21:20:33 -05:00
Edward Oakes	42ebc0a4f6	[serve] Add some test cases for pipeline DAG builder (#23210 )	2022-03-15 21:05:12 -05:00
Siyuan (Ryans) Zhuang	499c242f0f	[workflow] More tests for unifying workflow and remote function ObjectRef behavior (#23174 ) * add more tests	2022-03-15 16:42:27 -07:00
Simon Mo	78d6ed7029	[Serve] [CI] Split Serve tests into multiple shards (#23145 )	2022-03-15 16:32:30 -07:00
Antoni Baum	630985e3bb	[ML] `XGBoost`&`LightGBMTrainer` interfaces (#23192 ) Adds interfaces for `XGBoostTrainer` and `LightGBMTrainer`.	2022-03-15 16:16:30 -07:00
Simon Mo	823dbd06a8	[Serve] Add DeploymentNode implementation on top of existing DAG codebase (#23177 )	2022-03-15 16:06:57 -07:00
shrekris-anyscale	57871816d4	[serve] Fix TestGetDeploymentImportPath on Windows (#23201 )	2022-03-15 15:48:48 -07:00
Tomas Babej	7a1d10a3d0	[Job submission] Set headers when establishing websocket (#23111 )	2022-03-15 16:20:44 -05:00
Antoni Baum	3625c4760f	[ML/Train] Add `TensorflowTrainer` interface (#23072 ) Interface for TensorflowTrainer Depends on #22988 Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>	2022-03-15 14:02:17 -07:00
siddgoel	0722cbb37e	Add support for snappy text decompression #22298 (#22486 ) Adds a streaming based reading option for Snappy-compressed files. Arrow doesn't support streaming Snappy decompression since the canonical C++ Snappy library doesn't natively support streaming decompression. This PR works around this by doing streaming reads of snappy-compressed files using the streaming decompression API provided in the [python-snappy](https://github.com/andrix/python-snappy) package. This commit supplies a custom datasource that uses Arrow + [python-snappy](https://github.com/andrix/python-snappy) to read and decompress Snappy-compressed files. Co-authored-by: siddharth.goel <siddharth.goel@bytedance.com> Co-authored-by: Chen Shen <scv119@gmail.com>	2022-03-15 13:52:22 -07:00
Eric Liang	ca1100397e	Update paper links to include exoshuffle and remove whitepaper (moved to docs) (#23099 )	2022-03-15 13:12:01 -07:00
Amog Kamsetty	1572130a4e	[ml/train] Trainer interfaces [4/4]: `TorchTrainer` interface (#22989 ) Interface for TorchTrainer Depends on #22988	2022-03-15 12:47:44 -07:00
Antoni Baum	a8fbb4accc	[ML] `XGBoost`&`LightGBMPredictor` implementation (#23143 ) Implementation for XGBoostPredictor & LightGBMPredictor. The interface has been modified slightly.	2022-03-15 12:44:50 -07:00
Clark Zinzow	1d5f18fe0a	Fix equalized split handling of num_splits == num_blocks case. (#23191 )	2022-03-15 12:23:50 -07:00
Yi Cheng	72713e815b	[gcs] Remove use_gcs_for_bootstrap in other python modules.	2022-03-15 12:23:10 -07:00
Siyuan (Ryans) Zhuang	761f927720	[Lint] Cleanup incorrectly formatted strings (Part 2: Tune) (#23129 )	2022-03-15 12:17:47 -07:00
Archit Kulkarni	fc182006ec	[Doc] Add missing runtime context namespace doc (#23120 ) The public field RuntimeContext.namespace didn't have a docstring so it wasn't showing up at all in the docs. This PR adds a basic docstring.	2022-03-15 11:46:09 -07:00
Balaji Veeramani	c694ed4594	[Train] Add `enable_reproducibility` (#22851 ) This PR adds a feature that allows user to make their training runs more reproducible. I've implemented this feature by following PyTorch's guide on how to limit sources of randomness (https://pytorch.org/docs/stable/notes/randomness.html). These changes will make it easier for us to benchmark Ray Train, and also make it easier for users to reproduce their experiments.	2022-03-15 11:07:34 -07:00
Siyuan (Ryans) Zhuang	0c74ecad12	[Lint] Cleanup incorrectly formatted strings (Part 1: RLLib). (#23128 )	2022-03-15 17:34:21 +01:00
xwjiang2010	99d5288bbd	[tune] Better error msg for grpc resource exhausted error. (#22806 )	2022-03-15 16:01:40 +00:00
shrekris-anyscale	bf1bd293f4	[serve] Make deployments in `Application` use only import paths (#23027 ) `Application` stores a group of deployments and can write them to a YAML config. However, this requires the deployments to use import paths as their `func_or_class`. This change make all deployments in an `Application` store only import paths as the `func_or_class`. This change also adds a utility function to get a deployment's import path. This utility function is used in the DeploymentNode for Pipelines.	2022-03-15 10:48:35 -05:00
Fabien Couthouis	e575ed3350	[RLlib] Fix AttributeError with None obs shape + tf in `_unpack_obs()` utility (#22428 )	2022-03-15 16:34:31 +01:00
Amog Kamsetty	e1f24a244b	[ml/train] Training Interfaces [3/4]: `DataParallelTrainer` interface (#22988 ) Interface for DataParallelTrainer and updates to ScalingConfig definition. Depends on #22986 Co-authored-by: Eric Liang <ekhliang@gmail.com> Co-authored-by: Richard Liaw <rliaw@berkeley.edu> Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>	2022-03-15 08:11:05 -07:00
Qing Wang	f51cb09e02	[Core][Java][Remove JVM FullGC 2/N] Make JVM be aware of in-memory store pressure. (#21441 )	2022-03-15 19:25:27 +08:00
Max Pumperla	ad30123339	[docs] fix includes for md files (#23180 ) the include of content for md files like our central getting started page didn't render. fixed here. Signed-off-by: Max Pumperla <max.pumperla@googlemail.com>	2022-03-15 11:09:18 +00:00
Pamphile Roy	81b17669a4	[core][docs] Document port/IP binding and slurm concerns (#22663 ) Using Ray on SLURM system is documented but missing some pitfalls about network. This PR adds some information about port binding and address binding (I will open a feature request with more and link it here later). I did not put any real recommendation on this last point since `--address` did not work. I had cannot resolve issue after setting an internal IP although it's reachable.	2022-03-15 01:43:46 -07:00
Guyang Song	f65971756d	[dashboard agent] Catch agent port conflict (#23024 )	2022-03-15 16:09:15 +08:00
Chen Shen	5a2ebc281c	[Scheduler] separate scheduler code to its own build target (#23124 ) * wip * comments * fix build * fix-test * fix format	2022-03-14 23:23:58 -07:00
Kai Yang	35c7275bfc	[Object Spilling] Handle IO worker failures correctly (#20752 ) Currently, when a spill/restore worker fails and the state of it in the worker pool is idle, the worker pool will not clean up the metadata of the worker. Subsequent spill/restore requests will reuse this dead worker and RPC requests cannot succeed. This results in broken object spilling functionality. This PR addresses the issue by removing disconnected IO workers from `registered_io_workers` and `idle_io_workers`.	2022-03-15 12:14:14 +08:00
Kai Yang	041f98d5dd	Fix or remove unnecessary `action_env` settings in `.bazelrc` (#21307 ) `PATH` is easy to be changed in a terminal session. Different `$PATH` values lead to miss of bazel cache. e.g. `pip install python -e` and `bazel build //:all` don't share cache because Python modifies `PATH`. `LC_ALL`, `LANG`, and python-related environment variables are only used by C++ worker tests, which invokes the `ray start` command when running tests with `bazel test`. Java worker is not affected because we don't use `bazel test` to run Java tests. So these env variables should stay `test_env`, not `action_env`. This PR can greatly improve the cache hit rate of Bazel build and test.	2022-03-15 12:13:13 +08:00
Jules S. Damji	0246f3532e	[DOC] Added a full example how to access deployments (#22401 )	2022-03-14 21:15:52 -05:00

1 2 3 4 5 ...

11734 commits