hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
mwtian	1d2d60a2fc	[GCS-Ray] remove Redis password from CLI messages (#23242 ) Redis password should not be needed in the connection info printed by `ray start --head`. We can make another cleanup for removing flags and arguments related to Redis password. But it is a bit more risky (affects external Redis) and needs more care.	2022-03-17 13:36:29 -07:00
Simon Mo	f400b4333a	[Serve] Remove legacy pipeline codebase (#23172 )	2022-03-17 13:27:16 -07:00
Antoni Baum	1211c452d4	[ML/Train] `TensorflowTrainer` implementation (#23250 ) Implements `TensorflowTrainer`. Depends on https://github.com/ray-project/ray/pull/23211 (review only files with `tensorflow` in the name). Co-authored-by: Eric Liang <ekhliang@gmail.com> Co-authored-by: Richard Liaw <rliaw@berkeley.edu> Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com> Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>	2022-03-17 11:34:47 -07:00
Siyuan (Ryans) Zhuang	0f61e2f90e	[Lint] Cleanup incorrectly formatted strings (Part 5: util) (#23264 )	2022-03-17 10:27:05 -07:00
Antoni Baum	f71e7681b3	[ML] `XGBoost`&`LightGBMTrainer` implementation (#23245 ) Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com> Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>	2022-03-17 10:00:03 -07:00
Dmitri Gekhtman	c707ad8d73	Fix GCP node termination (#23101 ) Skips 404s on node termination for GCP node provider. Also resets internal "self.nodes_to_terminate" state at the start of an autoscaler iteration -- that's necessary for correct cleanup in the event of failed node termination.	2022-03-17 09:51:16 -07:00
Amog Kamsetty	cf512254bb	[ml/train] Don't create new `BackendExecutor` actor in `Trainable` (#23235 ) If using the DataParallelTrainer, since we are running the BackendExecutor in a Trainable actor already, we don't need to create a new actor. However if using Ray Train directly, we still want to run BackendExecutor in an actor for performance with Ray Client. This PR does some refactoring to support both cases.	2022-03-17 08:31:43 -07:00
xwjiang2010	c12d437fb5	[tune] de-spam some logging. (#23247 ) Demoting some logger calls to debug	2022-03-17 15:03:38 +00:00
Siyuan (Ryans) Zhuang	cb80518a80	[Lint] Cleanup incorrectly formatted strings (Part 4: tests, _private) (#23263 )	2022-03-17 00:49:16 -07:00
Amog Kamsetty	ef0b85c344	[ml/train] `TorchTrainer` implementation (#23219 )	2022-03-17 00:07:27 -07:00
Gagandeep Singh	c32649b85c	`map` and `map_unordered` cancel previous tasks before submitting new ones (#23187 ) N.B. - https://github.com/ray-project/ray/issues/23107#issuecomment-1068107507	2022-03-16 23:45:44 -07:00
Siyuan (Ryans) Zhuang	cc1728120f	[Tune] Move resource updater out of trial executor (#23178 ) * simplify trial executor * update test * fix: proper resource update before initialization * add test to BUILD * add doc for resource updater	2022-03-16 22:50:47 -07:00
xwjiang2010	814b49356c	[tuner] Tuner impl. (#22848 )	2022-03-16 20:55:30 -07:00
Balaji Veeramani	83986a4d83	[Train] Add support for automatic mixed precision (#22227 ) Closes #20643 Co-authored-by: Ubuntu <ubuntu@ip-172-31-58-19.us-west-2.compute.internal>	2022-03-16 20:53:02 -07:00
Amog Kamsetty	f33a495b3a	[ml/train] `DataParallelTrainer` implementation (#23211 ) Co-authored-by: Eric Liang <ekhliang@gmail.com> Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2022-03-16 19:49:44 -07:00
mwtian	391901f86b	[Remove Redis Pubsub 2/n] clean up remaining Redis references in gcs_utils.py (#23233 ) Continue to clean up Redis and other related Redis references, for - gcs_utils.py - log_monitor.py - `publish_error_to_driver()`	2022-03-16 19:34:57 -07:00
SangBin Cho	b350fe9ee8	[Nightly test] Fix additional k8s issues + add new tests (#23231 ) Fix bug from the previous fixes. Add more tests Stop using m5.xlarge (not supported now) There are 2 hard blockers from the infra: 1. Large size disk is not supported. 2. m5.xlarge is not supported. Both are considered as a high priority to be fixed soon.	2022-03-16 16:37:29 -07:00
Archit Kulkarni	8707eb6288	[runtime env] Support `.whl` files in `py_modules` (#22368 ) The `py_modules` field of runtime_env supports uploading local Python modules for use on the Ray cluster. One gap in this is if the local Python module is in the form of a wheel (`.whl` file.) This PR adds the missing support for uploading and installing the `.whl` file.	2022-03-16 16:37:10 -05:00
shrekris-anyscale	84b3de6825	[serve] Add atomic delete (#23195 )	2022-03-16 14:13:10 -07:00
Jiao	2bcbe41d54	[Serve] Polish new deployment to DAG binding API with Ray DAG tests (#23208 )	2022-03-16 12:59:19 -07:00
Siyuan (Ryans) Zhuang	6d83a3f283	[Lint] Cleanup incorrectly formatted strings (Part 3: components) (#23130 )	2022-03-16 12:36:57 -07:00
Edward Oakes	d1a528d6af	[serve] Use `deploy_group` in `serve run` and set HTTP options (#23215 )	2022-03-16 12:37:21 -05:00
shrekris-anyscale	56ddea85a1	[Serve] Fix typo `language` (#23213 )	2022-03-16 10:14:44 -07:00
shrekris-anyscale	34ebb3409e	[serve] Make Dashboard start Serve in the "serve" namespace (#23198 ) The Ray Dashboard starts Serve in the `"_ray_internal_dashboard"` namespace. However, Serve by default starts in the `"serve"` namespace. This causes surprising behavior when working with the Serve CLI and REST API. This change make the Ray Dashboard start Serve in the `"serve"` namespace, allowing the REST API to work intuitively with the Python API.	2022-03-16 12:03:44 -05:00
Kai Fricke	b80f79a072	[ci/multinode] Improve multi-node tests (#23196 ) The current multi node tests use a hardcoded mapping for local development mounts. With this PR, a new environment variable is introduced to be able to control this dynamically. Additionally, some minor improvements to the test utilities and monitor are added.	2022-03-16 09:59:50 +00:00
Siyuan (Ryans) Zhuang	d67c34256b	[Workflow] Optimize out tail recursion in python (#22794 ) * add test * warning when inplace subworkflows may use different resources	2022-03-16 01:51:18 -07:00
Gagandeep Singh	60a3340387	[workflow] Suggestions of correct inputs to `create_storage` in error message under windows (#23190 ) * Provide suggestions of correct inputs to create_storage in error msg * Applied linting format * Added test for verifying error message	2022-03-16 01:42:12 -07:00
Siyuan (Ryans) Zhuang	7c43c66b6b	[workflow] Implement workflow continuation unification (#23217 ) * implement workflow continuation unification * fix comments * fix: strict scope for workflow execution	2022-03-16 00:04:01 -07:00
mwtian	72ef9f91aa	[Remove Redis Pubsub 1/n] Remove `enable_gcs_pubsub()` (#23189 ) GCS pubsub has been the default for awhile. There is little chance that we would need to revert back to Redis pubsub in future. This is the step in removing Redis pubsub, by first removing the `enable_gcs_pubsub()` feature guard.	2022-03-15 23:56:15 -07:00
Amog Kamsetty	2548083dcb	[ml] Trainer implementation (#22969 ) Implementation for base Trainer Co-authored-by: Eric Liang <ekhliang@gmail.com> Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2022-03-15 20:35:54 -07:00
Qing Wang	149d06442b	[Core][Java][Remove JVM FullGC 3/N] Disable every 10min FullGC. (#21443 ) In this PR, we disabled every 10min FullGC which is not triggered by a global gc event in Java worker. As detail, we added `triggered_by_global_gc` flag to indicate whether the gc event is triggered by a global gc event. If it's triggered by global gc, we still need to do FullGC. Co-authored-by: Qing Wang <jovany.wq@antgroup.com>	2022-03-16 11:18:12 +08:00
Guyang Song	30ae287dac	enable test_runtime_env_working_dir_3.py and fix cache size to be negative (#23183 )	2022-03-16 11:00:48 +08:00
qicosmos	d8de5a445a	[C++ Worker]Python call cpp actor (#23061 ) [Last PR](https://github.com/ray-project/ray/pull/22820) has supported python call c++ normal task, this PR supports python call c++ actor task.	2022-03-15 19:54:10 -07:00
Edward Oakes	42ebc0a4f6	[serve] Add some test cases for pipeline DAG builder (#23210 )	2022-03-15 21:05:12 -05:00
Siyuan (Ryans) Zhuang	499c242f0f	[workflow] More tests for unifying workflow and remote function ObjectRef behavior (#23174 ) * add more tests	2022-03-15 16:42:27 -07:00
Antoni Baum	630985e3bb	[ML] `XGBoost`&`LightGBMTrainer` interfaces (#23192 ) Adds interfaces for `XGBoostTrainer` and `LightGBMTrainer`.	2022-03-15 16:16:30 -07:00
Simon Mo	823dbd06a8	[Serve] Add DeploymentNode implementation on top of existing DAG codebase (#23177 )	2022-03-15 16:06:57 -07:00
shrekris-anyscale	57871816d4	[serve] Fix TestGetDeploymentImportPath on Windows (#23201 )	2022-03-15 15:48:48 -07:00
Antoni Baum	3625c4760f	[ML/Train] Add `TensorflowTrainer` interface (#23072 ) Interface for TensorflowTrainer Depends on #22988 Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>	2022-03-15 14:02:17 -07:00
siddgoel	0722cbb37e	Add support for snappy text decompression #22298 (#22486 ) Adds a streaming based reading option for Snappy-compressed files. Arrow doesn't support streaming Snappy decompression since the canonical C++ Snappy library doesn't natively support streaming decompression. This PR works around this by doing streaming reads of snappy-compressed files using the streaming decompression API provided in the [python-snappy](https://github.com/andrix/python-snappy) package. This commit supplies a custom datasource that uses Arrow + [python-snappy](https://github.com/andrix/python-snappy) to read and decompress Snappy-compressed files. Co-authored-by: siddharth.goel <siddharth.goel@bytedance.com> Co-authored-by: Chen Shen <scv119@gmail.com>	2022-03-15 13:52:22 -07:00
Amog Kamsetty	1572130a4e	[ml/train] Trainer interfaces [4/4]: `TorchTrainer` interface (#22989 ) Interface for TorchTrainer Depends on #22988	2022-03-15 12:47:44 -07:00
Antoni Baum	a8fbb4accc	[ML] `XGBoost`&`LightGBMPredictor` implementation (#23143 ) Implementation for XGBoostPredictor & LightGBMPredictor. The interface has been modified slightly.	2022-03-15 12:44:50 -07:00
Clark Zinzow	1d5f18fe0a	Fix equalized split handling of num_splits == num_blocks case. (#23191 )	2022-03-15 12:23:50 -07:00
Yi Cheng	72713e815b	[gcs] Remove use_gcs_for_bootstrap in other python modules.	2022-03-15 12:23:10 -07:00
Siyuan (Ryans) Zhuang	761f927720	[Lint] Cleanup incorrectly formatted strings (Part 2: Tune) (#23129 )	2022-03-15 12:17:47 -07:00
Archit Kulkarni	fc182006ec	[Doc] Add missing runtime context namespace doc (#23120 ) The public field RuntimeContext.namespace didn't have a docstring so it wasn't showing up at all in the docs. This PR adds a basic docstring.	2022-03-15 11:46:09 -07:00
Balaji Veeramani	c694ed4594	[Train] Add `enable_reproducibility` (#22851 ) This PR adds a feature that allows user to make their training runs more reproducible. I've implemented this feature by following PyTorch's guide on how to limit sources of randomness (https://pytorch.org/docs/stable/notes/randomness.html). These changes will make it easier for us to benchmark Ray Train, and also make it easier for users to reproduce their experiments.	2022-03-15 11:07:34 -07:00
xwjiang2010	99d5288bbd	[tune] Better error msg for grpc resource exhausted error. (#22806 )	2022-03-15 16:01:40 +00:00
shrekris-anyscale	bf1bd293f4	[serve] Make deployments in `Application` use only import paths (#23027 ) `Application` stores a group of deployments and can write them to a YAML config. However, this requires the deployments to use import paths as their `func_or_class`. This change make all deployments in an `Application` store only import paths as the `func_or_class`. This change also adds a utility function to get a deployment's import path. This utility function is used in the DeploymentNode for Pipelines.	2022-03-15 10:48:35 -05:00
Amog Kamsetty	e1f24a244b	[ml/train] Training Interfaces [3/4]: `DataParallelTrainer` interface (#22988 ) Interface for DataParallelTrainer and updates to ScalingConfig definition. Depends on #22986 Co-authored-by: Eric Liang <ekhliang@gmail.com> Co-authored-by: Richard Liaw <rliaw@berkeley.edu> Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>	2022-03-15 08:11:05 -07:00

... 3 4 5 6 7 ...

6525 commits