hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Andrew Li	1a293a1187	Providing additional useful messages for JSONDecodeError (#23116 ) According to #22535 , I added additional and useful information when encountering the JSONDecodeError.	2022-03-17 20:58:43 -07:00
Guyang Song	1ad019aac3	[C++ API][Doc] Add doc and error log to notice C++ API is not supported on Windows (#23272 ) We don't support Windows entirely now. ## Checks - [X] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :(	2022-03-18 10:52:57 +08:00
Jiajun Yao	62a5404369	Collect more usage stats data (#23167 )	2022-03-17 19:33:27 -07:00
Tao Wang	b4bc8809dc	[Core][Tiny]Shorter thread name (#23222 ) In linux the thread name could not be longer than 15 chars. When we use command like top, we are easy being confused by similar thread name like `resource_report_poller` and `resource_report_broadcaster` because they are both show `resource_report`. This pr uses abbr to make the thread names shorter.	2022-03-18 09:58:32 +08:00
Jiao	ea51017e52	[Ray DAG][Serve Pipeline] better error messages on .bind and .remote with tests (#23290 )	2022-03-17 18:58:09 -07:00
shrekris-anyscale	1b30bfa972	[serve] Implement set_options (#23265 )	2022-03-17 17:09:55 -07:00
Edward Oakes	04ab27dcbf	[serve] Fix ServeHandle JSON Serde (#23285 )	2022-03-17 16:35:19 -07:00
Chris K. W	6416c65505	Revert "Revert "[Client] chunked get requests (#22455 )"" (#23261 ) * revert revertchunkedgets * exit early if all chunks received, tighter exception handler for stream in proxy	2022-03-17 16:24:30 -07:00
Siyuan (Ryans) Zhuang	f74ad24901	Cleanup nits in code (#23112 ) * cleanup code * fix comments	2022-03-17 15:55:35 -07:00
Amog Kamsetty	d31d6bc9bb	[Docker] Add Train requirements to ray-ml docker image (#22645 )	2022-03-17 15:07:32 -07:00
Eric Liang	015181ab9a	Add random access support for Datasets (experimental feature) (#22749 ) This PR adds experimental support for random access to datasets. A Dataset can be random access enabled by calling `ds.to_random_access_dataset(key, num_workers=N)`. This creates a RandomAccessDataset. RandomAccessDataset partitions the dataset across the cluster by the given sort key, providing efficient random access to records via binary search. A number of worker actors are created, each of which has zero-copy access to the underlying sorted data blocks of the Dataset. Performance-wise, you can expect each worker to provide ~3000 records / second via ``get_async()``, and ~10000 records / second via ``multiget()``. Since Ray actor calls go direct from worker->worker, throughput scales linearly with the number of workers.	2022-03-17 15:01:12 -07:00
Simon Mo	6cc0fee947	[Serve] Improve function deployment API (#23252 )	2022-03-17 14:37:43 -07:00
Archit Kulkarni	684a1821d3	[Doc] [runtime_env] Add limitation about single-file `py_modules` to doc (#23248 ) Until #23151 is fixed, this PR adds it as a known limitation in the documentation.	2022-03-17 16:23:46 -05:00
mwtian	1d2d60a2fc	[GCS-Ray] remove Redis password from CLI messages (#23242 ) Redis password should not be needed in the connection info printed by `ray start --head`. We can make another cleanup for removing flags and arguments related to Redis password. But it is a bit more risky (affects external Redis) and needs more care.	2022-03-17 13:36:29 -07:00
Simon Mo	f400b4333a	[Serve] Remove legacy pipeline codebase (#23172 )	2022-03-17 13:27:16 -07:00
Antoni Baum	1211c452d4	[ML/Train] `TensorflowTrainer` implementation (#23250 ) Implements `TensorflowTrainer`. Depends on https://github.com/ray-project/ray/pull/23211 (review only files with `tensorflow` in the name). Co-authored-by: Eric Liang <ekhliang@gmail.com> Co-authored-by: Richard Liaw <rliaw@berkeley.edu> Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com> Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>	2022-03-17 11:34:47 -07:00
Jian Xiao	8c9e3f6c2e	Move the third-party data integrations (non-Dataset stuff) out of the user guides which is for Dataset (#23162 ) Improve documentation of Ray Dataset.	2022-03-17 11:27:40 -07:00
Eric Liang	c8f207f746	[docs] Core docs refactor (#23216 ) This PR makes a number of major overhauls to the Ray core docs: Add a key-concepts section for {Tasks, Actors, Objects, Placement Groups, Env Deps}. Re-org the user guide to align with key concepts. Rewrite the walkthrough to link to mini-walkthroughs in the key concept sections. Minor tweaks and additional transition material.	2022-03-17 11:26:17 -07:00
Siyuan (Ryans) Zhuang	0f61e2f90e	[Lint] Cleanup incorrectly formatted strings (Part 5: util) (#23264 )	2022-03-17 10:27:05 -07:00
Antoni Baum	f71e7681b3	[ML] `XGBoost`&`LightGBMTrainer` implementation (#23245 ) Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com> Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>	2022-03-17 10:00:03 -07:00
Dmitri Gekhtman	c707ad8d73	Fix GCP node termination (#23101 ) Skips 404s on node termination for GCP node provider. Also resets internal "self.nodes_to_terminate" state at the start of an autoscaler iteration -- that's necessary for correct cleanup in the event of failed node termination.	2022-03-17 09:51:16 -07:00
Amog Kamsetty	cf512254bb	[ml/train] Don't create new `BackendExecutor` actor in `Trainable` (#23235 ) If using the DataParallelTrainer, since we are running the BackendExecutor in a Trainable actor already, we don't need to create a new actor. However if using Ray Train directly, we still want to run BackendExecutor in an actor for performance with Ray Client. This PR does some refactoring to support both cases.	2022-03-17 08:31:43 -07:00
xwjiang2010	c12d437fb5	[tune] de-spam some logging. (#23247 ) Demoting some logger calls to debug	2022-03-17 15:03:38 +00:00
Siyuan (Ryans) Zhuang	cb80518a80	[Lint] Cleanup incorrectly formatted strings (Part 4: tests, _private) (#23263 )	2022-03-17 00:49:16 -07:00
Amog Kamsetty	ef0b85c344	[ml/train] `TorchTrainer` implementation (#23219 )	2022-03-17 00:07:27 -07:00
Gagandeep Singh	c32649b85c	`map` and `map_unordered` cancel previous tasks before submitting new ones (#23187 ) N.B. - https://github.com/ray-project/ray/issues/23107#issuecomment-1068107507	2022-03-16 23:45:44 -07:00
Siyuan (Ryans) Zhuang	cc1728120f	[Tune] Move resource updater out of trial executor (#23178 ) * simplify trial executor * update test * fix: proper resource update before initialization * add test to BUILD * add doc for resource updater	2022-03-16 22:50:47 -07:00
xwjiang2010	814b49356c	[tuner] Tuner impl. (#22848 )	2022-03-16 20:55:30 -07:00
Balaji Veeramani	83986a4d83	[Train] Add support for automatic mixed precision (#22227 ) Closes #20643 Co-authored-by: Ubuntu <ubuntu@ip-172-31-58-19.us-west-2.compute.internal>	2022-03-16 20:53:02 -07:00
Archit Kulkarni	77090144a2	[jobs] Add `entrypoint` field to JobInfo (#23253 )	2022-03-16 22:02:22 -05:00
Qing Wang	9f3b4921b6	[Java] Add a default config file for log4j2. (#23225 ) We add a default config file for java worker to make info logs are able to be printed before `Ray.init()` invoked.	2022-03-17 11:00:21 +08:00
Amog Kamsetty	f33a495b3a	[ml/train] `DataParallelTrainer` implementation (#23211 ) Co-authored-by: Eric Liang <ekhliang@gmail.com> Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2022-03-16 19:49:44 -07:00
mwtian	391901f86b	[Remove Redis Pubsub 2/n] clean up remaining Redis references in gcs_utils.py (#23233 ) Continue to clean up Redis and other related Redis references, for - gcs_utils.py - log_monitor.py - `publish_error_to_driver()`	2022-03-16 19:34:57 -07:00
SangBin Cho	b350fe9ee8	[Nightly test] Fix additional k8s issues + add new tests (#23231 ) Fix bug from the previous fixes. Add more tests Stop using m5.xlarge (not supported now) There are 2 hard blockers from the infra: 1. Large size disk is not supported. 2. m5.xlarge is not supported. Both are considered as a high priority to be fixed soon.	2022-03-16 16:37:29 -07:00
ZhuSenlin	125ef0e5a6	[GCS] integrate cluster_resource_manager into gcs_resource_manager and gcs_resource_scheduler (#23105 ) * refactor gcs_resource_manager * fix lint error * fix lint error * fix compile error * fix test * fix test * fix test * add unit test * refactor UpdateNodeNormalTaskResources * fix comment Co-authored-by: 黑驰 <senlin.zsl@antgroup.com>	2022-03-16 16:27:14 -07:00
Stephanie Wang	ce71c5bbbd	[core][tests] Mark threaded_actors_stress_test as unstable	2022-03-16 15:31:19 -07:00
Archit Kulkarni	8707eb6288	[runtime env] Support `.whl` files in `py_modules` (#22368 ) The `py_modules` field of runtime_env supports uploading local Python modules for use on the Ray cluster. One gap in this is if the local Python module is in the form of a wheel (`.whl` file.) This PR adds the missing support for uploading and installing the `.whl` file.	2022-03-16 16:37:10 -05:00
shrekris-anyscale	84b3de6825	[serve] Add atomic delete (#23195 )	2022-03-16 14:13:10 -07:00
Jiao	2bcbe41d54	[Serve] Polish new deployment to DAG binding API with Ray DAG tests (#23208 )	2022-03-16 12:59:19 -07:00
Siyuan (Ryans) Zhuang	6d83a3f283	[Lint] Cleanup incorrectly formatted strings (Part 3: components) (#23130 )	2022-03-16 12:36:57 -07:00
Kai Fricke	e3987d85c3	[tune] Mark cloud OSS release tests as unstable (#23240 ) These tests have been flaky for a while. Until this is addressed, mark them as unstable.	2022-03-16 17:37:58 +00:00
Edward Oakes	d1a528d6af	[serve] Use `deploy_group` in `serve run` and set HTTP options (#23215 )	2022-03-16 12:37:21 -05:00
shrekris-anyscale	56ddea85a1	[Serve] Fix typo `language` (#23213 )	2022-03-16 10:14:44 -07:00
shrekris-anyscale	34ebb3409e	[serve] Make Dashboard start Serve in the "serve" namespace (#23198 ) The Ray Dashboard starts Serve in the `"_ray_internal_dashboard"` namespace. However, Serve by default starts in the `"serve"` namespace. This causes surprising behavior when working with the Serve CLI and REST API. This change make the Ray Dashboard start Serve in the `"serve"` namespace, allowing the REST API to work intuitively with the Python API.	2022-03-16 12:03:44 -05:00
Kai Fricke	eca5bcfc87	[ci/release] Reload modules after installing matching Ray (#23227 ) Apparently, ray gets imported somewhere before running the client runner (maybe from an anyscale package). This means that we need to reload the ray package after installing a matching local ray wheel. Additionally, job submission should also install a matching local ray to match with the job submission server.	2022-03-16 15:44:43 +00:00
Max Pumperla	71c57c619b	[docs] RLlib broken links (fixes #23160 ) (#23226 )	2022-03-16 12:38:18 +01:00
Kai Fricke	b80f79a072	[ci/multinode] Improve multi-node tests (#23196 ) The current multi node tests use a hardcoded mapping for local development mounts. With this PR, a new environment variable is introduced to be able to control this dynamically. Additionally, some minor improvements to the test utilities and monitor are added.	2022-03-16 09:59:50 +00:00
Siyuan (Ryans) Zhuang	d67c34256b	[Workflow] Optimize out tail recursion in python (#22794 ) * add test * warning when inplace subworkflows may use different resources	2022-03-16 01:51:18 -07:00
Gagandeep Singh	60a3340387	[workflow] Suggestions of correct inputs to `create_storage` in error message under windows (#23190 ) * Provide suggestions of correct inputs to create_storage in error msg * Applied linting format * Added test for verifying error message	2022-03-16 01:42:12 -07:00
Avnish Narayan	6c20e9d898	[RLlib] Change the slateq regression learning test with GPU to use torch only (#23168 )	2022-03-16 09:15:59 +01:00

1 2 3 4 5 ...

11774 commits