hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Eric Liang	cc74037b2e	Report only memory usage of pinned object copies to improve scaledown (#22020 ) Report only memory used by primary copies of objects, since secondary copies are not evicted even if not needed on a node. This prevents downscaling until all references to a shared object are removed. Closes https://github.com/ray-project/ray/issues/21870	2022-02-01 21:14:28 -08:00
Zyiqin-Miranda	8237c6228f	[autoscaler] Add AWS Autoscaler CloudWatch Alarm support (#21523 ) These changes add a set of improvements to enable automatic creation and update of CloudWatch alarms when provisioning AWS Autoscaling clusters. Successful implementation of these improvements will allow AWS Autoscaler users to: Setup alarms against Ray CloudWatch metrics to get notified about increased load, service outage. Update their CloudWatch alarm JSON configuration files during Ray up execution time. Notes: This PR is a follow-up PR for #20266, which adds CloudWatch alarm support.	2022-02-01 18:09:53 -08:00
Will Frey	429b7b9512	[Serve] Update ray.serve.deployment overloaded signature (#21743 )	2022-02-01 16:20:58 -08:00
shrekris-anyscale	8d43a6bac7	[Serve] [runtime env] Replace os.rename with shutil.move in remove_dir_from_filepaths() (#22018 ) Currently, the `remove_dir_from_filepaths()` function uses `os.rename()` when shifting directories and files. This change replaces [`os.rename()`](https://docs.python.org/3/library/os.html#os.rename) with [`shutil.move()`](https://docs.python.org/3/library/shutil.html#shutil.move) to support these operations even when the directory's parent and the temporary directory are located on separate file systems.	2022-02-01 14:33:53 -06:00
SangBin Cho	0d179dabcd	[Test] Fix broken lint (#22026 ) Fix the broken lint in the master. Details: https://buildkite.com/ray-project/ray-builders-branch/builds/5784#3c2cc53e-cf55-46f6-ab2d-d028d88d3d54	2022-02-01 11:03:32 -08:00
Balaji Veeramani	6441335f5e	[Doc] Correct information about code style (#21985 )	2022-02-01 10:37:21 -08:00
Archit Kulkarni	01ee9adbe8	[Serve] [Doc] Improve model composition snippet (#21961 )	2022-02-01 10:28:36 -08:00
Balaji Veeramani	7dcb0b6af6	[Train] Decorate `get_device` with `PublicAPI` (#22024 ) * Decorate `get_device` with `PublicAPI` * Add documentation * Update api.rst	2022-02-01 08:18:47 -08:00
Kai Fricke	b51b5afaea	[ci/gpu] Move ML dependency install to Dockerfile (#21711 ) Instead of installing dependencies in each Buildkite job, let's move this to the Dockerfile instead. This will update GPU tests to always use Python 3.7.	2022-02-01 12:04:55 +00:00
Kai Fricke	e508e9f75a	[tune] Support functools.partial names and treat as function in registry (#21518 ) Currently, tune trainables with functools.partial will raise the following warnings: INFO registry.py:66 -- Detected unknown callable for trainable. Converting to class. WARNING experiment.py:295 -- No name detected on trainable. Using DEFAULT. This PR propagates function names for function wrapped with partial and treat them as regular functions when wrapping.	2022-02-01 12:04:24 +00:00
SangBin Cho	19672688b0	[Test] Change `test_placement_group.py` to large test (#21997 ) We recently added tests to this file, and it seems to occasionally exceed 300 seconds timeout (before adding tests, it took about 260~270 seconds, so it is natural). This promotes this test to be large so that we can avoid this issue. (Lmk if you think it is better sharding test even more.)	2022-01-31 22:37:35 -08:00
SangBin Cho	3566cfd279	[Dashboard] Enable dashboard in the minimal ray installation (#21896 ) This is the last PR to enable dashboard in the minimal ray installation. Look https://docs.google.com/document/d/12qP3x5uaqZSKS-A_kK0ylPOp0E02_l-deAbmm8YtdFw/edit# for more details;	2022-01-31 22:34:40 -08:00
SangBin Cho	fd20cf3239	[Nightly Test] Add more metadata to test result (#21990 ) Add a columns, error code, commit url, stable, session url, and runtime	2022-01-31 22:33:30 -08:00
Simon Mo	e3cf47d731	[Serve] Remove shard_key, http_method, and http_headers (#21590 )	2022-01-31 22:27:12 -08:00
Chen Shen	4b528a7255	[resource-reporting 4/n] Separate cluster resource manager from cluster resource scheduler (#21992 ) As discussed, we need to separate the cluster resource management logic from scheduling logic. In this PR, we create the cluster_resource_manager to handle the resource management; and the cluster resource scheduler is only responsible for scheduling. * more clean up * refactor * address comments	2022-01-31 21:16:58 -08:00
Clark Zinzow	b3fd3c6828	[Datasets] Fix spread resource prefix tasks with no CPU requested. (#22017 ) When applying the `_spread_resouce_prefix` hack, don't make the CPU resource a required resource when `num_cpus=0` is requested.	2022-01-31 18:30:47 -08:00
Clark Zinzow	00e1ac3a3c	[Datasets] Tie `_DesignatedBlockOwner` lifetime to context creator (#22007 ) Instead of using a detached lifetime, tie the lifetime of `_DesignatedBlockOwner` to the lifetime of the context creator. Also, only create a `_DesignatedBlockOwner` if dynamic block splitting is enabled.	2022-01-31 17:06:01 -08:00
SangBin Cho	2db71f72cc	[Doc] Remove the legacy doc (#21996 )	2022-01-31 15:26:19 -08:00
Clark Zinzow	03024b8951	[Datasets] Add `.iter_batches()` test for batch size larger than dataset. (#22000 )	2022-01-31 14:09:48 -08:00
Yi Cheng	0659d4a472	[nightly] Limit many drivers iteration to 4000 iterations (#21958 ) Due to faster running of many drivers, we limit the iteration to 4k for the test.	2022-01-31 13:26:02 -08:00
Kai Yang	2038cc96c6	Revert "Revert "[Dataset] [DataFrame 2/n] Add pandas block format implementation (partial) (#20988 ) (#21661 )" (#21894 ) This PR adds pandas block format support by implementing `PandasRow`, `PandasBlockBuilder`, `PandasBlockAccessor`. Note that `sort_and_partition`, `combine`, `merge_sorted_blocks`, `aggregate_combined_blocks` in `PandasBlockAccessor` redirects to arrow block format implementation for now. They'll be implemented in a later PR.	2022-01-31 12:09:51 -08:00
Eric Liang	45e03bd497	[data] Optimize dataset metadata read/write in Ray client (#21939 )	2022-01-31 01:41:45 -08:00
Eric Liang	b73a007ccd	Flag off RAY_legacy_scheduler_warnings (#21965 )	2022-01-30 17:12:45 -08:00
Eric Liang	fe167c94b1	Deflake occasional deadlock in test_dataset.py::test_basic_actors[True] (#21970 )	2022-01-30 17:11:54 -08:00
Balaji Veeramani	7f1bacc7dc	[CI] Format Python code with Black (#21975 ) See #21316 and #21311 for the motivation behind these changes.	2022-01-29 18:41:57 -08:00
Eric Liang	95877be8ee	[data] Serialize parquet piece metadata in batches to reduce overheads	2022-01-29 14:30:50 -08:00
DK.Pino	91171a194f	[Core] Extract a common method to get predefined resource index #21895	2022-01-29 14:18:09 -08:00
Jiajun Yao	a3ea4343b3	Remove work pipelining (#21964 )	2022-01-29 11:31:45 -08:00
Chen Shen	2939f153a1	address remaining comments (#21960 )	2022-01-28 18:09:45 -08:00
Junwen Yao	eb8adc6105	[train] add a utility function to turn off TF autosharding (#21887 ) This PR adds a utility function to turn off TF autosharding as a temporary solution. Closes #19324.	2022-01-28 16:09:06 -08:00
Mehul Raheja	fe1bf0261a	[autoscaler] Support cache_stopped_nodes on Azure (#21747 ) * basic reuse functionality without valid node filtering * Filtering, logging, and formatting for cache_stopped_nodes on Azure * Updated formatter version	2022-01-28 15:20:50 -08:00
Yi Cheng	570f67798a	[nightly] Move scheduling tests into one suite (#21959 ) For future convenience, we are moving scheduling-related tests into one suite for easier monitoring and benchmarking.	2022-01-28 13:32:34 -08:00
Chen Shen	bfe3e5f4a8	add check on shape (#21947 )	2022-01-28 12:27:43 -08:00
Archit Kulkarni	1f58ee3731	[1.10.0 Release] Add release logs for 1.10.0 (#21908 ) * Copy logs from 1.9.0 * Replace 1.9.0 data with 1.10.0 data * update with non-smoke-test results	2022-01-28 11:59:03 -08:00
Josh	4ab83345d0	[autoscaler] Ensure inital scaleup with high upscaling_speed isn't limited. (#21953 ) We regularly run tasks where we know our expected resource requirements at launch, so call request_resources with the required number of cpus. The number of machines doesn't scale back down as our tasks are finishing, and just sit idle. This is costing more in aws hosting costs than necessary. Fix suggested is to not call request_resources and have a high upscaling_speed to instantly scale up to the required resources.	2022-01-28 11:34:11 -08:00
Jialing He	6cb2dffcc0	[Bug][UT] fix python case `test_object_assign_owner` never run (#21945 )	2022-01-28 11:08:25 -08:00
Ian Rodney	75daf87aa0	[GCP] Add `roles/iam.roleViewer` (#21907 ) Allows bootstrap_gcp to be called from the Head Node. This is the case with Tune's DockerSyncClient.	2022-01-28 10:20:51 -08:00
chenk008	51393abc16	[Core]delete shim pid flag (#21853 ) Now we have `startup-token` to identify registering worker, so the shim pid flag is not needed any more.	2022-01-28 21:33:26 +08:00
Sven Mika	7fc1683bab	[RLlib] Some more `bandit` cleanup/tests. (#21932 )	2022-01-28 12:03:26 +01:00
Chen Shen	0ff8bfacec	[resource-reporting 3/n] further clean up LocalResourceManager (#21927 ) * clean up * address comments	2022-01-28 01:50:54 -08:00
Gagandeep Singh	069c499def	Unskipped tests for Windows (#21890 ) This is third unskipping PR.	2022-01-27 23:06:44 -08:00
Dmitri Gekhtman	1fee0159b4	[test][k8s] Minor adjustment to manual K8s tests (#21924 ) This PR is a minor adjustment to the K8s release tests. Replace tasks with actors in scale test for reduced flakiness Use an up-to-date Ray client API.	2022-01-27 20:07:14 -08:00
Guyang Song	937bf6933c	[event] redefine "SetCustomFields" to "UpdateCustomFields" (#21930 ) In some cases, we need to add custom fields in different code path. `SetCustomFields` will cover all the existing items, which leads to custom fields losing. This PR redefine `SetCustomFields` to `UpdateCustomFields `. `UpdateCustomFields ` could keep existing items and merge new items. If the key already exists, replace the value.	2022-01-28 11:54:44 +08:00
Amog Kamsetty	bd726aab02	[Release] Disable caching for `ray_lightning` (#21886 ) Passing tests: https://buildkite.com/ray-project/periodic-ci/builds/2560#_ Add an echo timestamp to the post build commands of the ray lightning release tests to trigger a cluster env rebuild and get the latest versions of ray lightning. Without this, the cluster env gets cached so an outdated version is installed on the cluster that is different than the one on the driver, resulting in the below failures. Closes #21871 Closes #21863 Also reinstalls the dependencies in the post build commands so old versions are not cached in the Docker images	2022-01-27 17:56:32 -08:00
mwtian	97f7e3d0e6	[e2e] do not terminate in `serve_failure` smoke test (#21925 ) When the script terminates, it will also terminate its cluster including dashboard, which will prevent subsequent job submissions. Other long running e2e tests do not terminate in smoke test mode, so make `serve_failure` behave the same.	2022-01-27 15:36:46 -08:00
Clark Zinzow	09fab70991	[Datasets] [Docs] Fix bug in Datasets locality-aware splitting example (#21937 ) Fixes bug in Datasets locality-aware splitting example.	2022-01-27 14:46:04 -08:00
iasoon	b0700e676b	[serve] add root_path setting (#21090 ) Support hosting a serve instance under a path prefix. Some clean-up should still be done for the different overlapping HttpOptions that now exist (host, port, root_path, root_url).	2022-01-27 16:36:22 -06:00
mwtian	559eefd06f	[Doc] update dask version for Ray 1.11.0 (#21933 ) This is needed for release 1.11.0.	2022-01-27 13:15:01 -08:00
Max Pumperla	4dd221f848	[Docs] Ray Data docs target state (#21931 ) Preview: [docs](https://ray--21931.org.readthedocs.build/en/21931/data/dataset.html) The Ray Data project's docs now have a clearer structure and have partly been rewritten/modified. In particular we have - [x] A Getting Started Guide - [x] An explicit User / How-To Guide - [x] A dedicated Key Concepts page - [x] A consistent naming convention in `Ray Data` whenever is is referred to the project. This surfaces quite clearly that, apart from the "Getting Started" sections, we really only have one real example. Once we have more, we can create an "Example" section like many other sub-projects have. This will be addressed in https://github.com/ray-project/ray/issues/21838.	2022-01-27 13:14:36 -08:00
Sven Mika	ee41800c16	[RLlib] Preparatory PR for multi-agent, multi-GPU learning agent (alpha-star style) #02 . (#21649 )	2022-01-27 22:07:05 +01:00

... 2 3 4 5 6 ...

11285 commits