hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Archit Kulkarni	01ee9adbe8	[Serve] [Doc] Improve model composition snippet (#21961 )	2022-02-01 10:28:36 -08:00
Balaji Veeramani	7dcb0b6af6	[Train] Decorate `get_device` with `PublicAPI` (#22024 ) * Decorate `get_device` with `PublicAPI` * Add documentation * Update api.rst	2022-02-01 08:18:47 -08:00
Kai Fricke	b51b5afaea	[ci/gpu] Move ML dependency install to Dockerfile (#21711 ) Instead of installing dependencies in each Buildkite job, let's move this to the Dockerfile instead. This will update GPU tests to always use Python 3.7.	2022-02-01 12:04:55 +00:00
Kai Fricke	e508e9f75a	[tune] Support functools.partial names and treat as function in registry (#21518 ) Currently, tune trainables with functools.partial will raise the following warnings: INFO registry.py:66 -- Detected unknown callable for trainable. Converting to class. WARNING experiment.py:295 -- No name detected on trainable. Using DEFAULT. This PR propagates function names for function wrapped with partial and treat them as regular functions when wrapping.	2022-02-01 12:04:24 +00:00
SangBin Cho	19672688b0	[Test] Change `test_placement_group.py` to large test (#21997 ) We recently added tests to this file, and it seems to occasionally exceed 300 seconds timeout (before adding tests, it took about 260~270 seconds, so it is natural). This promotes this test to be large so that we can avoid this issue. (Lmk if you think it is better sharding test even more.)	2022-01-31 22:37:35 -08:00
SangBin Cho	3566cfd279	[Dashboard] Enable dashboard in the minimal ray installation (#21896 ) This is the last PR to enable dashboard in the minimal ray installation. Look https://docs.google.com/document/d/12qP3x5uaqZSKS-A_kK0ylPOp0E02_l-deAbmm8YtdFw/edit# for more details;	2022-01-31 22:34:40 -08:00
SangBin Cho	fd20cf3239	[Nightly Test] Add more metadata to test result (#21990 ) Add a columns, error code, commit url, stable, session url, and runtime	2022-01-31 22:33:30 -08:00
Simon Mo	e3cf47d731	[Serve] Remove shard_key, http_method, and http_headers (#21590 )	2022-01-31 22:27:12 -08:00
Chen Shen	4b528a7255	[resource-reporting 4/n] Separate cluster resource manager from cluster resource scheduler (#21992 ) As discussed, we need to separate the cluster resource management logic from scheduling logic. In this PR, we create the cluster_resource_manager to handle the resource management; and the cluster resource scheduler is only responsible for scheduling. * more clean up * refactor * address comments	2022-01-31 21:16:58 -08:00
Clark Zinzow	b3fd3c6828	[Datasets] Fix spread resource prefix tasks with no CPU requested. (#22017 ) When applying the `_spread_resouce_prefix` hack, don't make the CPU resource a required resource when `num_cpus=0` is requested.	2022-01-31 18:30:47 -08:00
Clark Zinzow	00e1ac3a3c	[Datasets] Tie `_DesignatedBlockOwner` lifetime to context creator (#22007 ) Instead of using a detached lifetime, tie the lifetime of `_DesignatedBlockOwner` to the lifetime of the context creator. Also, only create a `_DesignatedBlockOwner` if dynamic block splitting is enabled.	2022-01-31 17:06:01 -08:00
SangBin Cho	2db71f72cc	[Doc] Remove the legacy doc (#21996 )	2022-01-31 15:26:19 -08:00
Clark Zinzow	03024b8951	[Datasets] Add `.iter_batches()` test for batch size larger than dataset. (#22000 )	2022-01-31 14:09:48 -08:00
Yi Cheng	0659d4a472	[nightly] Limit many drivers iteration to 4000 iterations (#21958 ) Due to faster running of many drivers, we limit the iteration to 4k for the test.	2022-01-31 13:26:02 -08:00
Kai Yang	2038cc96c6	Revert "Revert "[Dataset] [DataFrame 2/n] Add pandas block format implementation (partial) (#20988 ) (#21661 )" (#21894 ) This PR adds pandas block format support by implementing `PandasRow`, `PandasBlockBuilder`, `PandasBlockAccessor`. Note that `sort_and_partition`, `combine`, `merge_sorted_blocks`, `aggregate_combined_blocks` in `PandasBlockAccessor` redirects to arrow block format implementation for now. They'll be implemented in a later PR.	2022-01-31 12:09:51 -08:00
Eric Liang	45e03bd497	[data] Optimize dataset metadata read/write in Ray client (#21939 )	2022-01-31 01:41:45 -08:00
Eric Liang	b73a007ccd	Flag off RAY_legacy_scheduler_warnings (#21965 )	2022-01-30 17:12:45 -08:00
Eric Liang	fe167c94b1	Deflake occasional deadlock in test_dataset.py::test_basic_actors[True] (#21970 )	2022-01-30 17:11:54 -08:00
Balaji Veeramani	7f1bacc7dc	[CI] Format Python code with Black (#21975 ) See #21316 and #21311 for the motivation behind these changes.	2022-01-29 18:41:57 -08:00
Eric Liang	95877be8ee	[data] Serialize parquet piece metadata in batches to reduce overheads	2022-01-29 14:30:50 -08:00
DK.Pino	91171a194f	[Core] Extract a common method to get predefined resource index #21895	2022-01-29 14:18:09 -08:00
Jiajun Yao	a3ea4343b3	Remove work pipelining (#21964 )	2022-01-29 11:31:45 -08:00
Chen Shen	2939f153a1	address remaining comments (#21960 )	2022-01-28 18:09:45 -08:00
Junwen Yao	eb8adc6105	[train] add a utility function to turn off TF autosharding (#21887 ) This PR adds a utility function to turn off TF autosharding as a temporary solution. Closes #19324.	2022-01-28 16:09:06 -08:00
Mehul Raheja	fe1bf0261a	[autoscaler] Support cache_stopped_nodes on Azure (#21747 ) * basic reuse functionality without valid node filtering * Filtering, logging, and formatting for cache_stopped_nodes on Azure * Updated formatter version	2022-01-28 15:20:50 -08:00
Yi Cheng	570f67798a	[nightly] Move scheduling tests into one suite (#21959 ) For future convenience, we are moving scheduling-related tests into one suite for easier monitoring and benchmarking.	2022-01-28 13:32:34 -08:00
Chen Shen	bfe3e5f4a8	add check on shape (#21947 )	2022-01-28 12:27:43 -08:00
Archit Kulkarni	1f58ee3731	[1.10.0 Release] Add release logs for 1.10.0 (#21908 ) * Copy logs from 1.9.0 * Replace 1.9.0 data with 1.10.0 data * update with non-smoke-test results	2022-01-28 11:59:03 -08:00
Josh	4ab83345d0	[autoscaler] Ensure inital scaleup with high upscaling_speed isn't limited. (#21953 ) We regularly run tasks where we know our expected resource requirements at launch, so call request_resources with the required number of cpus. The number of machines doesn't scale back down as our tasks are finishing, and just sit idle. This is costing more in aws hosting costs than necessary. Fix suggested is to not call request_resources and have a high upscaling_speed to instantly scale up to the required resources.	2022-01-28 11:34:11 -08:00
Jialing He	6cb2dffcc0	[Bug][UT] fix python case `test_object_assign_owner` never run (#21945 )	2022-01-28 11:08:25 -08:00
Ian Rodney	75daf87aa0	[GCP] Add `roles/iam.roleViewer` (#21907 ) Allows bootstrap_gcp to be called from the Head Node. This is the case with Tune's DockerSyncClient.	2022-01-28 10:20:51 -08:00
chenk008	51393abc16	[Core]delete shim pid flag (#21853 ) Now we have `startup-token` to identify registering worker, so the shim pid flag is not needed any more.	2022-01-28 21:33:26 +08:00
Sven Mika	7fc1683bab	[RLlib] Some more `bandit` cleanup/tests. (#21932 )	2022-01-28 12:03:26 +01:00
Chen Shen	0ff8bfacec	[resource-reporting 3/n] further clean up LocalResourceManager (#21927 ) * clean up * address comments	2022-01-28 01:50:54 -08:00
Gagandeep Singh	069c499def	Unskipped tests for Windows (#21890 ) This is third unskipping PR.	2022-01-27 23:06:44 -08:00
Dmitri Gekhtman	1fee0159b4	[test][k8s] Minor adjustment to manual K8s tests (#21924 ) This PR is a minor adjustment to the K8s release tests. Replace tasks with actors in scale test for reduced flakiness Use an up-to-date Ray client API.	2022-01-27 20:07:14 -08:00
Guyang Song	937bf6933c	[event] redefine "SetCustomFields" to "UpdateCustomFields" (#21930 ) In some cases, we need to add custom fields in different code path. `SetCustomFields` will cover all the existing items, which leads to custom fields losing. This PR redefine `SetCustomFields` to `UpdateCustomFields `. `UpdateCustomFields ` could keep existing items and merge new items. If the key already exists, replace the value.	2022-01-28 11:54:44 +08:00
Amog Kamsetty	bd726aab02	[Release] Disable caching for `ray_lightning` (#21886 ) Passing tests: https://buildkite.com/ray-project/periodic-ci/builds/2560#_ Add an echo timestamp to the post build commands of the ray lightning release tests to trigger a cluster env rebuild and get the latest versions of ray lightning. Without this, the cluster env gets cached so an outdated version is installed on the cluster that is different than the one on the driver, resulting in the below failures. Closes #21871 Closes #21863 Also reinstalls the dependencies in the post build commands so old versions are not cached in the Docker images	2022-01-27 17:56:32 -08:00
mwtian	97f7e3d0e6	[e2e] do not terminate in `serve_failure` smoke test (#21925 ) When the script terminates, it will also terminate its cluster including dashboard, which will prevent subsequent job submissions. Other long running e2e tests do not terminate in smoke test mode, so make `serve_failure` behave the same.	2022-01-27 15:36:46 -08:00
Clark Zinzow	09fab70991	[Datasets] [Docs] Fix bug in Datasets locality-aware splitting example (#21937 ) Fixes bug in Datasets locality-aware splitting example.	2022-01-27 14:46:04 -08:00
iasoon	b0700e676b	[serve] add root_path setting (#21090 ) Support hosting a serve instance under a path prefix. Some clean-up should still be done for the different overlapping HttpOptions that now exist (host, port, root_path, root_url).	2022-01-27 16:36:22 -06:00
mwtian	559eefd06f	[Doc] update dask version for Ray 1.11.0 (#21933 ) This is needed for release 1.11.0.	2022-01-27 13:15:01 -08:00
Max Pumperla	4dd221f848	[Docs] Ray Data docs target state (#21931 ) Preview: [docs](https://ray--21931.org.readthedocs.build/en/21931/data/dataset.html) The Ray Data project's docs now have a clearer structure and have partly been rewritten/modified. In particular we have - [x] A Getting Started Guide - [x] An explicit User / How-To Guide - [x] A dedicated Key Concepts page - [x] A consistent naming convention in `Ray Data` whenever is is referred to the project. This surfaces quite clearly that, apart from the "Getting Started" sections, we really only have one real example. Once we have more, we can create an "Example" section like many other sub-projects have. This will be addressed in https://github.com/ray-project/ray/issues/21838.	2022-01-27 13:14:36 -08:00
Sven Mika	ee41800c16	[RLlib] Preparatory PR for multi-agent, multi-GPU learning agent (alpha-star style) #02 . (#21649 )	2022-01-27 22:07:05 +01:00
Jun Gong	8ebc50f844	[RLlib] Issue 21334: Fix APPO when kl_loss is enabled. (#21855 )	2022-01-27 20:08:58 +01:00
Sriram Sankar	b7391a1c39	[autoscaler] Optimize finding the node id (#21885 ) This is a simple refactoring change and my first PR in ray-project. This change moves an if statement outside of a loop. This way the check is not repeated for each iteration.	2022-01-27 10:51:59 -08:00
Victor Yap	8be5f016af	Add NVIDIA_TESLA_A100 to accelerator types (#21558 ) Adds Nvidia's A100 to the list of accelerator types. AWS offers this in the p4d.24xlarge instance type.	2022-01-27 10:47:09 -08:00
Jiajun Yao	cea80b1a5b	Don't advertise cpus on gpu nodes for pipelined ingestion tests (#21899 ) * Don't advertise cpus on gpu nodes for pipelined ingestion tests * Don't advertise cpus on gpu nodes for pipelined ingestion tests * Don't advertise cpus on gpu nodes for pipelined ingestion tests	2022-01-27 09:17:01 -08:00
Sven Mika	893536ebd9	[RLlib] Move bandits into main agents folder; Make RecSim adapter more accessible; (#21773 )	2022-01-27 13:58:12 +01:00
Sven Mika	371fbb17e4	[RLlib] Make `policies_to_train` more flexible via callable option. (#20735 )	2022-01-27 12:17:34 +01:00

1 2 3 4 5 ...

11229 commits