hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Clark Zinzow	b3fd3c6828	[Datasets] Fix spread resource prefix tasks with no CPU requested. (#22017 ) When applying the `_spread_resouce_prefix` hack, don't make the CPU resource a required resource when `num_cpus=0` is requested.	2022-01-31 18:30:47 -08:00
Clark Zinzow	00e1ac3a3c	[Datasets] Tie `_DesignatedBlockOwner` lifetime to context creator (#22007 ) Instead of using a detached lifetime, tie the lifetime of `_DesignatedBlockOwner` to the lifetime of the context creator. Also, only create a `_DesignatedBlockOwner` if dynamic block splitting is enabled.	2022-01-31 17:06:01 -08:00
SangBin Cho	2db71f72cc	[Doc] Remove the legacy doc (#21996 )	2022-01-31 15:26:19 -08:00
Clark Zinzow	03024b8951	[Datasets] Add `.iter_batches()` test for batch size larger than dataset. (#22000 )	2022-01-31 14:09:48 -08:00
Yi Cheng	0659d4a472	[nightly] Limit many drivers iteration to 4000 iterations (#21958 ) Due to faster running of many drivers, we limit the iteration to 4k for the test.	2022-01-31 13:26:02 -08:00
Kai Yang	2038cc96c6	Revert "Revert "[Dataset] [DataFrame 2/n] Add pandas block format implementation (partial) (#20988 ) (#21661 )" (#21894 ) This PR adds pandas block format support by implementing `PandasRow`, `PandasBlockBuilder`, `PandasBlockAccessor`. Note that `sort_and_partition`, `combine`, `merge_sorted_blocks`, `aggregate_combined_blocks` in `PandasBlockAccessor` redirects to arrow block format implementation for now. They'll be implemented in a later PR.	2022-01-31 12:09:51 -08:00
Eric Liang	45e03bd497	[data] Optimize dataset metadata read/write in Ray client (#21939 )	2022-01-31 01:41:45 -08:00
Eric Liang	b73a007ccd	Flag off RAY_legacy_scheduler_warnings (#21965 )	2022-01-30 17:12:45 -08:00
Eric Liang	fe167c94b1	Deflake occasional deadlock in test_dataset.py::test_basic_actors[True] (#21970 )	2022-01-30 17:11:54 -08:00
Balaji Veeramani	7f1bacc7dc	[CI] Format Python code with Black (#21975 ) See #21316 and #21311 for the motivation behind these changes.	2022-01-29 18:41:57 -08:00
Eric Liang	95877be8ee	[data] Serialize parquet piece metadata in batches to reduce overheads	2022-01-29 14:30:50 -08:00
DK.Pino	91171a194f	[Core] Extract a common method to get predefined resource index #21895	2022-01-29 14:18:09 -08:00
Jiajun Yao	a3ea4343b3	Remove work pipelining (#21964 )	2022-01-29 11:31:45 -08:00
Chen Shen	2939f153a1	address remaining comments (#21960 )	2022-01-28 18:09:45 -08:00
Junwen Yao	eb8adc6105	[train] add a utility function to turn off TF autosharding (#21887 ) This PR adds a utility function to turn off TF autosharding as a temporary solution. Closes #19324.	2022-01-28 16:09:06 -08:00
Mehul Raheja	fe1bf0261a	[autoscaler] Support cache_stopped_nodes on Azure (#21747 ) * basic reuse functionality without valid node filtering * Filtering, logging, and formatting for cache_stopped_nodes on Azure * Updated formatter version	2022-01-28 15:20:50 -08:00
Yi Cheng	570f67798a	[nightly] Move scheduling tests into one suite (#21959 ) For future convenience, we are moving scheduling-related tests into one suite for easier monitoring and benchmarking.	2022-01-28 13:32:34 -08:00
Chen Shen	bfe3e5f4a8	add check on shape (#21947 )	2022-01-28 12:27:43 -08:00
Archit Kulkarni	1f58ee3731	[1.10.0 Release] Add release logs for 1.10.0 (#21908 ) * Copy logs from 1.9.0 * Replace 1.9.0 data with 1.10.0 data * update with non-smoke-test results	2022-01-28 11:59:03 -08:00
Josh	4ab83345d0	[autoscaler] Ensure inital scaleup with high upscaling_speed isn't limited. (#21953 ) We regularly run tasks where we know our expected resource requirements at launch, so call request_resources with the required number of cpus. The number of machines doesn't scale back down as our tasks are finishing, and just sit idle. This is costing more in aws hosting costs than necessary. Fix suggested is to not call request_resources and have a high upscaling_speed to instantly scale up to the required resources.	2022-01-28 11:34:11 -08:00
Jialing He	6cb2dffcc0	[Bug][UT] fix python case `test_object_assign_owner` never run (#21945 )	2022-01-28 11:08:25 -08:00
Ian Rodney	75daf87aa0	[GCP] Add `roles/iam.roleViewer` (#21907 ) Allows bootstrap_gcp to be called from the Head Node. This is the case with Tune's DockerSyncClient.	2022-01-28 10:20:51 -08:00
chenk008	51393abc16	[Core]delete shim pid flag (#21853 ) Now we have `startup-token` to identify registering worker, so the shim pid flag is not needed any more.	2022-01-28 21:33:26 +08:00
Sven Mika	7fc1683bab	[RLlib] Some more `bandit` cleanup/tests. (#21932 )	2022-01-28 12:03:26 +01:00
Chen Shen	0ff8bfacec	[resource-reporting 3/n] further clean up LocalResourceManager (#21927 ) * clean up * address comments	2022-01-28 01:50:54 -08:00
Gagandeep Singh	069c499def	Unskipped tests for Windows (#21890 ) This is third unskipping PR.	2022-01-27 23:06:44 -08:00
Dmitri Gekhtman	1fee0159b4	[test][k8s] Minor adjustment to manual K8s tests (#21924 ) This PR is a minor adjustment to the K8s release tests. Replace tasks with actors in scale test for reduced flakiness Use an up-to-date Ray client API.	2022-01-27 20:07:14 -08:00
Guyang Song	937bf6933c	[event] redefine "SetCustomFields" to "UpdateCustomFields" (#21930 ) In some cases, we need to add custom fields in different code path. `SetCustomFields` will cover all the existing items, which leads to custom fields losing. This PR redefine `SetCustomFields` to `UpdateCustomFields `. `UpdateCustomFields ` could keep existing items and merge new items. If the key already exists, replace the value.	2022-01-28 11:54:44 +08:00
Amog Kamsetty	bd726aab02	[Release] Disable caching for `ray_lightning` (#21886 ) Passing tests: https://buildkite.com/ray-project/periodic-ci/builds/2560#_ Add an echo timestamp to the post build commands of the ray lightning release tests to trigger a cluster env rebuild and get the latest versions of ray lightning. Without this, the cluster env gets cached so an outdated version is installed on the cluster that is different than the one on the driver, resulting in the below failures. Closes #21871 Closes #21863 Also reinstalls the dependencies in the post build commands so old versions are not cached in the Docker images	2022-01-27 17:56:32 -08:00
mwtian	97f7e3d0e6	[e2e] do not terminate in `serve_failure` smoke test (#21925 ) When the script terminates, it will also terminate its cluster including dashboard, which will prevent subsequent job submissions. Other long running e2e tests do not terminate in smoke test mode, so make `serve_failure` behave the same.	2022-01-27 15:36:46 -08:00
Clark Zinzow	09fab70991	[Datasets] [Docs] Fix bug in Datasets locality-aware splitting example (#21937 ) Fixes bug in Datasets locality-aware splitting example.	2022-01-27 14:46:04 -08:00
iasoon	b0700e676b	[serve] add root_path setting (#21090 ) Support hosting a serve instance under a path prefix. Some clean-up should still be done for the different overlapping HttpOptions that now exist (host, port, root_path, root_url).	2022-01-27 16:36:22 -06:00
mwtian	559eefd06f	[Doc] update dask version for Ray 1.11.0 (#21933 ) This is needed for release 1.11.0.	2022-01-27 13:15:01 -08:00
Max Pumperla	4dd221f848	[Docs] Ray Data docs target state (#21931 ) Preview: [docs](https://ray--21931.org.readthedocs.build/en/21931/data/dataset.html) The Ray Data project's docs now have a clearer structure and have partly been rewritten/modified. In particular we have - [x] A Getting Started Guide - [x] An explicit User / How-To Guide - [x] A dedicated Key Concepts page - [x] A consistent naming convention in `Ray Data` whenever is is referred to the project. This surfaces quite clearly that, apart from the "Getting Started" sections, we really only have one real example. Once we have more, we can create an "Example" section like many other sub-projects have. This will be addressed in https://github.com/ray-project/ray/issues/21838.	2022-01-27 13:14:36 -08:00
Sven Mika	ee41800c16	[RLlib] Preparatory PR for multi-agent, multi-GPU learning agent (alpha-star style) #02 . (#21649 )	2022-01-27 22:07:05 +01:00
Jun Gong	8ebc50f844	[RLlib] Issue 21334: Fix APPO when kl_loss is enabled. (#21855 )	2022-01-27 20:08:58 +01:00
Sriram Sankar	b7391a1c39	[autoscaler] Optimize finding the node id (#21885 ) This is a simple refactoring change and my first PR in ray-project. This change moves an if statement outside of a loop. This way the check is not repeated for each iteration.	2022-01-27 10:51:59 -08:00
Victor Yap	8be5f016af	Add NVIDIA_TESLA_A100 to accelerator types (#21558 ) Adds Nvidia's A100 to the list of accelerator types. AWS offers this in the p4d.24xlarge instance type.	2022-01-27 10:47:09 -08:00
Jiajun Yao	cea80b1a5b	Don't advertise cpus on gpu nodes for pipelined ingestion tests (#21899 ) * Don't advertise cpus on gpu nodes for pipelined ingestion tests * Don't advertise cpus on gpu nodes for pipelined ingestion tests * Don't advertise cpus on gpu nodes for pipelined ingestion tests	2022-01-27 09:17:01 -08:00
Sven Mika	893536ebd9	[RLlib] Move bandits into main agents folder; Make RecSim adapter more accessible; (#21773 )	2022-01-27 13:58:12 +01:00
Sven Mika	371fbb17e4	[RLlib] Make `policies_to_train` more flexible via callable option. (#20735 )	2022-01-27 12:17:34 +01:00
Kai Fricke	8dcd4a99ef	[tune/wandb] Use `resume=False` per default (#21892 ) The WandbLoggingCallback is run on the driver side, with the experiment directory was the cwd. Using resume=True will pick up state from other trials (as the file name is global), and thus lead to warning messages. Thus, we should default to resume=False when using the callback. This PR also incorporates changes from #20966. Co-authored by: Queimo <queimo@gmx.net> Co-authored by: Karim <karim.ben.hicham@rwth-aachen.de>	2022-01-27 07:58:01 +00:00
mwtian	634f897cb6	[e2e] improve output dir handling (#21906 ) Try to clear the result dir before running the e2e.py script, to avoid failures where the directory already exists, or a file cannot be overwritten due to permission issue.	2022-01-26 23:56:08 -08:00
Chen Shen	bdf9fa337d	[resource-reporting 2/n]separate local resource manager from cluster_resource_scheduler (#21772 ) * add * fix test * fix more tests	2022-01-26 22:53:05 -08:00
Yi Cheng	7d2237bc9f	[dashboard] Remove unused fields in dashboard actor table for better memory footprint (#21919 )	2022-01-26 22:48:17 -08:00
Yi Cheng	e6bbafc17a	[function table] Make sure FunctionsToRun are executed properly on all workers (#21867 ) This PR fix the issue that sometimes FunctionsToRun is not executed. We isolated the Functions/Actors in function table, but not the RunctionsToRun. So when doing importing, sometimes, some functions will be missed. This PR fixed this.	2022-01-26 21:58:43 -08:00
Yi Cheng	3560211ab5	[nightly] Temporarily stops the two pipelines for scheduling until with good setup. (#21922 ) Right now these two tests always run out-of-time. We disable them for now and after solid test, we'll reenable them with good parameters.	2022-01-26 20:15:59 -08:00
SangBin Cho	d363c37078	[Core] Stop Ray stop from killing redis that's not started by Ray (#21805 ) Currently, `ray stop` logic is vulnerable, and it kills Redis server that's not started by Ray. This PR fixes the issue by better checking the executable name of redis-server (If it is redis-server created by Ray, it should contain Ray specific path copied while wheels are built). I originally tried to obtain ppid and kill a redis-server only when it is created from the same parent, but it turns out all processes started by ray start has no ppid. While the best solution is to have some "process manager" that we can detect redis server started by us, I think there's no need to put lots of efforts here right now since Redis will be removed soon. We will eventually move to a better direction (process manager) to handle this sort of issues.	2022-01-26 18:12:38 -08:00
Dmitri Gekhtman	757b5a88ea	[autoscaler] Cap min and max workers for manually managed on-prem clusters. (#21710 ) Closes https://github.com/ray-project/ray/issues/19636 by capping min and max workers for manually managed on-prem clusters to the number of user-specified worker ips. See https://github.com/ray-project/ray/issues/19636#issuecomment-1016664169 for additional context.	2022-01-26 18:03:55 -08:00
Max Pumperla	b34099e764	[docs] landing page (fixes #21750 ) (#21859 ) Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2022-01-26 17:14:25 -08:00

1 2 3 4 5 ...

11120 commits