hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-07 02:51:39 -05:00

Author	SHA1	Message	Date
Sven Mika	62dbf26394	[RLlib] POC: Run PGTrainer w/o the distr. exec API (Trainer's new training_iteration method). (#20984 )	2021-12-21 08:39:05 +01:00
Dmitri Gekhtman	c9cf912a15	[autoscaler] Pass on provider.internal_ip() exceptions during scale down (#21204 ) Treats failures of provider.internal_ip during node drain as non-fatal. For example, if a node is deleted by a third party between the time it's scheduled for termination and drained, there will now be no error on GCP. Closes #21151	2021-12-20 22:23:17 -08:00
qicosmos	d1a27487a3	[C++ Worker] fix uninit ray runtime instance (#21125 ) In some compiler, the static ray runtime in ray runtime holder maybe a new un-init instance in dynamic library, so we need to init ray time holder in dynamic library to make sure the new instance valid.	2021-12-21 12:07:59 +08:00
Qing Wang	94251fbcc4	[Core] Fix invalid to specify concurrency group at runtime. (#21191 ) We fix the issue that it's unable to specify the concurrency group name of an actor task at runtime with the following usage: ```python a.f2.options(concurrency_group="compute").remote() ```	2021-12-21 10:47:47 +08:00
Linsong Chu	61bbecdb7d	[Workflow]add doc for metadata (#20156 ) This PR adds documentation for Workflow Metadata, which we recently added support in https://github.com/ray-project/ray/pull/19372. Co-authored-by: Yi Cheng <74173148+iycheng@users.noreply.github.com>	2021-12-20 17:24:07 -08:00
Hankpipi	ae5bb34f60	[Serve Autoscaler] Raise warning if max_concurrent_queries < target_num_ongoing_requests (#21184 )	2021-12-20 16:07:19 -08:00
iasoon	1c93beb490	[serve] use true nulls in snapshot (#21062 )	2021-12-20 16:07:09 -08:00
SangBin Cho	44320aba3b	[Nightly Test] Fix broken scalability test #21201 I added memory monitor to the scalability tests. This broke the tests because creating a memory monitor requires the node resources (to be scheduled on a head node), and that broke "resource leak" check. Ideally, this resource leak check should be more robust, but I fix the issue in an easier way for now. In the sooner future, memory monitor will become a fixture, and in that case, we should fix resource leak function code.	2021-12-20 14:58:39 -08:00
architkulkarni	5cc1308c66	[runtime env] [doc] [test] Add docs and tests for `RAY_runtime_env_skip_local_gc` environment variable (#21163 )	2021-12-20 10:34:59 -08:00
SangBin Cho	5959669a70	[Core] Remove task table. (#21188 ) Remove task table that's not used anymore.	2021-12-20 06:22:01 -08:00
architkulkarni	5b6bf534a0	[Java] Fix typo projetct->project in XML file (#21060 )	2021-12-20 20:21:35 +08:00
Qing Wang	bd502e8bd5	[Java] Remove out of date comment. (#21073 ) The semantic of `setName` API is changed, but the comment is out of date. This PR fixes it.	2021-12-20 20:07:59 +08:00
DK.Pino	33a45e55df	Revert "Revert "[Placement Group] Make placement group prepare resource rpc r… (#21144 )" (#21152 ) * Revert "Revert "[Placement Group] Make placement group prepare resource rpc r… (#21144)" This reverts commit `02465a6792`. * fix flakey ut	2021-12-20 00:32:42 -08:00
mwtian	06ec07057c	Revert "[Core] Unrevert #21115 , fix auto address env (#21158 )" (#21189 ) This reverts commit `968f08607b`. It is breaking e2e tests where worker nodes cannot start. e.g. ``` Traceback (most recent call last): File "/home/ray/anaconda3/bin/ray", line 8, in <module> sys.exit(main()) File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/scripts/scripts.py", line 1961, in main return cli() File "/home/ray/anaconda3/lib/python3.7/site-packages/click/core.py", line 1128, in __call__ return self.main(args, kwargs) File "/home/ray/anaconda3/lib/python3.7/site-packages/click/core.py", line 1053, in main rv = self.invoke(ctx) File "/home/ray/anaconda3/lib/python3.7/site-packages/click/core.py", line 1659, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/ray/anaconda3/lib/python3.7/site-packages/click/core.py", line 1395, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/ray/anaconda3/lib/python3.7/site-packages/click/core.py", line 754, in invoke return __callback(args, *kwargs) File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/autoscaler/_private/cli_logger.py", line 808, in wrapper return f(args, **kwargs) File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/scripts/scripts.py", line 733, in start address_ip, password=redis_password) File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/_private/services.py", line 593, in create_redis_client _, redis_ip_address, redis_port = validate_bootstrap_address(redis_address) File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/_private/services.py", line 494, in validate_bootstrap_address raise ValueError("Malformed address. Expected '<host>:<port>'.") ValueError: Malformed address. Expected '<host>:<port>'. ```	2021-12-20 00:22:12 -08:00
Guyang Song	2a9d9726d6	[doc] add doc for container runtime env (#21131 )	2021-12-20 14:13:05 +08:00
architkulkarni	774163f9c9	[Java] Bump log4j 2.16.0 -> 2.17.0 (#21176 ) Resolves [CVE-2021-45105](https://github.com/advisories/GHSA-p6xc-xr62-6r2g).	2021-12-20 10:27:24 +08:00
Oliver Mannion	8d9e0fca61	fix: data not exported (#20887 ) * fix: data not exported * empty commit	2021-12-18 22:33:34 -08:00
architkulkarni	2489b17634	[release] Uninstall old ray in all release test app configs to fix commit mismatch error (#21175 ) * uninstall old ray in all release test app configs * add instruction to e2e.py dosctring	2021-12-18 16:58:49 -08:00
Clark Zinzow	968f08607b	[Core] Unrevert #21115 , fix auto address env (#21158 ) This PR unreverts #21115, fixing the handling of an `"auto"` address in the `RAY_ADDRESS` environment variable. Co-authored-by: Mingwei Tian <mwtian@anyscale.com>	2021-12-18 07:45:00 -08:00
Chen Shen	c9c3f0745a	[Dataset][nighlytest] use latest ray for running test #21148 We are actually using the ray comes with the image, which is on a very old version of Ray. (suprised this actually works)	2021-12-17 23:48:44 -08:00
Jun Gong	c98d4fe2f3	[ci] Change build-wheel-macos-arm64.sh to be executable. (#21164 ) So the script can be simply executed. All the other build-wheels-xxx.sh are executable.	2021-12-17 17:23:10 -08:00
architkulkarni	56bd8e58de	[CI] [Release] uninstall Ray before installing new Ray version (#21159 )	2021-12-17 16:25:15 -08:00
Clark Zinzow	c3d68fa0c1	[Dask-on-Ray] Add Dask config helper, set task-based shuffle by default. (#21114 ) Dask default's to a disk-based shuffle even thought we're using a distributed scheduler, which appears to be resulting in dropped data since the filesystem isn't shared across nodes. Dask Distributed manually sets the shuffle algorithm in the global config to the task-based shuffle, which the Dask-on-Ray scheduler should probably do as well. This PR adds a Dask config helper, `enable_dask_on_ray`, that sets Dask-on-Ray as the default scheduler along with changing the default shuffle to a task-based shuffle. The shuffle method can still be overridden by the user by manually specifying `df.set_index(shuffle="disk")`.	2021-12-17 13:16:37 -08:00
Chen Shen	d99f699e3d	Revert "[Core][GCS] Use `port` and `address` flags to configure GCS server / client in GCS bootstrapping mode (#21115 )" (#21157 ) This reverts commit `0e7c0b491b`.	2021-12-17 11:48:40 -08:00
xwjiang2010	ce81ad21f3	Revert "[tune] Elongate test_trial_scheduler_pbt timeout. (#21120 )" (#21155 )	2021-12-17 11:32:00 -08:00
Gagandeep Singh	14fc023cb6	Bump timeout value for `test_worker_capping.py::test_zero_cpu_scheduling` (#21035 )	2021-12-17 10:51:54 -08:00
Simon Mo	956774e757	[CI] Disable serve test_standalone on windows again (#21154 )	2021-12-17 10:32:27 -08:00
Hankpipi	04ecdee9db	[Serve] Fix serve metrics test (#21140 )	2021-12-17 10:23:17 -08:00
shrekris-anyscale	7e15a8199e	[Serve] Reduce test_cluster flakiness by increasing timeout (#21146 )	2021-12-17 10:22:56 -08:00
SangBin Cho	02465a6792	Revert "[Placement Group] Make placement group prepare resource rpc r… (#21144 ) This PR makes pg_test_2 flaky. cc @clay4444 can you re-merge it?	2021-12-17 00:13:26 -08:00
mwtian	0e7c0b491b	[Core][GCS] Use `port` and `address` flags to configure GCS server / client in GCS bootstrapping mode (#21115 ) This change adds support for parsing `--address` as bootstrap address, and treating `--port` as GCS port, when using GCS for bootstrapping. Not launching Redis in GCS bootstrapping mode, and using GCS to fetch initial cluster information, will be implemented in a subsequent change. Also made some cleanups.	2021-12-16 15:11:05 -08:00
Matti Picus	29965ad325	enable passing serve tests on windows (#21107 ) * enable passing serve tests on windows * move test_handle to 'medium' and enable' * move test_cli to 'medium'	2021-12-16 14:03:11 -08:00
architkulkarni	4dcba1d0f4	[CI] Pin anyscale version to fix release tests (#21138 )	2021-12-16 13:15:16 -08:00
Simon Mo	0f0813b7b6	[Serve] Bump test_cli timeout (#21139 )	2021-12-16 11:00:22 -08:00
Hankpipi	97d3142c59	[Serve] Fix naming error and add Serve metric for HTTP error codes (#21009 )	2021-12-16 09:48:03 -08:00
Scott Graham	7153d58cbd	Updates to azure autoscaler for authentication and dependency updates (#19603 ) * updating azure autoscaler versions and backwards compatibility, and moving to azure-identity based authentication * adding azure sdk rqmts for tests * updating azure test requirements and adding wrapper function for azure sdk function resolution * adding docstring to get_azure_sdk_function Co-authored-by: Scott Graham <scgraham@microsoft.com>	2021-12-16 09:23:32 -08:00
xwjiang2010	a85df8c528	[tune] Elongate test_trial_scheduler_pbt timeout. (#21120 )	2021-12-16 17:19:57 +01:00
Tomasz Wrona	5836cf9b1c	[Tune] Allow for tuples in _split_resolved_unresolved_values. (#20794 )	2021-12-16 10:20:54 +01:00
Avnish Narayan	85a368c720	[RLlib] Expand Base env API to add necessary methods for testing. (#21027 )	2021-12-16 10:19:49 +01:00
Guyang Song	32cf19a881	[runtime env] add and remove uri reference in worker pool (#20789 ) Currently, the logic of uri reference in raylet is: - For job level, add uri reference when job started and remove uri reference when job finished. - For actor level, add and remove uri reference for detached actor only. In this PR, the logic is optimized to: - For job level, check if runtime env should be installed eagerly first. If true, add or remove uri reference. - For actor level * First, add uri reference for starting worker process to avoid that runtime env is gcd before worker registered. * Second, add uri reference for echo worker thread of worker process. We will remove reference when worker disconnected. - Besides, we move the instance of `RuntimeEnvManager` from `node_manager` to `worker_pool`. - Enable the test `test_actor_level_gc` and add some tests in python and worker pool test.	2021-12-16 01:00:05 -08:00
Yi Cheng	a778741db6	[gcs] Update constructor of gcs client (#21025 ) GcsClient accepts only redis before. To make it work without redis, we need to be able to pass gcs address to gcs client as well. In this PR, we add GCS related into into GcsClientOptions so that we can connect to the gcs directly with gcs address. This PR is part of GCS bootstrap. In the following PR, we'll add functionality to set the correct GcsClientOptions based on flags.	2021-12-16 00:19:37 -08:00
brulu	8b77fc0aef	[RLlib] Updating Repeated space. Allowing numpy arrays and adding representation. (#20799 )	2021-12-16 08:27:55 +01:00
DK.Pino	1edf4ab041	[Placement Group] Make placement group prepare resource rpc request batched (#20897 ) This is one part of this refactor, #20715 , make the prepare resource RPC requests batched per node.	2021-12-15 22:32:50 -08:00
Eric Liang	19390705e1	[data] Shuffle stats for datasets (#21070 ) This PR fills in the shuffle stats TODOs, as well as adding some common stats measurement machinery.	2021-12-15 21:56:10 -08:00
Chen Shen	80eb00f525	[Chaos] fix dataset chaos test #21113	2021-12-15 20:13:38 -08:00
Simon Mo	e453bfdb8e	[Serve] Run long poll callbacks in event loop (#21104 )	2021-12-15 16:27:08 -08:00
Yi Cheng	abdf9b5f3c	[nightly] Fix benchmark commit check failure (#21119 ) It looks like somehow `pip3 install -U` won't update ray anymore, and we need to uninstall before installing.	2021-12-15 14:54:03 -08:00
Matti Picus	d2cd0730a0	[Windows] Enable test_advanced_2 on windows (#20994 )	2021-12-15 14:30:40 -08:00
Sven Mika	e485aa846a	[RLlib; Docs overhaul] Overhaul of auto-API reference pages (via sphinx autoclass/automodule). (#19786 )	2021-12-15 22:32:52 +01:00
Ian Rodney	c7fb5a94d1	[CI] Upgrade Pip to 21.3 (#21111 )	2021-12-15 13:29:45 -08:00

1 2 3 4 5 ...

10768 commits