hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 18:41:40 -05:00

Author	SHA1	Message	Date
Jiajun Yao	a7b219fea1	[Core] Don't unpickle and run functions exported by other jobs (#19576 )	2021-10-22 17:13:20 -07:00
Gagandeep Singh	358aa57474	Fixed usage of ``cv_.wait_for`` (#19582 ) * Fixed usage of cv.wait_for * Changed method to calculate remaining time out * Modify timeout_ms -> remaining_timeout_ms	2021-10-22 16:23:13 -07:00
Edward Oakes	b4673daac6	[ray client] Add test that ray.init doesn't require resources to connect (#19635 )	2021-10-22 18:21:53 -05:00
Alex Wu	31d89be926	[Workflow] Basic event support (#19239 ) * basics * . * . * a test * a test * tests * cleanup * concepts page * docs * polish * fix sleep * fix yi things * lint * fix * . * . * . * fix? * . Co-authored-by: Alex Wu <alex@anyscale.com>	2021-10-22 15:27:33 -07:00
Edward Oakes	c9258aff0f	Revert "[ActorGroup] Add `ActorGroup` (#18960 )" (#19655 ) This reverts commit `4f05bac8fb`.	2021-10-22 14:55:17 -07:00
shrekris-anyscale	cfae64ebe8	[multiprocessing] Modify Ray's map_async() to match Multiprocessing's map_async() behavior (#19403 )	2021-10-22 16:31:34 -05:00
Gagandeep Singh	2f8da8f8c8	Bumped timeout due to slow test times in Windows (#19595 )	2021-10-22 13:48:15 -07:00
Jiao	f0be4cb390	[jobs] Add job manager class for simple jobs python APIs (#19567 )	2021-10-22 14:18:11 -05:00
Jiajun Yao	43b8f8e522	Revert "Revert "[Test] Fix flaky test_gpu test (#19524 )" (#19562 )" (#19643 ) This reverts commit `7daf28f348`.	2021-10-22 11:48:57 -07:00
Yi Cheng	48fb86a978	[core] Fix the spilling back failure in case of node missing (#19564 ) ## Why are these changes needed? When ray spill back, it'll check whether the node exists or not through gcs, so there is a race condition and sometimes raylet crashes due to this. This PR filter out the node that's not available when select the node. ## Related issue number #19438	2021-10-22 11:22:07 -07:00
mwtian	530f2d7c5e	[Pubsub] Wrap Redis-based publisher in GCS to allow incrementally switching to the GCS-based publisher (#19600 ) ## Why are these changes needed? The most significant change of the PR is the `GcsPublisher` wrapper added to `src/ray/gcs/pubsub/gcs_pub_sub.h`. It forwards publishing to the underlying `GcsPubSub` (Redis-based) or `pubsub::Publisher` (GCS-based) depending on the migration status, so it allows incremental migration by channel. - Since it was decided that we want to use typed ID and messages for GCS-based publishing, each member function of `GcsPublisher` accepts a typed message. Most of the modified files are from migrating publishing logic in GCS to use `GcsPublisher` instead of `GcsPubSub`. Later on, `GcsPublisher` member functions will be migrated to use GCS-based publishing. This change should make no functionality difference. If this looks ok, a similar change would be made for subscribers in GCS client. ## Related issue number	2021-10-22 10:52:36 -07:00
Edward Oakes	0760fe869d	[runtime_env] Clean up working dir tests, add more test cases (#19597 )	2021-10-22 12:35:27 -05:00
Amog Kamsetty	4f05bac8fb	[ActorGroup] Add `ActorGroup` (#18960 ) * move * fix * Revert "fix" This reverts commit 532660fc334ae96a0ff34c8ab1288488312300a3. * Revert "move" This reverts commit 54321f4a539c2ee873f17d988da5627588aeff97. * add * wip * wip * wip * wip * address comments * wip * add to build * fix * fix * fix	2021-10-22 10:22:31 -07:00
Simon Mo	1eb142b57c	[Serve] Fix shutdown protocol again (#19609 )	2021-10-22 09:27:32 -07:00
Jiajun Yao	256bf0bf3a	[Release] Bump up dask to latest compatible version 2021.9.1 (#19592 ) * Bump up dask to latest compatible version 2021.9.1 * Bump up dask to latest compatible version 2021.9.1	2021-10-22 09:16:28 -07:00
architkulkarni	030acf3857	[Serve] [Serve Autoscaler] Add upscale and downscale delay (#19290 )	2021-10-22 10:33:28 -05:00
xwjiang2010	a632cb439f	[Tune] Remove queue_trials. (#19472 )	2021-10-22 09:24:54 +01:00
Qing Wang	580b58a68f	[Java] Update CodeOwners for Java worker. (#19594 ) Since some pom.xml files were removed before, let me update the CodeOwners about that.	2021-10-22 16:17:05 +08:00
Stephanie Wang	499d6e9fc1	Turn on reconstruction tests in CI (#19497 )	2021-10-21 22:34:44 -07:00
Eric Liang	50e305e799	[data] Add take_all() and raise error if to_pandas() drops records (#19619 )	2021-10-21 22:23:50 -07:00
Yi Cheng	59b2f1f3f2	[gcs] Update select nodes to save cpu utilization (#19608 ) ## Why are these changes needed? Recently we found that gcs is using a lot of CPU in scheduling actors and it's because the code is not well organized. This PR improved the SelectNodes function. From profiling, for many nodes actor test, 50% of CPU is wasted and could be saved here. ## Related issue number	2021-10-21 22:15:17 -07:00
SangBin Cho	9a050c666d	[Test] Add a stronger resource leak check to pg unit tests. (#19586 ) * Add a stronger check to unit tests. * .	2021-10-21 21:40:00 -07:00
Edward Oakes	11b6019fb5	[ray client] Fix connecting to a cluster without available CPUs (#19604 )	2021-10-21 21:21:50 -05:00
Jiajun Yao	920384f34e	[Doc] Fix Dataset __annotations__ (#19599 )	2021-10-21 17:33:55 -07:00
SangBin Cho	cea7fda41a	Revert "Revert "[Dashboard] Disable unnecessary event messages. (#19490 )" (#19574 )" (#19577 ) This reverts commit `699c5aeac6`.	2021-10-21 15:36:22 -07:00
SangBin Cho	19e3280824	[Core] Fix shutdown Core worker crash when pg is removed. (#19549 ) * fix core worker crash * remove file * done	2021-10-21 14:30:54 -07:00
Simon Mo	30d9f8fbae	[Doc] [Serve] Fix code cutoff and broken linkes in deployment.rst (#19573 )	2021-10-21 13:47:55 -07:00
Simon Mo	03805d4064	[Serve] Good error message when Serve not installed and ensure Serve installs ray[default] (#19570 )	2021-10-21 13:47:29 -07:00
Simon Mo	32e648e5fa	[Serve][Doc] Add Failure Recovery Doc (#19166 )	2021-10-21 13:32:42 -07:00
xwjiang2010	3e31526445	[tune] Print warning msg when TrialExecutor is directly inherited. (#17654 )	2021-10-21 21:25:38 +01:00
Ameer Haj Ali	923adb6512	Update docs to make sure user does ssh port forwarding from another terminal (#19367 )	2021-10-21 13:17:08 -07:00
Simon Mo	03406706b3	[Serve] [Doc] Add Autoscaling Documentation (#19559 )	2021-10-21 13:11:29 -07:00
Ian Rodney	0cdf4ae8d0	[AWS] Stop Round Robining AZs (#19051 ) * round robin on failure to launch * still round-robin spot instances * prioritize first AZ * no more round-robining * doc updates * Order subnets by AZ * add spot instance advisor link * ensure we try all AZs * fix typos	2021-10-21 12:06:44 -07:00
Kai Fricke	7d8ea5e724	[tune] Remove magic results (e.g. config) before calculating trial result metrics (#19583 )	2021-10-21 19:36:14 +01:00
Kai Fricke	15cdffe0ff	[tune] Only try to sync driver if sync_to_driver is actually enabled (#19589 )	2021-10-21 19:35:35 +01:00
Eric Liang	eb24b08ced	Relax the check on object size changing	2021-10-21 11:05:54 -07:00
Oscar Knagg	15ca575078	Account for Windows return characters (#19590 )	2021-10-21 10:05:20 -07:00
SangBin Cho	7cfd170d01	Temporarily disable event framework for 1.8 #19587 Although event framework seems to work, it has an issue that it prints ERROR level severity events to the stderr, which eventually is streamed to the driver. Before we add this to the prod, we should fix this issue. To have enough time to fix it, we will turn off the feature temporarily.	2021-10-21 09:51:02 -07:00
Travis Addair	c6e2161dbc	[Train] Fixed HorovodBackend to automatically detect network interfaces (#19533 ) * Moved Horovod into package * Move in Ludwig fix * Undo git mv * Cleanup * Cleanup * flake8 * Update python/ray/train/backends/horovod.py Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> * Whitespace Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>	2021-10-21 09:13:11 -07:00
Amog Kamsetty	f1f334c348	[Train] Backwards Compatibility for `TrainingCallback` (#19566 )	2021-10-21 09:11:34 -07:00
SangBin Cho	9000f41aa6	[Nightly Test] Support memory profiling on Ray + implement memory monitor for nightly tests (#19539 ) * random fixes * Done * done * update the doc * doc lint fix * . * .	2021-10-21 07:37:05 -07:00
matthewdeng	b3b739266e	[docs] add dask compatibility for 1.8.0 (#19578 )	2021-10-21 07:26:07 -07:00
Tobias Kaymak	0e50701bbe	[Serve] Typo in kv_store.py (#19454 ) Fixing typo in init of RayS3KVStore class	2021-10-21 07:24:34 -07:00
Yi Cheng	7a7b356899	[Nightly test] add test for grpc broadcasting (#19579 )	2021-10-21 07:01:41 -07:00
Qing Wang	048e7f7d5d	[Core] Port concurrency groups with asyncio (#18567 ) ## Why are these changes needed? This PR aims to port concurrency groups functionality with asyncio for Python. ### API ```python @ray.remote(concurrency_groups={"io": 2, "compute": 4}) class AsyncActor: def __init__(self): pass @ray.method(concurrency_group="io") async def f1(self): pass @ray.method(concurrency_group="io") def f2(self): pass @ray.method(concurrency_group="compute") def f3(self): pass @ray.method(concurrency_group="compute") def f4(self): pass def f5(self): pass ``` The annotation above the actor class `AsyncActor` defines this actor will have 2 concurrency groups and defines their max concurrencies, and it has a default concurrency group. Every concurrency group has an async eventloop and a pythread to execute the methods which is defined on them. Method `f1` will be invoked in the `io` concurrency group. `f2` in `io`, `f3` in `compute` and etc. TO BE NOTICED, `f5` and `__init__` will be invoked in the default concurrency. The following method `f2` will be invoked in the concurrency group `compute` since the dynamic specifying has a higher priority. ```python a.f2.options(concurrency_group="compute").remote() ``` ### Implementation The straightforward implementation details are: - Before we only have 1 eventloop binding 1 pythread for an asyncio actor. Now we create 1 eventloop binding 1 pythread for every concurrency group of the asyncio actor. - Before we have 1 fiber state for every caller in the asyncio actor. Now we create a FiberStateManager for every caller in the asyncio actor. And the FiberStateManager manages the fiber states for concurrency groups. ## Related issue number #16047	2021-10-21 21:46:56 +08:00
Antoni Baum	a04b02e2e8	[tune] Better bad Stopper type message (#19496 )	2021-10-21 14:31:27 +01:00
Kai Fricke	44fb7d09df	[tune] sync_client: Fix delete template formatting (#19553 )	2021-10-21 10:59:54 +01:00
Patrick Ames	20d47873c9	[data] Add pickle support for PyArrow CSV WriteOptions (#19378 )	2021-10-21 00:46:52 -07:00
Matti Picus	bacd5f92e2	MAINT: cleanups for windows (#19430 ) * dead processes should increment total_stopped * use psutil in testing to check pid * remove unneeded repititions	2021-10-20 23:32:35 -07:00
Yi Cheng	cba8480616	[dashboard] Fix the wrong metrics for grpc query execution time in server side (#19500 ) ## Why are these changes needed? It looks like the metrics set on server side are wrong. The time the query is constructed sometimes is not the time we get the query. This PR fixed this. ## Related issue number	2021-10-20 23:06:35 -07:00

... 2 3 4 5 6 ...

10138 commits