hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Eric Liang	456d73754a	[data] Initial pass at support multiple-block returns for read and transform tasks (#19660 )	2021-10-29 14:21:56 -07:00
Philipp Moritz	0a5942d8b0	[Documentation] Fix quotes for windows installations (#19859 ) * [Documentation] Fix quotes for windows installations * update * formatting	2021-10-29 10:54:38 -07:00
Lixin Wei	56301e34b2	[Refactor] Remove ServiceBased Abstraction (#19694 ) ## Why are these changes needed? Prior to this PR, we have: ```cpp class XxxAccessor {} class ServiceBasedXxxAccessor : public XxxAccessor{} class GcsClient {} class ServiceBasedGcsClient : public GcsClient{} ``` However, XxxAccessor has only one implementation: ServiceBasedXxxAccessor. And GcsClient has only one implementation: ServiceBasedGcsClient. I think this abstraction is not necessary and will make development hard(I have to modify two files every time). This PR removes all ServiceBasedXxx and moves its implementations to the base class. Now we only have: ```cpp class XxxAccessor {} class GcsClient {} ```	2021-10-29 10:16:14 -07:00
Gagandeep Singh	9460a5375b	Added retry logic in test_basic::test_ray_options (#19832 ) * Added retry logic in test_ray_options * Applied linting format * Made test consistent	2021-10-29 10:15:12 -07:00
Edward Oakes	bf23a31017	[job submission] Always generate and return job_id (#19851 )	2021-10-29 09:09:54 -05:00
SangBin Cho	16dcff4091	[Core/RuntimeEnv] Fix runtime environment hanging issues. (#19823 ) * done * Add a right test * Fix unit tests * fix issues	2021-10-29 07:01:56 -07:00
Antoni Baum	f2773267c7	[docs] Tune doc fixes (#19791 )	2021-10-29 11:45:29 +02:00
Sven Mika	902e854af2	[RLlib; Docs overhaul] Docstring cleanup: Environments. (#19784 ) * wip. * Test: Make a change in tune to trigger tune tests, which are not run otherwise, but seem to fail nevertheless with this PR's changes. * remove bare_metal_policy_with_custom_view_reqs from tests	2021-10-29 10:46:52 +02:00
Stephanie Wang	e6d60d7376	[core] Fail objects when pull/reconstruction hangs (#19789 )	2021-10-28 23:34:51 -07:00
Chris K. W	bd4ad84ead	[Client] Add deprecation warnings for direct ray.client().connect() calls (#18783 ) * add deprecation warning * Update wording * add test * actually connect * add env var tests * fix message and test * skip on windows * add _LocalBuilder case, update test_namespace * better variable name	2021-10-28 22:06:11 -07:00
Jiajun Yao	760878f950	Handle empty dataset for sort and groupby (#19849 )	2021-10-28 18:49:33 -07:00
Simon Mo	0433281ec8	[CI] Bump Serve test_regression to medium for windows (#19844 )	2021-10-28 17:49:50 -07:00
Edward Oakes	42ac906313	[job submission] Support passing metadata to the JobConfig (#19845 )	2021-10-28 16:40:03 -05:00
SangBin Cho	9126810c41	[Usabiilty] Improve the serialization failure message (#19691 ) * Done * done * Done * fix test * Adressed code review. * done * done * fix mistake * Skip tests on windows	2021-10-28 14:25:51 -07:00
matthewdeng	bfb0ef1b08	move jsonschema to core dependencies and update default AutoscalerPrometheusMetrics (#19831 )	2021-10-28 13:04:22 -07:00
SangBin Cho	96fc875a89	[Core] Improve scheduling observability and fix wrong resource deadlock report message. (#19746 )	2021-10-28 11:42:21 -07:00
Amog Kamsetty	1803d88943	[Train] Simplify single worker training (#19814 ) * wip * update * fix * fix * fix * fix	2021-10-28 10:54:35 -07:00
shrekris-anyscale	6e6fff8857	[serve] Enable deployment of functions/classes that take no parameters (#19708 )	2021-10-28 12:53:44 -05:00
Jiao	ed0e2e4fd7	[job submission] Add job_config in subprocess driver script (#19765 )	2021-10-28 12:12:51 -05:00
Jiajun Yao	fe8138bfc2	Listen to 127.0.0.1 if node ip is 127.0.0.1 (#19810 )	2021-10-28 08:44:23 -07:00
Eric Liang	f60d312259	Try fixing reference counting issue with manual _owner assignment (#19734 )	2021-10-28 02:26:35 -07:00
Patrick Ames	8a9f664d75	[data] Add support for custom dataset block write path providers. (#19347 ) Co-authored-by: Eric Liang <ekhliang@gmail.com>	2021-10-28 00:12:02 -07:00
Jiajun Yao	7fb65abae1	[data] Fix dataset doc (#19821 )	2021-10-27 22:41:09 -07:00
Jiajun Yao	11751a1d87	Arrow block dataset groupBy (#19673 )	2021-10-27 16:27:11 -07:00
Edward Oakes	b2e12dc43b	[runtime_env] Add basic support for python modules (#19651 )	2021-10-27 17:56:46 -05:00
matthewdeng	aa5499ef0f	[Train] implement CheckpointStrategy (#19111 ) * [SGD] implement CheckpointStrategy * address comments * update docs * Update doc/source/train/user_guide.rst Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com> * best checkpoint Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>	2021-10-27 11:31:04 -07:00
Edward Oakes	1f681981af	[serve] Bump controller max concurrency to 15k, make long poll timeout random (#19790 )	2021-10-27 13:28:16 -05:00
Edward Oakes	acc5702535	[runtime_env] Fix hash length in URI (#19777 )	2021-10-27 12:22:20 -05:00
Simon Mo	6afbd1f558	[Serve] /api/snapshot works with all Serve KVStores (#19772 )	2021-10-26 23:27:38 -07:00
Jiao	3f628d4f6b	increase long poll timeout and wrk trial cpu resource (#19768 )	2021-10-26 21:31:39 -07:00
architkulkarni	6bd49a8cd5	[runtime env] Improve working dir messaging (#18893 )	2021-10-26 20:58:02 -05:00
Jiajun Yao	47744d282c	[data] Fix arrow dataset sort on empty blocks (#19707 )	2021-10-26 15:30:23 -07:00
Eric Liang	2652ae7905	[client] Put of a list should not return a list, this is a client bug (#19737 )	2021-10-26 13:51:37 -07:00
iasoon	b5158ca0ab	[serve] Correctly set num_replicas when deploying autoscaling deployment (#19520 )	2021-10-26 12:10:59 -05:00
SangBin Cho	00ea716ada	Revert "Revert "[Core] [Placement Group] Fix bundle reconstruction when raylet fo after gcs fo (#19452 )" (#19724 )" (#19736 ) This reverts commit `d453afbab8`.	2021-10-26 08:25:09 -07:00
Jiao	aaef82920d	[serve] Add periodic timeouts to long poll client to avoid accumulating concurrent tasks in the controller (#19728 )	2021-10-26 09:44:00 -05:00
Kai Fricke	3081488a99	[tune] Fix local checkpoint deletion for remote trials (#19632 )	2021-10-26 09:18:07 +01:00
Eric Liang	81b0eb297c	Un-revert size estimator and fix Train test (#19719 )	2021-10-25 22:09:24 -07:00
SangBin Cho	d453afbab8	Revert "[Core] [Placement Group] Fix bundle reconstruction when raylet fo after gcs fo (#19452 )" (#19724 ) This reverts commit `e3ced0e59e`.	2021-10-26 09:14:25 +09:00
Simon Mo	5330aab27a	[CI] Deflake test metrics (#19711 )	2021-10-25 16:34:20 -07:00
Eric Liang	66818d11b8	Revert "[data] Add serialized size estimator to block builder (#19681 )" (#19717 ) This reverts commit `8c37311c41`.	2021-10-25 15:06:58 -07:00
Eric Liang	8c37311c41	[data] Add serialized size estimator to block builder (#19681 )	2021-10-25 14:58:49 -07:00
SangBin Cho	ecd5a622ef	[Tests] Add a memory usage on dask on ray tests (#19674 )	2021-10-25 14:58:26 -07:00
SangBin Cho	544f774245	[Autoscaler/Core] Drain node API (#19350 ) * Initial version done. Graceful shutdown is possible with direct raylet RPCs * . * . * ip * Done. * done tests might fail * fix lint + cpp tests * fix 2 * Fix issues. * Addressed code review. * Fix another cpp test failure * completed * Skip windows tests * Update the comment * complete * addressed code review.	2021-10-25 14:57:50 -07:00
Linsong Chu	13d4894789	[workflow] Add get_metadata() for workflow (#19372 ) ## Why are these changes needed? Add the functionality to retrieve metadata for a workflow or workflow step. Design: - Similar to `get_output`, this will either return the metadata for workflow (`workflow.get_metadata(workflow_id)`) or the metadata for a specific step (`workflow.get_metadata(workflow_id, step_id)`) - Exceptions will only be raised if workflow id or step id not exist. Canceled job, running job, etc. will return proper metadata by retrieving information from checkpoint. See [here](`8c8ca609d7/python/ray/workflow/tests/test_metadata_get.py (L67)`) for more details. - Returned metadata is an aggregated result from multiple checkpoint files based on previous [discussion](https://github.com/ray-project/ray/issues/17090#issuecomment-920481789). The aggregation logic is [here for step metadata](`8c8ca609d7/python/ray/workflow/workflow_storage.py (L451)`) and [here for workflow metadata](`8c8ca609d7/python/ray/workflow/workflow_storage.py (L484)`) which can be tuned with further discussion. Example: ```python >>> user_step_metadata = {"k1": "v1"} >>> user_run_metadata = {"k2": "v2"} >>> step_name = "simple_step" >>> workflow_id = "simple" >>> @workflow.step >>> def simple(): >>> return 0 >>> simple.options(name=step_name, metadata=user_step_metadata).step().run(workflow_id, metadata=user_run_metadata) # get workflow-level metadata >>> workflow.get_metadata("simple") {'status': 'SUCCESSFUL', 'user_metadata': {'k2': 'v2'}, 'stats': {'start_time': 1634173413.116535, 'end_time': 1634173413.149051}} # get step-level metadata >>> workflow.get_metadata("simple", "simple_step") {'name': '__main__.simple', 'step_type': 'FUNCTION', 'workflows': [], 'max_retries': 3, 'workflow_refs': [], 'catch_exceptions': False, 'ray_options': {}, 'user_metadata': {'k1': 'v1'}, 'stats': {'start_time': 1634173413.131262, 'end_time': 1634173413.1347651}} ``` ## Related issue number https://github.com/ray-project/ray/issues/17090	2021-10-25 14:52:51 -07:00
Alex Wu	58b28f04cd	[docs/usability] Apple Silicon support (#19705 ) This PR puts the final touches on apple silicon support. There are 3 main caveats to supporting M1 macs right now (described in the docs): Requires using forge. Requires special installation instructions to get grpc working (this is an underlying grpc issue, so ideally it will be fixed upstream). We're only publishing release wheels, not nightlies right now. This also includes a grpc import check to ensure that we provide an actionable error message if the user tries the regular pip install ray process to properly install grpcio.	2021-10-25 14:49:28 -07:00
DK.Pino	e3ced0e59e	[Core] [Placement Group] Fix bundle reconstruction when raylet fo after gcs fo (#19452 ) * fixed * lint * add cxx ut * fix comment * Revert "fix comment" This reverts commit 32ea2558166a7674d7efe2e0c0a66ea7409c7d99. * fix comment	2021-10-25 14:15:36 -07:00
Eric Liang	27a5b546ad	Make ArrowRow less scary (#19686 )	2021-10-25 12:18:42 -07:00
Tao Wang	ff7d35d246	[Core]Add test case for cached named actor (#19510 ) ## Why are these changes needed? Recently we found a bug about named actor cache, only in internal codebase but not community, and the case is not covered by test case so we didn't know before user telling us. This add an extra test to cover it. Bug Detail: we didn't publish actor's name when the actor is dead so the cache keep the name to the old actor handle. The owner of this actor cannot sense this bug because the cache didn't apply to the owner currently.	2021-10-25 11:37:41 -07:00
xwjiang2010	46266b15f0	[tune] Avoid looping through _live_trials twice in _get_next_trial. (#19596 )	2021-10-25 19:26:55 +01:00

... 2 3 4 5 6 ...

5580 commits