hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 02:21:39 -05:00

Author	SHA1	Message	Date
Alan Guo	be92dd72d5	[Dashboard] Fix edge cases for log file names in the dashboard log viewer (#27772 )	2022-08-12 09:39:54 -07:00
SangBin Cho	def02bd4c9	Revert Revert "[Observability] Fix --follow lost connection when it is used for > 30 seconds" #26162 (#26163 ) * Revert "Revert "[Observability] Fix --follow lost connection when it is used for > 30 seconds (#26080)" (#26162)" This reverts commit `3017128d5e`.	2022-06-28 16:07:32 -07:00
Stephanie Wang	3017128d5e	Revert "[Observability] Fix --follow lost connection when it is used for > 30 seconds (#26080 )" (#26162 ) This reverts commit `2d58bd5a50`.	2022-06-28 10:04:58 -07:00
SangBin Cho	2d58bd5a50	[Observability] Fix --follow lost connection when it is used for > 30 seconds (#26080 ) ## Why are these changes needed? This PR fixes the issue where --follow lost connection when it is used for > 30 seconds because the gRPC timeout is configured to be 30 seconds, and we don't reset it when --follow is set. This fixes the issue by setting timeout=None when keepalive==True ## Related issue number Closes https://github.com/ray-project/ray/issues/25721	2022-06-28 05:48:25 -07:00
Eric Liang	43aa2299e6	[api] Annotate as public / move ray-core APIs to _private and add enforcement rule (#25695 ) Enable checking of the ray core module, excluding serve, workflows, and tune, in ./ci/lint/check_api_annotations.py. This required moving many files to ray._private and associated fixes.	2022-06-21 15:13:29 -07:00
SangBin Cho	856bea31fb	[State Observability] Ray log CLI / API (#25481 ) This PR implements the basic log APIs. For the better APIs (like higher level APIs like ray logs actors), it will be implemented after the internal API review is done. # If there's only 1 match, print a file content. Otherwise, print all files that match glob. ray logs [glob_filter] --node-id=[head node by default] Args: --tail: Tail the last X lines --follow: Follow the new logs --actor-id: The actor id --pid --node-ip: For worker logs --node-id: The node id of the log --interval: When --follow is specified, logs are printed with this interval. (should we remove it?)	2022-06-13 05:52:57 -07:00
SangBin Cho	00e3fd75f3	[State Observability] Ray log alpha API (#24964 ) This is the PR to implement ray log to the server side. The PR is continued from #24068. The PR supports two endpoints; /api/v0/logs # list logs of the node id filtered by the given glob. /api/v0/logs/{[file \| stream]}?filename&pid&actor_id&task_id&interval&lines # Stream the requested file log. The filename can be inferred by pid/actor_id/task_id Some tests need to be re-written, I will do it soon. As a follow-up after this PR, there will be 2 PRs. PR to add actual CLI PR to remove in-memory cached logs and do on-demand query for actor/worker logs	2022-06-04 05:10:23 -07:00
Kai Yang	4a999777fa	[Core] Allow accepting gRPC HTTP proxy via env variable (#23526 )	2022-05-10 11:30:46 +08:00
jon-chuang	ddcc252b51	[Core] Ray logs API (1/n) (#23435 ) Expose HTTP endpoint to retrieve logs from ray cluster	2022-04-20 23:11:02 -07:00
SangBin Cho	082baa2342	[Test] Fix test_log (#24004 ) The test verifies the first line 43~51 bytes are "dashboard" But due to recent code addition to head.py, the line where logs are written became 2 digits -> 3 digits Previously, 2022-04-18 23:23:56,946 INFO head.py:[less than 100] -- Dashboard head grpc address: 127.0.0.1:57208 Now 2022-04-18 23:23:56,946 INFO head.py:101 -- Dashboard head grpc address: 127.0.0.1:57208 So we should increase the bytes range.	2022-04-19 04:59:30 -07:00
mwtian	d5d2ef4249	[Core] Add a utility to check GCS / Ray cluster health (#23382 ) * Provide a utility to ping a Ray cluster and verify it has the same Ray version. This is useful to check if a Ray cluster is available at a given address, without connecting to the cluster with the more heavyweight ray.init(). This utility is integrated with ray memory to provide a better error message when the Ray cluster is unavailable. There seem to be user demand for exposing this as an API as well. * Improve the error message when the address provided to Ray does not contain port.	2022-04-18 09:58:45 -07:00
Balaji Veeramani	7f1bacc7dc	[CI] Format Python code with Black (#21975 ) See #21316 and #21311 for the motivation behind these changes.	2022-01-29 18:41:57 -08:00
SangBin Cho	e62c0052a0	[Dashboard] Agent in minimal ray installation (#21817 ) This is the second part of https://docs.google.com/document/d/12qP3x5uaqZSKS-A_kK0ylPOp0E02_l-deAbmm8YtdFw/edit#. After this PR, dashboard agents will fully work with minimal ray installation. Note that this PR requires to introduce "aioredis", "frozenlist", and "aiosignal" to the minimal installation. These dependencies are very small (or will be removed soon), and including them to minimal makes thing very easy. Please see the below for the reasoning.	2022-01-26 04:03:54 -08:00
SangBin Cho	1ae14ec513	[Dashboard] Make dashboard / agent work in minimal ray installation 1/3. (#21774 ) This is the doc that explains how to achieve this: https://docs.google.com/document/d/12qP3x5uaqZSKS-A_kK0ylPOp0E02_l-deAbmm8YtdFw/edit?usp=sharing The fully working e2e prototype is here (it passes all tests): `cdad913883` This PR is pure refactoring. Basically it moves some of util functions that require optional_deps to `optional_utils` so that optional deps' util functions are not used in the minimal installation. Look below to see the steps. <img width="693" alt="Screen Shot 2022-01-21 at 4 38 44 AM" src="https://user-images.githubusercontent.com/18510752/150528494-c3cdedf4-3a66-4557-b540-61436b1dbab6.png">	2022-01-23 21:11:32 -08:00
Edward Oakes	7736cdd91d	[dashboard] Rename "new_dashboard" -> "dashboard" (#18214 )	2021-09-15 11:17:15 -05:00
Simon Mo	e61160d514	[Dashboard] Move gcs health check to a separate thread to avoid crashing due to excessive CPU usage. (#18236 )	2021-09-03 14:23:56 -07:00
Clark Zinzow	d958457d07	[Core] Second pass at privatizing APIs. (#17885 ) * gcs_utils * resource_spec * profiling * ray_perf and ray_cluster_perf * test_utils	2021-08-18 20:56:33 -07:00
fyrestone	a6d135a072	[Dashboard] Add GET /log_proxy API (#13165 )	2021-01-08 11:45:07 +08:00
SangBin Cho	753cda2f28	[Dashboard] Delete old dashboard (#12144 ) * Delete old dashboard from repo. * Delete old dashboard from repo. 2	2020-11-25 11:31:02 -08:00
fyrestone	05ad4c7499	[Dashboard] Optimize dashboard datacenter (#11391 ) * Optimize dashboard datacenter * Fix tests * Fix tests * Fix * Fix CI * python/build-wheel-macos.sh Co-authored-by: 刘宝 <po.lb@antfin.com> Co-authored-by: Max Fitton <maxfitton@anyscale.com>	2020-10-27 23:49:31 -07:00
Max Fitton	caf3b04b27	[Dashboard] Turn on new dashboard by default pt 2 (#11510 )	2020-10-23 15:52:14 -05:00
fyrestone	defd41aad7	[Dashboard] http route handler cache (#10921 ) * Add aiohttp_cache to dashboard * Add comments; Refine code * Keep NODE_STATS_UPDATE_INTERVAL_SECONDS 1 second; Change AIOHTTP_CACHE_TTL_SECONDS to 2 seconds * Update merge Co-authored-by: 刘宝 <po.lb@antfin.com>	2020-10-09 22:27:05 -07:00
fyrestone	50784e2496	[Dashboard] Dashboard node grouping (#10528 ) * Add RAY_NODE_ID environment var to agent * Node ralated data use node id as key * ray.init() return node id; Pass test_reporter.py * Fix lint & CI * Fix comments * Minor fixes * Fix CI * Add const to ClientID in AgentManager::Options * Use fstring * Add comments * Fix lint * Add test_multi_nodes_info Co-authored-by: 刘宝 <po.lb@antfin.com>	2020-09-16 10:17:29 -07:00
fyrestone	e9b046306a	[Dashboard] Dashboard basic modules (#10303 ) * Improve reporter module * Add test_node_physical_stats to test_reporter.py * Add test_class_method_route_table to test_dashboard.py * Add stats_collector module for dashboard * Subscribe actor table data * Add log module for dashboard * Only enable test module in some test cases * CI run all dashboard tests * Reduce test timeout to 10s * Use fstring * Remove unused code * Remove blank line * Fix dashboard tests * Fix asyncio.create_task not available in py36; Fix lint * Add format_web_url to ray.test_utils * Update dashboard/modules/reporter/reporter_head.py Co-authored-by: Max Fitton <mfitton@berkeley.edu> * Add DictChangeItem type for Dict change * Refine logger.exception * Refine GET /api/launch_profiling * Remove disable_test_module fixture * Fix test_basic may fail Co-authored-by: 刘宝 <po.lb@antfin.com> Co-authored-by: Max Fitton <mfitton@berkeley.edu>	2020-08-29 23:09:34 -07:00

24 commits