hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Tobias Kaymak	0e50701bbe	[Serve] Typo in kv_store.py (#19454 ) Fixing typo in init of RayS3KVStore class	2021-10-21 07:24:34 -07:00
Qing Wang	048e7f7d5d	[Core] Port concurrency groups with asyncio (#18567 ) ## Why are these changes needed? This PR aims to port concurrency groups functionality with asyncio for Python. ### API ```python @ray.remote(concurrency_groups={"io": 2, "compute": 4}) class AsyncActor: def __init__(self): pass @ray.method(concurrency_group="io") async def f1(self): pass @ray.method(concurrency_group="io") def f2(self): pass @ray.method(concurrency_group="compute") def f3(self): pass @ray.method(concurrency_group="compute") def f4(self): pass def f5(self): pass ``` The annotation above the actor class `AsyncActor` defines this actor will have 2 concurrency groups and defines their max concurrencies, and it has a default concurrency group. Every concurrency group has an async eventloop and a pythread to execute the methods which is defined on them. Method `f1` will be invoked in the `io` concurrency group. `f2` in `io`, `f3` in `compute` and etc. TO BE NOTICED, `f5` and `__init__` will be invoked in the default concurrency. The following method `f2` will be invoked in the concurrency group `compute` since the dynamic specifying has a higher priority. ```python a.f2.options(concurrency_group="compute").remote() ``` ### Implementation The straightforward implementation details are: - Before we only have 1 eventloop binding 1 pythread for an asyncio actor. Now we create 1 eventloop binding 1 pythread for every concurrency group of the asyncio actor. - Before we have 1 fiber state for every caller in the asyncio actor. Now we create a FiberStateManager for every caller in the asyncio actor. And the FiberStateManager manages the fiber states for concurrency groups. ## Related issue number #16047	2021-10-21 21:46:56 +08:00
Antoni Baum	a04b02e2e8	[tune] Better bad Stopper type message (#19496 )	2021-10-21 14:31:27 +01:00
Kai Fricke	44fb7d09df	[tune] sync_client: Fix delete template formatting (#19553 )	2021-10-21 10:59:54 +01:00
Patrick Ames	20d47873c9	[data] Add pickle support for PyArrow CSV WriteOptions (#19378 )	2021-10-21 00:46:52 -07:00
Matti Picus	bacd5f92e2	MAINT: cleanups for windows (#19430 ) * dead processes should increment total_stopped * use psutil in testing to check pid * remove unneeded repititions	2021-10-20 23:32:35 -07:00
Oscar Knagg	5a05e89267	[Core] Add TLS/SSL support to gRPC channels (#18631 )	2021-10-20 22:39:11 -07:00
heng2j	6d23fb1ff1	[Tune] Support custom tags in MLflow logger callback (#19532 ) * Added Food Collector support to rllib/env/unity3d_env.py * feat(mlflow): added parameter tags to MLflowLoggerCallback * fix(unit_test): added tags tests in test_integration_mlflow.MLflowTest() * chore: lint the changes in this PR * update * Update python/ray/tune/integration/mlflow.py * fix * copy * fix Co-authored-by: zla0368 <zhongheng.li@stresearch.com> Co-authored-by: Li, Zhongheng <zhongheng.li@str.us> Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com> Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>	2021-10-20 22:31:33 -07:00
SangBin Cho	085162c68e	[Log] Print actor & task name upon stderr messages. (#19542 )	2021-10-20 22:16:32 -07:00
Eric Liang	699c5aeac6	Revert "[Dashboard] Disable unnecessary event messages. (#19490 )" (#19574 ) This reverts commit `7fb681a35d`.	2021-10-20 20:17:57 -07:00
Eric Liang	48ecb1f88a	[data] Fix O(n^2) issues in simple_block sort (#19543 )	2021-10-20 18:26:20 -07:00
SangBin Cho	7fb681a35d	[Dashboard] Disable unnecessary event messages. (#19490 ) * Disable unnecessary event messages. * use warning * Fix tests	2021-10-20 17:40:25 -07:00
Edward Oakes	bcf584294f	[runtime_env] Refactor working dir packaging code into runtime_env.packaging module (#19112 )	2021-10-20 18:38:50 -05:00
Eric Liang	7daf28f348	Revert "[Test] Fix flaky test_gpu test (#19524 )" (#19562 ) This reverts commit `39e54cd276`.	2021-10-20 12:21:19 -07:00
Clark Zinzow	88c5fcde8c	[Datasets] Unrevert Arrow table copy method change. (#19534 )	2021-10-20 11:57:36 -07:00
Jiao	c51f79bca6	[runtime_env] Support remote s3 package in runtime env (#19315 )	2021-10-20 10:41:54 -05:00
Jiajun Yao	39e54cd276	[Test] Fix flaky test_gpu test (#19524 )	2021-10-19 22:36:34 -07:00
Simon Mo	59eef6521b	[Serve] Use regular dict for handle caching (#19162 )	2021-10-19 21:27:01 -07:00
Jiajun Yao	4fc5b11c68	Simple block dataset groupBy (#19435 )	2021-10-19 19:53:13 -07:00
Eric Liang	eacfbf8be2	[data] Don't shuffle during repartition by default (#19379 )	2021-10-19 19:46:22 -07:00
SangBin Cho	3222d39fb8	[Dashboard] Dashboard memory improvement (#19385 ) * many ppo profiling * completed * improve memory usage lint * revert temporarily * Addressed code review * Fix a test	2021-10-19 19:34:42 -07:00
Simon Mo	48cf366dca	[Hotfix] Pin node version to 14 (#19522 )	2021-10-19 14:13:06 -07:00
matthewdeng	19eabd7a55	[train] remove default num_workers (#19518 ) * [train] remove default num_workers * fix tests	2021-10-19 13:53:23 -07:00
matthewdeng	56e46c3c23	[train] add callbacks package compatibility (#19519 )	2021-10-19 12:56:49 -07:00
Edward Oakes	4645893a5f	Add prototype of ray.serve.pipeline (#19278 )	2021-10-19 11:36:49 -07:00
xwjiang2010	a6f9c93db0	Revert "[Datasets] Add support for slicing Arrow blocks that contain tensor columns. (#19494 )" (#19517 ) This reverts commit `ad03917b8f`.	2021-10-19 11:35:04 -07:00
Tao He	1dde588702	[Dataset] Support dataset from a single dataframe/table. (#18205 )	2021-10-19 10:27:43 -07:00
Alex Wu	a819e417ac	Revert "[Hotfix] Revert "[Workflow] workflow.delete"" (#19248 ) * Revert "Revert "[Workflow] workflow.delete (#19178)" (#19247)" This reverts commit `b59317520d`. * fix * . * . * . * Revert "." This reverts commit 423b9b8e7e83f07cb0942b04e568e37ea0c62ba8. * . * . * done? * 4real Co-authored-by: Alex <alex@anyscale.com>	2021-10-19 09:47:56 -07:00
Gagandeep Singh	cc00ab74da	[Windows] Fix test_fair_queuing and test_wait_timing (#19456 ) * modified timeout in test_fair_qeueing * bump bounds to pass the tests	2021-10-19 09:27:04 -07:00
architkulkarni	b8941338d3	[runtime env] Raise error when creating runtime env when ray[default] is not installed (#19491 )	2021-10-19 09:16:04 -05:00
matthewdeng	4674c78050	[Train] Rename Ray SGD v2 to Ray Train (#19436 )	2021-10-18 22:27:46 -07:00
Guyang Song	46b4c7464d	runtime env eager install by default (#19449 )	2021-10-19 11:31:14 +08:00
Clark Zinzow	ad03917b8f	[Datasets] Add support for slicing Arrow blocks that contain tensor columns. (#19494 )	2021-10-18 20:07:06 -07:00
Simon Mo	6f2eb1f9fa	[Serve] Use ray core metrics for autoscaling (#19038 )	2021-10-18 19:32:49 -07:00
Gagandeep Singh	0b82135d2d	Use 127.0.0.1 in win32 as node ip addr (#19362 )	2021-10-18 15:51:15 -07:00
Ian Rodney	74db390d15	[Docker] Fix Rsync (#19020 ) * rsync down * Rsync up, but not delete * test fixes * Explicit rsync -e * Better copy check * quick comment * Additional fix to rsync_up	2021-10-18 14:35:22 -07:00
Kai Fricke	6798bdbb5d	Revert "Revert "[RLlib](deps): Bump tensorflow from 2.5.0 to 2.6.0 in /python/requirements/rllib"" (#19352 ) This reverts commit `bde9e058da`.	2021-10-18 22:29:16 +01:00
Eric Liang	1bb2b1fc49	[hotfix] Pin pyspark dep to 3.1.2	2021-10-18 13:10:06 -07:00
mwtian	9742abb749	[Debugging] Print Python stack trace in addition to C++ stack trace, when Python worker crashes (#19423 ) Why are these changes needed? Right now the failure signal handler registered in Python worker is skipped on crashes like segfault, because C++ core worker overrides the failure signal handler here and does not call the previously registered handler. This prevents Python stack trace from being printed on crashes. The fix is to make the C++ fault signal handler to call the previous signal handler registered in Python. For example with the script below which segfaults, import ray ray.init() @ray.remote def f(): import ctypes; ctypes.string_at(0) ray.get(f.remote()) Ray currently only prints the following stack trace: (pid=26693) * SIGSEGV received at time=1634418743 * (pid=26693) PC: @ 0x7fff203d9552 (unknown) _platform_strlen (pid=26693) [2021-10-16 14:12:23,331 E 26693 12194577] logging.cc:313: * SIGSEGV received at time=1634418743 * (pid=26693) [2021-10-16 14:12:23,331 E 26693 12194577] logging.cc:313: PC: @ 0x7fff203d9552 (unknown) _platform_strlen With this change, Python stack trace will be printed in addition to the stack trace above: (pid=26693) Fatal Python error: Segmentation fault (pid=26693) (pid=26693) Stack (most recent call first): (pid=26693) File "/Users/mwtian/opt/anaconda3/envs/ray/lib/python3.7/ctypes/__init__.py", line 505 in string_at (pid=26693) File "stack.py", line 7 in f (pid=26693) File "/Users/mwtian/work/ray-project/ray/python/ray/worker.py", line 425 in main_loop (pid=26693) File "/Users/mwtian/work/ray-project/ray/python/ray/workers/default_worker.py", line 212 in <module> This should make debugging crashes in Python worker easier, for users and Ray devs. Also, try to initialize symbolizer in GCS, Raylet and core worker. This is a no-op on MacOS and some Linux environments (e.g. Ray on Ubuntu 20.04 already produces symbolized stack traces), but should make Ray more likely to have symbolized stack traces on other platforms.	2021-10-18 09:05:08 -07:00
Guyang Song	c04fb62f1d	[C++ worker] set native library path for shared library search (#19376 )	2021-10-18 16:03:49 +08:00
Hao Zhang	c96c2e9b5f	[Collective] Enhance the collective group GC a bit (#19402 )	2021-10-15 18:47:54 -07:00
Chen Shen	a9c34d55e3	Throw if infinite (#19418 )	2021-10-15 18:01:53 -07:00
Gagandeep Singh	d226cbf21a	Added StartupToken to idenitfy a process at startup (#19014 ) * Added StartupToken to idenitfy a process at startup * Applied linting formats * Addressed reviews * Fixing worker_pool_test * Fixed worker_pool_test * Applied linting formatting * Added documentation for StartupToken * Fixed linting * Reordered initialisation of WorkerPool members * Fixed Python docs * Fixing bugs in cluster_mode_test * Fixing Java tests * Create and set shim process after verifying startup_token * shim_process.GetId() -> worker_shim_pid * Improvements in startup token and modifying java files * update io_ray_runtime_RayNativeRuntime.h * Fixed java tests by adding startup-token to conf * Applied linting * Increased arg count for startup_token * Attempt to fix streaming tests * Type correction * applied linting * Corrected index of startup token arg * Modified, mock_worker.cc to accept startup tokens * Applied linting * Applied linting changes from CI * Removed override from worker.h * Applied linting from scripts/format.sh * Addressed reviews and applied scripts/format.sh * Applied linting script from ci/travis * Removed unrequired methods from public scope * Applied linting	2021-10-15 15:13:13 -07:00
Chen Shen	acfbf4c170	Fix from Dask bug in Datasets (#19409 )	2021-10-15 15:04:52 -07:00
Gagandeep Singh	07064cddf9	Re-enabling tests from test_basic (#19384 ) Why are these changes needed? Related issue number ##19177 Quoting #19177 (comment) here, The following tests fail when not skipped, =================================== short test summary info ==================================== FAILED python\ray\tests\test_basic.py::test_user_setup_function - subprocess.CalledProcessErro... FAILED python\ray\tests\test_basic.py::test_disable_cuda_devices - subprocess.CalledProcessErr... FAILED python\ray\tests\test_basic.py::test_wait_timing - assert (1634209333.6099107 - 1634209... Results (395.22s): 36 passed 3 failed - ray\tests/test_basic.py:197 test_user_setup_function - ray\tests/test_basic.py:220 test_disable_cuda_devices - ray\tests/test_basic.py:265 test_wait_timing =================================== short test summary info ==================================== FAILED python\ray\tests\test_basic_3.py::test_fair_queueing - AssertionError: 23 Results (198.33s): 1 failed - ray\tests/test_basic_3.py:169 test_fair_queueing The following test passed when not skipped. Opening a PR to verify that. def test_oversized_function(ray_start_shared_local_modes)	2021-10-15 14:02:57 -07:00
Kai Fricke	bb38c5cb1f	[tune] Fix result buffering case check (fixes bug introduced in #19140 ) (#19399 )	2021-10-15 10:43:34 +01:00
Siyuan (Ryans) Zhuang	0d4b0ded27	[Serialization] Update cloudpickle to v2.0.0 (#19383 ) * update cloudpickle to v2.0.0	2021-10-15 02:37:29 -07:00
Hao Zhang	4b92f34ada	[Collective] Remove an unnecessary cuda.stream.synchornize (#19400 )	2021-10-14 21:33:59 -07:00
Matti Picus	f372bb07aa	Enable dashboard on Windows (#19319 )	2021-10-14 14:42:22 -07:00
architkulkarni	b3ccec5d76	[runtime_env] Fix bug when all working_dir contents are excluded with Ray Client (#19377 )	2021-10-14 11:20:45 -07:00

1 2 3 4 5 ...

5333 commits