hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
mwtian	b528bf9202	Revert "[e2e] Remove unnecessary logic around copying results (#22034 )" (#22088 ) This reverts commit `92d7e9bf98`.	2022-02-03 13:42:40 -08:00
mwtian	92d7e9bf98	[e2e] Remove unnecessary logic around copying results (#22034 ) After #21905, some of the logic around handling result artifacts become unnecessary or incorrect (in generating error logs). They are removed.	2022-02-03 12:15:06 -08:00
mwtian	9ac3f6879d	[Release 1.11.0][Core] avoid unnecessary work during event stats collection (#22054 ) This PR avoids some unnecessary copying and branching when recording event stats. It improves / recovers ~10% of `single_client_get_calls_Plasma_Store` performance. On AWS EC2 `m5.8xlarge`, - `single_client_get_calls_Plasma_Store` current: ~5200/s - `single_client_get_calls_Plasma_Store` with PR: ~5800/s When `RAY_event_stats=0`, `single_client_get_calls_Plasma_Store` can reach ~6800/s. If we want to optimize further, we can record data in opencensus only in intervals, or when the data are exported.	2022-02-03 12:01:42 -08:00
Jiajun Yao	44db41c0fb	No spreading if a node is selected for lease request due to locality (#22015 ) 1. If the node is selected based on locality, we always run the task on the node selected by locality if the node is available. 2. For spread scheduling strategy, we always select the local node as the first raylet to request lease, no locality involved.	2022-02-03 12:00:54 -08:00
Kai Fricke	bbc64eba32	[tune/wandb] Fix WandbTrainableMixin config for rllib trainables (#22063 ) The WandbTrainableMixin doesn't work with RLLib trainables as they won't recognize the wandb parameter. Thus we should pop the wandb config before we initialize the rest of the trainable.	2022-02-03 17:20:27 +01:00
Sven Mika	3f03ef8ba8	[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learning via league-based self-play. (#21356 )	2022-02-03 09:32:09 +01:00
Max Pumperla	092598774a	[Docs] Executable notebook tutorial (#22030 ) We're introducing the usage of [MyST Notebooks](https://myst-nb.readthedocs.io/en/latest/index.html) here and demonstrate how it works by rewriting (and extending) the RLLib Serve tutorial. Benefits: - [x] Write notebooks in markdown. Can be converted into other formats e.g. with `jupytext` - [x] Tutorials like this have a binderhub link added to the top nav (launch button). - [x] Notebooks get executed when docs are built, so it's impossible to have stale docs. - [x] But locally those builds are cached so that you don't have to wait too long. - [x] The notebook cell outputs can be shown, hidden or removed. In particular, we can now avoid adding expected code output as comments in our scripts (which might get outdated). We're also clarifying #22022. Old tutorial: [here](https://docs.ray.io/en/latest/serve/tutorials/rllib.html) New tutorial (preview): [here](https://ray--22030.org.readthedocs.build/en/22030/serve/tutorials/rllib.html) Co-authored-by: simon-mo <simon.mo@hey.com>	2022-02-03 08:13:04 +00:00
Eric Liang	c8f93dfdec	[data] Misc: add_column() and allow specifying decoding error handling in from_text() (#21967 ) This adds some utility functions to make it easier to manipulate structured data in Datasets. While in principle you can already do this with map_batches, this makes it a little easier to test things out for development.	2022-02-02 20:47:17 -08:00
birgerbr	826a8bb06c	[core] Use FileLock on ports_by_node.json (#18909 ) The new code uses a file-lock before reading and writing to `ports_by_node.json`. Without it, multiple nodes may write to ports_by_node.json at the same time.	2022-02-02 14:57:19 -06:00
SangBin Cho	3c056a6b92	Revert "[Nightly Test] Add more metadata to test result (#21990 )" (#22052 ) This reverts commit `fd20cf3239`.	2022-02-02 12:56:42 -08:00
Archit Kulkarni	78f882dbbc	[runtime env] Local uri caching for working_dir, py_modules and conda (#20273 ) Previously, local files corresponding to runtime env URIs were eagerly garbage collected as soon as there were no more references to them. In this PR, we store this data in a cache instead, so when the reference count for a URI drops to zero, instead of deleting it we simple mark it as unused in the cache. When the cache exceeds its size limit (default 10 GB) it will delete unused URIs until the cache is back under the size limit or there are no more unused URIs. Design doc: https://docs.google.com/document/d/1x1JAHg7c0ewcOYwhhclbuW0B0UC7l92WFkF4Su0T-dk/edit - Adds unit tests for caching and integration tests for working_dir caching	2022-02-02 14:53:03 -06:00
Edward Oakes	e85bbfb338	[jobs] Enable default port in `http://` addresses (#22014 ) Closes https://github.com/ray-project/ray/issues/22012	2022-02-02 14:34:34 -06:00
Edward Oakes	8bbc5b936a	[jobs] Use `subprocess.list2cmdline` to properly handle quotes in CLI entrypoints (#22011 )	2022-02-02 14:33:57 -06:00
Chris K. W	c95abe75a9	[client] Consistent ray.init return value (#21355 ) Proposal document: https://docs.google.com/document/d/1ln7_fUST18GOz4jJnI_zN00hfczXY48V5Ajy6fCmJCE/edit# This PR changes the return value of ray.init when not in client mode to be a RayContext, which acts as a context manager and the same public fields as ClientContext , as well a disconnect method (calls shutdown under the hood). To prevent breaking scripts that rely on accessing through dict methods, RayContext also subclasses collections.abc.Mapping (can be treated as an immutable dict). This behavior will be removed in 2.0, so deprecation warnings are raised when __getitem__ is used. To make migration simple, an additional dict field address_info is added with the same values as the original return value.	2022-02-02 19:39:03 +02:00
Rodrigo de Lazcano	a258f9c692	[RLlib] Neural-MMO `keep_per_episode_custom_metrics` patch (toward making Neuro-MMO RLlib's default massive-multi-agent learning test environment). (#22042 )	2022-02-02 17:28:42 +01:00
SangBin Cho	9531887590	[Placement Group] Fix infeasible placement group not scheduled after node is added (#21993 ) It looks like existing infeasible placement group in placement group manager didn't work properly. Idk how we added this feature when we cannot pass this simple test case. But this is what has happend; (1) PG is not scheduleable because it is infeasible (2) New node is added (3) After a new node is added, placement group manager tries rescheduling all infeasible pgs. (4) Here, when we add a new node, we didn't report resources (this seems to be very weird. We are reporting resource using a separate RPC here). So when (3) happens, pg was still unschedulable. This PR fixes the issue by adding the resource information when the new node is added. Note that in the long term, we'd like to have a separate resource path from (4). This won't be addressed in this PR.	2022-02-02 06:44:42 -08:00
Jun Gong	9c95b9a5fa	[RLlib] Add an env wrapper so RecSim works with our Bandits agent. (#22028 )	2022-02-02 12:15:38 +01:00
Jun Gong	87fe033f7b	[RLlib] Request CPU resources in `Trainer.default_resource_request()` if using dataset input. (#21948 )	2022-02-02 10:20:37 +01:00
Jun Gong	a55258eb9c	[RLlib] Move bandit example scripts into examples folder. (#21949 )	2022-02-02 09:20:47 +01:00
Eric Liang	3d449d4f71	[docs] Clean up long titles in TOC (#22016 )	2022-02-01 22:56:49 -08:00
Yi Cheng	588d540b68	[client] Fix ray client object ref releasing in wrong context. (#22025 )	2022-02-01 22:42:39 -08:00
Eric Liang	54fe2f80bb	[data] Always convert arrow batches to pandas batches when user specifies batch_format="native" (#21566 ) With the addition of https://github.com/ray-project/ray/pull/20988, the native format becomes ambiguous. This PR proposes to auto-promote arrow to pandas blocks when the user specifies "native" format, to avoid uncertainty.	2022-02-01 21:26:37 -08:00
Eric Liang	cc74037b2e	Report only memory usage of pinned object copies to improve scaledown (#22020 ) Report only memory used by primary copies of objects, since secondary copies are not evicted even if not needed on a node. This prevents downscaling until all references to a shared object are removed. Closes https://github.com/ray-project/ray/issues/21870	2022-02-01 21:14:28 -08:00
Zyiqin-Miranda	8237c6228f	[autoscaler] Add AWS Autoscaler CloudWatch Alarm support (#21523 ) These changes add a set of improvements to enable automatic creation and update of CloudWatch alarms when provisioning AWS Autoscaling clusters. Successful implementation of these improvements will allow AWS Autoscaler users to: Setup alarms against Ray CloudWatch metrics to get notified about increased load, service outage. Update their CloudWatch alarm JSON configuration files during Ray up execution time. Notes: This PR is a follow-up PR for #20266, which adds CloudWatch alarm support.	2022-02-01 18:09:53 -08:00
Will Frey	429b7b9512	[Serve] Update ray.serve.deployment overloaded signature (#21743 )	2022-02-01 16:20:58 -08:00
shrekris-anyscale	8d43a6bac7	[Serve] [runtime env] Replace os.rename with shutil.move in remove_dir_from_filepaths() (#22018 ) Currently, the `remove_dir_from_filepaths()` function uses `os.rename()` when shifting directories and files. This change replaces [`os.rename()`](https://docs.python.org/3/library/os.html#os.rename) with [`shutil.move()`](https://docs.python.org/3/library/shutil.html#shutil.move) to support these operations even when the directory's parent and the temporary directory are located on separate file systems.	2022-02-01 14:33:53 -06:00
SangBin Cho	0d179dabcd	[Test] Fix broken lint (#22026 ) Fix the broken lint in the master. Details: https://buildkite.com/ray-project/ray-builders-branch/builds/5784#3c2cc53e-cf55-46f6-ab2d-d028d88d3d54	2022-02-01 11:03:32 -08:00
Balaji Veeramani	6441335f5e	[Doc] Correct information about code style (#21985 )	2022-02-01 10:37:21 -08:00
Archit Kulkarni	01ee9adbe8	[Serve] [Doc] Improve model composition snippet (#21961 )	2022-02-01 10:28:36 -08:00
Balaji Veeramani	7dcb0b6af6	[Train] Decorate `get_device` with `PublicAPI` (#22024 ) * Decorate `get_device` with `PublicAPI` * Add documentation * Update api.rst	2022-02-01 08:18:47 -08:00
Kai Fricke	b51b5afaea	[ci/gpu] Move ML dependency install to Dockerfile (#21711 ) Instead of installing dependencies in each Buildkite job, let's move this to the Dockerfile instead. This will update GPU tests to always use Python 3.7.	2022-02-01 12:04:55 +00:00
Kai Fricke	e508e9f75a	[tune] Support functools.partial names and treat as function in registry (#21518 ) Currently, tune trainables with functools.partial will raise the following warnings: INFO registry.py:66 -- Detected unknown callable for trainable. Converting to class. WARNING experiment.py:295 -- No name detected on trainable. Using DEFAULT. This PR propagates function names for function wrapped with partial and treat them as regular functions when wrapping.	2022-02-01 12:04:24 +00:00
SangBin Cho	19672688b0	[Test] Change `test_placement_group.py` to large test (#21997 ) We recently added tests to this file, and it seems to occasionally exceed 300 seconds timeout (before adding tests, it took about 260~270 seconds, so it is natural). This promotes this test to be large so that we can avoid this issue. (Lmk if you think it is better sharding test even more.)	2022-01-31 22:37:35 -08:00
SangBin Cho	3566cfd279	[Dashboard] Enable dashboard in the minimal ray installation (#21896 ) This is the last PR to enable dashboard in the minimal ray installation. Look https://docs.google.com/document/d/12qP3x5uaqZSKS-A_kK0ylPOp0E02_l-deAbmm8YtdFw/edit# for more details;	2022-01-31 22:34:40 -08:00
SangBin Cho	fd20cf3239	[Nightly Test] Add more metadata to test result (#21990 ) Add a columns, error code, commit url, stable, session url, and runtime	2022-01-31 22:33:30 -08:00
Simon Mo	e3cf47d731	[Serve] Remove shard_key, http_method, and http_headers (#21590 )	2022-01-31 22:27:12 -08:00
Chen Shen	4b528a7255	[resource-reporting 4/n] Separate cluster resource manager from cluster resource scheduler (#21992 ) As discussed, we need to separate the cluster resource management logic from scheduling logic. In this PR, we create the cluster_resource_manager to handle the resource management; and the cluster resource scheduler is only responsible for scheduling. * more clean up * refactor * address comments	2022-01-31 21:16:58 -08:00
Clark Zinzow	b3fd3c6828	[Datasets] Fix spread resource prefix tasks with no CPU requested. (#22017 ) When applying the `_spread_resouce_prefix` hack, don't make the CPU resource a required resource when `num_cpus=0` is requested.	2022-01-31 18:30:47 -08:00
Clark Zinzow	00e1ac3a3c	[Datasets] Tie `_DesignatedBlockOwner` lifetime to context creator (#22007 ) Instead of using a detached lifetime, tie the lifetime of `_DesignatedBlockOwner` to the lifetime of the context creator. Also, only create a `_DesignatedBlockOwner` if dynamic block splitting is enabled.	2022-01-31 17:06:01 -08:00
SangBin Cho	2db71f72cc	[Doc] Remove the legacy doc (#21996 )	2022-01-31 15:26:19 -08:00
Clark Zinzow	03024b8951	[Datasets] Add `.iter_batches()` test for batch size larger than dataset. (#22000 )	2022-01-31 14:09:48 -08:00
Yi Cheng	0659d4a472	[nightly] Limit many drivers iteration to 4000 iterations (#21958 ) Due to faster running of many drivers, we limit the iteration to 4k for the test.	2022-01-31 13:26:02 -08:00
Kai Yang	2038cc96c6	Revert "Revert "[Dataset] [DataFrame 2/n] Add pandas block format implementation (partial) (#20988 ) (#21661 )" (#21894 ) This PR adds pandas block format support by implementing `PandasRow`, `PandasBlockBuilder`, `PandasBlockAccessor`. Note that `sort_and_partition`, `combine`, `merge_sorted_blocks`, `aggregate_combined_blocks` in `PandasBlockAccessor` redirects to arrow block format implementation for now. They'll be implemented in a later PR.	2022-01-31 12:09:51 -08:00
Eric Liang	45e03bd497	[data] Optimize dataset metadata read/write in Ray client (#21939 )	2022-01-31 01:41:45 -08:00
Eric Liang	b73a007ccd	Flag off RAY_legacy_scheduler_warnings (#21965 )	2022-01-30 17:12:45 -08:00
Eric Liang	fe167c94b1	Deflake occasional deadlock in test_dataset.py::test_basic_actors[True] (#21970 )	2022-01-30 17:11:54 -08:00
Balaji Veeramani	7f1bacc7dc	[CI] Format Python code with Black (#21975 ) See #21316 and #21311 for the motivation behind these changes.	2022-01-29 18:41:57 -08:00
Eric Liang	95877be8ee	[data] Serialize parquet piece metadata in batches to reduce overheads	2022-01-29 14:30:50 -08:00
DK.Pino	91171a194f	[Core] Extract a common method to get predefined resource index #21895	2022-01-29 14:18:09 -08:00
Jiajun Yao	a3ea4343b3	Remove work pipelining (#21964 )	2022-01-29 11:31:45 -08:00

... 3 4 5 6 7 ...

11357 commits