hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Sebastián Ramírez	298742d724	♻️ Refactor type annotations for `.remote()` to avoid incorrect autocompletion and checks (#25480 ) With the current type annotations for the `.remote()` method generated in decorated functions, editors understand that there are some keyword arguments `arg0`, `arg1`, etc. Which are incorrect as the actual function will probably have different names for its arguments. For example, this shouldn't autocomplete `arg0`, `arg1`, etc: <img width="407" alt="Screenshot 2022-06-04 at 06 13 46" src="https://user-images.githubusercontent.com/1326112/171996654-12248369-cf10-4fce-9ea2-5deb4ca8e2bd.png"> If anything, it should autocomplete `x` and `y` (although that's currently [not perfectly doable](https://github.com/python/typing/discussions/1163)). By updating the type annotations to use [arguments prefixed with double underscores](https://mypy.readthedocs.io/en/stable/protocols.html?highlight=double%20underscore#callback-protocols) at least it tells tooling to not provide autocompletion for those args (which would be incorrect). While still providing inline errors for invalid types. <img width="880" alt="Screenshot 2022-06-04 at 06 20 26" src="https://user-images.githubusercontent.com/1326112/171996806-560c0fa8-0ee3-477c-9906-71e880c84e56.png">	2022-06-05 16:21:53 -07:00
Eric Liang	48acbf0d69	[hotfix] Revert "[runtime env] runtime env inheritance refactor (#24538 )" (#25487 ) This reverts commit `eb2692c`. This is a temporary mitigation for #25484	2022-06-05 14:55:38 -07:00
Sebastián Ramírez	6e1248fb37	🚚 Move worker types to the module to improve static analysis (#25439 ) Currently, there are separated type annotations in `worker.pyi` that include the types for `func.remote()`, but they don't include types for the other things declared in `worker.py`. Because of that, editors can end up showing support only for the things in the `worker.pyi` file. For example: <img width="349" alt="Screenshot 2022-06-03 at 06 01 36" src="https://user-images.githubusercontent.com/1326112/171841977-ec7a0b9a-b4a5-4422-acd9-b73c1e263261.png"> After this change, the editor and other tools will be able to provide support for other things defined in the same file: <img width="760" alt="Screenshot 2022-06-03 at 06 04 24" src="https://user-images.githubusercontent.com/1326112/171842204-1915dd2a-6cc6-41b7-8785-5124beec37e8.png"> And the typing support for `func.remote()` keeps working as before: <img width="760" alt="Screenshot 2022-06-03 at 06 07 15" src="https://user-images.githubusercontent.com/1326112/171842528-f318753e-9f47-4236-b0a4-d86d00c0bb11.png"> This is the recommended approach by PyRight/Pylance/VS Code. I also recommend it as it's a lot easier to maintain types in the same file while editing than remembering to go to an external independent file to add those types. Also, to have proper support when using an external `.pyi` file all the things declared in `worker.py` would have to be declared in the `worker.pyi` file. Ref: https://github.com/microsoft/pyright/blob/main/docs/typed-libraries.md#inlined-type-annotations-and-type-stubs	2022-06-05 14:01:24 -07:00
Yi Cheng	acf210fcac	[flakey] Skip ray_syncer_test for ubsan. (#25477 ) From the message: ``` [ OK ] SyncerTest.TestMToN (13132 ms) [----------] 5 tests from SyncerTest (43175 ms total) [----------] Global test environment tear-down [==========] 8 tests from 2 test suites ran. (43176 ms total) [ PASSED ] 8 tests. external/com_github_grpc_grpc/src/core/lib/iomgr/ev_posix.cc:314:19: runtime error: member access within null pointer of type 'const struct grpc_event_engine_vtable' ``` This can only be reproduced by running with Bazel test so far. With gdb, it won't be reproduced. It seems like some issue with the grpc maybe the reactor API. Given that the ASAN test, which is supposed to catch the issue, runs well, and a considerable time has been spent investigating this one but no progress, skip this test for now.	2022-06-04 23:06:57 -07:00
matthewdeng	7dafb2e278	[air] remove invalid wandb symlink (#25488 )	2022-06-04 22:17:08 -07:00
Kai Fricke	f4d3daa3cc	[github] Codeowners for docs (#25479 )	2022-06-04 22:09:00 -07:00
Jun Gong	644b80c0ef	[RLlib] mark learning and examples tests exclusive. (#25445 )	2022-06-04 09:35:24 -07:00
SangBin Cho	00e3fd75f3	[State Observability] Ray log alpha API (#24964 ) This is the PR to implement ray log to the server side. The PR is continued from #24068. The PR supports two endpoints; /api/v0/logs # list logs of the node id filtered by the given glob. /api/v0/logs/{[file \| stream]}?filename&pid&actor_id&task_id&interval&lines # Stream the requested file log. The filename can be inferred by pid/actor_id/task_id Some tests need to be re-written, I will do it soon. As a follow-up after this PR, there will be 2 PRs. PR to add actual CLI PR to remove in-memory cached logs and do on-demand query for actor/worker logs	2022-06-04 05:10:23 -07:00
Sven Mika	a559efb7e4	[CI; LinkCheck] 3 RLlib fixes. (#25476 )	2022-06-04 11:54:56 +02:00
Yi Cheng	47c4f6f094	[flakey] Fix test_modin.py (#25469 ) test_modin.py is flakey right now. It complains about some modules can't be imported. This seems like a init issue where client mode and non-client mode are mixed. This test closes the cluster for each run. It slows the test a little bit, but it's more stable.	2022-06-04 08:34:37 +00:00
Max Pumperla	c5f4a82e3c	Add doc code owners (#24910 )	2022-06-03 23:59:32 -07:00
Sven Mika	b5bc2b93c3	[RLlib] Move all remaining algos into `algorithms` directory. (#25366 )	2022-06-04 07:35:24 +02:00
SangBin Cho	54496d7705	[State Observability API] Support Filtering (#25281 ) This PR adds a filtering support. The filtering is done from the API server side (not from the source side). Source side filtering is a bit complicated to write an elegant solution, and we will handle it in the future (no optimization for alpha APIs). We will also support limited types of columns for each API. The API is as follows ray list [resources] -- filter [key] [value] => filter data that's key==value. In the future, we can also support more complicated filtering like !=, And, Or , or etc.	2022-06-03 17:17:30 -07:00
Zhe Zhang	4cc202585a	[Docs] Document Ray downscaling behavior (#25466 )	2022-06-03 17:08:21 -07:00
Eric Liang	1f509ab331	[air] Add DatasetParallelTrainer.dataset_config for configuring dataset ingest (#25337 ) This adds a per-dataset config object to DataParallelTrainer. These configs define how the Dataset should be read into the DataParallelTrainer. It configures the preprocessing, splitting, and ingest strategy per-dataset. DataParallelTrainers declare default DatasetConfigs for each dataset passed in the ``datasets`` argument. Users have the opportunity to selectively override these configs by passing the ``dataset_config`` argument. Trainers can also define user customizable values (e.g., XGBoostTrainer doesn't support streaming ingest). This PR adds the minimal support for dataset configs. Future PRs will: - Add support for streaming ingest - Move this config from DataParallelTrainer to ml.Trainer	2022-06-03 16:32:53 -07:00
Eric Liang	22aaf47fda	[tune] Better error message for Tune nested tasks / actors (#25241 ) This PR uses a task/actor launch hook to generate better error messages for nested Tune tasks/actors in the case there are no extra resources reserved for them. The idea is that the Tune trial runner actor can set a hook prior to executing the user code. If the user code launches a task, and the placement group for the trial cannot possibly fit the task, then we raise TuneError right off to warn the user.	2022-06-03 14:53:40 -07:00
Sihan Wang	03ed27b9c1	[Serve] Fix the test_serve_start_different_http_checkpoint_options_warning flaky (#25452 )	2022-06-03 14:45:00 -07:00
Kai Fricke	4b9a89ad90	[air] Move python/ray/ml to python/ray/air (#25449 ) The package "ml" should be renamed to "air". Main question: Keep a `ml.py` with `from ray.air import *` for some level of backwards compatibility? I'd go for no to force people to use the new structure.	2022-06-03 21:53:44 +01:00
Yi Cheng	6b38b071e9	Revert "Revert "[core] Remove gcs addr updater in core worker. (#24747 )" (#25375 )" (#25391 ) This reverts commit `49efcab4fe`.	2022-06-03 12:26:27 -07:00
matthewdeng	2e05b62236	[AIR] Preprocessors feature guide (#25302 )	2022-06-03 11:43:51 -07:00
Kai Fricke	313e8730a2	[tune/docs] Trial executor doc fix (#25440 )	2022-06-03 16:25:38 +01:00
Kai Fricke	7186cd8b79	[tune] Remove various deprecated code paths (deprecation cycle) (#25407 ) This PR removes various deprecated code paths in Ray Tune that raised errors on usage before.	2022-06-03 15:01:40 +01:00
Sven Mika	6c7f781d8e	[RLlib] Unflake some CI-tests. (#25313 )	2022-06-03 14:51:50 +02:00
Kai Fricke	2e058380d7	[tune] Remove TrialExecutor base class (#25404 ) The TrialExecutor base class was a stub and has been deprecated long ago; direct inheritance was disabled. This PR removes the base class and moves the remaining functionality into the RayTrialExecutor.	2022-06-03 10:16:47 +01:00
Kai Fricke	f0fa8e54f8	[tune] Remove DurableTrainable class (#25405 ) The DurableTrainable is deprecated (every trainable is a durable trainable). This PR removes it from the Tune library and a related example.	2022-06-03 10:16:02 +01:00
Antoni Baum	84a9df9448	[AIR/Tune] Add `TempFileLock` (#25408 ) Adds a `TempFileLock` class that stores lockfiles inside a temporary directory.	2022-06-03 10:12:53 +01:00
Jun Gong	1d24d6af98	[RLlib] Fix MARWIL tf policy. (#25384 )	2022-06-03 10:50:36 +02:00
Qing Wang	99429b7a92	[Core] Remove thread local core worker instance 2/n (#25159 ) We removed the thread local core worker instance in this PR, which is the further arch cleaning stuff for removing multiple workers in one process. It also removes the unnecessary parameter `workerId` from JNI.	2022-06-03 14:08:44 +08:00
Yi Cheng	60587cf1dc	[flakey] Deflakey test_ray_shutdown.py (#25422 ) The main issue with this test is that the worker is trying to connect to the raylet but the raylet exits, and in this case, it'll hang there. This happens before the periodical check runs so the worker won't exit as well. This fix moves the hanging part to the place after the periodical check starts. Another issue is the pubsub timeout. The default one is 60s, and we need to adjust it to smaller value to make it work within 60s for the test.	2022-06-02 23:00:33 -07:00
Yi Cheng	fd0f967d2e	Revert "[RLlib] Move (A/DD)?PPO and IMPALA algos to `algorithms` dir and rename policy and trainer classes. (#25346 )" (#25420 ) This reverts commit `e4ceae19ef`. Reverts #25346 linux://python/ray/tests:test_client_library_integration never fail before this PR. In the CI of the reverted PR, it also fails (https://buildkite.com/ray-project/ray-builders-pr/builds/34079#01812442-c541-4145-af22-2a012655c128). So high likely it's because of this PR. And test output failure seems related as well (https://buildkite.com/ray-project/ray-builders-branch/builds/7923#018125c2-4812-4ead-a42f-7fddb344105b)	2022-06-02 20:38:44 -07:00
Jian Xiao	6589a4f8cb	[Datasets][UX Assessment] Add a section on how to write UDFs in Datasets (#25338 ) The Datasets UX assessment showed that users had difficulties in writing UDFs: what's input/output types, how to write the function etc. Co-authored-by: Ubuntu <ubuntu@ip-172-31-32-136.us-west-2.compute.internal>	2022-06-02 20:00:50 -07:00
SangBin Cho	ba90838b66	[Log monitor] Add unit tests + fix flaky test_logging (#25294 ) Looks like the test_logging fails when syncer is enabled. However, I found the test was badly written, and the failure might be a side effect of syncer (I am not sure why. Maybe syncer slows down ray.init()?) ray/python/ray/tests/test_logging.py Line 228 in `f75ede1` def test_log_monitor_backpressure(ray_start_cluster, monkeypatch): Anyway, it seems like the test will fail if there's a delay after log monitor is started. Testing this is not trivial. Instead, I made log_monitor unit testable and added full unit tests. This also adds a better exception message on another flaky test test_log_rotation . I need more data before actually fixing this issue.	2022-06-02 19:15:57 -07:00
Siyuan (Ryans) Zhuang	b5e71fde23	[workflow] Remove workflow virtual actor (#25394 ) * remove workflow virtual actor	2022-06-02 18:17:25 -07:00
Amog Kamsetty	c8b112ec46	[Train] Support amp for models with a custom `__getstate__` method (#25335 ) The current implementation of amp does not work if the model that is being wrapped defines a custom __getstate__ method. It would fail at the assertion like here: https://discuss.ray.io/t/ray-train-hangs-for-long-time/6333/7. This PR fixes amp for this case, and adds tests for it.	2022-06-02 18:13:13 -07:00
Stephanie Wang	473a962d89	[Datasets] [Docs] Add docs about fault tolerance in Datasets (#25371 ) Adds description of fault tolerance guarantees for Datasets. Related issue number Closes #24856.	2022-06-02 15:53:50 -07:00
Antoni Baum	f8551942bf	[AIR] Fix trainer allowed scaling config keys (#25350 ) Adds `resources_per_worker` to allowed scaling config keys in `DataParallelTrainer` and `GBDTTrainer`.	2022-06-02 11:20:37 -07:00
shrekris-anyscale	16bdfe6a39	Restore "[Serve] Deploy Serve deployment graphs via REST API" (#25073 ) (#25333 )	2022-06-02 11:06:53 -07:00
Stephanie Wang	ab8785ca5c	Revert "Revert "[core] Support generators for tasks with multiple return values (#25247 )" (#25380 )" (#25383 ) Duplicate for #25247. Adds a fix for Dask-on-Ray. Previously, for tasks with multiple return values, we implicitly allowed returning a dict with the return index as the key. This was used by Dask-on-Ray, but this is not documented behavior, and we now require task returns to be iterable instead.	2022-06-02 10:50:11 -07:00
Sihan Wang	3c9bd66485	[Serve][Doc] Add http endpoint for dag pattern doc (#25390 )	2022-06-02 09:01:37 -07:00
Sihan Wang	b024a9543e	[Serve] Support scale replica down to 0 (#24892 )	2022-06-02 08:06:46 -07:00
Sven Mika	e4ceae19ef	[RLlib] Move (A/DD)?PPO and IMPALA algos to `algorithms` dir and rename policy and trainer classes. (#25346 )	2022-06-02 16:47:05 +02:00
Steven Morad	f781622f86	[RLlib] Bandits (torch) Policy sub-class. (#25254 ) Co-authored-by: Steven Morad <smorad@anyscale.com>	2022-06-02 15:16:51 +02:00
Kai Fricke	6fe91885b0	[docs/lint] Fix reference to `dataset_tune` (#25402 )	2022-06-02 11:40:26 +01:00
Kai Fricke	8a9512bf62	[ci/release] Install local wheels in release test shell script (#25227 ) We're currently installing matching wheels on the fly in the python script for Ray client tests. However, we can't reload modules with changed protobuf configurations, and thus can't reload ray completely. Since the `anyscale` pacakge depends on Ray, this effectively prevents us from installing matching wheels within the python script. There are a few possible solutions to this. First, we could separate out the local environment preparation from the test running - this will duplicate some logic and is thus a bit more involved, but should be considered in the future. For now, we adjust the `run_release_tests.sh` shell script to install any passed `--ray-wheels` wheels locally. We only do this in CI instances per default as these wheels will not be compatible with e.g. MacOS. Link to successful build: https://buildkite.com/ray-project/release-tests-branch/builds/619#_	2022-06-02 10:32:51 +01:00
Antoni Baum	045c47f172	[CI] Check test files for `if __name__...` snippet (#25322 ) Bazel operates by simply running the python scripts given to it in `py_test`. If the script doesn't invoke pytest on itself in the `if _name__ == "__main__"` snippet, no tests will be ran, and the script will pass. This has led to several tests (indeed, some are fixed in this PR) that, despite having been written, have never ran in CI. This PR adds a lint check to check all `py_test` sources for the presence of `if _name__ == "__main__"` snippet, and will fail CI if there are any detected without it. This system is only enabled for libraries right now (tune, train, air, rllib), but it could be trivially extended to other modules if approved.	2022-06-02 10:30:00 +01:00
Qing Wang	64f9a9066f	[doc] Update document on `ray start` command. (#25306 )	2022-06-02 16:42:24 +08:00
Artur Niederfahrenhorst	71a8a443ce	[RLlib] Fix Policy global timesteps being off by init sample batch size. (#25349 )	2022-06-02 10:19:21 +02:00
kourosh hakhamaneshi	87c9fdd0f8	RLlib: Fix bug: `WorkerSet.stop()` will raise error if `self._local_worker` is None (e.g. in evaluation worker sets). (#25332 )	2022-06-02 09:41:43 +02:00
Eric Liang	6fe8f7e16b	fix lint (#25393 )	2022-06-01 22:35:30 -07:00
Eric Liang	51b295ad74	[docs] Improve Tune + Datasets documentation (#25389 )	2022-06-01 21:52:32 -07:00

1 2 3 4 5 ...

12833 commits