hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 18:41:40 -05:00

Author	SHA1	Message	Date
mwtian	6882ca0717	[Release branch] fix rllib test_catalog.py (#22387 )	2022-02-15 16:54:56 +01:00
Clark Zinzow	d51df512eb	[1.11.0] [Cherry-pick] [Datasets] Fix boolean tensor column representation and slicing. (#22358 ) Reformatted cherry-pick of `443416907e`. This PR fixes our {NumPy, Pandas} <--> Arrow interop for boolean tensor columns. NumPy and Pandas represent boolean arrays with a byte per boolean, while Arrow bit-packs booleans with 8 booleans per byte. Previously, when casting NumPy arrays to tensor columns, we were interpreting NumPy's boolean array buffers as being bit-packed when they were not. This PR completes support by packing and unpacking bits for boolean arrays when creating a boolean tensor column from an ndarray and when creating an ndarray from a boolean tensor column, respectively.	2022-02-14 11:45:50 -08:00
Edward Oakes	c48ad5cf13	[serve] Fix HTTP proxy controller namespace bug (#22287 ) (#22355 ) Closes https://github.com/ray-project/ray/issues/22265 This was caused by implicitly inferring the namespace from within the HTTP proxy when calling `get_handle`. This makes me think we really need to simplify the namespace handling logic.	2022-02-14 11:33:06 -08:00
Mingwei Tian	fee8947c23	[Release branch] Update Python version to 1.11.0rc0	2022-02-14 10:05:53 -08:00
mwtian	49b0d4d88f	[Release] Add release logs for 1.11.0rc0 (GCS KV & pubsub not enabled) (#22041 )	2022-02-14 10:05:53 -08:00
Chen Shen	a847fa3643	[Dataset] avoid pyarrow 7.0.0 for dataset (#22253 ) (#22330 )	2022-02-14 08:06:11 -08:00
Archit Kulkarni	789274c179	[runtime env] [1.11.0 release cherry-pick] fix bug where pip options don't work in `requirements.txt` (#22127 ) * [runtime env] Fix bug where options (e.g. `--extra-index-url`) could not be specified in `requirements.txt` (#22065) In https://github.com/ray-project/ray/pull/20341 the behavior of `pip` was changed to install the specified packages in the existing environment rather than in a new environment. This posed a problem when specifying Ray libraries like "ray[serve]" in the `pip` field, because the installer would install Ray at runtime and this new Ray would take precedence over the Ray existing on the cluster. This could cause version mismatch issues. Skipping some details, the approach taken in the that PR was essentially to parse the `pip` list and remove Ray. However not every line in a `pip` `requirements.txt` file is a requirements specifier; a line can also just specify options, like `--extra-index-url my-index-url.com`. This caused the parsing library to raise an exception when trying to parse the line. This PR fixes this by catching the exception and skipping the line in this case, since it's not a line that specifies `ray` and that's all we're looking for when parsing. * lint using old linter from pre-1.11.0-branch-cut	2022-02-14 07:13:37 -08:00
Kai Fricke	c500c5b1ed	[ci/release] Fix job submission command (#22093 ) Ray job submission does not accept quoted commands anymore (#22011). This PR updates the command to fix job submission within e2e tests.	2022-02-13 20:08:03 -08:00
mwtian	2b257189a1	[Release 1.11 Cherrypick] [e2e] do not terminate in `serve_failure` smoke test (#21955 ) Original PR #21925 This makes `serve_failure` pass its smoke test step. Without it, the test fails early and does not get to exercise the logic for 24 hr.	2022-01-28 20:19:18 -08:00
mwtian	e23b27c173	[ci/release] Increase long running timeout, fix artifacts copy (#21905 ) (#21943 ) With the new job-based file copy, fetching results takes longer. We thus have to increase the long running update test check times in order not to run into bogus release test failures. Also fixes artifact uploading issues. Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>	2022-01-27 22:30:43 -08:00
mwtian	32cf407fc4	[release] Fix broken `pip_download_test.sh` script for non-M1 Macs (#21542 ) (#21944 ) Fixes a typo that caused the script to exit early without running any sanity checks when not using an M1 Mac. Co-authored-by: Archit Kulkarni <architkulkarni@users.noreply.github.com>	2022-01-27 22:30:28 -08:00
mwtian	413447cbdf	[Doc] update dask version for Ray 1.11.0 (#21933 ) (#21942 ) This is needed for release 1.11.0.	2022-01-27 22:29:52 -08:00
Mingwei Tian	c80aceaf0a	Update version to 1.11.0rc0	2022-01-25 20:51:13 -08:00
Lingxuan Zuo	0c33ff718d	Remove generated streaming pb and pom files. (#21851 ) There are some auto-generated streaming files, which are not removed. This PR removes them totally. Co-authored-by: 林濯 <lingxuzn.zlx@antgroup.com>	2022-01-26 10:05:23 +08:00
Alex Wu	7a45f60dbc	[autoscaler] Fix ray.autoscaler.sdk import issue (#21795 ) This PR moves the sdk to its own folder, then includes everything in `import ray.autoscaler.sdk` in ray's import path. Note: that there were circular dependencies in naively doing this because the ray core now uses constants that were defined in the autoscaler for internal kv operations (and the autoscaler similarly calls into the ray core). The solution was to move those internal kv keys into ray core constants so the imports flow (more) one way. Co-authored-by: Alex Wu <alex@anyscale.com>	2022-01-25 14:43:24 -08:00
Wilson Wang	30a4761592	Two issues fix for GCS connecting logic in monitor.py and log_monitor.py (#21790 ) This patch fixed two issues. 1. log_monitor.py can crash when gcs is not temporarily available. Added retry logic in gcs_pubsub.py. 2. it is possible that the signal handler can raise another exception during exception handling.	2022-01-25 14:07:26 -08:00
Ian Rodney	257bd2d1e7	[Cleanup] Use `mkstemp` (#21676 ) `tempfile.mktemp` is technically deprecated in favor of `tempfile.mkstemp`. Ref: https://docs.python.org/3/library/tempfile.html#deprecated-functions-and-variables.	2022-01-25 13:42:12 -08:00
shrekris-anyscale	e4370720cc	[Serve] Add "Serve" team tag to untagged release tests (#21861 )	2022-01-25 11:46:03 -08:00
Dhruv Nair	3d79815cd0	Comet Integration (#20766 ) This PR adds a `CometLoggerCallback` to the Tune Integrations, allowing users to log runs from Ray to [Comet](https://www.comet.ml/site/). Co-authored-by: Michael Cullan <mjcullan@gmail.com> Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>	2022-01-25 11:42:00 -08:00
Clark Zinzow	1971a08b7d	[RFC] [Core] Support disabling log redirection via `RAY_LOG_TO_STDERR` environment variable. (#21767 )	2022-01-25 10:52:53 -08:00
Gagandeep Singh	395297a9bd	Unskip tests for Windows in `test_output` (#21775 )	2022-01-25 09:25:01 -08:00
Matti Picus	d3d1e8559c	enable passing metric tests on windows (#21755 ) Resubmitting #21705 which was merged then reverted. It seems somehow sphinx building broke in the meantime, not clear how it is connected to this PR. Here is the original description: >Part of the effort to enable tests on windows, this enables test_metrics and test_metric_agents, which pass locally.	2022-01-25 09:20:16 -08:00
Sven Mika	d5bfb7b7da	[RLlib] Preparatory PR for multi-agent multi-GPU learner (alpha-star style) #03 (#21652 )	2022-01-25 14:16:58 +01:00
SangBin Cho	b2cd123522	[Runtime Env] Suppress the log messages when RAY_RUNTIME_ENV_LOG_TO_DRIVER_ENABLED=0 (#21806 ) There was a user request to disable runtime env logs. This is the first PR that allows users to disable runtime env logs through an env var. Basically if users specify `RAY_RUNTIME_ENV_LOG_TO_DRIVER_ENABLED =0`, this will disable runtime env logs. Note that in the log monitor RAY_RUNTIME_ENV_LOG_TO_DRIVER_ENABLED=1 by default. This is temporary, and I'd like to make this 0 by default after improving runtime error failure messages. Once we disable log msgs by default, we can unify `RAY_RUNTIME_ENV_LOG_TO_DRIVER_ENABLED` and `RAY_RUNTIME_ENV_LOCAL_DEV_MODE`	2022-01-25 00:42:52 -08:00
Gagandeep Singh	290f3172ad	Unskipped tests for Windows in `test_client.py` (#21824 ) All the tests in `test_client.py` pass on Windows without issues, so unskipping them here.	2022-01-24 22:51:54 -08:00
Lixin Wei	bc55a958c4	[Core] Support UTF-8 Actor Creation Exceptions (#21807 ) Now if an actor throws an exception containing non-ASCII characters, the actor won't die and will be alive. This is because the following exception occurred during handling the user's exception: ``` File "python/ray/_raylet.pyx", line 587, in ray._raylet.task_execution_handler File "python/ray/_raylet.pyx", line 434, in ray._raylet.execute_task File "python/ray/_raylet.pyx", line 551, in ray._raylet.execute_task File "/home/admin/.local/lib/python3.6/site-packages/ray/utils.py", line 96, in push_error_to_driver worker.core_worker.push_error(job_id, error_type, message, time.time()) File "python/ray/_raylet.pyx", line 1636, in ray._raylet.CoreWorker.push_error UnicodeEncodeError: 'ascii' codec can't encode characters in position 2597-2600: ordinal not in range(128) An unexpected internal error occurred while the worker was executing a task. ``` This PR fixes this issue.	2022-01-24 20:27:43 -08:00
Guyang Song	089f49f554	[doc] fix doc of container-based runtime env (#21815 )	2022-01-25 12:23:15 +08:00
isaac-vidas	236fe58259	[Doc] Update requests calls to ray job submission api (#21802 )	2022-01-24 17:44:31 -08:00
Max Pumperla	7953c9ca57	[docs] integrate algolia docsearch, move to sphinx panels (#21814 )	2022-01-24 17:00:41 -08:00
Andrew A. Naguib	f026376556	[Tune] PTL replace deprecated `running_sanity_check` with `sanity_checking` (#21831 ) `running_sanity_check` was deprecated and removed in https://github.com/PyTorchLightning/pytorch-lightning/pull/9209 in favor of `sanity_checking`	2022-01-24 16:14:05 -08:00
Siyuan (Ryans) Zhuang	99b287d236	[workflow] Fix workflow recovery issue due to a bug of dynamic output (#21571 ) * Fix workflow recovery issue due to a bug of dynamic output * add tests	2022-01-24 15:34:57 -08:00
DK.Pino	c2199a50e3	[Placement Group] Fix remove pg flaky when worker startup slow (#20474 ) Currently, when we destroy the created placement group, we will kill all workers that are related to this placement group, however, we only killed the running worker at this time, if there is a worker which startup very slow and the related placement group was already destroyed before the worker startup successfully, then there will be a worker leak.	2022-01-24 15:30:04 -08:00
SangBin Cho	7d4287a6ab	[Test] Move long running tests to run everyday (#21813 ) Long running tests are cheap and low overhead (small number of node usage). We should just promote this to run every day so we can catch regressions quickly.	2022-01-24 15:10:27 -08:00
SangBin Cho	ac5f38d7fd	[Test] Fix dask on ray test on K8s (#21816 ) Fix dash on ray large scale test on K8s. Basically, chmod requires a root access, which we don't have it by default in the k8s cluster. We don't need chmod I think (I verified the test passes without it).	2022-01-24 15:09:22 -08:00
mwtian	a10d05ce27	[Bootstrap] fix log format (#21826 )	2022-01-24 15:06:41 -08:00
Yi Cheng	57afb2f75a	[gcs/ha] Skip raydb test when it's gcs bootstrap mode (#21771 ) RayDP needs to be updated to work with redisless ray. To be more specific this [line](`c08a786770/python/raydp/spark/ray_cluster_master.py (L146)` ) needs to be updated to using `node.address` We should update this after the release with the feature being turned on by default.	2022-01-24 14:43:31 -08:00
shrekris-anyscale	03d93ba7ee	Add a new End-to-End tutorial in Serve that walks users through deploying a model (#20765 ) Currently, the docs have an [end-to-end tutorial](https://web.archive.org/web/20211122152843/https://docs.ray.io/en/latest/serve/tutorial.html) walking users through deploying a `Counter` function on Serve. This PR adds an end-to-end tutorial walking users through deploying an entire Hugging Face model using Serve, providing a better understanding of how to deploy an actual model via Serve. Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Co-authored-by: Simon Mo <simon.mo@hey.com>	2022-01-24 16:36:04 -06:00
Sven Mika	c288b97e5f	[RLlib] Issue 21629: Video recorder env wrapper not working. Added test case. (#21670 )	2022-01-24 19:38:21 +01:00
SangBin Cho	2010f13175	Fix dashboard test bug (#21742 ) Currently `wait_until_succeeded_without_exception` is used in the dashboard, and it returns True/False. Unfortunately, there are lots of code that doesn't assert on this method (which means things are not actually tested).	2022-01-24 11:38:51 -06:00
Antoni Baum	850eb88cde	[tune] Fix analysis without registered trainable (#21475 ) This PR fixes issues with loading ExperimentAnalysis from path or pickle if the trainable used in the trials is not registered. Chiefly, it ensures that the stub attribute set in load_trials_from_experiment_checkpoint doesn't get overridden by the state of the loaded trial, and that when pickling, all trials in ExperimentAnalysis are turned into stubs if they aren't already. A test has also been added.	2022-01-24 08:27:08 -08:00
Guyang Song	08b8f3065b	add runtime env code owners (#21803 )	2022-01-24 19:25:16 +08:00
Guyang Song	f8e41215b3	[1/n][cross-language runtime env] runtime env protobuf refactor (#21551 ) We need to support runtime env for java、c++ and cross-language. This PR only do a refactor of protobuf. Related issue #21731	2022-01-24 19:24:59 +08:00
SangBin Cho	6b4aac7a08	Promote unstable tests to stable (#21811 ) Promote tests that have passed 100% last 1 week to stable	2022-01-24 02:10:37 -08:00
Chen Shen	a60251f47a	[Core] Fix 16GB mac perf issue by limit the plasma store size to 2GB (#21224 ) * add changes * as title * fix * max to min * fix tests	2022-01-24 01:52:59 -08:00
Shawn	6603ad450a	[Java] print hang test case name (#21804 ) * print hang test case name * use getFullTestName	2022-01-23 23:56:44 -08:00
SangBin Cho	1ae14ec513	[Dashboard] Make dashboard / agent work in minimal ray installation 1/3. (#21774 ) This is the doc that explains how to achieve this: https://docs.google.com/document/d/12qP3x5uaqZSKS-A_kK0ylPOp0E02_l-deAbmm8YtdFw/edit?usp=sharing The fully working e2e prototype is here (it passes all tests): `cdad913883` This PR is pure refactoring. Basically it moves some of util functions that require optional_deps to `optional_utils` so that optional deps' util functions are not used in the minimal installation. Look below to see the steps. <img width="693" alt="Screen Shot 2022-01-21 at 4 38 44 AM" src="https://user-images.githubusercontent.com/18510752/150528494-c3cdedf4-3a66-4557-b540-61436b1dbab6.png">	2022-01-23 21:11:32 -08:00
SangBin Cho	babc03edf2	Add a threaded actor k8s test (#21739 ) Add threaded actor flaky test to k8s.	2022-01-23 20:12:57 -08:00
DK.Pino	8cd7a5c438	[Placement Group] Make placement group commit resource rpc request batched (#21240 ) This is one part of this refactor, #20715 , make the commit resource RPC requests batched per node.	2022-01-23 06:16:09 -08:00
Jiao	5d382cfeb3	[nit] remove decorator in test_cli.py (#21792 ) Full context see https://github.com/ray-project/ray/issues/21791 pytest work for "some" environments for this test and on CI master, but this decorator is still unnecessary and was introduced by mistake. So just remove it and see what happens with the original issue.	2022-01-23 06:05:05 -08:00
Lingxuan Zuo	ec62d7f510	[Streaming]Farewell : remove all of streaming related from ray repo. (#21770 ) New repo url is https://github.com/ray-project/mobius Co-authored-by: 林濯 <lingxuzn.zlx@antgroup.com>	2022-01-23 17:53:41 +08:00

1 2 3 4 5 ...

11073 commits