hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
mwtian	45cddef2d3	[GCS] disable tests related to GCS restarting in GCS pubsub mode (#21534 ) `test_failure_2.py::test_gcs_server_failiure_report` and `test_gcs_fault_tolerance.py::test_gcs_server_restart_during_actor_creation` cannot pass in GCS pubsub mode with the existing logic. Disable these tests in GCS pubsub mode and add comment about how we may fix them. Also, suppress exceptions when sync subscribers are disconnected from GCS. I can push changes in this PR to #21513 as well.	2022-01-11 14:14:05 -08:00
Kai Fricke	084bda87a5	[ci/multinode] Fix resource popping resulting in empty resource head nodes (#21531 ) Fixes a small bug where we pop from the resources dict without making a copy, emptying the head node resources. This sometimes leads to empty head node resources.	2022-01-11 13:20:58 -08:00
Yi Cheng	d2d749b6f9	[workflow] Fix test_serialization.py (#21522 ) The new version of responses will introduce some errors in the test. This PR fixed responses. It also fixed moto in case of future updates upstream.	2022-01-11 11:45:18 -08:00
Sven Mika	f94bd99ce4	[RLlib] Issue 21044: Improve error message for "multiagent" dict checks. (#21448 )	2022-01-11 19:50:03 +01:00
mwtian	0e5de61c18	remove unnecessary test filter (#21510 ) (Comment from the PR:) If a GRPC call exceeds timeout, the calls is cancelled at client side but server may still reply to it, leading to missed messages and test failures. Using a sequence number to ensure no message is dropped can be the long term solution, but its complexity and the fact the Ray subscribers do not use deadline in production makes it less preferred. Therefore, a simpler workaround is used instead: a different subscriber is used for each get_error_message() call. Also, re-enable some additional tests in GCS HA mode.	2022-01-11 10:17:03 -08:00
Gagandeep Singh	d47b82883a	Unskipped non-cluster tests in test_actor_resources.py (#21500 )	2022-01-11 09:46:03 -08:00
Gagandeep Singh	a5a8156198	Unskipped tests in test_actor_failures (#21498 )	2022-01-11 09:42:12 -08:00
Gagandeep Singh	e8df34af08	Unskipped test in `test_autoscaling_policy` (#21497 ) The test passes on my Windows Azure VM. P.S. Is it related to cluster tests? I am not sure.	2022-01-11 09:40:37 -08:00
SangBin Cho	097706b35d	[Internal Observability] Re-enable event stats again. (#21515 ) I tried reproducing the many pg mini integration failure from this PR; https://github.com/ray-project/ray/pull/21216, but I failed to do that. (this was the only test that became flaky when we turned on the flag last time). I tried - Run tests:test_placement_group_mini_integration 5 times instead of 3 (the default) - Re-run the PR 3 times. So I think it is worth trying re-enabling it again.	2022-01-11 09:00:27 -08:00
Jamie Slome	a68bd2fcfd	Create SECURITY.md (#21521 )	2022-01-11 08:54:51 -08:00
Qing Wang	bb647626cf	[Xlang][Java] Fix Java overrided `default` method cannot be invoked. (#21491 ) In Xlang(Python call Java), a Java method which overrides a `default` method of the super class is not able to be invoked successfully, due to we treat it as overloaded method instead of overrided method. This PR correctly handle it at the case it overrides a `default` method. Before this PR, the following usage is not able to be invoked from Python -> Java. ```Java public interface ExampleInterface { default String echo(String inp) { return inp; } } public class ExampleImpl implements ExampleInterface { @Override public String echo(String inp) { return inp + " echo"; } } ``` ```python /// Invoke it in Python. cls = ray.java_actor_class("io.ray.serve.util.ExampleImpl") handle = cls.remote() print(ray.get(handle.echo.remote("hi"))) ```	2022-01-11 23:11:24 +08:00
Eric Liang	9ac34ecc94	Revert "[workflow] Skip saving outputs of "workflow.wait"" (#21520 ) This is breaking linux://python/ray/workflow:tests/test_wait per https://flakey-tests.ray.io/	2022-01-10 20:51:42 -08:00
Kai Fricke	5a7f6e4fdd	[rfc][ci] create fake docker-compose cluster environment (#20256 ) Following #18987 this PR adds a docker-compose based local multi node cluster. The fake multinode docker comprises two parts. The docker_monitor.py script is a watch script calling docker compose up whenever the docker-compose.yaml changes. The node provider creates and updates the docker compose according to the autoscaling requirements. This mode fully supports autoscaling and comes with test utilities to start and connect to docker-compose autoscaling environments. There's also a sample test case showing how this can be used.	2022-01-11 04:35:36 +00:00
Gagandeep Singh	4a8a8b30b0	Skipped `test_reference_counting_2` and `test_actor` (#21507 )	2022-01-10 20:34:03 -08:00
Yi Cheng	65598b3bb0	[gcs] Re-enable release tests with GCS HA (#21511 ) Re-enable release tests with GCS HA mode.	2022-01-10 16:35:57 -08:00
hckuo	7955333ffd	[runtime env] allow working_dir to be a zipped package (#20826 ) Check if working_dir is a zip, unzip it if so.	2022-01-10 18:29:01 -06:00
Siyuan (Ryans) Zhuang	6e568d2c02	[workflow] Skip saving outputs of "workflow.wait" (#21183 )	2022-01-10 15:37:13 -08:00
Jiajun Yao	aec37d4b60	Add container utils (#21444 ) - Add debug_string helper functions for common containers. - Add map_find_or_die helper function	2022-01-10 15:29:29 -08:00
Amog Kamsetty	bcae6ba6c9	[Train] `_WrappedDataLoader` yield tuples (#21467 ) Fixes bug with _WrappedDataLoader that yields a generator instead of a tuple. Addresses https://discuss.ray.io/t/ray-train-creates-typeerror-generator-object-is-not-subscriptable/4605/10	2022-01-10 12:40:36 -08:00
Qing Wang	57ff13461c	[Java] Use localhost instead of public ip (#21462 ) Use localhost ip address instead of public ip for avoid security popups on MacOS. This also reverts This reverts commit `e4542be0d1`.	2022-01-11 02:58:22 +08:00
Zyiqin-Miranda	71fae21e8e	[autoscaler] AWS Autoscaler CloudWatch Dashboard support (#20266 ) These changes add a set of improvements to enable automatic creation and update of CloudWatch dashboards when provisioning AWS Autoscaling clusters. Successful implementation of these improvements will allow AWS Autoscaler users to: 1. Get rapid insights into their cluster state via CloudWatch dashboards. 2. Allow users to update their CloudWatch dashboard JSON configuration files during Ray up execution time. Notes: 1. This PR is a follow-up PR for #18619, adds dashboard support.	2022-01-10 10:18:53 -08:00
Gagandeep Singh	6420c75fd2	Unskipped test in test_advanced_2.py (#21503 )	2022-01-10 09:06:44 -08:00
Sven Mika	92f030331e	[RLlib] Initial code/comment cleanups in preparation for decentralized multi-agent learner. (#21420 )	2022-01-10 11:22:55 +01:00
Sven Mika	4eaf70942d	[RLlib] Issue 21297: Ignore PPO KL-loss term completely if kl-coeff == 0.0 to avoid NaN values due to some discrete action probs==0.0 (#21456 )	2022-01-10 11:22:40 +01:00
Sven Mika	35af30a446	[RLlib] Issue 21109: Action unsquashing causes inf/NaN actions for unbounded action spaces. (#21110 )	2022-01-10 11:20:37 +01:00
Sven Mika	b10d5533be	[RLlib] Issue 20920 (partial solution): contrib/MADDPG + pettingzoo coop-pong-v4 not working. (#21452 )	2022-01-10 11:19:40 +01:00
qicosmos	f8244a4cc0	[C++ Worker]fix uninit worker context (#21371 )	2022-01-10 17:17:41 +08:00
Matti Picus	5aef1e1708	remove deprecated unittest aliases (#21455 ) In a [recent review](https://discuss.python.org/t/experience-with-python-3-11-in-fedora/12911) of the experience of the Fedora team porting packages to the upcoming python 3.11, they remarked that most of the work was in removing deprecated aliases in unittest. I came across a few of these when looking at unrelated test failures, the DeprecationWarnings caught my eye. So a made a quick sweep of the code, using `git grep` to find occurances of the deprecated aliases: old \| new ---\|--- assertEquals \| assertEqual assertNotEquals \| assertNotEqual assertRaisesRegexp \| assertRaisesRegex	2022-01-09 20:29:54 -08:00
Gagandeep Singh	c43d4cc028	Unskipped test in `test_kv_store.py` (#21451 ) Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>	2022-01-09 14:38:55 -08:00
Sven Mika	34cee199b1	[RLlib] from remote_vector_env import ... -> from remote_base_env import ... (avoid deprecation warning). (#21460 )	2022-01-08 17:13:04 +01:00
Yi Cheng	4ab059eaa1	[gcs] Fix the server standalone tests in HA mode (#21480 ) CoreWorker hangs there before exiting if gcs exits first due to in correct ordering of destruction. This PR fixed this. It'll stop gcs client first and then job the thread.	2022-01-07 22:54:50 -08:00
Yi Cheng	bdfba88082	[2/3][kv] Add delete by prefix support for internal kv (#21442 ) Delete by prefix for internal kv is necessary to cleanup the function table. This will be used to fix the issue #8822	2022-01-07 22:54:24 -08:00
mwtian	4a34233a90	[Core] allow message in deprecation annotation (#21466 )	2022-01-07 21:52:31 -08:00
Clark Zinzow	d6c02f46b9	Fix raylet command line arg descriptions. (#21478 )	2022-01-07 21:46:36 -08:00
Simon Mo	f5ac915ed5	[Serve] Detect http.disconnect can cancel handle requests (#21438 )	2022-01-07 21:01:34 -08:00
Yi Cheng	8fa9fddaa0	[1/3][kv] move some internal kv py logic into cpp (#21386 ) This PR moves the internal kv namespace logic into cpp to reduce logic in python for the following reasons: - internal kv is used in x-lang so we have to move it to cpp so that all langs can benefit. - for https://github.com/ray-project/ray/issues/8822 we need to delete resource when job finished in gcs One extra field about del is also added so that when delete, we are able to delete by prefix instead of just a key	2022-01-07 17:35:06 -08:00
Jiajun Yao	501b78feaa	Remove dead tests related to the old scheduler (#21465 )	2022-01-07 12:55:54 -08:00
Amog Kamsetty	123aa7cd2b	[Train] Improve usability for GPU Training (#21464 ) Minor changes to improve the user experience for GPU Training. Addresses https://discuss.ray.io/t/ray-train-doesnt-detect-gpu/4608	2022-01-07 11:53:53 -08:00
Gagandeep Singh	cc1000886a	[serve] Unskip tests in `test_fastapi.py` (#21422 ) These tests pass on my machine. Unskipping them here for CI verification.	2022-01-07 11:27:15 -08:00
mwtian	bbf23ec59f	[GCS] enhance error message when failing to fetch GCS address or connecting to GCS (#21396 ) There are test flakiness where GCS client failed to be created, but there is not enough information for debugging. The exception message will be printed after GCS client creation failure. Also, this PR breaks down GCS client creation to two steps: reading GCS address from Redis, and creating GCS client, which should help locating the issue.	2022-01-07 09:56:23 -08:00
Sven Mika	3a3d0a4a2b	[RLlib] Issue 21340: SampleBatch __init__ docstring wrong. (#21447 )	2022-01-07 15:48:14 +01:00
Jun Gong	83955a9407	[RLlib] Extend CQL perf test to 1hr. (#21449 )	2022-01-07 11:35:16 +01:00
Gagandeep Singh	51e4880477	[serve] Unskipped tests in `test_constructor_failure.py` & `test_ray_client.py` (#21423 ) These tests pass on my machine. Unskipping them here for CI.	2022-01-07 01:53:13 -08:00
Gagandeep Singh	39697cf69c	Unskipped `test_snapshot_always_written_to_internal_kv` (#21350 )	2022-01-07 00:57:23 -08:00
Matti Picus	f3dcd1fac1	WINDOWS: re-enable runtime_env tests, skip cluster tests in serve (#21398 ) After enabling tests of test_runtime_env_plugin and test_runtime_env_env_vars (PR #21252) and python/ray/serve:* tests (PR #21107), the analysis at flaky-tests.ray.io starting showing failing tests in the windows://python/ray/test/serv:test_standalone. PR #21352 reverted 21252 (runtime_env tests), but the problem was more likely in the serve tests. Specifically `test_standalone` has a test that uses Cluster, which should be skipped on windows because it is flaky. So this PR - re-enables the runtime_env tests for windows - skips the Cluster test in serve/tests/test_standalone.py	2022-01-06 21:43:58 -08:00
Eric Liang	e9068c45fa	[data] Instrument most remaining dataset functions and add docs (#21412 ) This PR finishes most of the stats todos for dataset. The main thing punted for future work is instrumentation of split(), which is particularly tricky since only certain blocks are transformed. Co-authored-by: Clark Zinzow <clarkzinzow@gmail.com>	2022-01-06 17:08:56 -08:00
Alex Wu	8cf4071759	[core] Nested tasks on by default (#20800 ) This PR turns worker capping on by default. Note that there are a couple of faulty tests that this uncovers which are fixed here. Co-authored-by: Alex Wu <alex@anyscale.com>	2022-01-06 15:00:03 -08:00
Avnish Narayan	39f8072eac	[RLlib] [MultiAgentEnv Refactor #2 ] Change space types for `BaseEnvs` and `MultiAgentEnvs` (#21063 )	2022-01-06 14:34:20 -08:00
Amog Kamsetty	8b4cb45088	[Docs] Update Ray Lightning API (#21428 ) Update ray lightning api docs to reflect new changes in ray lightning master. Making this quick change to fix CI and unblock the release, but will follow up on a proper fix for this. Closes #21426	2022-01-06 12:14:33 -08:00
Avnish Narayan	f7a5fc36eb	[rllib] Give rnnsac_stateless cartpole gpu, increase timeout (#21407 ) Increase test_preprocessors runtimes.	2022-01-06 11:54:19 -08:00

... 2 3 4 5 6 ...

11057 commits