hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 02:21:39 -05:00

Author	SHA1	Message	Date
Sven Mika	92f030331e	[RLlib] Initial code/comment cleanups in preparation for decentralized multi-agent learner. (#21420 )	2022-01-10 11:22:55 +01:00
Sven Mika	4eaf70942d	[RLlib] Issue 21297: Ignore PPO KL-loss term completely if kl-coeff == 0.0 to avoid NaN values due to some discrete action probs==0.0 (#21456 )	2022-01-10 11:22:40 +01:00
Sven Mika	35af30a446	[RLlib] Issue 21109: Action unsquashing causes inf/NaN actions for unbounded action spaces. (#21110 )	2022-01-10 11:20:37 +01:00
Sven Mika	b10d5533be	[RLlib] Issue 20920 (partial solution): contrib/MADDPG + pettingzoo coop-pong-v4 not working. (#21452 )	2022-01-10 11:19:40 +01:00
qicosmos	f8244a4cc0	[C++ Worker]fix uninit worker context (#21371 )	2022-01-10 17:17:41 +08:00
Matti Picus	5aef1e1708	remove deprecated unittest aliases (#21455 ) In a [recent review](https://discuss.python.org/t/experience-with-python-3-11-in-fedora/12911) of the experience of the Fedora team porting packages to the upcoming python 3.11, they remarked that most of the work was in removing deprecated aliases in unittest. I came across a few of these when looking at unrelated test failures, the DeprecationWarnings caught my eye. So a made a quick sweep of the code, using `git grep` to find occurances of the deprecated aliases: old \| new ---\|--- assertEquals \| assertEqual assertNotEquals \| assertNotEqual assertRaisesRegexp \| assertRaisesRegex	2022-01-09 20:29:54 -08:00
Gagandeep Singh	c43d4cc028	Unskipped test in `test_kv_store.py` (#21451 ) Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>	2022-01-09 14:38:55 -08:00
Sven Mika	34cee199b1	[RLlib] from remote_vector_env import ... -> from remote_base_env import ... (avoid deprecation warning). (#21460 )	2022-01-08 17:13:04 +01:00
Yi Cheng	4ab059eaa1	[gcs] Fix the server standalone tests in HA mode (#21480 ) CoreWorker hangs there before exiting if gcs exits first due to in correct ordering of destruction. This PR fixed this. It'll stop gcs client first and then job the thread.	2022-01-07 22:54:50 -08:00
Yi Cheng	bdfba88082	[2/3][kv] Add delete by prefix support for internal kv (#21442 ) Delete by prefix for internal kv is necessary to cleanup the function table. This will be used to fix the issue #8822	2022-01-07 22:54:24 -08:00
mwtian	4a34233a90	[Core] allow message in deprecation annotation (#21466 )	2022-01-07 21:52:31 -08:00
Clark Zinzow	d6c02f46b9	Fix raylet command line arg descriptions. (#21478 )	2022-01-07 21:46:36 -08:00
Simon Mo	f5ac915ed5	[Serve] Detect http.disconnect can cancel handle requests (#21438 )	2022-01-07 21:01:34 -08:00
Yi Cheng	8fa9fddaa0	[1/3][kv] move some internal kv py logic into cpp (#21386 ) This PR moves the internal kv namespace logic into cpp to reduce logic in python for the following reasons: - internal kv is used in x-lang so we have to move it to cpp so that all langs can benefit. - for https://github.com/ray-project/ray/issues/8822 we need to delete resource when job finished in gcs One extra field about del is also added so that when delete, we are able to delete by prefix instead of just a key	2022-01-07 17:35:06 -08:00
Jiajun Yao	501b78feaa	Remove dead tests related to the old scheduler (#21465 )	2022-01-07 12:55:54 -08:00
Amog Kamsetty	123aa7cd2b	[Train] Improve usability for GPU Training (#21464 ) Minor changes to improve the user experience for GPU Training. Addresses https://discuss.ray.io/t/ray-train-doesnt-detect-gpu/4608	2022-01-07 11:53:53 -08:00
Gagandeep Singh	cc1000886a	[serve] Unskip tests in `test_fastapi.py` (#21422 ) These tests pass on my machine. Unskipping them here for CI verification.	2022-01-07 11:27:15 -08:00
mwtian	bbf23ec59f	[GCS] enhance error message when failing to fetch GCS address or connecting to GCS (#21396 ) There are test flakiness where GCS client failed to be created, but there is not enough information for debugging. The exception message will be printed after GCS client creation failure. Also, this PR breaks down GCS client creation to two steps: reading GCS address from Redis, and creating GCS client, which should help locating the issue.	2022-01-07 09:56:23 -08:00
Sven Mika	3a3d0a4a2b	[RLlib] Issue 21340: SampleBatch __init__ docstring wrong. (#21447 )	2022-01-07 15:48:14 +01:00
Jun Gong	83955a9407	[RLlib] Extend CQL perf test to 1hr. (#21449 )	2022-01-07 11:35:16 +01:00
Gagandeep Singh	51e4880477	[serve] Unskipped tests in `test_constructor_failure.py` & `test_ray_client.py` (#21423 ) These tests pass on my machine. Unskipping them here for CI.	2022-01-07 01:53:13 -08:00
Gagandeep Singh	39697cf69c	Unskipped `test_snapshot_always_written_to_internal_kv` (#21350 )	2022-01-07 00:57:23 -08:00
Matti Picus	f3dcd1fac1	WINDOWS: re-enable runtime_env tests, skip cluster tests in serve (#21398 ) After enabling tests of test_runtime_env_plugin and test_runtime_env_env_vars (PR #21252) and python/ray/serve:* tests (PR #21107), the analysis at flaky-tests.ray.io starting showing failing tests in the windows://python/ray/test/serv:test_standalone. PR #21352 reverted 21252 (runtime_env tests), but the problem was more likely in the serve tests. Specifically `test_standalone` has a test that uses Cluster, which should be skipped on windows because it is flaky. So this PR - re-enables the runtime_env tests for windows - skips the Cluster test in serve/tests/test_standalone.py	2022-01-06 21:43:58 -08:00
Eric Liang	e9068c45fa	[data] Instrument most remaining dataset functions and add docs (#21412 ) This PR finishes most of the stats todos for dataset. The main thing punted for future work is instrumentation of split(), which is particularly tricky since only certain blocks are transformed. Co-authored-by: Clark Zinzow <clarkzinzow@gmail.com>	2022-01-06 17:08:56 -08:00
Alex Wu	8cf4071759	[core] Nested tasks on by default (#20800 ) This PR turns worker capping on by default. Note that there are a couple of faulty tests that this uncovers which are fixed here. Co-authored-by: Alex Wu <alex@anyscale.com>	2022-01-06 15:00:03 -08:00
Avnish Narayan	39f8072eac	[RLlib] [MultiAgentEnv Refactor #2 ] Change space types for `BaseEnvs` and `MultiAgentEnvs` (#21063 )	2022-01-06 14:34:20 -08:00
Amog Kamsetty	8b4cb45088	[Docs] Update Ray Lightning API (#21428 ) Update ray lightning api docs to reflect new changes in ray lightning master. Making this quick change to fix CI and unblock the release, but will follow up on a proper fix for this. Closes #21426	2022-01-06 12:14:33 -08:00
Avnish Narayan	f7a5fc36eb	[rllib] Give rnnsac_stateless cartpole gpu, increase timeout (#21407 ) Increase test_preprocessors runtimes.	2022-01-06 11:54:19 -08:00
Archit Kulkarni	c7b2d549e3	[runtime env] Fix "conda" field for M1 macs (#21229 ) Currently when the "conda" field of runtime_env is specified, we automatically insert the currently running Ray wheel in the conda dependencies (in the nested `pip` list). This Ray wheel is specified by a URL to Amazon S3, which is where we store our Ray wheels. Unfortunately, currently the M1 wheels are built manually and are uploaded directly to PyPI, and this only happens once for each stable release (in contrast to non-M1 wheels which are auto-built and uploaded to S3 for every commit on master and release branches.). So prior to this PR, if you tried to use the `"conda"` field on M1, it would fail with a message saying it couldn't find the appropriate wheel for the platform. To fix this, in the case of our Ray cluster running on M1 Mac the only thing we can do for now is to insert `"ray=={ray.__version__}` as our `pip` specifier, instead of the (nonexistent) S3 URL. The downside of this approach is (1) nightly wheels and wheels built from commits on master remain unsupported for M1, and (2) we cannot end-to-end test this codepath on a new stable version of Ray before that version is actually released to PyPI. However, this PR adds a unit test.	2022-01-06 09:48:59 -06:00
Jiajun Yao	48a5208645	Refactor ObjectManager wait logic to WaitManager (#21369 ) - This PR moves the `ObjectManager::Wait` related logic to a separate WaitManager class. - Fix the wait hang issue by not relying on the async object location notification, but checking if wait is complete when the local object is added, at that time the object is guaranteed to be local.	2022-01-06 10:42:31 -05:00
Kai Fricke	976ece4bc4	[tune] Add test for heterogeneous resource request deadlocks (#21397 ) This adds a test for potential resource deadlocks in experiments with heterogeneous PGFs. If the PGF of a later trial becomes ready before that of a previous trial, we could run into a deadlock. This is currently avoided, but untested, flagging the code path for removal in #21387.	2022-01-06 10:44:30 +00:00
Qing Wang	132e2b2a96	[Core] Remove unused flag put_small_object_in_memory_store (#21284 ) Since we have not been using `put_small_object_in_memory_store` flag for a long time, it's should be removed.	2022-01-06 14:46:58 +08:00
Archit Kulkarni	fd02065ce5	[CI] [docker] Fix docker image name regex matching (#21409 )	2022-01-05 18:59:10 -08:00
Qing Wang	3c68370fcf	[Core] Cache job_configs instead of ray_namespace. (#21279 ) We need to get not only ray_namespace config of a job. In this PR, we cache the job_configs instead of ray_namespaces, so that we can use it for other PR(For example, this PR #21249 needs the num_java_worker_pre_process item). Also, before this PR, ray_namespaces_ cache will not be cleared, and we clear the cache in this PR.	2022-01-05 17:48:06 -08:00
xwjiang2010	9528ac62cd	[tune] remove unused return_or_clean_cached_pg. (#21403 ) Unused code path.	2022-01-05 23:20:43 +00:00
Clark Zinzow	da4cc26449	[CI] Disable Java log rotation test. (#21394 )	2022-01-05 14:51:27 -08:00
Gagandeep Singh	62c9fc95ea	[CI] [Serve] Unskipped test and bumped wait time to avoid race condition in test_deploy.py (#21382 )	2022-01-05 14:28:42 -08:00
Ian Rodney	1b42a49e71	[CI] [Docker Build] Allow Branches with Double digits in regex matching(#21401 )	2022-01-05 14:19:19 -08:00
Simon Mo	f16b422062	[CI] Migrate Windows Wheels to Buildkite (#21388 )	2022-01-05 12:49:19 -08:00
Jiajun Yao	76b91efd9b	Fix wrong many_nodes_actor_test app config (#21404 ) RAY_GCS_ACTOR_SCHEDULING_ENABLED is wrong should be RAY_gcs_actor_scheduling_enabled. Since gcs based actor scheduling is not enabled yet so I just removed this flag.	2022-01-05 11:52:13 -08:00
Yi Cheng	72c9fef5f3	[nightly] Enable GCS HA nightly test with bootstrap (#21389 ) After https://github.com/ray-project/ray/pull/21232 we are able to start ray without redis. We need to bake the test for a while before turning on the flag by default. This PR add tests for this.	2022-01-05 10:53:07 -08:00
mwtian	24da654d90	[Test] Shard "Small & Large" tests (#21351 )	2022-01-05 10:49:14 -08:00
Sven Mika	853d10871c	[RLlib] Issue 18499: PGTrainer with training_iteration fn does not support multi-GPU. (#21376 )	2022-01-05 18:22:33 +01:00
Lixin Wei	64a2ba47d3	[Core] Rename PublisherService to SubscriberService (#20666 ) `PublisherClient` is a more reasonable name than `SubscriberClient` since XClient means ‘client used to access X’, like GcsClient. Besides, in the current codebase we already called this client `publisher_client`(line 329/333), while the actual class name is `SubscriberClient`, this is inconsistent. `a8d7897a56/src/ray/pubsub/subscriber.cc (L326-L339)`	2022-01-05 05:40:45 -08:00
Sven Mika	9e6b871739	[RLlib] Better utils for flattening complex inputs and enable prev-actions for LSTM/attention for complex action spaces. (#21330 )	2022-01-05 11:29:44 +01:00
SangBin Cho	94af7ccc92	[Actor exception message improvement] Unify the schema + improve error messages. (#21219 ) This PR is added to handle this comment; https://github.com/ray-project/ray/pull/20903#discussion_r772635662 The PR - Unifies the multiple actor died error to a single schema. (cannot unify runtime env or creation task exception) - Improve each of actor error message to include more metadata. - Include actor information to actor death cause.	2022-01-04 23:22:57 -08:00
mwtian	70db5c5592	[GCS][Bootstrap n/n] Do not start Redis in GCS bootstrapping mode (#21232 ) After this change in GCS bootstrapping mode, Redis no longer starts and `address` is treated as the GCS address of the Ray cluster. Co-authored-by: Yi Cheng <chengyidna@gmail.com> Co-authored-by: Yi Cheng <74173148+iycheng@users.noreply.github.com>	2022-01-04 23:06:44 -08:00
Philip Pilgerstorfer	8884cf0f4f	[Java] Bump log4j 2.17.0 to 2.17.1 (#21373 ) New log4j version fixes vulnerability: * https://nvd.nist.gov/vuln/detail/CVE-2021-44832	2022-01-05 09:58:48 +08:00
Qing Wang	240e6efe21	[Java] Try to fix flaky NamespaceTest (#21370 )	2022-01-05 09:01:34 +08:00
Gagandeep Singh	819e034023	Unskipped `test_reconfigure_with_exception` & `test_deploy_handle_validation` (#21374 ) These two tests pass without issues on my Windows machine. Rest time out or fail.	2022-01-04 12:58:11 -08:00

1 2 3 4 5 ...

10885 commits