hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 02:21:39 -05:00

Author	SHA1	Message	Date
chenk008	74fa267c72	Enable worker in container CI test (#20174 )	2021-11-11 16:11:06 -08:00
mwtian	0330852baf	[Core][Pubsub] Implement Python GCS publisher and subscriber (#20111 ) ## Why are these changes needed? This change adds Python publisher and subscriber in `gcs_utils.py`, and GRPC handler on GCS for publishing iva GCS. Error info is migrated to use the GCS-based pubsub, if feature flag `RAY_gcs_grpc_based_pubsub=true`. Also, add a `--gcs-address` flag to some Python processes. It is not set anywhere yet, but will be set aftering Redis-less bootstrapping work. Unit tests are added for the Python publisher and subscriber. Migrated error info publishers and subscribers are tested with existing unit tests, e.g. tests calling `ray._private.test_utils.get_error_message()` to ensure error info is published. GCS based pubsub has gaps in handling deadline, cancelled requests and GCS restarts. So 3 more unit tests are disabled in the `HA GCS` mode. They will be addressed in a separate change. ## Related issue number	2021-11-11 14:59:57 -08:00
xwjiang2010	883fbd003c	[CI; Tune] Split Tune tests and examples (#20210 ) * Split Tune tests and examples part 1 into tests and examples separate. * fix typo. * fix typo. * Add docs.	2021-11-11 10:50:51 +01:00
Sven Mika	ebd56b57db	[RLlib; documentation] "RLlib in 60sec" overhaul. (#20215 )	2021-11-10 22:20:06 +01:00
matthewdeng	78e9ff7c91	[train][datasets] add example for big data training (#20042 ) * [train][datasets] add example for big data training * add title docstring * lint and dependencies * add dask_ml requirement	2021-11-05 09:28:48 -07:00
Sven Mika	50c30f89c6	[Tune; RLlib] Move Tune tests that use RLlib into separate buildkite job. (#20016 )	2021-11-04 20:40:57 +01:00
Yi Cheng	65d3054a09	[build] fix the wrong flag for gcs ha test (#20052 ) ## Why are these changes needed? It should be `RAY_gcs_grpc_based_pubsub` instead of `Ray_gcs_grpc_based_pubsub` ## Related issue number	2021-11-04 09:59:11 -07:00
Sven Mika	4cb23d1c95	[Tune; Testing] Revert to 3.7 (undone by accident by previous PR); + some minor comment cleanups. (#20031 )	2021-11-04 10:58:34 +01:00
mwtian	a26474156d	Use GCC 9 in GPU docker (#20024 )	2021-11-03 22:53:17 -07:00
mwtian	f83195a1e1	[Build] Add GCS HA builds (#20008 ) ## Why are these changes needed? Add builds for Python tests with GCS pubsub enabled. ## Related issue number	2021-11-03 11:58:16 -07:00
Jiajun Yao	5de4a38948	[CI] Run Java CI on Mac (#19757 ) Why are these changes needed? Enable Java tests on Mac CI to avoid more breakages. Related issue number Closes #19700	2021-11-03 23:40:05 +08:00
Avnish Narayan	026bf01071	[RLlib] Upgrade gym version to 0.21 and deprecate pendulum-v0. (#19535 ) * Fix QMix, SAC, and MADDPA too. * Unpin gym and deprecate pendulum v0 Many tests in rllib depended on pendulum v0, however in gym 0.21, pendulum v0 was deprecated in favor of pendulum v1. This may change reward thresholds, so will have to potentially rerun all of the pendulum v1 benchmarks, or use another environment in favor. The same applies to frozen lake v0 and frozen lake v1 Lastly, all of the RLlib tests and have been moved to python 3.7 * Add gym installation based on python version. Pin python<= 3.6 to gym 0.19 due to install issues with atari roms in gym 0.20 * Reformatting * Fixing tests * Move atari-py install conditional to req.txt * migrate to new ale install method * Fix QMix, SAC, and MADDPA too. * Unpin gym and deprecate pendulum v0 Many tests in rllib depended on pendulum v0, however in gym 0.21, pendulum v0 was deprecated in favor of pendulum v1. This may change reward thresholds, so will have to potentially rerun all of the pendulum v1 benchmarks, or use another environment in favor. The same applies to frozen lake v0 and frozen lake v1 Lastly, all of the RLlib tests and have been moved to python 3.7 * Add gym installation based on python version. Pin python<= 3.6 to gym 0.19 due to install issues with atari roms in gym 0.20 Move atari-py install conditional to req.txt migrate to new ale install method Make parametric_actions_cartpole return float32 actions/obs Adding type conversions if obs/actions don't match space Add utils to make elements match gym space dtypes Co-authored-by: Jun Gong <jungong@anyscale.com> Co-authored-by: sven1977 <svenmika1977@gmail.com>	2021-11-03 16:24:00 +01:00
Sven Mika	e6ae08f416	[RLlib] Optionally don't drop last ts in v-trace calculations (APPO and IMPALA). (#19601 )	2021-11-03 10:01:34 +01:00
Sven Mika	2d24ef0d32	[RLlib] Add all simple learning tests as `framework=tf2`. (#19273 ) * Unpin gym and deprecate pendulum v0 Many tests in rllib depended on pendulum v0, however in gym 0.21, pendulum v0 was deprecated in favor of pendulum v1. This may change reward thresholds, so will have to potentially rerun all of the pendulum v1 benchmarks, or use another environment in favor. The same applies to frozen lake v0 and frozen lake v1 Lastly, all of the RLlib tests and Tune tests have been moved to python 3.7 * fix tune test_sampler::testSampleBoundsAx * fix re-install ray for py3.7 tests Co-authored-by: avnishn <avnishn@uw.edu>	2021-11-02 12:10:17 +01:00
xwjiang2010	c48d86e469	[CI] change git protocol to use https. (#19964 )	2021-11-01 19:38:58 -07:00
mwtian	7afdfdc6dd	[CI] narrow down tests that run when files change (#19656 )	2021-10-29 16:47:54 -07:00
matthewdeng	bfb0ef1b08	move jsonschema to core dependencies and update default AutoscalerPrometheusMetrics (#19831 )	2021-10-28 13:04:22 -07:00
Simon Mo	5e927b01ad	Revert "[CI] Remove config that disables Bazel test result cache" (#19818 ) * Revert "[CI] Remove config that disables Bazel test result cache (#18701)" This reverts commit `098ff36faa`. * Remove all RLlib tests from BUILD that currently fail. Co-authored-by: sven1977 <svenmika1977@gmail.com>	2021-10-28 15:54:53 +02:00
Amog Kamsetty	db863aafc0	Revert "Revert "[Docker] Support multiple CUDA Versions (#19505 )" (#19756 )" (#19763 ) This reverts commit `e58fcca404`.	2021-10-26 17:32:56 -07:00
Amog Kamsetty	e58fcca404	Revert "[Docker] Support multiple CUDA Versions (#19505 )" (#19756 ) This reverts commit `f0053d405b`.	2021-10-26 12:55:20 -07:00
Avnish Narayan	ad87ddf93e	[rllib] Add deterministic test to gpu (#19306 ) Co-authored-by: sven1977 <svenmika1977@gmail.com>	2021-10-26 10:11:39 -07:00
Amog Kamsetty	f0053d405b	[Docker] Support multiple CUDA Versions (#19505 ) * wip * wip * update * finish * deprecate * debug * fix and address comments * try catch * fix * split tests * force * merge * docs * wip * fix and check * update readme * fix * fix * fix sanity checking * format	2021-10-25 18:57:05 -07:00
Jiajun Yao	256bf0bf3a	[Release] Bump up dask to latest compatible version 2021.9.1 (#19592 ) * Bump up dask to latest compatible version 2021.9.1 * Bump up dask to latest compatible version 2021.9.1	2021-10-22 09:16:28 -07:00
Simon Mo	03805d4064	[Serve] Good error message when Serve not installed and ensure Serve installs ray[default] (#19570 )	2021-10-21 13:47:29 -07:00
mwtian	098ff36faa	[CI] Remove config that disables Bazel test result cache (#18701 )	2021-10-19 13:31:42 -07:00
architkulkarni	b8941338d3	[runtime env] Raise error when creating runtime env when ray[default] is not installed (#19491 )	2021-10-19 09:16:04 -05:00
matthewdeng	4674c78050	[Train] Rename Ray SGD v2 to Ray Train (#19436 )	2021-10-18 22:27:46 -07:00
Antoni Baum	e9df253f5d	[CI/docs] Remove [default] from xgboost-ray (#19186 ) Co-authored-by: Kai Fricke <kai@anyscale.com>	2021-10-14 16:29:55 +01:00
Kai Fricke	d8d8901192	[ci/tune] Remove deprecated `jenkins_only` tag from test tags (#19287 )	2021-10-12 10:05:46 +01:00
Matti Picus	9ca34c7192	add dependencies to BUILD.bazel and update windows bazel to 4.2.1 (#19132 ) * add dependencies to BUILD.bazel and update windows bazel to 4.2.1 * fixes from review	2021-10-11 10:25:19 -07:00
SangBin Cho	0ef0d9a77d	Revert "[core] Assign tasks to the first available worker (#18167 )" (#19180 ) This reverts commit `545db13800`.	2021-10-07 10:38:37 -07:00
Stephanie Wang	545db13800	[core] Assign tasks to the first available worker (#18167 ) * Convert worker pool to queue * Start up to backlog size more workers * fixes * Prestart workers according to num available CPUs * lint * x * Update src/ray/raylet/worker_pool.h Co-authored-by: Eric Liang <ekhliang@gmail.com> * Update src/ray/raylet/worker_pool.h Co-authored-by: Eric Liang <ekhliang@gmail.com> * dedicated workers * Fix tests * x * fix * asan * asan * Workers can only exec tasks with same job ID * size_t for runtime env hash, fix unit tests * include job ID in runtime env hash, remove from worker registration msg * x * conflict * debug * Schedule and dispatch periodically, skip if no new tasks * Update src/ray/common/task/task_spec.h Co-authored-by: Eric Liang <ekhliang@gmail.com> * Update src/ray/raylet/scheduling/cluster_task_manager.h Co-authored-by: Eric Liang <ekhliang@gmail.com> * Update src/ray/raylet/worker_pool.h Co-authored-by: Eric Liang <ekhliang@gmail.com> Co-authored-by: Eric Liang <ekhliang@gmail.com>	2021-10-05 13:45:50 -07:00
Kai Fricke	3dc176c42e	[ci/tune] Add SGD and Tune GPU pipeline step to CI (#18469 ) * [ci/tune] Add Tune GPU pipeline step to CI * cont. * add sgd gpu tests * format yaml, fix imports * install horovod; fix line wrapping * set GPU per worker to 0.5 * fix import * move test to 4gpu machine * fix lint * lint * set visible devices * pull in tf gpu fix * Fix Tune GPU pipeline step * nit * Disable GPU tests until we have some * Re-add empty rllib tests Co-authored-by: Matthew Deng <matthew.j.deng@gmail.com>	2021-10-01 18:34:05 -07:00
architkulkarni	0f0b161ea1	Revert "Revert "[Serve] [doc] Improve runtime env doc"" (#18943 ) * Revert "Revert "[Serve] [doc] Improve runtime env doc (#18782)" (#18935)" This reverts commit `e4f4c79252`.	2021-09-30 13:28:44 -05:00
Yi Cheng	e4f4c79252	Revert "[Serve] [doc] Improve runtime env doc (#18782 )" (#18935 ) This reverts commit `d4d71985d5`.	2021-09-27 21:52:13 -07:00
architkulkarni	d4d71985d5	[Serve] [doc] Improve runtime env doc (#18782 )	2021-09-27 16:12:03 -05:00
mwtian	43ac18bbc0	[Build] include minimal debug info in C++ build; upgrade clang-format to 12 (#18888 ) * Revert "Revert "[Build] include minimal debug info in C++ build; upgrade clang-format to 12 (#18840)" (#18886)" This reverts commit `f851a072f3`. * use gcc 8	2021-09-24 17:59:05 -07:00
Chen Shen	f851a072f3	Revert "[Build] include minimal debug info in C++ build; upgrade clang-format to 12 (#18840 )" (#18886 ) This reverts commit `07e1366383`.	2021-09-24 12:55:08 -07:00
mwtian	07e1366383	[Build] include minimal debug info in C++ build; upgrade clang-format to 12 (#18840 ) * debug info and clang-format * doc * fix * no clang-format on all files * gcc * keep gcc 7	2021-09-24 12:26:33 -07:00
Chen Shen	35aa944ef4	Fix thread-safety in global state accessor (#18746 )	2021-09-19 12:01:31 -07:00
mwtian	efdbfcfdfb	[Build] Generate Bazel config for compiling with clang and libc++ in CI (#18622 ) * Add Bazel config for building with llvm. Upgrade C++ std to 17. * Fix redis. Try fixing asan and tsan * Fix asan and format * Update comments. Co-authored-by: Chen Shen <scv119@gmail.com>	2021-09-17 19:01:07 -07:00
Sven Mika	8a72824c63	[RLlib Testig] Split and unflake more CI tests (make sure all jobs are < 30min). (#18591 )	2021-09-15 22:16:48 +02:00
Antoni Baum	7e95f330d5	[ci] Fix xgboost_ray install from git (#18640 )	2021-09-15 18:07:15 +01:00
Edward Oakes	7736cdd91d	[dashboard] Rename "new_dashboard" -> "dashboard" (#18214 )	2021-09-15 11:17:15 -05:00
Antoni Baum	eeb67a42cc	pip install xgboost_ray -> xgboost_ray[default] (#18607 ) Co-authored-by: Kai Fricke <kai@anyscale.com>	2021-09-15 14:45:56 +01:00
Simon Mo	497c5f56fa	[CI] Temporary disable worker-in-container test (#18606 ) * revert again * disable tmp	2021-09-14 22:38:20 -07:00
SangBin Cho	0684531e22	[Test] Break down placement group tests (#18612 )	2021-09-14 21:55:18 -07:00
mwtian	a3f399ef10	[Client] fix propagating errors to async calls during disconnect, and other cleanup (#18539 ) * cleanup tests and errors for clients * Fix lock and async get * rerun * Avoid running callback under lock. Make lock non-reentrant * Add all necessary apis * Removed unused APIs	2021-09-14 18:48:27 +03:00
Yi Cheng	7d1f408de9	[workflow] Move `experimental/workflow` to `workflow` (#18521 )	2021-09-13 17:45:18 -07:00
Chen Shen	5f57079041	use clang for C++ debug testing (#18343 )	2021-09-09 15:48:36 -07:00

1 2 3 4 5

236 commits