hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-05 18:11:42 -05:00

Author	SHA1	Message	Date
Antoni Baum	e9df253f5d	[CI/docs] Remove [default] from xgboost-ray (#19186 ) Co-authored-by: Kai Fricke <kai@anyscale.com>	2021-10-14 16:29:55 +01:00
Kai Fricke	9cee83c919	[tune] PBT: Add burn-in period (#19321 )	2021-10-14 16:28:29 +01:00
Edward Oakes	888fb24c25	Remove deprecated ray.services package (#18475 ) Co-authored-by: Kai Fricke <kai@anyscale.com>	2021-10-14 16:28:16 +01:00
Kai Fricke	312dc369a7	Revert "[Hotfix] Revert "[tune/wip] Exclude trial checkpoints in experiment sync"" (#19285 ) This reverts commit `a92f1fedf4`. and fixes the failing test	2021-10-14 11:18:48 +01:00
Qing Wang	2cc164e616	[Java] Fix incompleted core worker dynamic library. (#19342 ) * Fix incompleted core worker dynamic library. * Fix lint.	2021-10-14 14:42:05 +08:00
mwtian	12100015d9	[Lint] Disable `modernize-use-override` (#19368 ) This lint rule cannot apply only to changed lines because currently Ray has `-Winconsistent-missing-override` as a build flag. Either all or none of member functions from a derived class can have the `override` / `final` annocation.	2021-10-13 20:20:08 -07:00
Carlo Grisetti	5cee8a1985	[release tests] Switch from yaml.load to yaml.safe_load (#19365 )	2021-10-13 17:27:25 -07:00
Edward Oakes	2ac81f336a	[serve] Remove BackendConfig broadcasting (#19154 )	2021-10-13 16:25:34 -07:00
Chen Shen	b8c201b7cb	[Core][CoreWorker] Make WorkerContext thread safe, fix race condition. #19343 Why are these changes needed? The theory around #19270 is there are two create actor requests sent to the same threaded actor due to retry logic. Specifically: the first request comes and calls CoreWorkerDirectTaskReceiver::HandleTask, it's queued to be executed by thread pool; then the second request comes and calls CoreWorkerDirectTaskReceiver::HandleTask again, before first request being executed and calls worker_context_.SetCurrentTask; this fails the current dedupe logic and leads to SetMaxActorConcurrency be called twice, which fails the RAY_CHECK. In this PR, we fix the dedupe logic by adding SetCurrentActorId and calling it in the task execution thread. this ensures the dedupe logic works for threaded actor. we also noticed that the WorkerContext is actually not thread safe in threaded actors, thus make it thread safe in this PR as well. Related issue number Closes #19270 Checks I've run scripts/format.sh to lint the changes in this PR. I've included any doc changes needed for https://docs.ray.io/en/master/. I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ Testing Strategy Unit tests Release tests This PR is not tested :(	2021-10-13 16:12:36 -07:00
Linsong Chu	b86a5fcb96	[workflow] fix workflow user metadata return when None is given (#19356 ) ## Why are these changes needed? Quick fix for metadata put. Currently when workflow-level metadata is not given, it will output `null` to `user_run_metadata.json`, this fix will make it output `{}`. ## Related issue number original issue: https://github.com/ray-project/ray/issues/17090 original PR: https://github.com/ray-project/ray/pull/19195	2021-10-13 15:52:12 -07:00
Yi Cheng	1dc03cd49d	[nightly] Put many nodes actor test back (#19313 ) ## Why are these changes needed? There are two issues fixed in this PR: - make sure wait for session count alive node - upgrade the machine to match what's tested in oss ray. ## Related issue number https://github.com/ray-project/ray/issues/19084	2021-10-13 15:51:12 -07:00
matthewdeng	d998373968	[release] fix test by pinning filelock (#19334 ) Co-authored-by: Kai Fricke <kai@anyscale.com>	2021-10-13 22:27:04 +01:00
architkulkarni	b0716f66ae	[runtime env] Fix handling of runtime env with None fields (#19300 )	2021-10-13 13:57:55 -07:00
Jiao	893f76daf9	[serve] Add serve FT nightly test to buildkite (#19361 )	2021-10-13 13:56:55 -07:00
Antoni Baum	3cb0862152	Fix double gym in requirements (#19357 )	2021-10-13 21:43:41 +01:00
Omkar Pangarkar	f1b9b16ae9	[tune] Fix `DistributedTrainable` restore (#19349 )	2021-10-13 21:29:05 +01:00
Carlo Grisetti	da7a485786	[Windows] use dynamic temp path (#19096 )	2021-10-13 13:02:45 -04:00
hazeone	c2f0035fd2	[Java]Support getGpuIds API (#19031 ) Add java getGpuIds() API which is the same as get_gpu_ids in python. We can get deviceId if we've allocated a GPU to a worker.	2021-10-13 23:40:26 +08:00
Kai Fricke	bde9e058da	Revert "[RLlib](deps): Bump tensorflow from 2.5.0 to 2.6.0 in /python/requirements/rllib (#18183 )" (#19351 ) This reverts commit `74ee99ff99`.	2021-10-13 13:06:36 +01:00
Linsong Chu	ce64e6dc45	[workflow] add metadata put in workflow (#19195 ) ## Why are these changes needed? Add metadata to workflow. Currently there is no option for user to attach any metadata to a step or workflow run, and workflow running metrics (except status) are not captured nor checkpointed. We are adding various of metadata including: 1. step-level user metadata. can be set with `step.options(metadata={})` 2. step-level pre-run metadata. this captures pre-run metadata such as step_start_time, more metrics can be added later. 3. step-level post-run metadata. this captures post-run metadata such as step_end_time, more metrics can be added later. 4. workflow-level user metadata. can be set with `workflow.run(metadata={})` 5. workflow-level pre-run metadata. this captures pre-run metadata such as workflow_start_time, more metrics can be added later. 6. workflow-level post-run metadata. this captures post-run metadata such as workflow_end_time, more metrics can be added later. ## Related issue number https://github.com/ray-project/ray/issues/17090 Co-authored-by: Yi Cheng <chengyidna@gmail.com>	2021-10-12 21:01:24 -07:00
Clark Zinzow	1b179adfa1	[Core] [Hotfix] Handle logging redirected to stdout when configuring log file (#19301 )	2021-10-12 19:03:21 -07:00
SangBin Cho	84118c9659	Revert "Revert "[Placement Group] Fix the high load bug from the plac… (#19330 )	2021-10-12 19:02:30 -07:00
Eric Liang	430a5f4a21	[doc] Bump dataset to beta for 1.8 and add backlink to SGD (#19332 )	2021-10-12 18:32:29 -07:00
Clark Zinzow	df6d06bd41	Fix for LazyBlockList refactor. (#19333 )	2021-10-12 18:18:45 -07:00
Jasha10	53e791d136	[Docs] Fix Typo in walkthrough (#19335 ) There is one backtick too many in walkthrough.rst, it's causing a formatting issue.	2021-10-12 17:47:28 -07:00
Amog Kamsetty	09d8049584	[SGD] Make actor creation async (#19325 ) * fix * fix * fix	2021-10-12 16:15:59 -07:00
Jiajun Yao	d99b095eac	Set default max_pending_lease_requests_per_scheduling_category to 1 (#19328 )	2021-10-12 15:59:32 -07:00
Eric Liang	9f1cd9e867	[docs] Document fake multi-node autoscaler (#19329 )	2021-10-12 15:59:07 -07:00
Amog Kamsetty	f6f2435b91	[SGD] Sgd v2 Dataset Integration (#17626 ) * wip * wip * wip * draft * disable tf autosharding * wip * wip * wip * wip * add example * wip * wip * wip * use dataset.split * add unit tests * add linear example * concatenate tensors and fix example * WIP tune example * add tensorflow example * wip * random_shuffle_each_window * fault tolerance test * GPU, examples, CI * formatting * fix * Update python/ray/util/sgd/v2/tests/test_trainer.py Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> * wip * type hints * wip * update user guide * fix * fix immediate issues * update example * update * fix tune gpu test * fix resources for smoke test - 1 CPU for dataset tasks * update tests, docs, examples * Apply suggestions from code review Co-authored-by: Clark Zinzow <clarkzinzow@gmail.com> * address comments * add warning * fix tests * minor doc updates * update example in doc * configure tests * Update doc/source/raysgd/v2/user_guide.rst Co-authored-by: Clark Zinzow <clarkzinzow@gmail.com> * Update python/ray/data/dataset.py Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> * fix docstring Co-authored-by: Matthew Deng <matthew.j.deng@gmail.com> Co-authored-by: matthewdeng <matt@anyscale.com> Co-authored-by: Clark Zinzow <clarkzinzow@gmail.com>	2021-10-12 14:03:10 -07:00
Carlo Grisetti	7651cc782a	Change prometheus warning filename source (#19275 ) * Change prometheus warning filename source * Fix linting	2021-10-12 14:02:51 -07:00
Eric Liang	8c152bd17c	Revert "[Placement Group] Fix the high load bug from the placement group (#19277 )" (#19327 ) This reverts commit `4360b99803`.	2021-10-12 12:41:51 -07:00
Akash Patel	b897b7b3be	add missing <memory> include (#19083 )	2021-10-12 12:03:07 -07:00
Lixin Wei	f2f9c749cb	[Build] Add an Option to Skip Bazel Build (#19265 )	2021-10-12 12:01:58 -07:00
Yi Cheng	bce6a498f3	Ensure job registered first before return. (#19307 ) ## Why are these changes needed? Before this PR, there is a race condition where: - job register starts - driver start to launch actor - gcs register actor ===> crash - job register ends Actor registration should be forced to be after driver registration. This PR enforces that. ## Related issue number Closes #19172	2021-10-12 11:26:58 -07:00
Eric Liang	0ab6749602	Support iter_epochs for Datasets (#19217 )	2021-10-12 11:05:00 -07:00
SangBin Cho	4360b99803	[Placement Group] Fix the high load bug from the placement group (#19277 )	2021-10-12 11:04:14 -07:00
Clark Zinzow	6ca3c02041	[Datasets] Parallelize Parquet metadata fetches. (#19211 )	2021-10-12 11:02:30 -07:00
dependabot[bot]	74ee99ff99	[RLlib](deps): Bump tensorflow from 2.5.0 to 2.6.0 in /python/requirements/rllib (#18183 ) * [RLlib](deps): Bump tensorflow in /python/requirements/rllib Bumps [tensorflow](https://github.com/tensorflow/tensorflow) from 2.5.0 to 2.6.0. - [Release notes](https://github.com/tensorflow/tensorflow/releases) - [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md) - [Commits](https://github.com/tensorflow/tensorflow/compare/v2.5.0...v2.6.0) --- updated-dependencies: - dependency-name: tensorflow dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * wip. Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: sven1977 <svenmika1977@gmail.com>	2021-10-12 17:56:36 +02:00
SangBin Cho	2c93708324	Migrating to flat hash map [Raylet] (#19220 ) * done * Fix all unit tests * done * . * Fix the build issue * fix the compilation bug	2021-10-12 07:41:51 -07:00
Qing Wang	b6d67d2ba9	Use javac -h instead of javah. (#19311 )	2021-10-12 22:37:14 +08:00
Wansoo Kim	0f6d4661d7	[tune] Port all MNIST examples to specify data_dir (#19033 ) Co-authored-by: Kai Fricke <kai@anyscale.com>	2021-10-12 15:36:06 +01:00
Antoine Galataud	edb338ff7c	[RLlib] Check `training_enabled` on PolicyServer (#19007 )	2021-10-12 16:21:02 +02:00
SangBin Cho	cbbd349df9	add versions as a bug report requirement. (#19282 ) * add versions as a req * .	2021-10-12 07:09:23 -07:00
gjoliver	9226f9bddc	[RLlib] Report timesteps_this_iter to Tune, so it can track/checkpoint/restore total timesteps trained. (#19264 ) * Report timesteps_this_iter to Tune, so it can track/checkpoint/restore total timesteps trained. * Trigger Build * lint	2021-10-12 16:03:41 +02:00
gjoliver	5d14904b9b	[Tune] catch HTTPError when logging to wandb. (#19314 )	2021-10-12 14:38:17 +01:00
Akash Patel	8241a03d31	resolve maybe uninitialized error (#19103 )	2021-10-12 04:06:48 -07:00
Kai Fricke	d8d8901192	[ci/tune] Remove deprecated `jenkins_only` tag from test tags (#19287 )	2021-10-12 10:05:46 +01:00
Chris K. W	35230ea9fa	[client] deflake test_stdout_log_stream (#19232 ) * deflake test_stdout_log_stream * add assert message	2021-10-11 22:22:39 -07:00
architkulkarni	cc16e8f8c5	[runtime env] Validate "excludes" field (#19302 )	2021-10-11 20:05:22 -07:00
Jiajun Yao	a781b10a50	[Release] Centralize c++ ray version string definition (#19297 ) * Centralize c++ ray version string definition * Centralize c++ ray version string definition	2021-10-12 11:09:29 +09:00

1 2 3 4 5 ...

9858 commits