hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-08 19:41:38 -05:00

Author	SHA1	Message	Date
architkulkarni	b0716f66ae	[runtime env] Fix handling of runtime env with None fields (#19300 )	2021-10-13 13:57:55 -07:00
Jiao	893f76daf9	[serve] Add serve FT nightly test to buildkite (#19361 )	2021-10-13 13:56:55 -07:00
Antoni Baum	3cb0862152	Fix double gym in requirements (#19357 )	2021-10-13 21:43:41 +01:00
Omkar Pangarkar	f1b9b16ae9	[tune] Fix `DistributedTrainable` restore (#19349 )	2021-10-13 21:29:05 +01:00
Carlo Grisetti	da7a485786	[Windows] use dynamic temp path (#19096 )	2021-10-13 13:02:45 -04:00
hazeone	c2f0035fd2	[Java]Support getGpuIds API (#19031 ) Add java getGpuIds() API which is the same as get_gpu_ids in python. We can get deviceId if we've allocated a GPU to a worker.	2021-10-13 23:40:26 +08:00
Kai Fricke	bde9e058da	Revert "[RLlib](deps): Bump tensorflow from 2.5.0 to 2.6.0 in /python/requirements/rllib (#18183 )" (#19351 ) This reverts commit `74ee99ff99`.	2021-10-13 13:06:36 +01:00
Linsong Chu	ce64e6dc45	[workflow] add metadata put in workflow (#19195 ) ## Why are these changes needed? Add metadata to workflow. Currently there is no option for user to attach any metadata to a step or workflow run, and workflow running metrics (except status) are not captured nor checkpointed. We are adding various of metadata including: 1. step-level user metadata. can be set with `step.options(metadata={})` 2. step-level pre-run metadata. this captures pre-run metadata such as step_start_time, more metrics can be added later. 3. step-level post-run metadata. this captures post-run metadata such as step_end_time, more metrics can be added later. 4. workflow-level user metadata. can be set with `workflow.run(metadata={})` 5. workflow-level pre-run metadata. this captures pre-run metadata such as workflow_start_time, more metrics can be added later. 6. workflow-level post-run metadata. this captures post-run metadata such as workflow_end_time, more metrics can be added later. ## Related issue number https://github.com/ray-project/ray/issues/17090 Co-authored-by: Yi Cheng <chengyidna@gmail.com>	2021-10-12 21:01:24 -07:00
Clark Zinzow	1b179adfa1	[Core] [Hotfix] Handle logging redirected to stdout when configuring log file (#19301 )	2021-10-12 19:03:21 -07:00
SangBin Cho	84118c9659	Revert "Revert "[Placement Group] Fix the high load bug from the plac… (#19330 )	2021-10-12 19:02:30 -07:00
Eric Liang	430a5f4a21	[doc] Bump dataset to beta for 1.8 and add backlink to SGD (#19332 )	2021-10-12 18:32:29 -07:00
Clark Zinzow	df6d06bd41	Fix for LazyBlockList refactor. (#19333 )	2021-10-12 18:18:45 -07:00
Jasha10	53e791d136	[Docs] Fix Typo in walkthrough (#19335 ) There is one backtick too many in walkthrough.rst, it's causing a formatting issue.	2021-10-12 17:47:28 -07:00
Amog Kamsetty	09d8049584	[SGD] Make actor creation async (#19325 ) * fix * fix * fix	2021-10-12 16:15:59 -07:00
Jiajun Yao	d99b095eac	Set default max_pending_lease_requests_per_scheduling_category to 1 (#19328 )	2021-10-12 15:59:32 -07:00
Eric Liang	9f1cd9e867	[docs] Document fake multi-node autoscaler (#19329 )	2021-10-12 15:59:07 -07:00
Amog Kamsetty	f6f2435b91	[SGD] Sgd v2 Dataset Integration (#17626 ) * wip * wip * wip * draft * disable tf autosharding * wip * wip * wip * wip * add example * wip * wip * wip * use dataset.split * add unit tests * add linear example * concatenate tensors and fix example * WIP tune example * add tensorflow example * wip * random_shuffle_each_window * fault tolerance test * GPU, examples, CI * formatting * fix * Update python/ray/util/sgd/v2/tests/test_trainer.py Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> * wip * type hints * wip * update user guide * fix * fix immediate issues * update example * update * fix tune gpu test * fix resources for smoke test - 1 CPU for dataset tasks * update tests, docs, examples * Apply suggestions from code review Co-authored-by: Clark Zinzow <clarkzinzow@gmail.com> * address comments * add warning * fix tests * minor doc updates * update example in doc * configure tests * Update doc/source/raysgd/v2/user_guide.rst Co-authored-by: Clark Zinzow <clarkzinzow@gmail.com> * Update python/ray/data/dataset.py Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> * fix docstring Co-authored-by: Matthew Deng <matthew.j.deng@gmail.com> Co-authored-by: matthewdeng <matt@anyscale.com> Co-authored-by: Clark Zinzow <clarkzinzow@gmail.com>	2021-10-12 14:03:10 -07:00
Carlo Grisetti	7651cc782a	Change prometheus warning filename source (#19275 ) * Change prometheus warning filename source * Fix linting	2021-10-12 14:02:51 -07:00
Eric Liang	8c152bd17c	Revert "[Placement Group] Fix the high load bug from the placement group (#19277 )" (#19327 ) This reverts commit `4360b99803`.	2021-10-12 12:41:51 -07:00
Akash Patel	b897b7b3be	add missing <memory> include (#19083 )	2021-10-12 12:03:07 -07:00
Lixin Wei	f2f9c749cb	[Build] Add an Option to Skip Bazel Build (#19265 )	2021-10-12 12:01:58 -07:00
Yi Cheng	bce6a498f3	Ensure job registered first before return. (#19307 ) ## Why are these changes needed? Before this PR, there is a race condition where: - job register starts - driver start to launch actor - gcs register actor ===> crash - job register ends Actor registration should be forced to be after driver registration. This PR enforces that. ## Related issue number Closes #19172	2021-10-12 11:26:58 -07:00
Eric Liang	0ab6749602	Support iter_epochs for Datasets (#19217 )	2021-10-12 11:05:00 -07:00
SangBin Cho	4360b99803	[Placement Group] Fix the high load bug from the placement group (#19277 )	2021-10-12 11:04:14 -07:00
Clark Zinzow	6ca3c02041	[Datasets] Parallelize Parquet metadata fetches. (#19211 )	2021-10-12 11:02:30 -07:00
dependabot[bot]	74ee99ff99	[RLlib](deps): Bump tensorflow from 2.5.0 to 2.6.0 in /python/requirements/rllib (#18183 ) * [RLlib](deps): Bump tensorflow in /python/requirements/rllib Bumps [tensorflow](https://github.com/tensorflow/tensorflow) from 2.5.0 to 2.6.0. - [Release notes](https://github.com/tensorflow/tensorflow/releases) - [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md) - [Commits](https://github.com/tensorflow/tensorflow/compare/v2.5.0...v2.6.0) --- updated-dependencies: - dependency-name: tensorflow dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * wip. Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: sven1977 <svenmika1977@gmail.com>	2021-10-12 17:56:36 +02:00
SangBin Cho	2c93708324	Migrating to flat hash map [Raylet] (#19220 ) * done * Fix all unit tests * done * . * Fix the build issue * fix the compilation bug	2021-10-12 07:41:51 -07:00
Qing Wang	b6d67d2ba9	Use javac -h instead of javah. (#19311 )	2021-10-12 22:37:14 +08:00
Wansoo Kim	0f6d4661d7	[tune] Port all MNIST examples to specify data_dir (#19033 ) Co-authored-by: Kai Fricke <kai@anyscale.com>	2021-10-12 15:36:06 +01:00
Antoine Galataud	edb338ff7c	[RLlib] Check `training_enabled` on PolicyServer (#19007 )	2021-10-12 16:21:02 +02:00
SangBin Cho	cbbd349df9	add versions as a bug report requirement. (#19282 ) * add versions as a req * .	2021-10-12 07:09:23 -07:00
gjoliver	9226f9bddc	[RLlib] Report timesteps_this_iter to Tune, so it can track/checkpoint/restore total timesteps trained. (#19264 ) * Report timesteps_this_iter to Tune, so it can track/checkpoint/restore total timesteps trained. * Trigger Build * lint	2021-10-12 16:03:41 +02:00
gjoliver	5d14904b9b	[Tune] catch HTTPError when logging to wandb. (#19314 )	2021-10-12 14:38:17 +01:00
Akash Patel	8241a03d31	resolve maybe uninitialized error (#19103 )	2021-10-12 04:06:48 -07:00
Kai Fricke	d8d8901192	[ci/tune] Remove deprecated `jenkins_only` tag from test tags (#19287 )	2021-10-12 10:05:46 +01:00
Chris K. W	35230ea9fa	[client] deflake test_stdout_log_stream (#19232 ) * deflake test_stdout_log_stream * add assert message	2021-10-11 22:22:39 -07:00
architkulkarni	cc16e8f8c5	[runtime env] Validate "excludes" field (#19302 )	2021-10-11 20:05:22 -07:00
Jiajun Yao	a781b10a50	[Release] Centralize c++ ray version string definition (#19297 ) * Centralize c++ ray version string definition * Centralize c++ ray version string definition	2021-10-12 11:09:29 +09:00
Jiao	85b8a6de5f	[Serve] Add nightly test for Serve failure recovery (#19125 )	2021-10-11 18:33:20 -07:00
Carlo Grisetti	c2377fb725	[Serve] Call without loop parameter if python 3.10+ (#19298 )	2021-10-11 18:31:13 -07:00
Eric Liang	6cacc54774	[RFC] Fake multi-node mode for autoscaler (#18987 )	2021-10-11 18:27:29 -07:00
architkulkarni	1ee3b4136c	[Serve] [Doc] Serve fix tracing snippet (#19296 )	2021-10-11 16:59:04 -07:00
SangBin Cho	0d7a7a06c0	[Placement group] Warm up the cluster before running the unit test #19286 (#19286 )	2021-10-11 16:26:52 -07:00
Carlo Grisetti	2d0355548e	[Dashboard] Try to work around aiohttp 4.0.0 breaking changes (#19120 )	2021-10-11 16:25:52 -07:00
Patrick Ames	a43193b9e5	[data] Add support for Arrow open input/output stream kwargs. (#19197 )	2021-10-11 15:38:15 -07:00
Chen Shen	c740aae54c	[Core][Dataset] adding example for large scale data ingestion (#18998 )	2021-10-11 15:37:09 -07:00
Jiajun Yao	92516981ea	[core] Increase worker lease parallelism (#18647 )	2021-10-11 15:34:32 -07:00
Matti Picus	9ca34c7192	add dependencies to BUILD.bazel and update windows bazel to 4.2.1 (#19132 ) * add dependencies to BUILD.bazel and update windows bazel to 4.2.1 * fixes from review	2021-10-11 10:25:19 -07:00
Amog Kamsetty	b3ad72643c	[Tune] Call on_trial_complete after final checkpoint (#19243 )	2021-10-11 09:47:39 -07:00
Kai Fricke	6252a6c1f9	[tune] Force no result buffering for hyperband schedulers (#19140 )	2021-10-11 16:56:11 +01:00

1 2 3 4 5 ...

9946 commits