architkulkarni
b0716f66ae
[runtime env] Fix handling of runtime env with None fields ( #19300 )
2021-10-13 13:57:55 -07:00
Jiao
893f76daf9
[serve] Add serve FT nightly test to buildkite ( #19361 )
2021-10-13 13:56:55 -07:00
Antoni Baum
3cb0862152
Fix double gym in requirements ( #19357 )
2021-10-13 21:43:41 +01:00
Omkar Pangarkar
f1b9b16ae9
[tune] Fix DistributedTrainable
restore ( #19349 )
2021-10-13 21:29:05 +01:00
Carlo Grisetti
da7a485786
[Windows] use dynamic temp path ( #19096 )
2021-10-13 13:02:45 -04:00
hazeone
c2f0035fd2
[Java]Support getGpuIds API ( #19031 )
...
Add java getGpuIds() API which is the same as get_gpu_ids in python. We can get deviceId if we've allocated a GPU to a worker.
2021-10-13 23:40:26 +08:00
Kai Fricke
bde9e058da
Revert "[RLlib](deps): Bump tensorflow from 2.5.0 to 2.6.0 in /python/requirements/rllib ( #18183 )" ( #19351 )
...
This reverts commit 74ee99ff99
.
2021-10-13 13:06:36 +01:00
Linsong Chu
ce64e6dc45
[workflow] add metadata put in workflow ( #19195 )
...
## Why are these changes needed?
Add metadata to workflow. Currently there is no option for user to attach any metadata to a step or workflow run, and workflow running metrics (except status) are not captured nor checkpointed.
We are adding various of metadata including:
1. step-level user metadata. can be set with `step.options(metadata={})`
2. step-level pre-run metadata. this captures pre-run metadata such as step_start_time, more metrics can be added later.
3. step-level post-run metadata. this captures post-run metadata such as step_end_time, more metrics can be added later.
4. workflow-level user metadata. can be set with `workflow.run(metadata={})`
5. workflow-level pre-run metadata. this captures pre-run metadata such as workflow_start_time, more metrics can be added later.
6. workflow-level post-run metadata. this captures post-run metadata such as workflow_end_time, more metrics can be added later.
## Related issue number
https://github.com/ray-project/ray/issues/17090
Co-authored-by: Yi Cheng <chengyidna@gmail.com>
2021-10-12 21:01:24 -07:00
Clark Zinzow
1b179adfa1
[Core] [Hotfix] Handle logging redirected to stdout when configuring log file ( #19301 )
2021-10-12 19:03:21 -07:00
SangBin Cho
84118c9659
Revert "Revert "[Placement Group] Fix the high load bug from the plac… ( #19330 )
2021-10-12 19:02:30 -07:00
Eric Liang
430a5f4a21
[doc] Bump dataset to beta for 1.8 and add backlink to SGD ( #19332 )
2021-10-12 18:32:29 -07:00
Clark Zinzow
df6d06bd41
Fix for LazyBlockList refactor. ( #19333 )
2021-10-12 18:18:45 -07:00
Jasha10
53e791d136
[Docs] Fix Typo in walkthrough ( #19335 )
...
There is one backtick too many in walkthrough.rst, it's causing a formatting issue.
2021-10-12 17:47:28 -07:00
Amog Kamsetty
09d8049584
[SGD] Make actor creation async ( #19325 )
...
* fix
* fix
* fix
2021-10-12 16:15:59 -07:00
Jiajun Yao
d99b095eac
Set default max_pending_lease_requests_per_scheduling_category to 1 ( #19328 )
2021-10-12 15:59:32 -07:00
Eric Liang
9f1cd9e867
[docs] Document fake multi-node autoscaler ( #19329 )
2021-10-12 15:59:07 -07:00
Amog Kamsetty
f6f2435b91
[SGD] Sgd v2 Dataset Integration ( #17626 )
...
* wip
* wip
* wip
* draft
* disable tf autosharding
* wip
* wip
* wip
* wip
* add example
* wip
* wip
* wip
* use dataset.split
* add unit tests
* add linear example
* concatenate tensors and fix example
* WIP tune example
* add tensorflow example
* wip
* random_shuffle_each_window
* fault tolerance test
* GPU, examples, CI
* formatting
* fix
* Update python/ray/util/sgd/v2/tests/test_trainer.py
Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
* wip
* type hints
* wip
* update user guide
* fix
* fix immediate issues
* update example
* update
* fix tune gpu test
* fix resources for smoke test - 1 CPU for dataset tasks
* update tests, docs, examples
* Apply suggestions from code review
Co-authored-by: Clark Zinzow <clarkzinzow@gmail.com>
* address comments
* add warning
* fix tests
* minor doc updates
* update example in doc
* configure tests
* Update doc/source/raysgd/v2/user_guide.rst
Co-authored-by: Clark Zinzow <clarkzinzow@gmail.com>
* Update python/ray/data/dataset.py
Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
* fix docstring
Co-authored-by: Matthew Deng <matthew.j.deng@gmail.com>
Co-authored-by: matthewdeng <matt@anyscale.com>
Co-authored-by: Clark Zinzow <clarkzinzow@gmail.com>
2021-10-12 14:03:10 -07:00
Carlo Grisetti
7651cc782a
Change prometheus warning filename source ( #19275 )
...
* Change prometheus warning filename source
* Fix linting
2021-10-12 14:02:51 -07:00
Eric Liang
8c152bd17c
Revert "[Placement Group] Fix the high load bug from the placement group ( #19277 )" ( #19327 )
...
This reverts commit 4360b99803
.
2021-10-12 12:41:51 -07:00
Akash Patel
b897b7b3be
add missing <memory> include ( #19083 )
2021-10-12 12:03:07 -07:00
Lixin Wei
f2f9c749cb
[Build] Add an Option to Skip Bazel Build ( #19265 )
2021-10-12 12:01:58 -07:00
Yi Cheng
bce6a498f3
Ensure job registered first before return. ( #19307 )
...
## Why are these changes needed?
Before this PR, there is a race condition where:
- job register starts
- driver start to launch actor
- gcs register actor ===> crash
- job register ends
Actor registration should be forced to be after driver registration. This PR enforces that.
## Related issue number
Closes #19172
2021-10-12 11:26:58 -07:00
Eric Liang
0ab6749602
Support iter_epochs for Datasets ( #19217 )
2021-10-12 11:05:00 -07:00
SangBin Cho
4360b99803
[Placement Group] Fix the high load bug from the placement group ( #19277 )
2021-10-12 11:04:14 -07:00
Clark Zinzow
6ca3c02041
[Datasets] Parallelize Parquet metadata fetches. ( #19211 )
2021-10-12 11:02:30 -07:00
dependabot[bot]
74ee99ff99
[RLlib](deps): Bump tensorflow from 2.5.0 to 2.6.0 in /python/requirements/rllib ( #18183 )
...
* [RLlib](deps): Bump tensorflow in /python/requirements/rllib
Bumps [tensorflow](https://github.com/tensorflow/tensorflow ) from 2.5.0 to 2.6.0.
- [Release notes](https://github.com/tensorflow/tensorflow/releases )
- [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md )
- [Commits](https://github.com/tensorflow/tensorflow/compare/v2.5.0...v2.6.0 )
---
updated-dependencies:
- dependency-name: tensorflow
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
* wip.
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-10-12 17:56:36 +02:00
SangBin Cho
2c93708324
Migrating to flat hash map [Raylet] ( #19220 )
...
* done
* Fix all unit tests
* done
* .
* Fix the build issue
* fix the compilation bug
2021-10-12 07:41:51 -07:00
Qing Wang
b6d67d2ba9
Use javac -h instead of javah. ( #19311 )
2021-10-12 22:37:14 +08:00
Wansoo Kim
0f6d4661d7
[tune] Port all MNIST examples to specify data_dir ( #19033 )
...
Co-authored-by: Kai Fricke <kai@anyscale.com>
2021-10-12 15:36:06 +01:00
Antoine Galataud
edb338ff7c
[RLlib] Check training_enabled
on PolicyServer ( #19007 )
2021-10-12 16:21:02 +02:00
SangBin Cho
cbbd349df9
add versions as a bug report requirement. ( #19282 )
...
* add versions as a req
* .
2021-10-12 07:09:23 -07:00
gjoliver
9226f9bddc
[RLlib] Report timesteps_this_iter to Tune, so it can track/checkpoint/restore total timesteps trained. ( #19264 )
...
* Report timesteps_this_iter to Tune, so it can track/checkpoint/restore
total timesteps trained.
* Trigger Build
* lint
2021-10-12 16:03:41 +02:00
gjoliver
5d14904b9b
[Tune] catch HTTPError when logging to wandb. ( #19314 )
2021-10-12 14:38:17 +01:00
Akash Patel
8241a03d31
resolve maybe uninitialized error ( #19103 )
2021-10-12 04:06:48 -07:00
Kai Fricke
d8d8901192
[ci/tune] Remove deprecated jenkins_only
tag from test tags ( #19287 )
2021-10-12 10:05:46 +01:00
Chris K. W
35230ea9fa
[client] deflake test_stdout_log_stream ( #19232 )
...
* deflake test_stdout_log_stream
* add assert message
2021-10-11 22:22:39 -07:00
architkulkarni
cc16e8f8c5
[runtime env] Validate "excludes" field ( #19302 )
2021-10-11 20:05:22 -07:00
Jiajun Yao
a781b10a50
[Release] Centralize c++ ray version string definition ( #19297 )
...
* Centralize c++ ray version string definition
* Centralize c++ ray version string definition
2021-10-12 11:09:29 +09:00
Jiao
85b8a6de5f
[Serve] Add nightly test for Serve failure recovery ( #19125 )
2021-10-11 18:33:20 -07:00
Carlo Grisetti
c2377fb725
[Serve] Call without loop parameter if python 3.10+ ( #19298 )
2021-10-11 18:31:13 -07:00
Eric Liang
6cacc54774
[RFC] Fake multi-node mode for autoscaler ( #18987 )
2021-10-11 18:27:29 -07:00
architkulkarni
1ee3b4136c
[Serve] [Doc] Serve fix tracing snippet ( #19296 )
2021-10-11 16:59:04 -07:00
SangBin Cho
0d7a7a06c0
[Placement group] Warm up the cluster before running the unit test #19286 ( #19286 )
2021-10-11 16:26:52 -07:00
Carlo Grisetti
2d0355548e
[Dashboard] Try to work around aiohttp 4.0.0 breaking changes ( #19120 )
2021-10-11 16:25:52 -07:00
Patrick Ames
a43193b9e5
[data] Add support for Arrow open input/output stream kwargs. ( #19197 )
2021-10-11 15:38:15 -07:00
Chen Shen
c740aae54c
[Core][Dataset] adding example for large scale data ingestion ( #18998 )
2021-10-11 15:37:09 -07:00
Jiajun Yao
92516981ea
[core] Increase worker lease parallelism ( #18647 )
2021-10-11 15:34:32 -07:00
Matti Picus
9ca34c7192
add dependencies to BUILD.bazel and update windows bazel to 4.2.1 ( #19132 )
...
* add dependencies to BUILD.bazel and update windows bazel to 4.2.1
* fixes from review
2021-10-11 10:25:19 -07:00
Amog Kamsetty
b3ad72643c
[Tune] Call on_trial_complete after final checkpoint ( #19243 )
2021-10-11 09:47:39 -07:00
Kai Fricke
6252a6c1f9
[tune] Force no result buffering for hyperband schedulers ( #19140 )
2021-10-11 16:56:11 +01:00