Ian Rodney
8fe7111a7b
[Client] Bump Proto Version ( #17879 )
2021-08-16 17:08:36 -07:00
Yi Cheng
03a82d733a
Revert "Revert "Export useful metrics"" ( #17755 )
...
* Revert "Revert "[Observability] Export useful metrics (#17578 )" (#17752 )"
This reverts commit 02e79f3fe5
.
* Update metric.h
* up
* up
* Update server_call.h
* Update test_metrics_agent.py
* up
* fix comment
2021-08-16 17:05:56 -07:00
Ian Rodney
2f200e5c2b
[Client] Pass ray.init()
args to the remote server ( #17776 )
2021-08-16 12:34:01 -07:00
architkulkarni
7d690e7231
[serve] Add deployment and replica tags to logs ( #17830 )
2021-08-16 11:00:39 -05:00
Sven Mika
0bc0e17712
CUDA 11.2 in docker images
2021-08-16 12:31:19 +02:00
dependabot[bot]
91d01f7211
[RLlib](deps): Bump tensorflow from 2.4.1 to 2.5.0 in /python/requirements/rllib ( #15849 )
2021-08-16 10:55:48 +02:00
Eric Liang
eb4239160a
Add an experimental flag to disable CUDA_VISIBLE_DEVICES ( #17847 )
...
* wip
* skip windows
2021-08-15 17:17:55 -07:00
dependabot[bot]
f6922b1768
[tune](deps): Bump optuna in /python/requirements/tune ( #17853 )
...
Bumps [optuna](https://github.com/optuna/optuna ) from 2.8.0 to 2.9.1.
- [Release notes](https://github.com/optuna/optuna/releases )
- [Commits](https://github.com/optuna/optuna/compare/v2.8.0...v2.9.1 )
---
updated-dependencies:
- dependency-name: optuna
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
2021-08-15 01:22:58 -07:00
dependabot[bot]
b29d05c79e
[tune](deps): Bump smart-open in /python/requirements/tune ( #17852 )
...
Bumps [smart-open](https://github.com/piskvorky/smart_open ) from 5.0.0 to 5.1.0.
- [Release notes](https://github.com/piskvorky/smart_open/releases )
- [Changelog](https://github.com/RaRe-Technologies/smart_open/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/piskvorky/smart_open/compare/v5.0.0...v5.1.0 )
---
updated-dependencies:
- dependency-name: smart-open
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-08-14 23:48:34 -07:00
dependabot[bot]
5d486e4214
[tune](deps): Bump h5py from 3.2.1 to 3.3.0 in /python/requirements/tune ( #17850 )
...
Bumps [h5py](https://github.com/h5py/h5py ) from 3.2.1 to 3.3.0.
- [Release notes](https://github.com/h5py/h5py/releases )
- [Changelog](https://github.com/h5py/h5py/blob/master/docs/release_guide.rst )
- [Commits](https://github.com/h5py/h5py/compare/3.2.1...3.3.0 )
---
updated-dependencies:
- dependency-name: h5py
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-08-14 23:32:16 -07:00
dependabot[bot]
afe6ec658e
[tune](deps): Bump nevergrad in /python/requirements/tune ( #17817 )
...
Bumps [nevergrad](https://github.com/facebookresearch/nevergrad ) from 0.4.3.post3 to 0.4.3.post7.
- [Release notes](https://github.com/facebookresearch/nevergrad/releases )
- [Changelog](https://github.com/facebookresearch/nevergrad/blob/master/CHANGELOG.md )
- [Commits](https://github.com/facebookresearch/nevergrad/compare/0.4.3.post3...0.4.3.post7 )
---
updated-dependencies:
- dependency-name: nevergrad
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
2021-08-14 21:01:48 -07:00
dependabot[bot]
030214ce5b
[tune](deps): Bump ax-platform in /python/requirements/tune ( #17815 )
...
Bumps [ax-platform](https://github.com/facebook/Ax ) from 0.1.20 to 0.2.1.
- [Release notes](https://github.com/facebook/Ax/releases )
- [Changelog](https://github.com/facebook/Ax/blob/master/CHANGELOG.md )
- [Commits](https://github.com/facebook/Ax/compare/v0.1.20...0.2.1 )
---
updated-dependencies:
- dependency-name: ax-platform
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
2021-08-14 21:01:28 -07:00
Yi Cheng
9136bb95d9
[workflow] Allow function without __module__ and __qualname__ ( #17804 )
2021-08-14 11:18:07 -07:00
Hasan Genc
6957ce66f6
Revert "Shutdown clusters when large number of nodes ( #17642 )" ( #17836 )
...
This reverts commit a33dc75105
.
2021-08-14 04:57:22 +03:00
Thomas Desrosiers
3e48df89f7
[Client] Fix mismatched debug log ID formats ( #17597 )
2021-08-13 13:28:20 -07:00
Amog Kamsetty
9f5dc5ec9f
[Docker] Downgrade to CUDA 11.0 ( #17806 )
2021-08-13 20:39:06 +02:00
architkulkarni
fcac416933
[Serve] [Dashboard] Add start times and replica tags to cluster snapshot ( #17749 )
2021-08-13 09:49:12 -07:00
Eric Liang
7ec52ca311
Make the namespace argument explicit instead of implicit in actor names ( #17758 )
2021-08-13 09:24:13 -07:00
Hasan Genc
a33dc75105
Shutdown clusters when large number of nodes ( #17642 )
...
* Allow clusters with over 1000 nodes to be shut down
* Add unit-test for terminating large number of nodes on AWS
* Fix lint
* Add max_terminate_nodes to the NodeProvider abstract class, and refactor terminate_nodes to reduce repetition
* lint
* Update comment
Co-authored-by: Ameer Haj Ali <ameerh@berkeley.edu>
* lint
* lint
* Unit test previously required internet access. This commit removes that requirement.
Co-authored-by: Ameer Haj Ali <ameerh@berkeley.edu>
2021-08-13 17:09:19 +03:00
Kai Fricke
96b620bc01
[docker] Pin matplotlib, fix docker build ( #17819 )
2021-08-13 14:59:50 +01:00
xwjiang2010
0be9f06ab6
[tune] Output insufficent resources warning msg when trials are in pending for extended amount of time. ( #17533 )
2021-08-13 01:37:56 -07:00
Hao Zhang
61de23cbae
[Collective] silent the pygloo warning as it is not commonly used ( #17792 )
2021-08-13 00:47:45 -07:00
qicosmos
a2a1c46c83
[C++ Worker]Fix for mac ( #17633 )
...
* linkopts shared
* replace gflags with absl flags
* fix
* add test option
* fix
* add cpp worker to mac ci
* fix
* support empty redis password;mod arc argv
* add encoding
* test
* ignore example test on mac
* support mac
* fix
* fix and update doc
* fix
* fix run.sh
* fix init
* fix typo
* fix run.sh
* fix lint
Co-authored-by: 久龙 <guyang.sgy@antfin.com>
2021-08-13 12:22:37 +08:00
Simon Mo
242a5d1a8d
[Serve] Add support for root_url
( #17765 )
2021-08-12 17:54:53 -07:00
Simon Mo
22b030d79f
[Serve] Remove serve.start(http_*)
arguments ( #17762 )
2021-08-12 17:50:12 -07:00
Eric Liang
7fc62a1529
Support dataset union ( #17793 )
2021-08-12 14:01:40 -07:00
Chen Shen
9565fa549e
[Core][RFC] limit the total number of inlined bytes in task request rpc
...
Co-authored-by: Clark Zinzow <clarkzinzow@gmail.com>
2021-08-12 13:55:54 -07:00
Simon Mo
6879293b6b
[CI] Mark some tests exclusive ( #17650 )
2021-08-12 10:28:03 -07:00
SangBin Cho
8fd7e025be
Skip raylet kill windows #17682 ( #17683 )
...
* Try fixing it?
* Done
* skip raylet signal
2021-08-12 09:35:44 -07:00
matthewdeng
55680a1f9e
[SGD] v2 initial checkpoint functionality ( #17632 )
...
* [SGD] initial checkpoint functionality
* remove thread implementation and merge with fetch_next_result
* Update comment
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
* address comments
* add additional tests
* fix imports
* load most recently saved checkpoint
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
2021-08-12 08:52:04 -07:00
Clark Zinzow
d6eeb5dc70
[Datasets] Add local and S3 filesystem test coverage for file-based datasources. ( #17158 )
2021-08-12 08:39:31 -07:00
architkulkarni
00f6b30684
[Serve] [Dashboard] Support nondetached and multiple Serve instances in cluster snapshot ( #17747 )
2021-08-11 22:26:54 -05:00
Eric Liang
ce171f10a1
Remove legacy plasma unlimited and pull manager pinning flag ( #17753 )
2021-08-11 20:19:12 -07:00
Clark Zinzow
623db7c47b
[Datasets] Add support for reading partitioned Parquet datasets. ( #17716 )
2021-08-11 15:55:49 -07:00
Jiao
3c64a1a3c1
Add micro benchmark to releaser repo ( #17727 )
2021-08-11 15:15:33 -07:00
architkulkarni
9a70e83e90
[hotfix] pin tensorflow==2.5.1 ( #17760 )
...
* pin tensorflow==1.5.1
* Update python/requirements.txt
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-08-11 15:15:22 -07:00
Yi Cheng
aa96e59faf
[workflow] Examples of function chaining ( #17715 )
2021-08-11 13:15:51 -07:00
Yi Cheng
02e79f3fe5
Revert "[Observability] Export useful metrics ( #17578 )" ( #17752 )
...
This reverts commit bd4db53df2
.
2021-08-11 12:21:50 -07:00
Jiao
e38db5875b
Add serve external kv store ( #17622 )
2021-08-11 12:06:14 -07:00
Amog Kamsetty
ed24bae644
[SGD] Fail if num_workers is not greater than 0 ( #17723 )
2021-08-11 10:05:19 -07:00
Ian Rodney
97f7ae5e06
[Cluster Launcher] Allow attach/exec on uninitialized head node ( #17688 )
2021-08-11 09:43:23 -07:00
chenk008
f0fc26960d
[sgd] Wait for placement_group deletion when shutdown worker_group ( #17698 )
...
* fix
* fix ut
* delete sleep
* fix according to comment
* fix according to comment
* use pg in test_resize
* fix
2021-08-11 08:47:49 -07:00
J K Terry
48e32555c8
[rllib] Update PettingZoo dependency versions ( #17702 )
...
* update pettingzoo dependency versions
* pettingzoo verison
* fix tests
2021-08-11 01:19:19 -07:00
Shantanu
abc593561c
[client] fix ClientRemoteMethod error message ( #17726 )
...
Co-authored-by: hauntsaninja <>
2021-08-11 00:43:17 -07:00
Yi Cheng
bd4db53df2
[Observability] Export useful metrics ( #17578 )
...
* up
* up
* up
* up
* up
* up
* up
* up
* up
* up
* up
* up
* up
* checkpoint
* up
* up
* up
* up
* fix
* up
* up
* up
* up
* up
* up
* up
* up
* up
* up
* add comments
* up
* up
* up
* up
* add tests
2021-08-10 17:14:42 -07:00
SongGuyang
63c15d7ced
[core] make 'PopWorker' to be an async function ( #17202 )
...
* make 'PopWorker' to be an async function
* pop worker async works
* fix
* address comments
* bugfix
* fix cluster_task_manager_test
* fix
* bugfix of detached actor
* address comments
* fix
* address comments
* fix aioredis
* Revert "fix aioredis"
This reverts commit 041b983eac95b105ab0e853e84c4cf2647008431.
* bug fix
* fix
* fix test_step_resources test
* format
* add unit test
* fix
* add test case PopWorkerStatus
* address commit
* fix lint
* address comments
* add python test
* address comments
* make an independent function
* Update test_basic_3.py
Co-authored-by: Hao Chen <chenh1024@gmail.com>
2021-08-10 17:03:17 -07:00
xwjiang2010
932f038644
[tune] Type hint TrialExecutor. Use Abstract Base Class. ( #17584 )
2021-08-10 14:17:22 -07:00
Clark Zinzow
78d23434e6
[Datasets] Fix write_json so roundtrip writing + reading works. ( #17691 )
...
* Write out dataset blocks as newline-delimited JSON.
* Add roundtrip JSON reading + writing test.
* Formatting.
2021-08-10 13:24:33 -07:00
SangBin Cho
705a7192b3
Unflake multi node 3 ( #17694 )
2021-08-10 13:16:52 -07:00
SangBin Cho
6160c06c69
[Core] Fix a bug where get_actor crashes gcs if the actor is already killed. ( #17670 )
...
* Fix a bug where get_actor crashes gcs if the actor is already killed.
* Test the restart code path.
* Add an additional test
* Add a comment
* addressed code review.
2021-08-10 09:58:09 -07:00