Commit graph

9110 commits

Author SHA1 Message Date
Sven Mika
f18213712f
[RLlib] Redo: "fix self play example scripts" PR (17566) (#17895)
* wip.

* wip.

* wip.

* wip.

* wip.

* wip.

* wip.

* wip.

* wip.
2021-08-17 09:13:35 -07:00
Antoni Baum
2b7d907762
Print description in --help (#17871) 2021-08-17 17:29:01 +02:00
Hasan Genc
adc0c47b4f
Shutdown clusters on AWS with >1000 nodes (#17841)
* Revert "Revert "Shutdown clusters when large number of nodes (#17642)" (#17836)"

This reverts commit 6957ce66f6.

* Update unit test and fix terminate_nodes
2021-08-17 16:26:10 +03:00
Chris Bamford
58a73821fb
[RLlib] IMPALA sample throughput calculation and full queue slowdown fixes (#17822) 2021-08-17 14:01:41 +02:00
chenk008
c3764ffd7d
[Core] Unified worker initiators (#17401)
* use setup_worker as starter

* use setup_worker as starter

* add java test

* fix

* fix

* lint

* sleep in ci

* sleep in ci

* fix ut

* fix

* fix

* fix

* fix

* fix

* fix

* change test size

* test

* fix

* fix

* fix ut

* restore sgd test

* change test size

* fix merge confict

* restore cpp worker flag

* fix

* fix

* add worker-languange in setup_runtime_env.py

* lint

* fix java command

Co-authored-by: root <chenk008>
2021-08-17 19:37:26 +08:00
simonsays1980
7b33dc21dc
[RLlib] Fix update model view requirements from init state for bare-metal policies with custom view-reqs. (#17867)
* Changed '_update_model_view_requirements_from_init_state()' to adopt the 'shift' in view_requirements from a user-defined policy that inherits directly from Policy.

* Added slightly modifed version of Sven's suggestion. Like this any user-defined attributes of the ViewRequirement of the state get conserved.

* I saw that the code in _update_model_view_requirements_from_init_state() had changed and is not identical to my locally installed version. In the new version view_requirements from the model and the policy get united and therefore a loop runs through this unified list. Code should run now in the present version

* Apply suggestions from code review
2021-08-17 11:49:24 +02:00
gjoliver
1dbe7fc26a
[RLlib] Config dict should use true instad of True in docs/examples. (#17889) 2021-08-17 11:46:10 +02:00
Guyang Song
8227e24424
[event] event framework integration in raylet, gcs server and core worker (#17671) 2021-08-17 11:21:23 +08:00
Hao Chen
ddb0dc8ad2
Fix client_test_enabled (#17699)
* Fix client_test_enabled

* fix

* trigger CI
2021-08-17 10:59:50 +08:00
Chen Shen
a9757a86b3
[Core] Fix nested ref count bug: add NestedIds to reference_counter once a task returns (#17802)
* add nested reference

* fix bug
2021-08-16 19:02:26 -07:00
Alex Wu
dde8250744
Better error message on docker wheel build (#17881)
* Better error message

* Apply suggestions from code review

Co-authored-by: Clark Zinzow <clarkzinzow@gmail.com>

Co-authored-by: Clark Zinzow <clarkzinzow@gmail.com>
2021-08-16 18:07:10 -07:00
Ian Rodney
8fe7111a7b
[Client] Bump Proto Version (#17879) 2021-08-16 17:08:36 -07:00
Yi Cheng
03a82d733a
Revert "Revert "Export useful metrics"" (#17755)
* Revert "Revert "[Observability] Export useful metrics (#17578)" (#17752)"

This reverts commit 02e79f3fe5.

* Update metric.h

* up

* up

* Update server_call.h

* Update test_metrics_agent.py

* up

* fix comment
2021-08-16 17:05:56 -07:00
Navneet Nandan
35d86ebfee
Added support to use tolerations for head and worker nodes (#17608)
* Added support to use tolerations for head and worker nodes

* removed the imagePullSecret configuration

* Update comments

* minor comment change

* add back rayproject/ray:nightly comment

Co-authored-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
2021-08-16 17:06:15 -04:00
Thomas Lecat
c02f91fa2d
[RLlib] Ape-X doesn't take the value of prioritized_replay into account (#17541) 2021-08-16 22:18:08 +02:00
Stefan Schneider
eab9c25856
[RLlib] Better example scripts: Description --no-tune and --local-mode CLI options (autoregressive_action_dist.py) (#17705) 2021-08-16 22:08:13 +02:00
Sven Mika
f3bbe4ea44
[RLlib] Test cases/BUILD cleanup; split "everything else" (longest running one rn) tests in 2. (#17640) 2021-08-16 22:01:01 +02:00
Ian Rodney
2f200e5c2b
[Client] Pass ray.init() args to the remote server (#17776) 2021-08-16 12:34:01 -07:00
Alex Wu
1209a87ead
[core] Remove push based resource report code path (#17825) 2021-08-16 12:03:38 -07:00
Chen Shen
b349c6bc4f
[object store refactor 4/n] object lifecycle manager (#17344)
* lifecycle

* address comments
2021-08-16 09:58:35 -07:00
architkulkarni
7d690e7231
[serve] Add deployment and replica tags to logs (#17830) 2021-08-16 11:00:39 -05:00
architkulkarni
e1ffc0fd73
[Serve] [Doc] Delete docs for removed automatic conda activation feature (#17832) 2021-08-16 10:59:49 -05:00
Lingxuan Zuo
f2a3085ce2
[Metric]Java metric api enhancement (#17811)
* Java metric api enhancement:
make tagkey transparent for upper level users

* add java metric tags test

* mark Deprecated
2021-08-16 22:38:27 +08:00
Kai Fricke
8580e450cb
[release] update/unify base images (#17859) 2021-08-16 12:44:25 +02:00
Sven Mika
0bc0e17712
CUDA 11.2 in docker images 2021-08-16 12:31:19 +02:00
dependabot[bot]
91d01f7211
[RLlib](deps): Bump tensorflow from 2.4.1 to 2.5.0 in /python/requirements/rllib (#15849) 2021-08-16 10:55:48 +02:00
Sven Mika
2bd2ee7a73
[RLlib] SampleBatch: Docstring- and API cleanups; Add support for nested data. (#17485) 2021-08-16 06:08:14 +02:00
Eric Liang
eb4239160a
Add an experimental flag to disable CUDA_VISIBLE_DEVICES (#17847)
* wip

* skip windows
2021-08-15 17:17:55 -07:00
Holden Karau
e0f8e18173
Make the ray logs visible (#17810) 2021-08-15 17:16:55 -04:00
dependabot[bot]
f6922b1768
[tune](deps): Bump optuna in /python/requirements/tune (#17853)
Bumps [optuna](https://github.com/optuna/optuna) from 2.8.0 to 2.9.1.
- [Release notes](https://github.com/optuna/optuna/releases)
- [Commits](https://github.com/optuna/optuna/compare/v2.8.0...v2.9.1)

---
updated-dependencies:
- dependency-name: optuna
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
2021-08-15 01:22:58 -07:00
dependabot[bot]
b29d05c79e
[tune](deps): Bump smart-open in /python/requirements/tune (#17852)
Bumps [smart-open](https://github.com/piskvorky/smart_open) from 5.0.0 to 5.1.0.
- [Release notes](https://github.com/piskvorky/smart_open/releases)
- [Changelog](https://github.com/RaRe-Technologies/smart_open/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/piskvorky/smart_open/compare/v5.0.0...v5.1.0)

---
updated-dependencies:
- dependency-name: smart-open
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-08-14 23:48:34 -07:00
dependabot[bot]
5d486e4214
[tune](deps): Bump h5py from 3.2.1 to 3.3.0 in /python/requirements/tune (#17850)
Bumps [h5py](https://github.com/h5py/h5py) from 3.2.1 to 3.3.0.
- [Release notes](https://github.com/h5py/h5py/releases)
- [Changelog](https://github.com/h5py/h5py/blob/master/docs/release_guide.rst)
- [Commits](https://github.com/h5py/h5py/compare/3.2.1...3.3.0)

---
updated-dependencies:
- dependency-name: h5py
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-08-14 23:32:16 -07:00
dependabot[bot]
afe6ec658e
[tune](deps): Bump nevergrad in /python/requirements/tune (#17817)
Bumps [nevergrad](https://github.com/facebookresearch/nevergrad) from 0.4.3.post3 to 0.4.3.post7.
- [Release notes](https://github.com/facebookresearch/nevergrad/releases)
- [Changelog](https://github.com/facebookresearch/nevergrad/blob/master/CHANGELOG.md)
- [Commits](https://github.com/facebookresearch/nevergrad/compare/0.4.3.post3...0.4.3.post7)

---
updated-dependencies:
- dependency-name: nevergrad
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
2021-08-14 21:01:48 -07:00
dependabot[bot]
030214ce5b
[tune](deps): Bump ax-platform in /python/requirements/tune (#17815)
Bumps [ax-platform](https://github.com/facebook/Ax) from 0.1.20 to 0.2.1.
- [Release notes](https://github.com/facebook/Ax/releases)
- [Changelog](https://github.com/facebook/Ax/blob/master/CHANGELOG.md)
- [Commits](https://github.com/facebook/Ax/compare/v0.1.20...0.2.1)

---
updated-dependencies:
- dependency-name: ax-platform
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
2021-08-14 21:01:28 -07:00
Yi Cheng
9136bb95d9
[workflow] Allow function without __module__ and __qualname__ (#17804) 2021-08-14 11:18:07 -07:00
Hasan Genc
6957ce66f6
Revert "Shutdown clusters when large number of nodes (#17642)" (#17836)
This reverts commit a33dc75105.
2021-08-14 04:57:22 +03:00
Sven Mika
c2ea2c01bb
[RLlib] Redo: Add support for multi-GPU to DDPG. (#17789)
* wip.

* wip.

* wip.

* wip.

* wip.

* wip.
2021-08-13 18:01:24 -07:00
Ian Rodney
54f107f559
[Docker] Exclude +dbg builds when finding Wheels (#17826) 2021-08-13 16:04:57 -07:00
Simon Mo
61ac06cc6d
[Buildkite] Fix zsh bug so latest wheels are pushed correctly (#17831) 2021-08-13 15:33:14 -07:00
Abhishek Malvankar
393926d619
[Docs] Tips for mac and RHELv8 (#17263)
Co-authored-by: asmalvan@us.ibm.com <asmalvan@us.ibm.com>
2021-08-13 13:45:13 -07:00
Ivorius
8b6edcb1c9
[Docker] Add --base-image argument to build-docker.sh (#17574) 2021-08-13 13:29:33 -07:00
Thomas Desrosiers
3e48df89f7
[Client] Fix mismatched debug log ID formats (#17597) 2021-08-13 13:28:20 -07:00
Huaiwei Sun
14365e111d
Update README.rst (#17471)
Add Slack channel info in the "Getting Involved" section
2021-08-13 13:25:51 -07:00
akern40
0cb2c602db
[rllib] Fixes typo in RolloutWorker.__init__ (#17583)
Fixes the typo in RolloutWorker.__init__, closes #17582
2021-08-13 13:17:36 -07:00
Amog Kamsetty
9f5dc5ec9f
[Docker] Downgrade to CUDA 11.0 (#17806) 2021-08-13 20:39:06 +02:00
architkulkarni
fcac416933
[Serve] [Dashboard] Add start times and replica tags to cluster snapshot (#17749) 2021-08-13 09:49:12 -07:00
Simon Mo
7d482fe099
[Doc] Update macos nightly wheel names (#17813) 2021-08-13 09:45:10 -07:00
Eric Liang
7ec52ca311
Make the namespace argument explicit instead of implicit in actor names (#17758) 2021-08-13 09:24:13 -07:00
Hasan Genc
a33dc75105
Shutdown clusters when large number of nodes (#17642)
* Allow clusters with over 1000 nodes to be shut down

* Add unit-test for terminating large number of nodes on AWS

* Fix lint

* Add max_terminate_nodes to the NodeProvider abstract class, and refactor terminate_nodes to reduce repetition

* lint

* Update comment

Co-authored-by: Ameer Haj Ali <ameerh@berkeley.edu>

* lint

* lint

* Unit test previously required internet access. This commit removes that requirement.

Co-authored-by: Ameer Haj Ali <ameerh@berkeley.edu>
2021-08-13 17:09:19 +03:00
Kai Fricke
96b620bc01
[docker] Pin matplotlib, fix docker build (#17819) 2021-08-13 14:59:50 +01:00