Sven Mika
7e260edb07
[RLlib] Fix small memory leak in SimpleListCollector (already superseeded by Bam4d's PR + small fix in error message). ( #15783 )
2021-05-18 16:02:03 +02:00
Chris Bamford
0be83d9a95
[RLlib] Fixing Memory Leak In Multi-Agent environments. Adding tooling for finding memory leaks in workers. ( #15815 )
2021-05-18 13:23:00 +02:00
Sven Mika
d2c755ccef
[RLlib] Examples scripts add argparse help and replace --torch
with --framework
. ( #15832 )
2021-05-18 13:18:12 +02:00
Sven Mika
2303851c3c
[RLlib] Torch multi-GPU + LSTM/RNN bug fix. ( #15492 )
2021-05-18 11:51:05 +02:00
Sven Mika
4e9555cad3
[RLlib] Issue 15724: Breaking example script in docs due to outdated eager
config flag (use framework='tf2|tfe' instead). ( #15736 )
2021-05-18 11:34:46 +02:00
dependabot[bot]
4c8813f2e8
[RLlib](deps): Bump pettingzoo in /python/requirements/rllib ( #15846 )
...
Bumps [pettingzoo](https://github.com/PettingZoo-Team/PettingZoo ) from 1.8.1 to 1.8.2.
- [Release notes](https://github.com/PettingZoo-Team/PettingZoo/releases )
- [Commits](https://github.com/PettingZoo-Team/PettingZoo/compare/1.8.1...1.8.2 )
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-05-18 11:25:52 +02:00
Sven Mika
839fc59224
[RLlib] CQL TensorFlow support ( #15841 )
2021-05-18 11:10:46 +02:00
Sven Mika
a36b9305d4
[RLlib] Better error message when deep-learning framework not installed. ( #15735 )
2021-05-18 11:06:05 +02:00
Sven Mika
6f4d988713
[RLlib] Issue 15556: Fix R2D2 using chunks from previous episodes in the "burn-in" window. ( #15737 )
2021-05-18 11:05:42 +02:00
Sven Mika
308ea62430
[RLlib] Fix "seed" setting to work in all frameworks and w/ all CUDA versions. ( #15682 )
2021-05-18 11:00:24 +02:00
architkulkarni
194c5e3a96
[Core] Cache workers by runtime_env in worker pool ( #15782 )
...
* pass RuntimeEnv in task spec as opaque string
* lint
* set correct empty value for json: "{}" not ""
* add comment for field in proto
* fix worker pool test by checking both "" and "{}"
* add RAY_CHECK todo
* make dict empty if all values null
* remove unnecessary ser/de
* fix
* address comments
* add WorkerCacheKey with hash function
* clean up
* add naive impl., dedicated workers never killed
* put dedicated workers in idle_of_all_languages
* pipe env hash from worker.py -> Worker
* fully pipe through hash, basic cache test passing
* use int type for runtime env hash
* convert Worker env hash type from size_t to int
* fix
* add method to MockWorker to fix cpp tests
* make compatible with java streaming test
* restore old dynamic_options code to fix java test
* address comments
* add comment about sorting before hash
* add comments for private members of WorkerCacheKey
2021-05-18 00:19:27 -07:00
Yi Cheng
863532af0a
[core] API for pre-run customized functions ( #15749 )
...
* run customer setup fn
* fix
* lint
* skip on w32
* fix comment
* up
* up
2021-05-17 22:52:36 -07:00
Alex Wu
69f228d22d
[core] Record actor+job start/end times and metadata ( #15803 )
2021-05-17 21:38:39 -07:00
Frank Luan
0dc34566fe
Refactor raylet to allocate+write+seal one return object at a time ( #15757 )
...
* Refactor raylet to allocate+write+seal one return object at a time
* Fix build
* Fix C++ and Java runtime
* Skip Windows testing
* Fix java and cpp runtime
* Fix warnings
* Fix cpp and java tests
* Fix cpp and java runtime
Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>
2021-05-17 20:06:08 -07:00
SangBin Cho
ff461634b0
[Core] Improved bad error message. ( #15663 )
...
* Improved bad error message.
* Update src/ray/raylet/node_manager.cc
Co-authored-by: Clark Zinzow <clarkzinzow@gmail.com>
* lint.
* Add a pid
Co-authored-by: Alex Wu <itswu.alex@gmail.com>
Co-authored-by: Clark Zinzow <clarkzinzow@gmail.com>
2021-05-17 19:38:05 -07:00
Dmitri Gekhtman
95c3d88cac
[autoscaler][kubernetes] Helm chart ( #15614 )
2021-05-17 16:55:10 -07:00
Amog Kamsetty
c97594aca3
[CI] Update dependencies on travis flaky build ( #15858 )
2021-05-17 16:07:00 -07:00
Dmitri Gekhtman
c2b8381015
[autoscaler][gcp] Migrate GCP config to available node types ( #15805 )
2021-05-17 15:45:47 -07:00
dependabot[bot]
434465e477
[tune](deps): Bump gpy from 1.9.9 to 1.10.0 in /python/requirements/tune ( #15850 )
...
Bumps [gpy](https://github.com/SheffieldML/GPy ) from 1.9.9 to 1.10.0.
- [Release notes](https://github.com/SheffieldML/GPy/releases )
- [Changelog](https://github.com/SheffieldML/GPy/blob/devel/CHANGELOG.md )
- [Commits](https://github.com/SheffieldML/GPy/compare/v1.9.9...v1.10.0 )
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-05-17 11:33:11 -07:00
Alex Wu
3744026897
Fix test_scheduling ( #15823 )
...
* done
* Update python/ray/tests/test_scheduling.py
Co-authored-by: Ian Rodney <ian.rodney@gmail.com>
* Update python/ray/tests/test_scheduling.py
Co-authored-by: Ian Rodney <ian.rodney@gmail.com>
* lint
Co-authored-by: Alex Wu <alex@anyscale.com>
Co-authored-by: Ian Rodney <ian.rodney@gmail.com>
2021-05-17 10:08:01 -07:00
Alex Wu
3e94114336
Namespaces ( #15774 )
2021-05-17 10:04:22 -07:00
Sven Mika
f25d58492d
[Testing] Dependabot for RLlib. ( #15812 )
2021-05-17 18:24:13 +02:00
architkulkarni
78c26ac9fe
[runtime_env] Use sys executable in shim if conda not specified ( #15834 )
2021-05-17 11:19:58 -05:00
lanlin
5d2ed47978
[tune] Allow to set buffer_length via tune.run ( #15810 )
2021-05-17 13:11:26 +01:00
wzl
5247c0a5b8
[doc] Fix typo ( #15828 )
2021-05-16 16:08:14 -07:00
Sven Mika
d89fb82bfb
[RLlib] Add simple curriculum learning API and example script. ( #15740 )
2021-05-16 17:35:10 +02:00
Sven Mika
ebc6d8692a
[RLlib] Docs: Example scripts and blogs documentation update. ( #15763 )
2021-05-16 15:24:38 +02:00
Sven Mika
469f5227da
[RLlib] CQL bug fix: Normalize actions for atanh in BC part of the CQL loss. ( #15814 )
2021-05-16 15:21:06 +02:00
Sven Mika
bc09e75b78
[RLlib] Fix 3 flakey test cases. ( #15785 )
2021-05-16 12:20:33 +02:00
Edward Oakes
cd32a92edc
[serve] Avoid exporting actor class for every replica ( #15788 )
2021-05-15 09:04:09 -05:00
fcardoso75
b3428bd09e
Adjust bazel number of build jobs ( #15784 )
2021-05-14 21:33:14 -07:00
Dmitri Gekhtman
d1b1ae0f45
[test][client][dask] Run dask tests in client mode. ( #15806 )
2021-05-14 17:15:59 -07:00
Edward Oakes
f6be6dbcdc
[Serve] batch slow warning for multiple replicas ( #15798 )
2021-05-14 13:12:32 -07:00
Ian Rodney
00c913cbc6
[Flaky] Mark test_nested_observation_spaces
as Flaky ( #15794 )
2021-05-14 12:08:52 -07:00
Ian Rodney
7b1c5dbe0a
[Hotfix][Lint] Pin other ESlint Deps ( #15816 )
2021-05-14 09:18:43 -07:00
Ian Rodney
ec5322a463
[Client] ray.client.connect() and ray.ClientBuilder ( #15706 )
2021-05-14 00:08:39 -07:00
Ian Rodney
395c0ea03c
[Testing] Pin Tensorflow Version in requirements.txt ( #15799 )
2021-05-13 17:09:15 -07:00
Edward Oakes
28f2962bb2
[serve] Add helpful log messages when deploying ( #15689 )
2021-05-13 18:10:23 -05:00
Ian Rodney
42f99541d4
[Tests] Mark test_scheduling & test_memstat as Flaky ( #15789 )
2021-05-13 15:46:12 -07:00
Edward Oakes
6a0f087643
[serve] Randomly shuffle replicas to avoid cross-handle synchronization ( #15792 )
2021-05-13 17:19:27 -05:00
Richard Liaw
c624e89483
[tune] Support numpy types in TBXlogger ( #15760 )
2021-05-13 14:54:47 -07:00
Edward Oakes
77d713ac78
[serve] Fix shutdown logic + add test ( #15790 )
2021-05-13 16:43:07 -05:00
Edward Oakes
d107cca1aa
[serve] Don't deserialize backend classes in the controller ( #15741 )
2021-05-13 16:01:09 -05:00
Ian Rodney
859703e993
[RuntimeEnv] Log which file caused an Exception ( #15772 )
2021-05-13 13:48:59 -07:00
mwtian
5462c6e7de
Fix link to release checklist from release process doc. ( #15793 )
2021-05-13 13:34:54 -07:00
Simon Mo
838cfec122
[Tracing] Fix kwargs replacement ( #15742 )
2021-05-13 12:44:35 -07:00
Ian Rodney
82876ecc2a
[rllib] [testing] make kill failure non fatal ( #15771 )
2021-05-13 12:24:49 -07:00
mwtian
dce13d3a81
Explicitly set protobuf dependency version to allow building ray with bazel 4.0.0 ( #15756 )
...
Java protobuf dependency version is made to be consistent as well.
2021-05-13 10:34:09 -07:00
SangBin Cho
259fcbd5bd
[Pubsub] Generalize the pubsub interface and adapt it for ref counting protocol ( #15446 )
...
* Add mock code first
* In the initial progress.
* Fix the number error
* In progress.
* in more pgoress.
* in progress.
* lint.
* Prototype done.
* Fix compilation bug.
* Now it is working with reference counting.
* Remove template.
* lint.
* Fixed issues.
* Fix reference count test.
* Reference count test passes now.
* Fixed the test array problem
* Addressed code review.
* lint.
* Addressed half of code review.
* Fix tests.
* Addressed the most critical issue.
* Make subscriber thread-safe.
* Revert "Make subscriber thread-safe."
This reverts commit 9a6a52197cfa8463ab60dfaae9530ad3c0ed8790.
* Fixed test failures. The only failure now is the asan failure.
* Reset test suites and see if it fixes the issue.
* Fix a flaky test
* Addressed code review.
2021-05-13 09:29:02 -07:00
architkulkarni
a0c1cfe034
[Core] Pass RuntimeEnv as opaque string in the task spec ( #15658 )
2021-05-13 10:32:00 -05:00