Chen Shen
89f988e9cc
add dataset shuffle data loader ( #17917 )
2021-08-20 11:26:01 -07:00
Edward Oakes
30541025e5
[serve] Remove deprecated APIs from code & docs ( #17754 )
2021-08-20 11:59:45 -05:00
Stephanie Wang
b8fe776638
[core] Fix inlined nested ids ( #17834 )
...
* test
* Use ObjectRef instead of ObjectID in nested refs
* java
* doc
* java
* build
* build
* x
* lint
* simplify
* fix
2021-08-20 08:58:29 -07:00
Amog Kamsetty
9416fce91b
[SGD] v2 Tune integration + iterator API ( #17839 )
...
* [SGD] implement SGD Trainer.to_tune_trainable
* address some comments
* add RESULT_DUPLICATE
* extract trainable creation logic out of Trainer
* add 1 CPU for driver
* use class attribute to fix serialization issues
* add examples
* add test for tune error
* tune
* test tune_linear
* run_iterator
* add to build file
* Update python/ray/util/sgd/v2/trainer.py
Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
* Update python/ray/util/sgd/v2/trainer.py
Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
* address comments
* fix tests & address comments
* resolve merge
* lint
* fix
* add team tag to tests
* fix tests
* lint
Co-authored-by: Matthew Deng <matthew.j.deng@gmail.com>
2021-08-20 08:31:21 -07:00
simonsays1980
60aee4a330
[RLlib] Add example script for bare metal Policy with custom view_requirements
. ( #17896 )
2021-08-20 12:17:13 +02:00
Jingyu-Peng
40330ca439
Fix loading dynamic functions/classes when using code_search_path ( #17605 )
2021-08-20 17:24:11 +08:00
Antoni Baum
0a1228ef6e
Add configurable autosuspend for connect tests ( #17958 )
2021-08-20 10:57:41 +02:00
Sven Mika
8248ba531b
[RLlib] Redo #17410 : Example script: Remote worker envs with inference done on main node. ( #17960 )
2021-08-20 08:02:18 +02:00
Eric Liang
236b772465
Revert "[GCS] GCS Based Actor Scheduler ( #16580 )" ( #17941 )
...
This reverts commit a9b4545502
.
2021-08-19 21:46:52 -07:00
Eric Liang
661ac4e37b
Remove last traces of ref-counting flag ( #17932 )
2021-08-19 21:08:13 -07:00
architkulkarni
36c26578a7
[runtime env] [test] Add nightly test to verify Ray wheel URLs are valid ( #17938 )
2021-08-19 15:48:37 -07:00
Chen Shen
a16a25852a
[Core] fix event race condition ( #17947 )
2021-08-19 14:20:34 -07:00
matthewdeng
d081ee9d87
[SGD v2] Save checkpoints to disk ( #17807 )
...
* [SGD] save checkpoints to disk
* fix test; add logs
* rename log_dir to logdir for consistency with tune
* address comments: add run level directories, add CheckpointConfig
* check for empty strings
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
* address comments - refactor CheckpointStrategy, remove run_dir and checkpoint_dir configurability
* fix Trainer docs
* Update python/ray/util/sgd/v2/checkpoint.py
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
* remove construct_path_with_default
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-08-19 14:18:51 -07:00
Sven Mika
a2d96c513a
[RLlib] Expand machine for nightly multi-gpu learning tests. ( #17955 )
2021-08-19 22:27:30 +02:00
Eric Liang
238941f857
Ray workflow comparison examples + add to tests ( #17880 )
2021-08-19 12:19:08 -07:00
architkulkarni
5ed3f0ce35
[Serve] [Dashboard] Add end times and DELETED state for endpoints ( #17898 )
2021-08-19 11:10:42 -05:00
Kai Fricke
21d90a0e9a
Increase disk for serve tests ( #17606 )
2021-08-19 17:51:19 +02:00
Kai Fricke
651aae76b9
[release] Ask for configuration in buildkite ( #17948 )
2021-08-19 17:51:05 +02:00
Alex Wu
318ba6fae0
Revert "[RLlib] Add example script for how to have n remote (parallel) envs with inference happening on "main" (possibly GPU) node. ( #17410 )" ( #17951 )
...
This reverts commit 8fc16b9a18
.
2021-08-19 07:55:10 -07:00
Kai Fricke
622f724f61
Update release process ( #17888 )
2021-08-19 13:34:51 +02:00
souravraha
f5fcb3c576
Fixes bug #17424 . ( #17437 )
2021-08-19 12:23:36 +02:00
Sven Mika
8fc16b9a18
[RLlib] Add example script for how to have n remote (parallel) envs with inference happening on "main" (possibly GPU) node. ( #17410 )
2021-08-19 12:14:50 +02:00
Kai Fricke
0eee355d2e
Terminate session instead of stop ( #17946 )
2021-08-19 10:26:59 +02:00
Alex Wu
497446063c
[hotfix] Fix test owners lint ( #17945 )
...
Co-authored-by: Alex <alex@anyscale.com>
2021-08-18 23:41:58 -07:00
Chong-Li
5e22257cec
[GCS] Fix: GCS Based Actor Scheduler ( #17944 )
2021-08-18 23:40:35 -07:00
Clark Zinzow
d958457d07
[Core] Second pass at privatizing APIs. ( #17885 )
...
* gcs_utils
* resource_spec
* profiling
* ray_perf and ray_cluster_perf
* test_utils
2021-08-18 20:56:33 -07:00
architkulkarni
4c6a695dab
[Doc] Runtime env docstring fix monospace formatting ( #17929 )
2021-08-18 20:53:41 -07:00
Simon Mo
b573864928
[CI] Add test owners ( #17893 )
2021-08-18 18:38:31 -07:00
Eric Liang
a9073d16f4
Revert "[Core] Unified worker initiators ( #17401 )" ( #17935 )
...
This reverts commit c3764ffd7d
.
2021-08-18 18:06:24 -07:00
Chen Shen
89d83228f6
[Core][Plasma-store] add stats-collector that eagerly collect stats
2021-08-18 13:47:50 -07:00
Chong-Li
a9b4545502
[GCS] GCS Based Actor Scheduler ( #16580 )
2021-08-18 13:44:59 -07:00
Clark Zinzow
e2c7706f76
Add support for an app config override to the release test script, allowing better integration with compile-on-product. ( #17913 )
2021-08-18 13:35:27 -07:00
Yi Cheng
ddc2e59af5
[workflow] Simplify the workflow storage layer ( #17883 )
2021-08-18 13:26:50 -07:00
Kai Fricke
bf3eaa9264
[RLlib] Dreamer fixes and reinstate Dreamer test. ( #17821 )
...
Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-08-18 18:47:08 +02:00
architkulkarni
6e8ff30de4
[Doc] [runtime env] Add note to install ray[default] ( #17869 )
2021-08-18 10:57:45 -05:00
Simon Mo
8fe970f4e7
[Buildkite] Cleanup test wheel environment ( #17912 )
...
The macOS builders are shared and reused across commits.
@clarkzinzow found a bug that the installed version of the wheel
is not the on in PR. This should fix it.
https://buildkite.com/ray-project/ray-builders-pr/builds/11628#be6c5fd6-14a2-449c-8f35-e3382a6ee647
2021-08-18 08:32:35 -07:00
Sven Mika
a428f10ebe
[RLlib] Add multi-GPU learning tests to nightly. ( #17778 )
2021-08-18 17:21:01 +02:00
architkulkarni
7e109a3266
[hotfix] [runtime env] change MacOS wheel URL from 10_13 to 10_15 ( #17902 )
2021-08-18 09:16:09 +02:00
Holden Karau
b9dae93bfa
Add ephemeral-storage: 1Gi requests but no limits. ( #17854 )
...
* Add ephemeral-storage: 1Gi requests but no limits. This is useful when scheduling in a storage constrained env since ray assumes it has ephemeral storage to use.
* Add ephemeral-storage: 1Gi to b/deploy/charts/ray/templates/operator_cluster_scoped.yaml b/deploy/charts/ray/templates/operator_namespaced.yaml
2021-08-17 21:10:39 -04:00
Eric Liang
5536c5fff6
Add namespace
argument to Ray client get actor call ( #17878 )
2021-08-17 16:41:18 -07:00
Richard Liaw
c2c855b38b
Add codeowners for setup.py ( #17884 )
...
* add-czar
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
* setup
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2021-08-17 16:29:32 -07:00
Chen Shen
880797d5c2
[Core][Test] Add ubsan support for C++ tests ( #17812 )
...
* support ubsan
* update
2021-08-17 10:22:03 -07:00
SangBin Cho
4971e13941
[Build] Asan wheel test ( #17685 )
...
* in progerss
* ASAN tests.
* d
* in progress
* in progress without the asan wheel
* Support the asan wheel.
* Support the asan wheels
* Not build a binary for asan
* Fix issues
* Remove a wrong build
* Separate out asan wheel build
* Try preparing more deps.
* ip
* Try different version
* done
* d
* Trial
* Another try
* Another try
* skip cpp build to see what happens
* add more des
* ip
* abc
* Try next
* completed
* try
* Try without static libasan
* dbg
* Try static link
* Fix issues
* abc
2021-08-17 10:21:41 -07:00
Sven Mika
f18213712f
[RLlib] Redo: "fix self play example scripts" PR (17566) ( #17895 )
...
* wip.
* wip.
* wip.
* wip.
* wip.
* wip.
* wip.
* wip.
* wip.
2021-08-17 09:13:35 -07:00
Antoni Baum
2b7d907762
Print description in --help ( #17871 )
2021-08-17 17:29:01 +02:00
Hasan Genc
adc0c47b4f
Shutdown clusters on AWS with >1000 nodes ( #17841 )
...
* Revert "Revert "Shutdown clusters when large number of nodes (#17642 )" (#17836 )"
This reverts commit 6957ce66f6
.
* Update unit test and fix terminate_nodes
2021-08-17 16:26:10 +03:00
Chris Bamford
58a73821fb
[RLlib] IMPALA sample throughput calculation and full queue slowdown fixes ( #17822 )
2021-08-17 14:01:41 +02:00
chenk008
c3764ffd7d
[Core] Unified worker initiators ( #17401 )
...
* use setup_worker as starter
* use setup_worker as starter
* add java test
* fix
* fix
* lint
* sleep in ci
* sleep in ci
* fix ut
* fix
* fix
* fix
* fix
* fix
* fix
* change test size
* test
* fix
* fix
* fix ut
* restore sgd test
* change test size
* fix merge confict
* restore cpp worker flag
* fix
* fix
* add worker-languange in setup_runtime_env.py
* lint
* fix java command
Co-authored-by: root <chenk008>
2021-08-17 19:37:26 +08:00
simonsays1980
7b33dc21dc
[RLlib] Fix update model view requirements from init state for bare-metal policies with custom view-reqs. ( #17867 )
...
* Changed '_update_model_view_requirements_from_init_state()' to adopt the 'shift' in view_requirements from a user-defined policy that inherits directly from Policy.
* Added slightly modifed version of Sven's suggestion. Like this any user-defined attributes of the ViewRequirement of the state get conserved.
* I saw that the code in _update_model_view_requirements_from_init_state() had changed and is not identical to my locally installed version. In the new version view_requirements from the model and the policy get united and therefore a loop runs through this unified list. Code should run now in the present version
* Apply suggestions from code review
2021-08-17 11:49:24 +02:00
gjoliver
1dbe7fc26a
[RLlib] Config dict should use true instad of True in docs/examples. ( #17889 )
2021-08-17 11:46:10 +02:00