Simon Mo
a024effac7
[Serve] Add InMemoryMetricsStore for Autoscaling ( #18458 )
2021-09-16 11:08:42 -07:00
Simon Mo
317a34c523
[Serve] Use BackendConfig Protobuf ( #17835 )
2021-09-16 11:08:23 -07:00
Jiao
ca3be60291
[Releaes] change headnode type for serve benchmark ( #18672 )
...
Co-authored-by: Jiao Dong <jiaodong@anyscale.com>
2021-09-16 10:57:36 -07:00
Sven Mika
ba1c489b79
[RLlib Testing] Lower --smoke-test
"time_total_s" to make sure it doesn't time out. ( #18670 )
2021-09-16 18:22:23 +02:00
Edward Oakes
e7ea1f9a82
[runtime_env] Remove global logger from working_dir code ( #18605 )
2021-09-16 10:37:45 -05:00
Guyang Song
187e4a86ca
[C++ API] expose C++ task failure event ( #18596 )
2021-09-16 19:20:16 +08:00
Jernej Makovsek
b5c5247ad4
Update example yaml file for running local clusters ( #18530 )
2021-09-16 02:24:45 -07:00
Sasha Sobol
2f0e22aa4e
prioritize non-gpu nodes when scheduling CPU-only requests ( #18615 )
2021-09-16 09:57:24 +01:00
gjoliver
df32ed35fd
Extend --smoke-test deadlines for learning and stress regression tests. ( #18667 )
2021-09-16 09:18:39 +01:00
DK.Pino
99043e5045
[Hotfix] [Issue template] Fix the yaml grammer in feature request issue template ( #18624 )
2021-09-15 23:01:48 -07:00
xwjiang2010
ea48b1227f
[Tune] Do not crash when resources are insufficient. ( #18611 )
2021-09-15 23:00:53 -07:00
Stephanie Wang
be7cb70c30
[core] Fix ref counting during actor construction ( #18646 )
...
* test
* fix
* cpp
* skip windows
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2021-09-15 22:16:53 -07:00
liuyang-my
ed04ab7140
Define protobuf for RequestMetadata and HTTPRequestWrapper ( #18203 )
2021-09-15 14:39:27 -07:00
Chris K. W
7df3441ae9
[client] Fix credential generation when secure=True but no credentials provided ( #18636 )
...
* set self._credentials if not provided
* fix credential generation
2021-09-16 00:37:33 +03:00
Sven Mika
8a72824c63
[RLlib Testig] Split and unflake more CI tests (make sure all jobs are < 30min). ( #18591 )
2021-09-15 22:16:48 +02:00
Chen Shen
28c9c1fd98
fix windows pg test by skipping ( #18649 )
2021-09-15 11:39:13 -07:00
Antoni Baum
7e95f330d5
[ci] Fix xgboost_ray install from git ( #18640 )
2021-09-15 18:07:15 +01:00
Antoni Baum
d50ff16ccf
[ci] Fix HEBO breaking Tune tests ( #18629 )
2021-09-15 10:01:29 -07:00
Kai Fricke
0223ae9605
[xgboost] Bump xgboost_ray requirements_upstream.txt version to 0.1.3 ( #18632 )
2021-09-15 18:01:15 +01:00
Edward Oakes
7736cdd91d
[dashboard] Rename "new_dashboard" -> "dashboard" ( #18214 )
2021-09-15 11:17:15 -05:00
Edward Oakes
7d0a2b39e3
[runtime_env] Remove dynamically imported setup_hook ( #18601 )
2021-09-15 10:19:55 -05:00
Antoni Baum
eeb67a42cc
pip install xgboost_ray -> xgboost_ray[default] ( #18607 )
...
Co-authored-by: Kai Fricke <kai@anyscale.com>
2021-09-15 14:45:56 +01:00
Kai Fricke
15a83d104d
[ci/release] remove legacy release tests ( #18592 )
2021-09-15 14:42:58 +01:00
Kai Fricke
c186253fc5
[github] fix feature request template ( #18627 )
2021-09-15 11:33:19 +01:00
DK.Pino
9d41aafcce
Adapt GitHub new issue template ( #18516 )
2021-09-15 00:57:57 -07:00
Sven Mika
8a00154038
[RLlib] Bump tf version in ML docker to tf==2.5.0; add tfp to ML-docker. ( #18544 )
2021-09-15 08:46:37 +02:00
Sven Mika
c5d20849ae
[RLlib] Rename rllib rollout
into rllib evaluate
(backward compatible) to match Trainer API. ( #18467 )
2021-09-15 08:45:17 +02:00
qicosmos
d7c631209b
[C++ Worker]Add api get placement group ( #18535 )
2021-09-15 14:11:31 +08:00
qicosmos
15881acffd
[C++ Worker]Update cpp worker doc ( #18537 )
2021-09-15 14:11:17 +08:00
Simon Mo
497c5f56fa
[CI] Temporary disable worker-in-container test ( #18606 )
...
* revert again
* disable tmp
2021-09-14 22:38:20 -07:00
SangBin Cho
0684531e22
[Test] Break down placement group tests ( #18612 )
2021-09-14 21:55:18 -07:00
SangBin Cho
b8c361d3fb
[Test] Mark app config failure as a infra failure ( #18614 )
2021-09-14 17:20:05 -07:00
Eric Liang
d1f348cd9d
[RFC] Split the list of libraries into ML vs production
2021-09-14 16:32:07 -07:00
Chris K. W
cc1d7b8174
[client] Refactors for Reconnect PR ( #18484 )
...
* add refactors
* add worker annotation
* Regenerate credentials by default
* use self._secure
* infer secure if credentials provided
* separate _shutdown
2021-09-14 16:13:35 -07:00
Eric Liang
15512c27c2
Revert "Revert "Route core worker ERROR/FATAL logs to driver logs (#1… ( #18604 )
2021-09-14 13:32:07 -07:00
SangBin Cho
31e1638fb3
[CLI] Improve ray status for placement groups ( #18289 )
2021-09-14 11:29:13 -07:00
Stephanie Wang
344f2d9073
[core] Fix race condition in distributed ref counting ( #18584 )
2021-09-14 11:02:59 -07:00
Kai Fricke
c8188ea70e
[ci/rllib] wait for stress test cluster ( #18603 )
2021-09-14 19:01:22 +01:00
Kai Fricke
6777e24293
[ci] Add release test owner overview file ( #18590 )
2021-09-14 11:00:31 -07:00
Sven Mika
08c09737fa
[RLlib] Fix R2D2 (torch) multi-GPU issue. ( #18550 )
2021-09-14 19:58:10 +02:00
SangBin Cho
51d94ebee0
[Tests] Make nightly test work + Remove work stealing logs ( #18300 )
...
* make tests work
* .
2021-09-14 09:52:58 -07:00
Edward Oakes
644f7bd7fa
[runtime_env] Remove no-longer-used mock setup function ( #18600 )
2021-09-14 11:35:09 -05:00
matthewdeng
380a653787
[SGD] update SGDv2 user guide docs ( #18270 )
...
* [SGD] update SGDv2 user guide docs
* Update doc/source/raysgd/v2/user_guide.rst
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
* add new line
* update docs
* fix header line length
* lint
* lint
* lint
* lint
* fix remaining lint issues
* Update doc/source/raysgd/v2/user_guide.rst
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
* Update doc/source/raysgd/v2/user_guide.rst
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
* address comments
* address comments
* add TODO for iterator API
* Update doc/source/raysgd/v2/user_guide.rst
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
* address comments
* address comments
* add tune doc
* restructure table of contents
* add examples; rename example files to include example suffix
* add quick start, porting code
* address comments
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
2021-09-14 09:07:25 -07:00
mwtian
a3f399ef10
[Client] fix propagating errors to async calls during disconnect, and other cleanup ( #18539 )
...
* cleanup tests and errors for clients
* Fix lock and async get
* rerun
* Avoid running callback under lock. Make lock non-reentrant
* Add all necessary apis
* Removed unused APIs
2021-09-14 18:48:27 +03:00
Edward Oakes
7f8cdce67d
Revert "Route core worker ERROR/FATAL logs to driver logs ( #18577 )" ( #18602 )
...
This reverts commit 3e0ae38e11
.
2021-09-14 10:41:10 -05:00
Antoni Baum
65d5deae60
[tests] Increase golden notebook test timeout to 20 mins ( #18554 )
2021-09-14 16:27:56 +01:00
Jiao
d3734d803d
[serve] Change nightly test docker image and enable micro benchmark ( #18566 )
2021-09-14 09:41:21 -05:00
Jiao
18bbf044a7
[serve] Add reconfigure with exception test and ensure it can rollback ( #18568 )
2021-09-14 08:39:46 -05:00
Kai Fricke
e4754f1e19
[ci] wheel URLs - give some time for wheels to be built ( #18505 )
2021-09-14 09:56:34 +01:00
Ameer Haj Ali
e6807ecb43
Change tests owners for ml tests ( #18417 )
2021-09-14 01:04:52 -07:00