Guyang Song
|
89ce8a3a02
|
support 'CustomFields' tooltip in dashboard (#18698)
|
2021-09-17 17:48:32 +08:00 |
|
architkulkarni
|
a9cce8a34b
|
[serve] Add basic calculate_desired_num_replicas function for autoscaling (#18658)
|
2021-09-17 00:18:51 -07:00 |
|
Qing Wang
|
11291029b1
|
Add Codeowners for Java API. (#18663)
|
2021-09-17 14:48:55 +08:00 |
|
Simon Mo
|
3029812b8b
|
[Serve] Autoscaling metric store take 2 (#18683)
|
2021-09-16 22:28:13 -07:00 |
|
qicosmos
|
4af3d86d8a
|
Remove abi flag (#18538)
|
2021-09-17 12:13:56 +08:00 |
|
Eric Liang
|
c9ca980c83
|
Check dataset pipeline is not read multiple times by accident (#18682)
|
2021-09-16 20:33:24 -07:00 |
|
Amog Kamsetty
|
84e958f330
|
[ML] Consolidate and upgrade Deep Learning Dependencies (#18574)
* wip
'
* upgrade requirements
* add file
* fix
* fixes
* Apply suggestions from code review
Try mlagents==0.21.0 for now (works with torch 1.9).
* Apply suggestions from code review
* wip
* wip
* fix
* fix
* upgrade lightning bolts
* address comment
Co-authored-by: Sven Mika <sven@anyscale.io>
|
2021-09-16 20:16:40 -07:00 |
|
DK.Pino
|
12b3b1f723
|
[core] Log resource name not id (#18598)
|
2021-09-16 16:28:09 -07:00 |
|
Amog Kamsetty
|
de050e8187
|
[SGD] v2 Class API (#18571)
* wip
* wip
* add horovod example
* add example
* lint
* fix
* address comments
* updates
* lint
* update example
* address comment
* address comment
* update
* fix
* Update python/ray/util/sgd/v2/examples/horovod/horovod_stateful_example.py
Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
* address comments
* add back name mangling
* fix tests
* Update python/ray/util/sgd/v2/trainer.py
* fix
* lint
* fix
* fix docstring
* Update python/ray/util/sgd/v2/tests/test_trainer.py
Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
* update
Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
|
2021-09-16 12:33:38 -07:00 |
|
Simon Mo
|
eeaae5aa08
|
Revert "[Serve] Add InMemoryMetricsStore for Autoscaling (#18458)" (#18675)
This reverts commit a024effac7 .
|
2021-09-16 11:37:31 -07:00 |
|
Simon Mo
|
a024effac7
|
[Serve] Add InMemoryMetricsStore for Autoscaling (#18458)
|
2021-09-16 11:08:42 -07:00 |
|
Simon Mo
|
317a34c523
|
[Serve] Use BackendConfig Protobuf (#17835)
|
2021-09-16 11:08:23 -07:00 |
|
Jiao
|
ca3be60291
|
[Releaes] change headnode type for serve benchmark (#18672)
Co-authored-by: Jiao Dong <jiaodong@anyscale.com>
|
2021-09-16 10:57:36 -07:00 |
|
Sven Mika
|
ba1c489b79
|
[RLlib Testing] Lower --smoke-test "time_total_s" to make sure it doesn't time out. (#18670)
|
2021-09-16 18:22:23 +02:00 |
|
Edward Oakes
|
e7ea1f9a82
|
[runtime_env] Remove global logger from working_dir code (#18605)
|
2021-09-16 10:37:45 -05:00 |
|
Guyang Song
|
187e4a86ca
|
[C++ API] expose C++ task failure event (#18596)
|
2021-09-16 19:20:16 +08:00 |
|
Jernej Makovsek
|
b5c5247ad4
|
Update example yaml file for running local clusters (#18530)
|
2021-09-16 02:24:45 -07:00 |
|
Sasha Sobol
|
2f0e22aa4e
|
prioritize non-gpu nodes when scheduling CPU-only requests (#18615)
|
2021-09-16 09:57:24 +01:00 |
|
gjoliver
|
df32ed35fd
|
Extend --smoke-test deadlines for learning and stress regression tests. (#18667)
|
2021-09-16 09:18:39 +01:00 |
|
DK.Pino
|
99043e5045
|
[Hotfix] [Issue template] Fix the yaml grammer in feature request issue template (#18624)
|
2021-09-15 23:01:48 -07:00 |
|
xwjiang2010
|
ea48b1227f
|
[Tune] Do not crash when resources are insufficient. (#18611)
|
2021-09-15 23:00:53 -07:00 |
|
Stephanie Wang
|
be7cb70c30
|
[core] Fix ref counting during actor construction (#18646)
* test
* fix
* cpp
* skip windows
Co-authored-by: Eric Liang <ekhliang@gmail.com>
|
2021-09-15 22:16:53 -07:00 |
|
liuyang-my
|
ed04ab7140
|
Define protobuf for RequestMetadata and HTTPRequestWrapper (#18203)
|
2021-09-15 14:39:27 -07:00 |
|
Chris K. W
|
7df3441ae9
|
[client] Fix credential generation when secure=True but no credentials provided (#18636)
* set self._credentials if not provided
* fix credential generation
|
2021-09-16 00:37:33 +03:00 |
|
Sven Mika
|
8a72824c63
|
[RLlib Testig] Split and unflake more CI tests (make sure all jobs are < 30min). (#18591)
|
2021-09-15 22:16:48 +02:00 |
|
Chen Shen
|
28c9c1fd98
|
fix windows pg test by skipping (#18649)
|
2021-09-15 11:39:13 -07:00 |
|
Antoni Baum
|
7e95f330d5
|
[ci] Fix xgboost_ray install from git (#18640)
|
2021-09-15 18:07:15 +01:00 |
|
Antoni Baum
|
d50ff16ccf
|
[ci] Fix HEBO breaking Tune tests (#18629)
|
2021-09-15 10:01:29 -07:00 |
|
Kai Fricke
|
0223ae9605
|
[xgboost] Bump xgboost_ray requirements_upstream.txt version to 0.1.3 (#18632)
|
2021-09-15 18:01:15 +01:00 |
|
Edward Oakes
|
7736cdd91d
|
[dashboard] Rename "new_dashboard" -> "dashboard" (#18214)
|
2021-09-15 11:17:15 -05:00 |
|
Edward Oakes
|
7d0a2b39e3
|
[runtime_env] Remove dynamically imported setup_hook (#18601)
|
2021-09-15 10:19:55 -05:00 |
|
Antoni Baum
|
eeb67a42cc
|
pip install xgboost_ray -> xgboost_ray[default] (#18607)
Co-authored-by: Kai Fricke <kai@anyscale.com>
|
2021-09-15 14:45:56 +01:00 |
|
Kai Fricke
|
15a83d104d
|
[ci/release] remove legacy release tests (#18592)
|
2021-09-15 14:42:58 +01:00 |
|
Kai Fricke
|
c186253fc5
|
[github] fix feature request template (#18627)
|
2021-09-15 11:33:19 +01:00 |
|
DK.Pino
|
9d41aafcce
|
Adapt GitHub new issue template (#18516)
|
2021-09-15 00:57:57 -07:00 |
|
Sven Mika
|
8a00154038
|
[RLlib] Bump tf version in ML docker to tf==2.5.0; add tfp to ML-docker. (#18544)
|
2021-09-15 08:46:37 +02:00 |
|
Sven Mika
|
c5d20849ae
|
[RLlib] Rename rllib rollout into rllib evaluate (backward compatible) to match Trainer API. (#18467)
|
2021-09-15 08:45:17 +02:00 |
|
qicosmos
|
d7c631209b
|
[C++ Worker]Add api get placement group (#18535)
|
2021-09-15 14:11:31 +08:00 |
|
qicosmos
|
15881acffd
|
[C++ Worker]Update cpp worker doc (#18537)
|
2021-09-15 14:11:17 +08:00 |
|
Simon Mo
|
497c5f56fa
|
[CI] Temporary disable worker-in-container test (#18606)
* revert again
* disable tmp
|
2021-09-14 22:38:20 -07:00 |
|
SangBin Cho
|
0684531e22
|
[Test] Break down placement group tests (#18612)
|
2021-09-14 21:55:18 -07:00 |
|
SangBin Cho
|
b8c361d3fb
|
[Test] Mark app config failure as a infra failure (#18614)
|
2021-09-14 17:20:05 -07:00 |
|
Eric Liang
|
d1f348cd9d
|
[RFC] Split the list of libraries into ML vs production
|
2021-09-14 16:32:07 -07:00 |
|
Chris K. W
|
cc1d7b8174
|
[client] Refactors for Reconnect PR (#18484)
* add refactors
* add worker annotation
* Regenerate credentials by default
* use self._secure
* infer secure if credentials provided
* separate _shutdown
|
2021-09-14 16:13:35 -07:00 |
|
Eric Liang
|
15512c27c2
|
Revert "Revert "Route core worker ERROR/FATAL logs to driver logs (#1… (#18604)
|
2021-09-14 13:32:07 -07:00 |
|
SangBin Cho
|
31e1638fb3
|
[CLI] Improve ray status for placement groups (#18289)
|
2021-09-14 11:29:13 -07:00 |
|
Stephanie Wang
|
344f2d9073
|
[core] Fix race condition in distributed ref counting (#18584)
|
2021-09-14 11:02:59 -07:00 |
|
Kai Fricke
|
c8188ea70e
|
[ci/rllib] wait for stress test cluster (#18603)
|
2021-09-14 19:01:22 +01:00 |
|
Kai Fricke
|
6777e24293
|
[ci] Add release test owner overview file (#18590)
|
2021-09-14 11:00:31 -07:00 |
|
Sven Mika
|
08c09737fa
|
[RLlib] Fix R2D2 (torch) multi-GPU issue. (#18550)
|
2021-09-14 19:58:10 +02:00 |
|