SangBin Cho
666fcde8ca
[Placement group] Input validation ( #11152 )
...
* Add a basic input validation.
* Addressed code review.
2020-10-14 13:56:41 -07:00
Ameer Haj Ali
a10e36ca04
Make the logging of gc.collect() freed refs appear in DEBUG not INFO ( #11353 )
2020-10-14 13:14:35 -07:00
Alex Wu
7466ce82df
[Autoscaler] Placement group autoscaling ( #11243 )
2020-10-14 13:11:46 -07:00
Eric Liang
aefcf901d3
[docs] Add sklearn integration link
2020-10-14 13:07:23 -07:00
SangBin Cho
b1481c6acf
Revert "[PlacementGroup]Add node manager test framework ( #11174 )" ( #11398 )
...
This reverts commit 241e765d3a
.
2020-10-14 11:09:20 -07:00
Lingxuan Zuo
149ec5f6bf
[Log] dump stacktrace from glog lib ( #11360 )
...
* dump stacktrace from glog lib
* fix windows compile
* add comments for getcallstack
2020-10-14 10:52:12 -07:00
Kai Yang
abc6126814
[Java] Release actor instance reference when Ray.exitActor()
is invoked ( #11324 )
2020-10-14 13:12:59 +08:00
fangfengbin
c926838411
[GCS]Fix GcsActorManagerTest multithreading bug ( #11361 )
...
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-10-13 21:36:40 -07:00
Simon Mo
5637093f44
Add Serve load testing tool to long running test yaml ( #11386 )
2020-10-13 20:24:57 -07:00
Simon Mo
866193b01c
Fix cluster yaml for serve benchmarks ( #11383 )
...
- Separate out single node and multiple node yamls
- Remove cluster_synced_files, somehow it breaks for me
2020-10-13 19:30:18 -07:00
fangfengbin
241e765d3a
[PlacementGroup]Add node manager test framework ( #11174 )
...
* add part code
* add part code
* add part code
* add part code
* add part code
* add part code
* fix ut bug
* fix ut bug
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-10-13 19:27:11 -07:00
Max Fitton
cd9dcfca0d
[Dashboard] CPU/GPU usage details in actor pane ( #11269 )
2020-10-13 20:23:23 -05:00
Amog Kamsetty
933cf6675c
[Tune] Changes for Pytorch Lightning 1.0 ( #11375 )
2020-10-13 15:50:11 -07:00
Sven Mika
a6a94d3206
[RLlib] Fix test_env_with_subprocess.py. ( #11356 )
2020-10-13 12:42:20 -07:00
J Seppänen
63fa0a53a3
[k8s] Fix kubernetes cloud cluster example configuration ( #11364 )
2020-10-13 12:28:55 -07:00
Ian Rodney
84617f6ff6
[docker] Script for quickly fixing all Latest images ( #11351 )
2020-10-13 09:36:40 -07:00
Simon Mo
39e809fa03
Update microbenchmark script to use Python 3.8 wheel ( #11357 )
2020-10-13 09:27:52 -07:00
fangfengbin
0c02427da2
[GCS]Eviction of destroyed actors cached in GCS ( #11338 )
2020-10-13 15:34:35 +08:00
Lingxuan Zuo
c84a9b457c
[Streaming] add barrier helper tests ( #11107 )
2020-10-13 09:55:55 +08:00
Ian Rodney
6426fb3fff
[CI] Fix-Up Docker Build (Use Python) ( #11139 )
2020-10-12 14:22:51 -07:00
Sven Mika
1ebcdf236f
[RLlib] Add support for custom MultiActionDistributions. ( #11311 )
2020-10-12 13:50:43 -07:00
Sven Mika
0c0f67c14d
[RLlib] ARS/ES eval workers not working: Issue 9933. ( #11308 )
2020-10-12 13:49:48 -07:00
Sven Mika
8ea1bc5ff9
[RLlib] Allow for more than 2^31 policy timesteps. ( #11301 )
2020-10-12 13:49:11 -07:00
Sven Mika
f5e2cda68a
[RLlib] SAC: log_alpha not being learnt when on GPU. ( #11298 )
2020-10-12 13:48:44 -07:00
Julius Frost
7dcfd258cd
[RLlib] Assert LongTensor in SAC Discrete PyTorch ( #11245 )
2020-10-12 13:47:21 -07:00
Sven Mika
580820a530
[RLlib] Create ci/rllib_tests and organize a little ( #11342 )
2020-10-12 12:05:09 -07:00
SangBin Cho
c107eea551
[Core] Do not report stats when worker is already dead. ( #11167 )
...
* Fix.
* Addressed code reivew.
* Done.
2020-10-12 11:57:04 -07:00
SangBin Cho
56f69543d0
Try to deflake test_failure ( #11293 )
2020-10-12 12:03:36 -05:00
Ameer Haj Ali
06fe690682
[autoscaler] Limit max launch concurrency per node type ( #11242 )
2020-10-12 09:45:52 -07:00
Sumanth Ratna
92a58aabce
[tune][docs] Fix learning rate bounds in FAQ ( #11345 )
2020-10-12 09:44:53 -07:00
Alex Wu
175fc41fbc
[Autoscaler] Account for resource backlog size ( #11261 )
2020-10-12 09:43:48 -07:00
Sven Mika
d3bc20b727
[RLlib] ConvTranspose2D module ( #11231 )
2020-10-12 15:00:42 +02:00
fangfengbin
d1579819e9
[GCS]Eviction of dead nodes cached in GCS ( #11323 )
2020-10-12 15:54:32 +08:00
fangfengbin
31117b5e96
[GCS]Add job id to log ( #11331 )
2020-10-12 13:53:08 +08:00
Simon Mo
0d09a17c64
Skip set_result if the future is done ( #11256 )
2020-10-11 22:33:58 -07:00
Alex V. Kotlar
f9a29a6d26
[docs] Fix pip install commands ( #11326 )
2020-10-11 22:12:18 -07:00
Sven Mika
957877ad3f
Tf version of VisionNet (ray/rllib/model/tf/vision_net.py) crashes iff len(conv-filters)=1. ( #11330 )
2020-10-11 12:49:47 +02:00
Richard Liaw
56f858ed1a
[tune][docs/util] gputil check, docs ( #11260 )
...
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
2020-10-10 00:54:31 -07:00
fyrestone
defd41aad7
[Dashboard] http route handler cache ( #10921 )
...
* Add aiohttp_cache to dashboard
* Add comments; Refine code
* Keep NODE_STATS_UPDATE_INTERVAL_SECONDS 1 second; Change AIOHTTP_CACHE_TTL_SECONDS to 2 seconds
* Update merge
Co-authored-by: 刘宝 <po.lb@antfin.com>
2020-10-09 22:27:05 -07:00
SangBin Cho
9dd4561d1b
[Placement Group] Fix stress tests to pass when actors are scheduled. ( #11151 )
...
* Fix stress tests to pass when actors are created.
* Addressed code review.
2020-10-09 21:52:26 -07:00
chaokunyang
0737e78445
[Java] upgrade common-collections version ( #10613 )
2020-10-10 11:16:12 +08:00
Gekho457
48db6f8858
[autoscaler/k8s] namespace permissions problem ( #11270 )
2020-10-09 19:22:20 -05:00
Gekho457
92b4059cad
Replace read_namespaced_pod_status with read_namespaced_pod ( #11278 )
2020-10-09 19:21:39 -05:00
Ian Rodney
5ef1784024
[Autoscaler] Fix sdk ( #11314 )
...
* Use
* [Hotfix] Make Optional[str] default to None
* Fix TempFile
* context manager (with statement)
* use throughout
* drop try/finally
2020-10-09 12:34:29 -07:00
fangfengbin
3eb2b9e216
[GCS]Random eviction of destroyed actors cached in GCS ( #11189 )
...
* add part code
* fix lint error
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-10-09 11:54:47 -07:00
fangfengbin
ca36105d77
[TEST]Fix TestActorSubscribeAll bug ( #11297 )
...
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-10-09 11:54:27 -07:00
Sumanth Ratna
40071739dc
[docs] fix version warning banner location ( #11286 )
2020-10-08 21:21:42 -07:00
Kai Fricke
b450cb030a
[tune] reuse actors for function API ( #11230 )
...
Co-authored-by: Kristian Hartikainen <kristian.hartikainen@gmail.com>
2020-10-08 16:15:02 -07:00
Thomas Tumiel
587319debc
[tune] move _SCHEDULERS to tune.schedulers and add all available schedulers ( #11218 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-10-08 16:10:23 -07:00
SangBin Cho
6cb00208f7
[Placement Group] Export bundle reservation check method only once. ( #11153 )
...
* Export bundle reservation check method only once.
* Addressed code review.
2020-10-08 16:08:28 -07:00