Eric Liang
380df89069
Lazily initialize the global state accessor in Python workers ( #12054 )
...
* wip
* fix
* fix
2020-11-16 21:35:12 -08:00
Max Fitton
90574b66cc
pin aiohttp to the 3.x.x version ( #12051 )
2020-11-16 21:54:16 -05:00
Richard Liaw
51d277f2e4
[tests] fix mock for test_cli ( #12055 )
...
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-16 18:44:15 -08:00
Stephanie Wang
c49554fb7a
Abstract plasma store creation request queue ( #12039 )
2020-11-16 17:09:15 -08:00
Kai Fricke
9f5986ee58
[tune] logger migration to ExperimentLogger classes ( #11984 )
2020-11-16 15:08:37 -08:00
Alan Guo
3dc68533a9
make some private rsync, and exec_cluster arguments public ( #11958 )
...
* make some private rsync, and exec_cluster arguments public
* fix format issue
* undo make all_nodes public
2020-11-16 14:31:41 -08:00
Ameer Haj Ali
8d599bb3f5
[autoscaler] Move fill out resources to bootstrap config to cache the resources and avoid expensive boto3 calls ( #12028 )
2020-11-16 13:28:57 -08:00
fyrestone
0c6bb745cd
Fix dashboard agent use incorrect ip ( #12038 )
2020-11-16 14:02:20 -06:00
SangBin Cho
f56d7c1a76
[Logging] Remove per worker job log file / support worker log rotation ( #11927 )
...
* In progress.
* MVP done.
* In Progress.
* Remove unnecessay code.
* Fix some issues.
* Fix test failures.
* Addressed code review + fix object spilling test failure.
2020-11-16 11:29:43 -08:00
Kai Fricke
8609e2dd90
[tune] refactor verbosity levels ( #11767 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-16 10:32:53 -08:00
Keqiu Hu
a50128079d
[tune/placement group] dist. training placement group support ( #11934 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-16 01:11:39 -08:00
fangfengbin
8fb926565c
[Placement Group]Placement Group supports gcs failover (Part1) ( #11933 )
2020-11-16 14:42:56 +08:00
dHannasch
d35de2272d
[Core] Allow redis.ResponseError instead of redis.AuthenticationError ( #12024 )
...
* redis.ResponseError
* there really is no way to make this look good, is there
2020-11-15 15:04:56 -08:00
Simon Mo
ac9610b19d
[Autoscaler] Precisely match docker HOME ( #12020 )
...
* [Autoscaler] Precisely match docker HOME
The current grep will match any env variable keyed by HOME. This will
include some unwanted variables like PYTHONHOME, PROJECT_HOME, etc.
Depending on the order of the environment variable, the subsequent
docker setup command might fail.
* fstring
2020-11-15 11:49:50 -08:00
Richard Liaw
8b3f79f307
[tune] refactor and add examples ( #11931 )
2020-11-14 20:43:28 -08:00
dHannasch
5891759a3e
Clarify get_node_ip_address docstring ( #11881 )
2020-11-14 15:20:58 -08:00
dHannasch
9fbeefd604
Distinguish a bad --redis-password from any other Redis error ( #11893 )
2020-11-13 17:39:44 -06:00
Simon Mo
277558895d
[Serve] Introduce Long Polling ( #11905 )
2020-11-13 13:17:20 -08:00
Eric Liang
00ef1179c0
[object spilling] Autocreate dir if not exists ( #11999 )
2020-11-13 12:13:06 -08:00
Ian Rodney
f936ea35fe
[hotfix] Fix ResourceDemandScheduler ( #11996 )
...
* [hotfix] Fix ResourceDemandScheduler
* fix test_autoscaler
2020-11-13 00:42:16 -08:00
Ian Rodney
3b56a1a522
[docker] auto-populate shared memory size ( #11953 )
2020-11-12 17:22:42 -08:00
Barak Michener
272edcca94
[ray_client]: Implement function calls ( #11922 )
2020-11-12 16:49:34 -08:00
Eric Liang
a6a8e777f3
[autoscaler] Interpret autoscaling_speed as 1/x-1 of previous target util fraction ( #11961 )
...
* tweak
* update
2020-11-12 16:23:50 -08:00
Ian Rodney
9254de0b02
[autoscaler] Fix custom node resources on head ( #11896 )
2020-11-12 10:30:04 -08:00
Gekho457
ad639f12d8
[autoscaler/k8s] Preliminary k8s operator ( #11929 )
2020-11-12 11:58:02 -06:00
Kai Fricke
02c02369ca
[tune] Fix hpo randint limits ( #11946 )
...
Co-authored-by: Sumanth Ratna <sumanthratna@gmail.com>
2020-11-12 08:45:49 -08:00
Kristian Hartikainen
07f401d99d
[tune] Fix unflatten dict ( #11948 )
2020-11-12 08:43:15 -08:00
Lee moon soo
9920933e31
[docker] Support non-root container ( #11407 )
2020-11-12 08:41:50 -08:00
SangBin Cho
f80d812799
[Object Spilling] Introduce SpillWorker & RestoreWorker Pool to avoid IO worker deadlock. ( #11885 )
2020-11-11 18:20:14 -08:00
Edward Oakes
73a1cb702b
Split _get_node_provider_cls off from _get_node_provider ( #11949 )
2020-11-11 16:10:46 -06:00
Ameer Haj Ali
85197deece
[autoscaler] Remove legacy autoscaler ( #11802 )
2020-11-11 13:36:48 -08:00
dHannasch
396ae0b7c2
Add docstring for find_redis_address ( #11884 )
2020-11-11 12:24:36 -06:00
Siyuan (Ryans) Zhuang
b8dda0e3d0
[Serialization] Fix buffer alignment issues ( #11888 )
...
* fix buffer alignment issues
* remove unused fields
* aligned memory allocation
* windows compat
* license. fix compiler warnings
* fix compilation error
* reinterpret_cast
2020-11-10 23:44:16 -08:00
Alex Wu
8afd2acdc1
[Autoscaler] simulator placement groups ( #11777 )
2020-11-10 18:10:36 -08:00
Eric Liang
46f3652102
Remove repeat push timeout from object manager ( #11874 )
2020-11-10 16:26:53 -08:00
Keqiu Hu
0c1bdaef59
[tune] TensorFlow Distributed Trainable ( #11876 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-10 14:59:08 -08:00
Richard Liaw
50dbf1a307
[core] Support configurable number of "check for redis" attempts ( #11902 )
...
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-10 14:57:57 -08:00
Ian Rodney
1d158dda32
[serve] Rename to use replicas, not workers ( #11822 )
2020-11-10 11:36:15 -08:00
Eric Liang
9b8218aabd
[docs] Move all /latest links to /master ( #11897 )
...
* use master link
* remae
* revert non-ray
* more
* mre
2020-11-10 10:53:28 -08:00
Nikita Vemuri
aba9288615
[Autoscaler] Introduce callback system ( #11674 )
...
Co-authored-by: Nikita Vemuri <nikitavemuri@Nikitas-MacBook-Pro.local>
Co-authored-by: Xiayue Charles Lin <xcl@anyscale.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-09 20:03:15 -08:00
Eric Liang
ee2da0cf45
[Core] PushManager for reliable broadcast ( #11869 )
2020-11-09 18:01:47 -08:00
Benjamin Black
1999266bba
Updated pettingzoo env to acomidate api changes and fixes ( #11873 )
...
* Updated pettingzoo env to acomidate api changes and fixes
* fixed test failure
* fixed linting issue
* fixed test failure
2020-11-09 16:09:49 -08:00
Eric Liang
a9cf0141a0
[autoscaler] Fix semantics of request_resources ( #11820 )
2020-11-09 14:57:40 -08:00
Edward Oakes
1c132f2ff8
[serve] Improve DEBUG logging for understanding perf ( #11838 )
2020-11-09 14:10:42 -06:00
architkulkarni
adcaabcd64
[Serve] Reconfigure backend class at runtime ( #11709 )
2020-11-09 14:04:51 -06:00
Kai Fricke
287aba6dc3
[tune] schedulers: Add test for context finalization ( #11889 )
2020-11-09 11:37:05 -08:00
Richard Liaw
a09e49ee94
[core] Add retry for reading session name ( #11844 )
2020-11-09 11:22:50 -08:00
Kai Fricke
88be1ea20b
[tune] Handle infinite and NaN values ( #11835 )
2020-11-09 11:18:31 -08:00
Eric Liang
0932320eb3
Move test_joblib back to new_scheduler_broken category ( #11872 )
2020-11-07 20:08:41 -08:00
Stephanie Wang
61e41257e7
[Object spilling] Queue failed object creation requests until objects have been spilled ( #11796 )
...
* Queue creation requests
* Cleanup disconnected clients
* Remove unused
* todo
* FIFO order for create requests, remove warmup for IO workers
* test and lint
* disable test
* lint
* Skip on windows
2020-11-06 18:22:19 -05:00