Philipp Moritz
ff82af1588
Clean up requirements.txt ( #12136 )
2020-11-19 09:27:09 -08:00
Xianyang Liu
9481ecd180
[data] MLDataset based on ParallelIterator ( #11849 )
2020-11-19 00:33:37 -08:00
Barak Michener
2fe1321c3f
[ray_client] __getattr__ for the API Import interface ( #12089 )
...
* move all things that import real-ray into the server folder
* change the import line and have a __getattr__-able API stub
* formatting
* remove unused (duplicated) util file
* Remove module methods (but leave comment on why)
2020-11-18 22:42:02 -08:00
Ian Rodney
a74f1885db
Revert "[CLI] Fix ray commands when RAY_ADDRESS used ( #11989 )" ( #12135 )
...
* Revert "[CLI] Fix ray commands when RAY_ADDRESS used (#11989 )"
This reverts commit d23d326560
.
* only check environment for CLI commands
* use new fns
* fixing docs
* rename and return "auto"
* Update python/ray/_private/services.py
Co-authored-by: Eric Liang <ekhliang@gmail.com>
* Update services.py
* Update services.py
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-11-18 22:41:10 -08:00
dHannasch
5bc4976550
More informative error message if ray start fails to connect to Redis ( #11880 )
...
* Chain original redis.ConnectionError. More importantly, print out the address so people don't have to dig out --logging-level debug to get the number wait_for_redis_to_start() already knows.
Check the Redis password.
* f
2020-11-18 19:28:10 -08:00
Richard Liaw
0d388c4d31
[autoscaler] remove unnecessary print output ( #12131 )
2020-11-18 18:33:48 -08:00
Richard Liaw
2bb6db5e64
[tune] temporary revert of verbosity changes ( #12132 )
2020-11-18 18:27:41 -08:00
Ameer Haj Ali
4717fcd9c0
[autoscaler] give max_workers precedence over min_workers in resource demand scheduler ( #12106 )
2020-11-18 16:24:48 -08:00
Ameer Haj Ali
d826452e0b
[autoscaler] fix max_workers bug in resource_demand_scheduler by counting the head node ( #12123 )
2020-11-18 15:24:38 -08:00
Ian Rodney
e086ddc18f
[core] Add Recursive task cancelation ( #11923 )
2020-11-18 15:18:40 -08:00
Alex Wu
e9c9ba9c9f
[New Scheduler] Don't start tasks if the owner is dead ( #12050 )
2020-11-18 11:34:19 -08:00
Ameer Haj Ali
eef624750c
[ray client] ray wait() implementation ( #12072 )
2020-11-18 11:33:57 -08:00
Kai Fricke
2b60c5774b
[tune] cache checkpoint serialization ( #12064 )
2020-11-18 09:03:53 -08:00
Ian Rodney
d23d326560
[CLI] Fix ray commands when RAY_ADDRESS used ( #11989 )
...
* [CLI] Fix ray commands when RAY_ADDRESS used
* erics suggestion
2020-11-17 23:44:59 -08:00
Philipp Moritz
b96516e9d3
[core] Remove google dependency ( #12085 )
2020-11-17 19:01:00 -08:00
fangfengbin
f400333841
[Placement Group]Placement Group supports gcs failover(Part2) ( #12003 )
...
* add testcase
* fix ut
* fix review comment
* fix review comment
* fix review comments
* fix ut bug
* add part code
* add part code
* add part code
* add testcase
* add part code
* fix ut bug
* fix ut timeout bug
* fix ut bug
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2020-11-18 10:59:26 +08:00
Simon Mo
c476037c97
[Core] Async API should raise on all RayError ( #12043 )
...
Before this PR we are raising just RayTaskError, this means errors
like RayActorError(Actor Died) won't be propogated and thrown at
`await object_ref`. This PR fixes that.
2020-11-17 17:20:30 -08:00
Stephanie Wang
f6bdd5ab17
[New Scheduler] Spillback from the queue of tasks assigned to the local node ( #12084 )
2020-11-17 16:13:59 -08:00
Richard Liaw
ca44222e03
[minor] log info instead of error upon ray.init rerun ( #12025 )
2020-11-17 12:59:24 -08:00
fangfengbin
7f050c706b
[PlacementGroup]Skip flaky testcase ( #12065 )
...
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2020-11-17 12:21:34 -08:00
Simon Mo
d7c95a4a90
[Serve] Rewrite Router to be Embeddable ( #12019 )
2020-11-17 08:28:18 -08:00
Maksim Smolin
23926f3e6e
[CLI] Docker Support ( #11761 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-17 00:04:39 -08:00
chaokunyang
bea0031491
fix linux wheel build ( #9896 )
2020-11-17 15:49:42 +08:00
Amog Kamsetty
f10cef93c7
[sgd] support operator.device ( #12056 )
2020-11-16 21:44:27 -08:00
Eric Liang
380df89069
Lazily initialize the global state accessor in Python workers ( #12054 )
...
* wip
* fix
* fix
2020-11-16 21:35:12 -08:00
Max Fitton
90574b66cc
pin aiohttp to the 3.x.x version ( #12051 )
2020-11-16 21:54:16 -05:00
Richard Liaw
51d277f2e4
[tests] fix mock for test_cli ( #12055 )
...
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-16 18:44:15 -08:00
Stephanie Wang
c49554fb7a
Abstract plasma store creation request queue ( #12039 )
2020-11-16 17:09:15 -08:00
Kai Fricke
9f5986ee58
[tune] logger migration to ExperimentLogger classes ( #11984 )
2020-11-16 15:08:37 -08:00
Alan Guo
3dc68533a9
make some private rsync, and exec_cluster arguments public ( #11958 )
...
* make some private rsync, and exec_cluster arguments public
* fix format issue
* undo make all_nodes public
2020-11-16 14:31:41 -08:00
Ameer Haj Ali
8d599bb3f5
[autoscaler] Move fill out resources to bootstrap config to cache the resources and avoid expensive boto3 calls ( #12028 )
2020-11-16 13:28:57 -08:00
fyrestone
0c6bb745cd
Fix dashboard agent use incorrect ip ( #12038 )
2020-11-16 14:02:20 -06:00
SangBin Cho
f56d7c1a76
[Logging] Remove per worker job log file / support worker log rotation ( #11927 )
...
* In progress.
* MVP done.
* In Progress.
* Remove unnecessay code.
* Fix some issues.
* Fix test failures.
* Addressed code review + fix object spilling test failure.
2020-11-16 11:29:43 -08:00
Kai Fricke
8609e2dd90
[tune] refactor verbosity levels ( #11767 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-16 10:32:53 -08:00
Keqiu Hu
a50128079d
[tune/placement group] dist. training placement group support ( #11934 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-16 01:11:39 -08:00
fangfengbin
8fb926565c
[Placement Group]Placement Group supports gcs failover (Part1) ( #11933 )
2020-11-16 14:42:56 +08:00
dHannasch
d35de2272d
[Core] Allow redis.ResponseError instead of redis.AuthenticationError ( #12024 )
...
* redis.ResponseError
* there really is no way to make this look good, is there
2020-11-15 15:04:56 -08:00
Simon Mo
ac9610b19d
[Autoscaler] Precisely match docker HOME ( #12020 )
...
* [Autoscaler] Precisely match docker HOME
The current grep will match any env variable keyed by HOME. This will
include some unwanted variables like PYTHONHOME, PROJECT_HOME, etc.
Depending on the order of the environment variable, the subsequent
docker setup command might fail.
* fstring
2020-11-15 11:49:50 -08:00
Richard Liaw
8b3f79f307
[tune] refactor and add examples ( #11931 )
2020-11-14 20:43:28 -08:00
dHannasch
5891759a3e
Clarify get_node_ip_address docstring ( #11881 )
2020-11-14 15:20:58 -08:00
dHannasch
9fbeefd604
Distinguish a bad --redis-password from any other Redis error ( #11893 )
2020-11-13 17:39:44 -06:00
Simon Mo
277558895d
[Serve] Introduce Long Polling ( #11905 )
2020-11-13 13:17:20 -08:00
Eric Liang
00ef1179c0
[object spilling] Autocreate dir if not exists ( #11999 )
2020-11-13 12:13:06 -08:00
Ian Rodney
f936ea35fe
[hotfix] Fix ResourceDemandScheduler ( #11996 )
...
* [hotfix] Fix ResourceDemandScheduler
* fix test_autoscaler
2020-11-13 00:42:16 -08:00
Ian Rodney
3b56a1a522
[docker] auto-populate shared memory size ( #11953 )
2020-11-12 17:22:42 -08:00
Barak Michener
272edcca94
[ray_client]: Implement function calls ( #11922 )
2020-11-12 16:49:34 -08:00
Eric Liang
a6a8e777f3
[autoscaler] Interpret autoscaling_speed as 1/x-1 of previous target util fraction ( #11961 )
...
* tweak
* update
2020-11-12 16:23:50 -08:00
Ian Rodney
9254de0b02
[autoscaler] Fix custom node resources on head ( #11896 )
2020-11-12 10:30:04 -08:00
Gekho457
ad639f12d8
[autoscaler/k8s] Preliminary k8s operator ( #11929 )
2020-11-12 11:58:02 -06:00
Kai Fricke
02c02369ca
[tune] Fix hpo randint limits ( #11946 )
...
Co-authored-by: Sumanth Ratna <sumanthratna@gmail.com>
2020-11-12 08:45:49 -08:00