Sven Mika
124c8318a8
[RLlib] Fix broken test_distributions.py (test_categorical) ( #12915 )
2020-12-17 17:44:26 -06:00
dHannasch
d747071dd9
Test shard_context on already-created boost::asio::io_service. ( #12917 )
2020-12-17 14:26:30 -08:00
Edward Oakes
c7a59b239f
Remove unused endpoints_to_remove ( #12946 )
2020-12-17 15:04:11 -06:00
Gekho457
82f9c7014e
[K8s] Retry getting home directory in command runner. ( #12925 )
2020-12-17 09:41:48 -08:00
Allen
e6cb4f4bd7
[Core] Add log of address and port ( #12908 )
...
Co-authored-by: Allen Yin <allenyin@anyscale.io>
2020-12-17 00:25:29 -08:00
Yi Cheng
40032541dc
[core] Introduce fetch_local to ray.wait
( #12526 )
2020-12-16 23:44:28 -08:00
Tao Wang
12231ec2a6
Optimize heartbeat manager initialization ( #12911 )
2020-12-17 14:24:23 +08:00
SangBin Cho
057687e534
[New Scheduler] Fix test_failure.py by supporting infeasible tasks ( #12738 )
...
* Fix the first issue.
* ip
* In Progress.
* In progress.
* done.
* Remove unnecessary logs.
* Addressed code review + fix some test failures.
* Try fixing issues.
* Fix issues.
* Fix test issues.
* Fix issues.
* done.
2020-12-16 21:27:50 -08:00
Philipp Moritz
ad036fd564
Fix continue for debugger ( #12862 )
2020-12-16 16:09:13 -08:00
Amog Kamsetty
dd522a71a1
[SGD] Disable Elastic Training by default when using with Tune ( #12927 )
2020-12-16 15:37:44 -08:00
Alex Wu
8b783ecafa
Fix pull manager retry ( #12907 )
2020-12-16 14:18:43 -08:00
Ameer Haj Ali
c677b9e201
[autoscaler] Fix flaky autoscaler test ( #12918 )
2020-12-16 14:18:27 -08:00
Edward Oakes
aedcf0c9d9
Disable test_distributions ( #12919 )
2020-12-16 14:17:49 -08:00
Edward Oakes
fdb4c6eb1c
Better message for too little /dev/shm memory ( #12896 )
2020-12-16 10:30:20 -06:00
fangfengbin
91878d18b5
[PlacementGroup]Fix placement group wait api disorder bug ( #12827 )
...
* [PlacementGroup]Fix placment group wait api disorder bug
* fix review comment
* fix review comment
* fix review comment
* fix review comments
* increase num_heartbeats_timeout
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2020-12-16 18:45:53 +08:00
Eric Liang
7ff314a5df
[New scheduler] Also unsubscribe get dependencies on unblock
2020-12-15 20:29:44 -08:00
Richard Liaw
a7caa14d3d
[k8s] avoid bad error messages ( #12871 )
2020-12-15 15:00:02 -08:00
Edward Oakes
f4b5a8b2f7
[serve] Re-enable test_failure.py ( #12891 )
2020-12-15 16:02:04 -06:00
Richard Liaw
87cf1a97e5
[core] recover startup logs ( #12876 )
2020-12-15 13:49:45 -08:00
Edward Oakes
6795d7c75c
[serve] Fix flaky test_api.py::test_backend_user_config ( #12892 )
2020-12-15 15:35:30 -06:00
Kai Fricke
ea1228074d
[tune] enable points_to_eval
for all search algorithms ( #12790 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-12-15 11:51:53 -08:00
Simon Mo
fdd85e3af4
[Serve] Add benchmark for async handles ( #12858 )
2020-12-15 11:21:51 -08:00
Alex Wu
0031723ace
[New scheduler] Object spilling ( #12857 )
2020-12-15 11:05:38 -08:00
Edward Oakes
cde711aaf1
Revert "[RLLib] Execution-Folder Type Annotations ( #12760 )" ( #12886 )
...
This reverts commit becca1424d
.
2020-12-15 11:03:02 -08:00
architkulkarni
ba12fb1451
Fix for RLIMIT patch ( #12882 )
...
Implement new soft limit introduced by https://github.com/ray-project/ray/pull/12853 .
2020-12-15 10:38:46 -08:00
SangBin Cho
de7848231c
[Doc] Fix placement group doc ( #12875 )
2020-12-15 10:36:51 -08:00
Edward Oakes
261b2f9053
Check for raylet PID as ppid in dashboard agent fate-sharing ( #12867 )
2020-12-15 12:13:11 -06:00
Max Fitton
e077bc4206
[Release] Bump master to 1.2.0 for 1.1.0 release ( #12856 )
2020-12-15 09:40:26 -08:00
Simon Mo
b291dd4486
[Metrics] Call GetMeasureDoubleByName to prevent override ( #12860 )
2020-12-15 09:39:39 -08:00
Gekho457
5a142d5bd6
Use nightly images in all kubernetes examples. ( #12868 )
2020-12-14 20:49:41 -08:00
fangfengbin
43b9259d40
[GCS]GCS resource manager support scheduling resource ( #12780 )
...
* add part code
* add part code
* fix review comments
* rebase master
* add part code
* add part code
* fix review comments
* add part code
* fix code style
* fix ut bug
* fix ut bug
* fix review comments
* fix review comment
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2020-12-15 10:27:55 +08:00
Gekho457
8cebe5cbe9
[docs][autoscaler][k8s][minor] quotes #12866
2020-12-14 18:24:13 -08:00
Gekho457
44f5be04ca
[autoscaler][k8s][doc][minor] Fix typo in k8s doc. ( #12865 )
2020-12-14 17:30:43 -08:00
Simon Mo
b56db5a22f
[Serve] Wait for actor name to be cleaned up ( #12215 )
2020-12-14 15:09:43 -08:00
architkulkarni
231518e86f
[Serve] Support basic Starlette response types ( #12811 )
2020-12-14 17:03:56 -06:00
Max Fitton
d0813c1c58
[Dashboard] Add dashboard multi-node churn test ( #11768 )
2020-12-14 17:03:33 -06:00
Richard Liaw
c56799e3da
disable-for-now ( #12838 )
...
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2020-12-14 14:18:31 -08:00
Eric Liang
1eb4ac12b1
Clip RLIMIT_NOFILE increase to avoid redis failing to start on Big Sur
2020-12-14 14:05:19 -08:00
SangBin Cho
69b0bc2132
[Logging] Use file handle temporalily ( #12839 )
2020-12-14 11:42:44 -08:00
Tao Wang
ac53e2f857
[GCS]Tell dead nodes to commit suicide ( #12792 )
...
* [GCS]Tell dead nodes to commit suicide
* fix comment, add ut
2020-12-14 11:42:00 -08:00
Michael Luo
becca1424d
[RLLib] Execution-Folder Type Annotations ( #12760 )
2020-12-14 19:16:44 +01:00
Gekho457
11ce1dc743
Ray cluster CRD and example CR + multi-ray-cluster operator ( #12098 )
2020-12-14 10:26:01 -06:00
Tao Wang
35f7d84dbe
Revert heartbeat interval to keep ci stable ( #12836 )
...
* Revert heartbeat interval to keep ci stable
* fix missing one
2020-12-14 16:58:40 +08:00
Eric Squires
22c1968d62
Runing -> Running ( #12826 )
2020-12-13 22:23:48 -08:00
Ameer Haj Ali
aaa11941f6
[autoscaler] Fix flaky autoscaler test ( #12829 )
2020-12-13 17:09:30 -08:00
Sven Mika
3c808835a5
[RLlib] Issue 12831: AttributeError: 'NoneType' object has no attribute 'id' when using custom Atari env. ( #12832 )
2020-12-13 16:15:54 +01:00
fangfengbin
1e02b28abe
[GCS]Move node resource info to gcs resource manager ( #12775 )
...
* add part code
* add part code
* fix review comments
* fix ut bug
* rebase master
* add part code
* fix ut bug
* fix ut bug
* fix review comments
* fix review comment
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2020-12-13 20:37:34 +08:00
Max Fitton
ac24d1db30
[Dashboard][Bugfix] Fix GPU List Bug ( #12666 )
...
* Fix bug where None was passed as the empty value for ActorInfo.gpu_stats instead of an empty list
* lint
* dashboard/modules/logical_view
* fix test
* trigger build
2020-12-12 23:34:24 -08:00
DK.Pino
153b24746c
[Placement Group] Refactor pg resource constrain in node manager ( #12538 )
...
* first version by pointer
* second version reference
* clean up
* add cpp ut
* lint
* extract LocalPlacementGroupManagerInterface
* lint
* fix commemt
* add idempotency test
* lint
* fix pg ut
* fix pg ut
* python lint
* fix pg ut timeout
* python lint
* fix comment
* lint
* lint
2020-12-12 23:32:15 -08:00
Eric Liang
bdc6624da8
Revert "[PlacementGroup]Add PlacementGroup wait python api ( #12601 )" ( #12825 )
...
This reverts commit 401d342602
.
2020-12-12 12:13:48 -08:00