Commit graph

3460 commits

Author SHA1 Message Date
Kai Fricke
3d72000826
[tune] Add points_to_evaluate to BasicVariantGenerator (#12916)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-12-17 19:16:03 -08:00
Edward Oakes
c7a59b239f
Remove unused endpoints_to_remove (#12946) 2020-12-17 15:04:11 -06:00
Gekho457
82f9c7014e
[K8s] Retry getting home directory in command runner. (#12925) 2020-12-17 09:41:48 -08:00
Yi Cheng
40032541dc
[core] Introduce fetch_local to ray.wait (#12526) 2020-12-16 23:44:28 -08:00
SangBin Cho
057687e534
[New Scheduler] Fix test_failure.py by supporting infeasible tasks (#12738)
* Fix the first issue.

* ip

* In Progress.

* In progress.

* done.

* Remove unnecessary logs.

* Addressed code review + fix some test failures.

* Try fixing issues.

* Fix issues.

* Fix test issues.

* Fix issues.

* done.
2020-12-16 21:27:50 -08:00
Philipp Moritz
ad036fd564
Fix continue for debugger (#12862) 2020-12-16 16:09:13 -08:00
Amog Kamsetty
dd522a71a1
[SGD] Disable Elastic Training by default when using with Tune (#12927) 2020-12-16 15:37:44 -08:00
Alex Wu
8b783ecafa
Fix pull manager retry (#12907) 2020-12-16 14:18:43 -08:00
Ameer Haj Ali
c677b9e201
[autoscaler] Fix flaky autoscaler test (#12918) 2020-12-16 14:18:27 -08:00
Edward Oakes
fdb4c6eb1c
Better message for too little /dev/shm memory (#12896) 2020-12-16 10:30:20 -06:00
fangfengbin
91878d18b5
[PlacementGroup]Fix placement group wait api disorder bug (#12827)
* [PlacementGroup]Fix placment group wait api disorder bug

* fix review comment

* fix review comment

* fix review comment

* fix review comments

* increase num_heartbeats_timeout

Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2020-12-16 18:45:53 +08:00
Richard Liaw
a7caa14d3d
[k8s] avoid bad error messages (#12871) 2020-12-15 15:00:02 -08:00
Edward Oakes
f4b5a8b2f7
[serve] Re-enable test_failure.py (#12891) 2020-12-15 16:02:04 -06:00
Richard Liaw
87cf1a97e5
[core] recover startup logs (#12876) 2020-12-15 13:49:45 -08:00
Edward Oakes
6795d7c75c
[serve] Fix flaky test_api.py::test_backend_user_config (#12892) 2020-12-15 15:35:30 -06:00
Kai Fricke
ea1228074d
[tune] enable points_to_eval for all search algorithms (#12790)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-12-15 11:51:53 -08:00
Simon Mo
fdd85e3af4
[Serve] Add benchmark for async handles (#12858) 2020-12-15 11:21:51 -08:00
Alex Wu
0031723ace
[New scheduler] Object spilling (#12857) 2020-12-15 11:05:38 -08:00
architkulkarni
ba12fb1451
Fix for RLIMIT patch (#12882)
Implement new soft limit introduced by https://github.com/ray-project/ray/pull/12853.
2020-12-15 10:38:46 -08:00
Max Fitton
e077bc4206
[Release] Bump master to 1.2.0 for 1.1.0 release (#12856) 2020-12-15 09:40:26 -08:00
Simon Mo
b291dd4486
[Metrics] Call GetMeasureDoubleByName to prevent override (#12860) 2020-12-15 09:39:39 -08:00
Gekho457
5a142d5bd6
Use nightly images in all kubernetes examples. (#12868) 2020-12-14 20:49:41 -08:00
Simon Mo
b56db5a22f
[Serve] Wait for actor name to be cleaned up (#12215) 2020-12-14 15:09:43 -08:00
architkulkarni
231518e86f
[Serve] Support basic Starlette response types (#12811) 2020-12-14 17:03:56 -06:00
Eric Liang
1eb4ac12b1
Clip RLIMIT_NOFILE increase to avoid redis failing to start on Big Sur 2020-12-14 14:05:19 -08:00
SangBin Cho
69b0bc2132
[Logging] Use file handle temporalily (#12839) 2020-12-14 11:42:44 -08:00
Gekho457
11ce1dc743
Ray cluster CRD and example CR + multi-ray-cluster operator (#12098) 2020-12-14 10:26:01 -06:00
Tao Wang
35f7d84dbe
Revert heartbeat interval to keep ci stable (#12836)
* Revert heartbeat interval to keep ci stable

* fix missing one
2020-12-14 16:58:40 +08:00
Eric Squires
22c1968d62
Runing -> Running (#12826) 2020-12-13 22:23:48 -08:00
Ameer Haj Ali
aaa11941f6
[autoscaler] Fix flaky autoscaler test (#12829) 2020-12-13 17:09:30 -08:00
DK.Pino
153b24746c
[Placement Group] Refactor pg resource constrain in node manager (#12538)
* first version by pointer

* second version reference

* clean up

* add cpp ut

* lint

* extract LocalPlacementGroupManagerInterface

* lint

* fix commemt

* add idempotency test

* lint

* fix pg ut

* fix pg ut

* python lint

* fix pg ut timeout

* python lint

* fix comment

* lint

* lint
2020-12-12 23:32:15 -08:00
Eric Liang
bdc6624da8
Revert "[PlacementGroup]Add PlacementGroup wait python api (#12601)" (#12825)
This reverts commit 401d342602.
2020-12-12 12:13:48 -08:00
Richard Liaw
2f2bd884a3
[tune] upgrade gpytorch, bump default pytorch to 1.7.0 (#12776)
* upgrade gpytorch

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>

* pin

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>

* version-torch

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>

* fix-build

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2020-12-12 10:35:33 -08:00
Richard Liaw
7e09f1d934
remove-xgboost-build (#12822)
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2020-12-12 10:34:56 -08:00
Kai Fricke
5f04ade6ef
[tune] add more stoppers and stopper documentation (#12750)
* Add new stoppers & docs

* Add tests for maximum iteration stopper and trial plateau stopper

* Update python/ray/tune/stopper.py

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* Update doc/source/tune/api_docs/stoppers.rst

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* Update doc/source/tune/api_docs/stoppers.rst

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* Apply suggestions from code review

* Apply suggestions from code review

* Update python/ray/tune/stopper.py

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-12-12 01:47:19 -08:00
Kai Fricke
905652cdd6
[tune] migrate xgboost callback api (#12745)
* Migrate to new-style xgboost callbacks

* Fix flaky progress reporter test

* Fix import error

* Take last value (not first)
2020-12-12 01:42:20 -08:00
Kai Fricke
42c70be073
[tune] Hyperopt: Directly accept category variables instead of indices (#12715)
* [tune] Hyperopt: Directly accept category variables instead of indices

* Fix interrupt test

* Update python/ray/tune/suggest/hyperopt.py

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* Apply suggestions from code review

* Update python/ray/tune/suggest/hyperopt.py

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* lint

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-12-12 01:40:53 -08:00
Hao Zhang
0b1fbc5e83
[PR 1/6] Collective in Ray (#12637)
Co-authored-by: YLJALDC <dal177@ucsd.edu>
2020-12-12 01:26:36 -08:00
Alex Wu
aa64cd4534
[New scheduler] Fix test_global_state (#12586) 2020-12-11 21:47:01 -08:00
Edward Oakes
03d869d51c
Hold GIL while submitting (actor) tasks (#12803) 2020-12-11 21:47:16 -06:00
Edward Oakes
aec5c9879e
Add tests for atexit handler behavior (#12808) 2020-12-11 21:47:05 -06:00
Edward Oakes
6262ee1f76
Clarify docs for atexit behavior when using ray.kill (#12807) 2020-12-11 21:45:39 -06:00
Eric Liang
1ce745cf44
Add automatic local GC and plasma debug logs every 10 minutes by default (#12804) 2020-12-11 17:09:58 -08:00
Simon Mo
3d8c1cbae6
[Serve] Fix Serve Release Tests (#12777) 2020-12-11 11:53:47 -08:00
fangfengbin
9ded69fdaa
[Hotfix] Fix python client lint error (#12783) 2020-12-11 10:15:53 -08:00
Simon Mo
68d7fa2137
Fix exit_actor in asyncio mode (#12693) 2020-12-11 09:35:17 -08:00
Edward Oakes
699ded5328
[serve] Initial commit for CLI (#12770) 2020-12-11 10:31:29 -06:00
Tao Wang
295b6e5ce4
Split heartbeat message (#12535)
* first

* xxx

* Split heartbeat message

* only report resource usage when changed

* Fix GetAllResourceUsage

* Fix report resource usage

* Increase default heartbeat interval

* regularize heartbeat interval in test case
2020-12-11 21:19:57 +08:00
Stephanie Wang
86b0741026
[new scheduler] Allocate resources for spilled back task to a local view of the remote node (#12711)
* Force report heartbeats if remote resources may be dirty

* lint

* typo

* typo

* unit test

* debug

* Revert "lint"

This reverts commit 6dc7e982ffee98185665eb7c3c8fde0d91938919.

* Revert "Force report heartbeats if remote resources may be dirty"

This reverts commit cbfa9405197df62f874107d55b46715ceae2abd2.

* Local view of resources

* debug travis

* debug

* debug

* debug

* weaken test

* cleanups

* lint

* Revert "debug travis"

This reverts commit 11ff5f4f84e64e9fbd4eecda5b3c7fd07a7130a4.

* revert

* const view, remove unused
2020-12-10 22:43:29 -05:00
Barak Michener
b7f246c451
[ray_client] Include multiple facets of the Ray API (#12736) 2020-12-10 19:09:34 -08:00