Commit graph

5545 commits

Author SHA1 Message Date
Sven Mika
bab9c0f670
[RLlib; Docs overhaul] Redo: Docstring cleanup: Trainer, trainer_template, Callbacks."" (#19830) 2021-11-01 21:45:11 +01:00
Alex Wu
80fb3f10ae
[ci] Script for building M1 wheels (#19925)
This PR includes a script for building wheels for Macs with M1 processors. It roughly follows the pattern of the other scripts with a few differences.

Manually installs nvm
Uses miniforge conda to install python/pip instead of python foundation .pkgs
Doesn't pin numpy (we probably shouldn't be pinning it in the other scripts either...)
Commit detection falls back to git instead of erroring
All of these changes were made so that the script works on a laptop, which comes with a subset of the dependencies that the x86 buildkite image comes with.
2021-11-01 11:44:59 -07:00
Hao Zhang
a03c4363b5
[Collective] Allow send/recv partial tensors in Send/Recv primitives (#19921) 2021-11-01 10:25:43 -07:00
Edward Oakes
ee57025be6
[serve] Rename BackendConfig -> DeploymentConfig (#19923) 2021-11-01 10:24:02 -07:00
architkulkarni
702bffe072
[runtime env] [test] Enable runtime env nightly test with working_dir reconnection (#19906) 2021-10-31 10:48:48 -05:00
architkulkarni
de8a9b5151
[runtime env] Always print package pushing logs regardless of size (#19897) 2021-10-31 10:47:37 -05:00
Edward Oakes
e507b7ba6e
[serve] Rename BackendVersion -> DeploymentVersion (#19798) 2021-10-31 10:27:19 -05:00
Chen Shen
961742f8e7
[Core] deflake windows test failure (test_task_retry_mini_integration) #19916 2021-10-30 15:13:38 -07:00
Sven Mika
4d945fe651
[RLlib] Issue 19878: Re-instate bare_metal_policy example script (#19881) 2021-10-30 12:50:39 -07:00
Stephanie Wang
630a8cacb3
Revert "[core] Fail objects when pull/reconstruction hangs (#19789)" (#19904)
This reverts commit e6d60d7376.
2021-10-30 10:54:39 -07:00
chenk008
57363995f3
[runtime env] Move container related code to runtime env (#19067) 2021-10-29 16:31:11 -07:00
Jiao
bb0ebb7903
[job submission] Temporarily make pydantic imports conditional (#19827) 2021-10-29 18:09:18 -05:00
Gagandeep Singh
f549e528c7
Bumped time limit in test_cancel::test_comprehensive (#19871) 2021-10-29 15:51:49 -07:00
SangBin Cho
99b5932d06
Add a simple node failure integration test + clean up spammy logs upon node failures (#19695)
* .

* Done

* clean up

* lint

* fix a bug

* lint

* fix issue

* Remove no-op from StartRayLog

* Addressed code review.
2021-10-29 18:42:35 -04:00
architkulkarni
16d3afc665
[serve] Base autoscaling decisions on target num replicas, not current num replicas (#19869) 2021-10-29 17:03:53 -05:00
Eric Liang
456d73754a
[data] Initial pass at support multiple-block returns for read and transform tasks (#19660) 2021-10-29 14:21:56 -07:00
Philipp Moritz
0a5942d8b0
[Documentation] Fix quotes for windows installations (#19859)
* [Documentation] Fix quotes for windows installations

* update

* formatting
2021-10-29 10:54:38 -07:00
Lixin Wei
56301e34b2
[Refactor] Remove ServiceBased Abstraction (#19694)
## Why are these changes needed?

Prior to this PR, we have:
```cpp
class XxxAccessor {}
class ServiceBasedXxxAccessor : public XxxAccessor{}

class GcsClient {}
class ServiceBasedGcsClient : public GcsClient{}
```

However, XxxAccessor has only one implementation: ServiceBasedXxxAccessor. And GcsClient has only one implementation: ServiceBasedGcsClient.

I think this abstraction is not necessary and will make development hard(I have to modify two files every time).

This PR removes all ServiceBasedXxx and moves its implementations to the base class.

Now we only have:
```cpp
class XxxAccessor {}
class GcsClient {}
```
2021-10-29 10:16:14 -07:00
Gagandeep Singh
9460a5375b
Added retry logic in test_basic::test_ray_options (#19832)
* Added retry logic in test_ray_options

* Applied linting format

* Made test consistent
2021-10-29 10:15:12 -07:00
Edward Oakes
bf23a31017
[job submission] Always generate and return job_id (#19851) 2021-10-29 09:09:54 -05:00
SangBin Cho
16dcff4091
[Core/RuntimeEnv] Fix runtime environment hanging issues. (#19823)
* done

* Add a right test

* Fix unit tests

* fix issues
2021-10-29 07:01:56 -07:00
Antoni Baum
f2773267c7
[docs] Tune doc fixes (#19791) 2021-10-29 11:45:29 +02:00
Sven Mika
902e854af2
[RLlib; Docs overhaul] Docstring cleanup: Environments. (#19784)
* wip.

* Test: Make a change in tune to trigger tune tests, which are not run otherwise, but seem to fail nevertheless with this PR's changes.

* remove bare_metal_policy_with_custom_view_reqs from tests
2021-10-29 10:46:52 +02:00
Stephanie Wang
e6d60d7376
[core] Fail objects when pull/reconstruction hangs (#19789) 2021-10-28 23:34:51 -07:00
Chris K. W
bd4ad84ead
[Client] Add deprecation warnings for direct ray.client().connect() calls (#18783)
* add deprecation warning

* Update wording

* add test

* actually connect

* add env var tests

* fix message and test

* skip on windows

* add _LocalBuilder case, update test_namespace

* better variable name
2021-10-28 22:06:11 -07:00
Jiajun Yao
760878f950
Handle empty dataset for sort and groupby (#19849) 2021-10-28 18:49:33 -07:00
Simon Mo
0433281ec8
[CI] Bump Serve test_regression to medium for windows (#19844) 2021-10-28 17:49:50 -07:00
Edward Oakes
42ac906313
[job submission] Support passing metadata to the JobConfig (#19845) 2021-10-28 16:40:03 -05:00
SangBin Cho
9126810c41
[Usabiilty] Improve the serialization failure message (#19691)
* Done

* done

* Done

* fix test

* Adressed code review.

* done

* done

* fix mistake

* Skip tests on windows
2021-10-28 14:25:51 -07:00
matthewdeng
bfb0ef1b08
move jsonschema to core dependencies and update default AutoscalerPrometheusMetrics (#19831) 2021-10-28 13:04:22 -07:00
SangBin Cho
96fc875a89
[Core] Improve scheduling observability and fix wrong resource deadlock report message. (#19746) 2021-10-28 11:42:21 -07:00
Amog Kamsetty
1803d88943
[Train] Simplify single worker training (#19814)
* wip

* update

* fix

* fix

* fix

* fix
2021-10-28 10:54:35 -07:00
shrekris-anyscale
6e6fff8857
[serve] Enable deployment of functions/classes that take no parameters (#19708) 2021-10-28 12:53:44 -05:00
Jiao
ed0e2e4fd7
[job submission] Add job_config in subprocess driver script (#19765) 2021-10-28 12:12:51 -05:00
Jiajun Yao
fe8138bfc2
Listen to 127.0.0.1 if node ip is 127.0.0.1 (#19810) 2021-10-28 08:44:23 -07:00
Eric Liang
f60d312259
Try fixing reference counting issue with manual _owner assignment (#19734) 2021-10-28 02:26:35 -07:00
Patrick Ames
8a9f664d75
[data] Add support for custom dataset block write path providers. (#19347)
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2021-10-28 00:12:02 -07:00
Jiajun Yao
7fb65abae1
[data] Fix dataset doc (#19821) 2021-10-27 22:41:09 -07:00
Jiajun Yao
11751a1d87
Arrow block dataset groupBy (#19673) 2021-10-27 16:27:11 -07:00
Edward Oakes
b2e12dc43b
[runtime_env] Add basic support for python modules (#19651) 2021-10-27 17:56:46 -05:00
matthewdeng
aa5499ef0f
[Train] implement CheckpointStrategy (#19111)
* [SGD] implement CheckpointStrategy

* address comments

* update docs

* Update doc/source/train/user_guide.rst

Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>

* best checkpoint

Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
2021-10-27 11:31:04 -07:00
Edward Oakes
1f681981af
[serve] Bump controller max concurrency to 15k, make long poll timeout random (#19790) 2021-10-27 13:28:16 -05:00
Edward Oakes
acc5702535
[runtime_env] Fix hash length in URI (#19777) 2021-10-27 12:22:20 -05:00
Simon Mo
6afbd1f558
[Serve] /api/snapshot works with all Serve KVStores (#19772) 2021-10-26 23:27:38 -07:00
Jiao
3f628d4f6b
increase long poll timeout and wrk trial cpu resource (#19768) 2021-10-26 21:31:39 -07:00
architkulkarni
6bd49a8cd5
[runtime env] Improve working dir messaging (#18893) 2021-10-26 20:58:02 -05:00
Jiajun Yao
47744d282c
[data] Fix arrow dataset sort on empty blocks (#19707) 2021-10-26 15:30:23 -07:00
Eric Liang
2652ae7905
[client] Put of a list should not return a list, this is a client bug (#19737) 2021-10-26 13:51:37 -07:00
iasoon
b5158ca0ab
[serve] Correctly set num_replicas when deploying autoscaling deployment (#19520) 2021-10-26 12:10:59 -05:00
SangBin Cho
00ea716ada
Revert "Revert "[Core] [Placement Group] Fix bundle reconstruction when raylet fo after gcs fo (#19452)" (#19724)" (#19736)
This reverts commit d453afbab8.
2021-10-26 08:25:09 -07:00