Commit graph

9966 commits

Author SHA1 Message Date
Edward Oakes
11b6019fb5
[ray client] Fix connecting to a cluster without available CPUs (#19604) 2021-10-21 21:21:50 -05:00
Jiajun Yao
920384f34e
[Doc] Fix Dataset __annotations__ (#19599) 2021-10-21 17:33:55 -07:00
SangBin Cho
cea7fda41a
Revert "Revert "[Dashboard] Disable unnecessary event messages. (#19490)" (#19574)" (#19577)
This reverts commit 699c5aeac6.
2021-10-21 15:36:22 -07:00
SangBin Cho
19e3280824
[Core] Fix shutdown Core worker crash when pg is removed. (#19549)
* fix core worker crash

* remove file

* done
2021-10-21 14:30:54 -07:00
Simon Mo
30d9f8fbae
[Doc] [Serve] Fix code cutoff and broken linkes in deployment.rst (#19573) 2021-10-21 13:47:55 -07:00
Simon Mo
03805d4064
[Serve] Good error message when Serve not installed and ensure Serve installs ray[default] (#19570) 2021-10-21 13:47:29 -07:00
Simon Mo
32e648e5fa
[Serve][Doc] Add Failure Recovery Doc (#19166) 2021-10-21 13:32:42 -07:00
xwjiang2010
3e31526445
[tune] Print warning msg when TrialExecutor is directly inherited. (#17654) 2021-10-21 21:25:38 +01:00
Ameer Haj Ali
923adb6512
Update docs to make sure user does ssh port forwarding from another terminal (#19367) 2021-10-21 13:17:08 -07:00
Simon Mo
03406706b3
[Serve] [Doc] Add Autoscaling Documentation (#19559) 2021-10-21 13:11:29 -07:00
Ian Rodney
0cdf4ae8d0
[AWS] Stop Round Robining AZs (#19051)
* round robin on failure to launch

* still round-robin spot instances

* prioritize first AZ

* no more round-robining

* doc updates

* Order subnets by AZ

* add spot instance advisor link

* ensure we try all AZs

* fix typos
2021-10-21 12:06:44 -07:00
Kai Fricke
7d8ea5e724
[tune] Remove magic results (e.g. config) before calculating trial result metrics (#19583) 2021-10-21 19:36:14 +01:00
Kai Fricke
15cdffe0ff
[tune] Only try to sync driver if sync_to_driver is actually enabled (#19589) 2021-10-21 19:35:35 +01:00
Eric Liang
eb24b08ced
Relax the check on object size changing 2021-10-21 11:05:54 -07:00
Oscar Knagg
15ca575078
Account for Windows return characters (#19590) 2021-10-21 10:05:20 -07:00
SangBin Cho
7cfd170d01
Temporarily disable event framework for 1.8 #19587
Although event framework seems to work, it has an issue that it prints ERROR level severity events to the stderr, which eventually is streamed to the driver. Before we add this to the prod, we should fix this issue. To have enough time to fix it, we will turn off the feature temporarily.
2021-10-21 09:51:02 -07:00
Travis Addair
c6e2161dbc
[Train] Fixed HorovodBackend to automatically detect network interfaces (#19533)
* Moved Horovod into package

* Move in Ludwig fix

* Undo git mv

* Cleanup

* Cleanup

* flake8

* Update python/ray/train/backends/horovod.py

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>

* Whitespace

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
2021-10-21 09:13:11 -07:00
Amog Kamsetty
f1f334c348
[Train] Backwards Compatibility for TrainingCallback (#19566) 2021-10-21 09:11:34 -07:00
SangBin Cho
9000f41aa6
[Nightly Test] Support memory profiling on Ray + implement memory monitor for nightly tests (#19539)
* random fixes

* Done

* done

* update the doc

* doc lint fix

* .

* .
2021-10-21 07:37:05 -07:00
matthewdeng
b3b739266e
[docs] add dask compatibility for 1.8.0 (#19578) 2021-10-21 07:26:07 -07:00
Tobias Kaymak
0e50701bbe
[Serve] Typo in kv_store.py (#19454)
Fixing typo in init of RayS3KVStore class
2021-10-21 07:24:34 -07:00
Yi Cheng
7a7b356899
[Nightly test] add test for grpc broadcasting (#19579) 2021-10-21 07:01:41 -07:00
Qing Wang
048e7f7d5d
[Core] Port concurrency groups with asyncio (#18567)
## Why are these changes needed?
This PR aims to port concurrency groups functionality with asyncio for Python.

### API
```python
@ray.remote(concurrency_groups={"io": 2, "compute": 4})
class AsyncActor:
    def __init__(self):
        pass

    @ray.method(concurrency_group="io")
    async def f1(self):
        pass

    @ray.method(concurrency_group="io")
    def f2(self):
        pass

    @ray.method(concurrency_group="compute")
    def f3(self):
        pass

    @ray.method(concurrency_group="compute")
    def f4(self):
        pass

    def f5(self):
        pass
```
The annotation above the actor class `AsyncActor` defines this actor will have 2 concurrency groups and defines their max concurrencies, and it has a default concurrency group.  Every concurrency group has an async eventloop and a pythread to execute the methods which is defined on them.

Method `f1` will be invoked in the `io` concurrency group. `f2` in `io`, `f3` in `compute` and etc.
TO BE NOTICED, `f5` and `__init__` will be invoked in the default concurrency.

The following method `f2` will be invoked in the concurrency group `compute` since the dynamic specifying has a higher priority.
```python
a.f2.options(concurrency_group="compute").remote()
```

### Implementation
The straightforward implementation details are:
 - Before we only have 1 eventloop binding 1 pythread for an asyncio actor. Now we create 1 eventloop binding 1 pythread for every concurrency group of the asyncio actor.
- Before we have 1 fiber state for every caller in the asyncio actor. Now we create a FiberStateManager for every caller in the asyncio actor. And the FiberStateManager manages the fiber states for concurrency groups.


## Related issue number
#16047
2021-10-21 21:46:56 +08:00
Antoni Baum
a04b02e2e8
[tune] Better bad Stopper type message (#19496) 2021-10-21 14:31:27 +01:00
Kai Fricke
44fb7d09df
[tune] sync_client: Fix delete template formatting (#19553) 2021-10-21 10:59:54 +01:00
Patrick Ames
20d47873c9
[data] Add pickle support for PyArrow CSV WriteOptions (#19378) 2021-10-21 00:46:52 -07:00
Matti Picus
bacd5f92e2
MAINT: cleanups for windows (#19430)
* dead processes should increment total_stopped

* use psutil in testing to check pid

* remove unneeded repititions
2021-10-20 23:32:35 -07:00
Yi Cheng
cba8480616
[dashboard] Fix the wrong metrics for grpc query execution time in server side (#19500)
## Why are these changes needed?
It looks like the metrics set on server side are wrong. The time the query is constructed sometimes is not the time we get the query. This PR fixed this.

## Related issue number
2021-10-20 23:06:35 -07:00
Oscar Knagg
5a05e89267
[Core] Add TLS/SSL support to gRPC channels (#18631) 2021-10-20 22:39:11 -07:00
heng2j
6d23fb1ff1
[Tune] Support custom tags in MLflow logger callback (#19532)
* Added Food Collector support to rllib/env/unity3d_env.py

* feat(mlflow): added parameter tags to MLflowLoggerCallback

* fix(unit_test): added tags tests in test_integration_mlflow.MLflowTest()

* chore:  lint the changes in this PR

* update

* Update python/ray/tune/integration/mlflow.py

* fix

* copy

* fix

Co-authored-by: zla0368 <zhongheng.li@stresearch.com>
Co-authored-by: Li, Zhongheng <zhongheng.li@str.us>
Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
2021-10-20 22:31:33 -07:00
SangBin Cho
085162c68e
[Log] Print actor & task name upon stderr messages. (#19542) 2021-10-20 22:16:32 -07:00
Eric Liang
699c5aeac6
Revert "[Dashboard] Disable unnecessary event messages. (#19490)" (#19574)
This reverts commit 7fb681a35d.
2021-10-20 20:17:57 -07:00
Eric Liang
48ecb1f88a
[data] Fix O(n^2) issues in simple_block sort (#19543) 2021-10-20 18:26:20 -07:00
Philipp Moritz
45f1ff0fa9
[Windows] Update react-scripts dependency for dashboard (#19489) 2021-10-20 17:57:30 -07:00
SangBin Cho
7fb681a35d
[Dashboard] Disable unnecessary event messages. (#19490)
* Disable unnecessary event messages.

* use warning

* Fix tests
2021-10-20 17:40:25 -07:00
Edward Oakes
bcf584294f
[runtime_env] Refactor working dir packaging code into runtime_env.packaging module (#19112) 2021-10-20 18:38:50 -05:00
gjoliver
44a4e42172
[rllib] Add entropy_coeff_schedule support for APPO. (#19544)
* Add entropy_coeff_schedule support for APPO.

* lint
2021-10-20 14:18:01 -07:00
Eric Liang
7daf28f348
Revert "[Test] Fix flaky test_gpu test (#19524)" (#19562)
This reverts commit 39e54cd276.
2021-10-20 12:21:19 -07:00
Clark Zinzow
88c5fcde8c
[Datasets] Unrevert Arrow table copy method change. (#19534) 2021-10-20 11:57:36 -07:00
Jiao
c51f79bca6
[runtime_env] Support remote s3 package in runtime env (#19315) 2021-10-20 10:41:54 -05:00
mwtian
aaff6901dd
[Pubsub] refactor pubsub to support different channel types (#19498)
* refactor pubsub to support different channel types

* fix

* use std::string for key id

* fix mock

* fix
2021-10-20 07:02:55 -07:00
Kai Fricke
71564040ec
[ci/release] Unwrap after installing pip packages (#19552) 2021-10-20 13:41:16 +01:00
Jiajun Yao
39e54cd276
[Test] Fix flaky test_gpu test (#19524) 2021-10-19 22:36:34 -07:00
Yi Cheng
01b899dafb
[nightly] Fix broken test due to bad syntax #19536 (#19536) 2021-10-19 21:43:46 -07:00
Simon Mo
59eef6521b
[Serve] Use regular dict for handle caching (#19162) 2021-10-19 21:27:01 -07:00
Yi Cheng
7a9cedfc5c
[nightly] Add grpc based broadcasting into nightly test for decision_tree (#19531)
* dbg

* up

* check

* up

* up

* put grpc based one into nightly test

* up
2021-10-19 19:59:39 -07:00
Jiajun Yao
4fc5b11c68
Simple block dataset groupBy (#19435) 2021-10-19 19:53:13 -07:00
Eric Liang
eacfbf8be2
[data] Don't shuffle during repartition by default (#19379) 2021-10-19 19:46:22 -07:00
SangBin Cho
3222d39fb8
[Dashboard] Dashboard memory improvement (#19385)
* many ppo profiling

* completed

* improve memory usage lint

* revert temporarily

* Addressed code review

* Fix a test
2021-10-19 19:34:42 -07:00
Simon Mo
30c8c073a2
[Doc] Generate sitemap (#19375) 2021-10-19 14:14:17 -07:00