Commit graph

5349 commits

Author SHA1 Message Date
Eric Liang
50e305e799
[data] Add take_all() and raise error if to_pandas() drops records (#19619) 2021-10-21 22:23:50 -07:00
SangBin Cho
9a050c666d
[Test] Add a stronger resource leak check to pg unit tests. (#19586)
* Add a stronger check to unit tests.

* .
2021-10-21 21:40:00 -07:00
Edward Oakes
11b6019fb5
[ray client] Fix connecting to a cluster without available CPUs (#19604) 2021-10-21 21:21:50 -05:00
Jiajun Yao
920384f34e
[Doc] Fix Dataset __annotations__ (#19599) 2021-10-21 17:33:55 -07:00
SangBin Cho
cea7fda41a
Revert "Revert "[Dashboard] Disable unnecessary event messages. (#19490)" (#19574)" (#19577)
This reverts commit 699c5aeac6.
2021-10-21 15:36:22 -07:00
SangBin Cho
19e3280824
[Core] Fix shutdown Core worker crash when pg is removed. (#19549)
* fix core worker crash

* remove file

* done
2021-10-21 14:30:54 -07:00
Simon Mo
30d9f8fbae
[Doc] [Serve] Fix code cutoff and broken linkes in deployment.rst (#19573) 2021-10-21 13:47:55 -07:00
Simon Mo
03805d4064
[Serve] Good error message when Serve not installed and ensure Serve installs ray[default] (#19570) 2021-10-21 13:47:29 -07:00
xwjiang2010
3e31526445
[tune] Print warning msg when TrialExecutor is directly inherited. (#17654) 2021-10-21 21:25:38 +01:00
Ian Rodney
0cdf4ae8d0
[AWS] Stop Round Robining AZs (#19051)
* round robin on failure to launch

* still round-robin spot instances

* prioritize first AZ

* no more round-robining

* doc updates

* Order subnets by AZ

* add spot instance advisor link

* ensure we try all AZs

* fix typos
2021-10-21 12:06:44 -07:00
Kai Fricke
7d8ea5e724
[tune] Remove magic results (e.g. config) before calculating trial result metrics (#19583) 2021-10-21 19:36:14 +01:00
Kai Fricke
15cdffe0ff
[tune] Only try to sync driver if sync_to_driver is actually enabled (#19589) 2021-10-21 19:35:35 +01:00
Oscar Knagg
15ca575078
Account for Windows return characters (#19590) 2021-10-21 10:05:20 -07:00
Travis Addair
c6e2161dbc
[Train] Fixed HorovodBackend to automatically detect network interfaces (#19533)
* Moved Horovod into package

* Move in Ludwig fix

* Undo git mv

* Cleanup

* Cleanup

* flake8

* Update python/ray/train/backends/horovod.py

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>

* Whitespace

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
2021-10-21 09:13:11 -07:00
Amog Kamsetty
f1f334c348
[Train] Backwards Compatibility for TrainingCallback (#19566) 2021-10-21 09:11:34 -07:00
SangBin Cho
9000f41aa6
[Nightly Test] Support memory profiling on Ray + implement memory monitor for nightly tests (#19539)
* random fixes

* Done

* done

* update the doc

* doc lint fix

* .

* .
2021-10-21 07:37:05 -07:00
Tobias Kaymak
0e50701bbe
[Serve] Typo in kv_store.py (#19454)
Fixing typo in init of RayS3KVStore class
2021-10-21 07:24:34 -07:00
Qing Wang
048e7f7d5d
[Core] Port concurrency groups with asyncio (#18567)
## Why are these changes needed?
This PR aims to port concurrency groups functionality with asyncio for Python.

### API
```python
@ray.remote(concurrency_groups={"io": 2, "compute": 4})
class AsyncActor:
    def __init__(self):
        pass

    @ray.method(concurrency_group="io")
    async def f1(self):
        pass

    @ray.method(concurrency_group="io")
    def f2(self):
        pass

    @ray.method(concurrency_group="compute")
    def f3(self):
        pass

    @ray.method(concurrency_group="compute")
    def f4(self):
        pass

    def f5(self):
        pass
```
The annotation above the actor class `AsyncActor` defines this actor will have 2 concurrency groups and defines their max concurrencies, and it has a default concurrency group.  Every concurrency group has an async eventloop and a pythread to execute the methods which is defined on them.

Method `f1` will be invoked in the `io` concurrency group. `f2` in `io`, `f3` in `compute` and etc.
TO BE NOTICED, `f5` and `__init__` will be invoked in the default concurrency.

The following method `f2` will be invoked in the concurrency group `compute` since the dynamic specifying has a higher priority.
```python
a.f2.options(concurrency_group="compute").remote()
```

### Implementation
The straightforward implementation details are:
 - Before we only have 1 eventloop binding 1 pythread for an asyncio actor. Now we create 1 eventloop binding 1 pythread for every concurrency group of the asyncio actor.
- Before we have 1 fiber state for every caller in the asyncio actor. Now we create a FiberStateManager for every caller in the asyncio actor. And the FiberStateManager manages the fiber states for concurrency groups.


## Related issue number
#16047
2021-10-21 21:46:56 +08:00
Antoni Baum
a04b02e2e8
[tune] Better bad Stopper type message (#19496) 2021-10-21 14:31:27 +01:00
Kai Fricke
44fb7d09df
[tune] sync_client: Fix delete template formatting (#19553) 2021-10-21 10:59:54 +01:00
Patrick Ames
20d47873c9
[data] Add pickle support for PyArrow CSV WriteOptions (#19378) 2021-10-21 00:46:52 -07:00
Matti Picus
bacd5f92e2
MAINT: cleanups for windows (#19430)
* dead processes should increment total_stopped

* use psutil in testing to check pid

* remove unneeded repititions
2021-10-20 23:32:35 -07:00
Oscar Knagg
5a05e89267
[Core] Add TLS/SSL support to gRPC channels (#18631) 2021-10-20 22:39:11 -07:00
heng2j
6d23fb1ff1
[Tune] Support custom tags in MLflow logger callback (#19532)
* Added Food Collector support to rllib/env/unity3d_env.py

* feat(mlflow): added parameter tags to MLflowLoggerCallback

* fix(unit_test): added tags tests in test_integration_mlflow.MLflowTest()

* chore:  lint the changes in this PR

* update

* Update python/ray/tune/integration/mlflow.py

* fix

* copy

* fix

Co-authored-by: zla0368 <zhongheng.li@stresearch.com>
Co-authored-by: Li, Zhongheng <zhongheng.li@str.us>
Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
2021-10-20 22:31:33 -07:00
SangBin Cho
085162c68e
[Log] Print actor & task name upon stderr messages. (#19542) 2021-10-20 22:16:32 -07:00
Eric Liang
699c5aeac6
Revert "[Dashboard] Disable unnecessary event messages. (#19490)" (#19574)
This reverts commit 7fb681a35d.
2021-10-20 20:17:57 -07:00
Eric Liang
48ecb1f88a
[data] Fix O(n^2) issues in simple_block sort (#19543) 2021-10-20 18:26:20 -07:00
SangBin Cho
7fb681a35d
[Dashboard] Disable unnecessary event messages. (#19490)
* Disable unnecessary event messages.

* use warning

* Fix tests
2021-10-20 17:40:25 -07:00
Edward Oakes
bcf584294f
[runtime_env] Refactor working dir packaging code into runtime_env.packaging module (#19112) 2021-10-20 18:38:50 -05:00
Eric Liang
7daf28f348
Revert "[Test] Fix flaky test_gpu test (#19524)" (#19562)
This reverts commit 39e54cd276.
2021-10-20 12:21:19 -07:00
Clark Zinzow
88c5fcde8c
[Datasets] Unrevert Arrow table copy method change. (#19534) 2021-10-20 11:57:36 -07:00
Jiao
c51f79bca6
[runtime_env] Support remote s3 package in runtime env (#19315) 2021-10-20 10:41:54 -05:00
Jiajun Yao
39e54cd276
[Test] Fix flaky test_gpu test (#19524) 2021-10-19 22:36:34 -07:00
Simon Mo
59eef6521b
[Serve] Use regular dict for handle caching (#19162) 2021-10-19 21:27:01 -07:00
Jiajun Yao
4fc5b11c68
Simple block dataset groupBy (#19435) 2021-10-19 19:53:13 -07:00
Eric Liang
eacfbf8be2
[data] Don't shuffle during repartition by default (#19379) 2021-10-19 19:46:22 -07:00
SangBin Cho
3222d39fb8
[Dashboard] Dashboard memory improvement (#19385)
* many ppo profiling

* completed

* improve memory usage lint

* revert temporarily

* Addressed code review

* Fix a test
2021-10-19 19:34:42 -07:00
Simon Mo
48cf366dca
[Hotfix] Pin node version to 14 (#19522) 2021-10-19 14:13:06 -07:00
matthewdeng
19eabd7a55
[train] remove default num_workers (#19518)
* [train] remove default num_workers

* fix tests
2021-10-19 13:53:23 -07:00
matthewdeng
56e46c3c23
[train] add callbacks package compatibility (#19519) 2021-10-19 12:56:49 -07:00
Edward Oakes
4645893a5f
Add prototype of ray.serve.pipeline (#19278) 2021-10-19 11:36:49 -07:00
xwjiang2010
a6f9c93db0
Revert "[Datasets] Add support for slicing Arrow blocks that contain tensor columns. (#19494)" (#19517)
This reverts commit ad03917b8f.
2021-10-19 11:35:04 -07:00
Tao He
1dde588702
[Dataset] Support dataset from a single dataframe/table. (#18205) 2021-10-19 10:27:43 -07:00
Alex Wu
a819e417ac
Revert "[Hotfix] Revert "[Workflow] workflow.delete"" (#19248)
* Revert "Revert "[Workflow] workflow.delete (#19178)" (#19247)"

This reverts commit b59317520d.

* fix

* .

* .

* .

* Revert "."

This reverts commit 423b9b8e7e83f07cb0942b04e568e37ea0c62ba8.

* .

* .

* done?

* 4real

Co-authored-by: Alex <alex@anyscale.com>
2021-10-19 09:47:56 -07:00
Gagandeep Singh
cc00ab74da
[Windows] Fix test_fair_queuing and test_wait_timing (#19456)
* modified timeout in test_fair_qeueing

* bump bounds to pass the tests
2021-10-19 09:27:04 -07:00
architkulkarni
b8941338d3
[runtime env] Raise error when creating runtime env when ray[default] is not installed (#19491) 2021-10-19 09:16:04 -05:00
matthewdeng
4674c78050
[Train] Rename Ray SGD v2 to Ray Train (#19436) 2021-10-18 22:27:46 -07:00
Guyang Song
46b4c7464d
runtime env eager install by default (#19449) 2021-10-19 11:31:14 +08:00
Clark Zinzow
ad03917b8f
[Datasets] Add support for slicing Arrow blocks that contain tensor columns. (#19494) 2021-10-18 20:07:06 -07:00
Simon Mo
6f2eb1f9fa
[Serve] Use ray core metrics for autoscaling (#19038) 2021-10-18 19:32:49 -07:00