Commit graph

5381 commits

Author SHA1 Message Date
xwjiang2010
46266b15f0
[tune] Avoid looping through _live_trials twice in _get_next_trial. (#19596) 2021-10-25 19:26:55 +01:00
chenk008
b65aca9002
flush stdout/stderr to avoid empty log in docker start block (#19546) 2021-10-25 10:58:48 -07:00
architkulkarni
f101f7cc02
[runtime_env] Allow specifying runtime env in @ray.remote decorator with Ray Client (#19626) 2021-10-25 10:32:31 -05:00
Kai Fricke
6e455e59d8
[tune] Verbosely/gracefully handle empty experiment checkpoints (#19641) 2021-10-25 13:41:18 +01:00
Kai Fricke
0cfa267fde
[tune] Fix shim error message for scheduler (#19642) 2021-10-25 11:16:16 +01:00
SangBin Cho
aa9eb6499c
[Test] skip pg restart test (#19670) 2021-10-24 16:53:29 -07:00
Philipp Moritz
22eef65134
[Windows] Suppress 'Windows fatal exception: access violation' (#19561)
* suppress 'Windows fatal exception: access violation'

* lint

* update

* Update python/ray/_private/log_monitor.py

Co-authored-by: Matti Picus <matti.picus@gmail.com>

* fixE

* re-introduce mattip's fix again

* update

* .

Co-authored-by: Matti Picus <matti.picus@gmail.com>
Co-authored-by: Alex <alex@anyscale.com>
2021-10-24 11:23:23 -07:00
dependabot[bot]
5ed1530170
[tune](deps): Bump starlette in /python/requirements/ml (#18691)
Bumps [starlette](https://github.com/encode/starlette) from 0.14.2 to 0.16.0.
- [Release notes](https://github.com/encode/starlette/releases)
- [Changelog](https://github.com/encode/starlette/blob/master/docs/release-notes.md)
- [Commits](https://github.com/encode/starlette/compare/0.14.2...0.16.0)

---
updated-dependencies:
- dependency-name: starlette
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-10-23 18:34:14 -07:00
dependabot[bot]
55ab8da3c8
[tune](deps): Bump accelerate in /python/requirements/ml (#19057)
Bumps [accelerate](https://github.com/huggingface/accelerate) from 0.3.0 to 0.5.1.
- [Release notes](https://github.com/huggingface/accelerate/releases)
- [Commits](https://github.com/huggingface/accelerate/compare/v0.3.0...v0.5.1)

---
updated-dependencies:
- dependency-name: accelerate
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-10-23 18:33:01 -07:00
dependabot[bot]
9201687b34
[tune](deps): Bump pytorch-lightning in /python/requirements/ml (#19059)
Bumps [pytorch-lightning](https://github.com/PyTorchLightning/pytorch-lightning) from 1.4.5 to 1.4.9.
- [Release notes](https://github.com/PyTorchLightning/pytorch-lightning/releases)
- [Changelog](https://github.com/PyTorchLightning/pytorch-lightning/blob/master/CHANGELOG.md)
- [Commits](https://github.com/PyTorchLightning/pytorch-lightning/compare/1.4.5...1.4.9)

---
updated-dependencies:
- dependency-name: pytorch-lightning
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-10-23 18:30:59 -07:00
dependabot[bot]
bea802cb80
[tune](deps): Bump wandb in /python/requirements/ml (#19646)
Bumps [wandb](https://github.com/wandb/client) from 0.10.29 to 0.12.5.
- [Release notes](https://github.com/wandb/client/releases)
- [Changelog](https://github.com/wandb/client/blob/master/CHANGELOG.md)
- [Commits](https://github.com/wandb/client/compare/v0.10.29...v0.12.5)

---
updated-dependencies:
- dependency-name: wandb
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-10-23 18:23:52 -07:00
Amog Kamsetty
97878162f4
[ActorGroup] Retry ActorGroup (#19658) 2021-10-23 16:37:29 -05:00
Edward Oakes
445fb0ee99
[runtime_env] Deflake test_runtime_env_working_dir.py (#19665) 2021-10-23 16:35:42 -05:00
Eric Liang
875d19f838
[data] Fix inconsistent naming of to_refs() methods, remove to_arrow() (#19620) 2021-10-23 12:20:23 -07:00
Jiao
e53fecfbd5
[jobs] Initial http jobs server on head node (#19657) 2021-10-23 12:48:16 -05:00
Philipp Moritz
dbd61b9e6b
[Dashboard] Include the dashboard in Windows wheels (#19575) 2021-10-22 17:57:36 -07:00
Jiajun Yao
a7b219fea1
[Core] Don't unpickle and run functions exported by other jobs (#19576) 2021-10-22 17:13:20 -07:00
Gagandeep Singh
358aa57474
Fixed usage of `cv_.wait_for` (#19582)
* Fixed usage of cv.wait_for

* Changed method to calculate remaining time out

* Modify timeout_ms -> remaining_timeout_ms
2021-10-22 16:23:13 -07:00
Edward Oakes
b4673daac6
[ray client] Add test that ray.init doesn't require resources to connect (#19635) 2021-10-22 18:21:53 -05:00
Alex Wu
31d89be926
[Workflow] Basic event support (#19239)
* basics

* .

* .

* a test

* a test

* tests

* cleanup

* concepts page

* docs

* polish

* fix sleep

* fix yi things

* lint

* fix

* .

* .

* .

* fix?

* .

Co-authored-by: Alex Wu <alex@anyscale.com>
2021-10-22 15:27:33 -07:00
Edward Oakes
c9258aff0f
Revert "[ActorGroup] Add ActorGroup (#18960)" (#19655)
This reverts commit 4f05bac8fb.
2021-10-22 14:55:17 -07:00
shrekris-anyscale
cfae64ebe8
[multiprocessing] Modify Ray's map_async() to match Multiprocessing's map_async() behavior (#19403) 2021-10-22 16:31:34 -05:00
Gagandeep Singh
2f8da8f8c8
Bumped timeout due to slow test times in Windows (#19595) 2021-10-22 13:48:15 -07:00
Jiao
f0be4cb390
[jobs] Add job manager class for simple jobs python APIs (#19567) 2021-10-22 14:18:11 -05:00
Jiajun Yao
43b8f8e522
Revert "Revert "[Test] Fix flaky test_gpu test (#19524)" (#19562)" (#19643)
This reverts commit 7daf28f348.
2021-10-22 11:48:57 -07:00
Edward Oakes
0760fe869d
[runtime_env] Clean up working dir tests, add more test cases (#19597) 2021-10-22 12:35:27 -05:00
Amog Kamsetty
4f05bac8fb
[ActorGroup] Add ActorGroup (#18960)
* move

* fix

* Revert "fix"

This reverts commit 532660fc334ae96a0ff34c8ab1288488312300a3.

* Revert "move"

This reverts commit 54321f4a539c2ee873f17d988da5627588aeff97.

* add

* wip

* wip

* wip

* wip

* address comments

* wip

* add to build

* fix

* fix

* fix
2021-10-22 10:22:31 -07:00
Simon Mo
1eb142b57c
[Serve] Fix shutdown protocol again (#19609) 2021-10-22 09:27:32 -07:00
Jiajun Yao
256bf0bf3a
[Release] Bump up dask to latest compatible version 2021.9.1 (#19592)
* Bump up dask to latest compatible version 2021.9.1

* Bump up dask to latest compatible version 2021.9.1
2021-10-22 09:16:28 -07:00
architkulkarni
030acf3857
[Serve] [Serve Autoscaler] Add upscale and downscale delay (#19290) 2021-10-22 10:33:28 -05:00
xwjiang2010
a632cb439f
[Tune] Remove queue_trials. (#19472) 2021-10-22 09:24:54 +01:00
Stephanie Wang
499d6e9fc1
Turn on reconstruction tests in CI (#19497) 2021-10-21 22:34:44 -07:00
Eric Liang
50e305e799
[data] Add take_all() and raise error if to_pandas() drops records (#19619) 2021-10-21 22:23:50 -07:00
SangBin Cho
9a050c666d
[Test] Add a stronger resource leak check to pg unit tests. (#19586)
* Add a stronger check to unit tests.

* .
2021-10-21 21:40:00 -07:00
Edward Oakes
11b6019fb5
[ray client] Fix connecting to a cluster without available CPUs (#19604) 2021-10-21 21:21:50 -05:00
Jiajun Yao
920384f34e
[Doc] Fix Dataset __annotations__ (#19599) 2021-10-21 17:33:55 -07:00
SangBin Cho
cea7fda41a
Revert "Revert "[Dashboard] Disable unnecessary event messages. (#19490)" (#19574)" (#19577)
This reverts commit 699c5aeac6.
2021-10-21 15:36:22 -07:00
SangBin Cho
19e3280824
[Core] Fix shutdown Core worker crash when pg is removed. (#19549)
* fix core worker crash

* remove file

* done
2021-10-21 14:30:54 -07:00
Simon Mo
30d9f8fbae
[Doc] [Serve] Fix code cutoff and broken linkes in deployment.rst (#19573) 2021-10-21 13:47:55 -07:00
Simon Mo
03805d4064
[Serve] Good error message when Serve not installed and ensure Serve installs ray[default] (#19570) 2021-10-21 13:47:29 -07:00
xwjiang2010
3e31526445
[tune] Print warning msg when TrialExecutor is directly inherited. (#17654) 2021-10-21 21:25:38 +01:00
Ian Rodney
0cdf4ae8d0
[AWS] Stop Round Robining AZs (#19051)
* round robin on failure to launch

* still round-robin spot instances

* prioritize first AZ

* no more round-robining

* doc updates

* Order subnets by AZ

* add spot instance advisor link

* ensure we try all AZs

* fix typos
2021-10-21 12:06:44 -07:00
Kai Fricke
7d8ea5e724
[tune] Remove magic results (e.g. config) before calculating trial result metrics (#19583) 2021-10-21 19:36:14 +01:00
Kai Fricke
15cdffe0ff
[tune] Only try to sync driver if sync_to_driver is actually enabled (#19589) 2021-10-21 19:35:35 +01:00
Oscar Knagg
15ca575078
Account for Windows return characters (#19590) 2021-10-21 10:05:20 -07:00
Travis Addair
c6e2161dbc
[Train] Fixed HorovodBackend to automatically detect network interfaces (#19533)
* Moved Horovod into package

* Move in Ludwig fix

* Undo git mv

* Cleanup

* Cleanup

* flake8

* Update python/ray/train/backends/horovod.py

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>

* Whitespace

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
2021-10-21 09:13:11 -07:00
Amog Kamsetty
f1f334c348
[Train] Backwards Compatibility for TrainingCallback (#19566) 2021-10-21 09:11:34 -07:00
SangBin Cho
9000f41aa6
[Nightly Test] Support memory profiling on Ray + implement memory monitor for nightly tests (#19539)
* random fixes

* Done

* done

* update the doc

* doc lint fix

* .

* .
2021-10-21 07:37:05 -07:00
Tobias Kaymak
0e50701bbe
[Serve] Typo in kv_store.py (#19454)
Fixing typo in init of RayS3KVStore class
2021-10-21 07:24:34 -07:00
Qing Wang
048e7f7d5d
[Core] Port concurrency groups with asyncio (#18567)
## Why are these changes needed?
This PR aims to port concurrency groups functionality with asyncio for Python.

### API
```python
@ray.remote(concurrency_groups={"io": 2, "compute": 4})
class AsyncActor:
    def __init__(self):
        pass

    @ray.method(concurrency_group="io")
    async def f1(self):
        pass

    @ray.method(concurrency_group="io")
    def f2(self):
        pass

    @ray.method(concurrency_group="compute")
    def f3(self):
        pass

    @ray.method(concurrency_group="compute")
    def f4(self):
        pass

    def f5(self):
        pass
```
The annotation above the actor class `AsyncActor` defines this actor will have 2 concurrency groups and defines their max concurrencies, and it has a default concurrency group.  Every concurrency group has an async eventloop and a pythread to execute the methods which is defined on them.

Method `f1` will be invoked in the `io` concurrency group. `f2` in `io`, `f3` in `compute` and etc.
TO BE NOTICED, `f5` and `__init__` will be invoked in the default concurrency.

The following method `f2` will be invoked in the concurrency group `compute` since the dynamic specifying has a higher priority.
```python
a.f2.options(concurrency_group="compute").remote()
```

### Implementation
The straightforward implementation details are:
 - Before we only have 1 eventloop binding 1 pythread for an asyncio actor. Now we create 1 eventloop binding 1 pythread for every concurrency group of the asyncio actor.
- Before we have 1 fiber state for every caller in the asyncio actor. Now we create a FiberStateManager for every caller in the asyncio actor. And the FiberStateManager manages the fiber states for concurrency groups.


## Related issue number
#16047
2021-10-21 21:46:56 +08:00