Commit graph

10126 commits

Author SHA1 Message Date
Alex Wu
045d72cdc0
[docs] Fix typo in installation instructions (#19721) 2021-10-25 15:30:34 -07:00
Eric Liang
66818d11b8
Revert "[data] Add serialized size estimator to block builder (#19681)" (#19717)
This reverts commit 8c37311c41.
2021-10-25 15:06:58 -07:00
Eric Liang
8c37311c41
[data] Add serialized size estimator to block builder (#19681) 2021-10-25 14:58:49 -07:00
SangBin Cho
ecd5a622ef
[Tests] Add a memory usage on dask on ray tests (#19674) 2021-10-25 14:58:26 -07:00
SangBin Cho
544f774245
[Autoscaler/Core] Drain node API (#19350)
* Initial version done. Graceful shutdown  is possible with direct raylet RPCs

* .

* .

* ip

* Done.

* done tests might fail

* fix lint + cpp tests

* fix 2

* Fix issues.

* Addressed code review.

* Fix another cpp test failure

* completed

* Skip windows tests

* Update the comment

* complete

* addressed code review.
2021-10-25 14:57:50 -07:00
Linsong Chu
13d4894789
[workflow] Add get_metadata() for workflow (#19372)
## Why are these changes needed?

Add the functionality to retrieve metadata for a workflow or workflow step.

Design:
- Similar to `get_output`, this will either return the metadata for workflow (`workflow.get_metadata(workflow_id)`) or the metadata for a specific step (`workflow.get_metadata(workflow_id, step_id)`)
- Exceptions will only be raised if workflow id or step id not exist. Canceled job, running job, etc. will return proper metadata by retrieving information from checkpoint. See [here](8c8ca609d7/python/ray/workflow/tests/test_metadata_get.py (L67)) for more details.
- Returned metadata is an aggregated result from multiple checkpoint files based on previous [discussion](https://github.com/ray-project/ray/issues/17090#issuecomment-920481789). The aggregation logic is [here for step metadata](8c8ca609d7/python/ray/workflow/workflow_storage.py (L451)) and [here for workflow metadata](8c8ca609d7/python/ray/workflow/workflow_storage.py (L484)) which can be tuned with further discussion.

Example:
```python
>>>  user_step_metadata = {"k1": "v1"}
>>>  user_run_metadata = {"k2": "v2"}
>>>  step_name = "simple_step"
>>>  workflow_id = "simple"

>>>  @workflow.step
>>>  def simple():
>>>      return 0

>>>  simple.options(name=step_name, metadata=user_step_metadata).step().run(workflow_id, metadata=user_run_metadata)

# get workflow-level metadata
>>>  workflow.get_metadata("simple")
{'status': 'SUCCESSFUL',
 'user_metadata': {'k2': 'v2'},
 'stats': {'start_time': 1634173413.116535, 'end_time': 1634173413.149051}}

# get step-level metadata
>>> workflow.get_metadata("simple", "simple_step")
{'name': '__main__.simple',
 'step_type': 'FUNCTION',
 'workflows': [],
 'max_retries': 3,
 'workflow_refs': [],
 'catch_exceptions': False,
 'ray_options': {},
 'user_metadata': {'k1': 'v1'},
 'stats': {'start_time': 1634173413.131262, 'end_time': 1634173413.1347651}}
```

## Related issue number
https://github.com/ray-project/ray/issues/17090
2021-10-25 14:52:51 -07:00
Alex Wu
58b28f04cd
[docs/usability] Apple Silicon support (#19705)
This PR puts the final touches on apple silicon support. There are 3 main caveats to supporting M1 macs right now (described in the docs):

Requires using forge.
Requires special installation instructions to get grpc working (this is an underlying grpc issue, so ideally it will be fixed upstream).
We're only publishing release wheels, not nightlies right now.
This also includes a grpc import check to ensure that we provide an actionable error message if the user tries the regular pip install ray process to properly install grpcio.
2021-10-25 14:49:28 -07:00
DK.Pino
e3ced0e59e
[Core] [Placement Group] Fix bundle reconstruction when raylet fo after gcs fo (#19452)
* fixed

* lint

* add cxx ut

* fix comment

* Revert "fix comment"

This reverts commit 32ea2558166a7674d7efe2e0c0a66ea7409c7d99.

* fix comment
2021-10-25 14:15:36 -07:00
architkulkarni
2c64b2b0e8
[Doc] Move all contribution info to getting-involved.html and link to it from CONTRIBUTING.rst (#19571) 2021-10-25 14:23:23 -05:00
Eric Liang
6081cf870e
Try enabling event stats by default (#19650) 2021-10-25 12:19:34 -07:00
Eric Liang
27a5b546ad
Make ArrowRow less scary (#19686) 2021-10-25 12:18:42 -07:00
Jiajun Yao
e4542be0d1
[Java] Run java on mac with public ip (#19701) 2021-10-25 11:38:33 -07:00
Tao Wang
ff7d35d246
[Core]Add test case for cached named actor (#19510)
## Why are these changes needed?
Recently we found a bug about named actor cache, only in internal codebase but not community, and the case is not covered by test case so we didn't know before user telling us.

This add an extra test to cover it.

Bug Detail: we didn't publish actor's name when the actor is dead so the cache keep the name to the old actor handle. The owner of this actor cannot sense this bug because the cache didn't apply to the owner currently.
2021-10-25 11:37:41 -07:00
xwjiang2010
46266b15f0
[tune] Avoid looping through _live_trials twice in _get_next_trial. (#19596) 2021-10-25 19:26:55 +01:00
chenk008
b65aca9002
flush stdout/stderr to avoid empty log in docker start block (#19546) 2021-10-25 10:58:48 -07:00
architkulkarni
414910b7fc
[test] [runtime env] Add release test with Ray Client and local pip files (#19026) 2021-10-25 11:49:27 -05:00
architkulkarni
f101f7cc02
[runtime_env] Allow specifying runtime env in @ray.remote decorator with Ray Client (#19626) 2021-10-25 10:32:31 -05:00
Sven Mika
b213565783
[RLlib] Fix failing test cases: Soft-deprecate ModelV2.from_batch (in favor of ModelV2.__call__). (#19693) 2021-10-25 15:00:00 +02:00
Kai Fricke
6e455e59d8
[tune] Verbosely/gracefully handle empty experiment checkpoints (#19641) 2021-10-25 13:41:18 +01:00
Kai Fricke
0cfa267fde
[tune] Fix shim error message for scheduler (#19642) 2021-10-25 11:16:16 +01:00
gjoliver
89fbfc00f8
[RLlib] Some minor cleanups (buffer buffer_size -> capacity and others). (#19623) 2021-10-25 09:42:39 +02:00
roireshef
9b0352f363
[RLlib] Added LearningRateSchedule and EntropyCoeffSchedule to TF and Torch versions of A3C and PPO (#19276) 2021-10-25 09:39:35 +02:00
gjoliver
c3c42278e4
[RLlib] clean up all the SampleBatch['is_training'] deprecation warnings (#19652)
* [RLlib] clean up all the SampleBatch['is_training'] deprecation warnings.

* wip
2021-10-25 09:38:56 +02:00
Renos Zabounidis
41dd037ae9
[RLlib; Docs] Correcting documentation with respect to postprocess_trajectory (#19672)
postprocess_trajectory is referred to incorrectly in the rllib-environments documentation. When defining a custom policy, a user never directly modifies Policy.postprocess_trajectory, they define postprocess_fn, which is in turn called by postprocess_trajectory.
2021-10-25 09:37:58 +02:00
Jiajun Yao
f6a0165286
Add dependabot for data processing (#19682) 2021-10-24 20:49:43 -07:00
SangBin Cho
aa9eb6499c
[Test] skip pg restart test (#19670) 2021-10-24 16:53:29 -07:00
Philipp Moritz
22eef65134
[Windows] Suppress 'Windows fatal exception: access violation' (#19561)
* suppress 'Windows fatal exception: access violation'

* lint

* update

* Update python/ray/_private/log_monitor.py

Co-authored-by: Matti Picus <matti.picus@gmail.com>

* fixE

* re-introduce mattip's fix again

* update

* .

Co-authored-by: Matti Picus <matti.picus@gmail.com>
Co-authored-by: Alex <alex@anyscale.com>
2021-10-24 11:23:23 -07:00
dependabot[bot]
0cd05403b0
Bump pillow from 7.2.0 to 8.3.2 in /doc (#18422)
Bumps [pillow](https://github.com/python-pillow/Pillow) from 7.2.0 to 8.3.2.
- [Release notes](https://github.com/python-pillow/Pillow/releases)
- [Changelog](https://github.com/python-pillow/Pillow/blob/master/CHANGES.rst)
- [Commits](https://github.com/python-pillow/Pillow/compare/7.2.0...8.3.2)

---
updated-dependencies:
- dependency-name: pillow
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-10-23 18:36:14 -07:00
dependabot[bot]
5ed1530170
[tune](deps): Bump starlette in /python/requirements/ml (#18691)
Bumps [starlette](https://github.com/encode/starlette) from 0.14.2 to 0.16.0.
- [Release notes](https://github.com/encode/starlette/releases)
- [Changelog](https://github.com/encode/starlette/blob/master/docs/release-notes.md)
- [Commits](https://github.com/encode/starlette/compare/0.14.2...0.16.0)

---
updated-dependencies:
- dependency-name: starlette
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-10-23 18:34:14 -07:00
dependabot[bot]
55ab8da3c8
[tune](deps): Bump accelerate in /python/requirements/ml (#19057)
Bumps [accelerate](https://github.com/huggingface/accelerate) from 0.3.0 to 0.5.1.
- [Release notes](https://github.com/huggingface/accelerate/releases)
- [Commits](https://github.com/huggingface/accelerate/compare/v0.3.0...v0.5.1)

---
updated-dependencies:
- dependency-name: accelerate
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-10-23 18:33:01 -07:00
dependabot[bot]
9201687b34
[tune](deps): Bump pytorch-lightning in /python/requirements/ml (#19059)
Bumps [pytorch-lightning](https://github.com/PyTorchLightning/pytorch-lightning) from 1.4.5 to 1.4.9.
- [Release notes](https://github.com/PyTorchLightning/pytorch-lightning/releases)
- [Changelog](https://github.com/PyTorchLightning/pytorch-lightning/blob/master/CHANGELOG.md)
- [Commits](https://github.com/PyTorchLightning/pytorch-lightning/compare/1.4.5...1.4.9)

---
updated-dependencies:
- dependency-name: pytorch-lightning
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-10-23 18:30:59 -07:00
dependabot[bot]
bea802cb80
[tune](deps): Bump wandb in /python/requirements/ml (#19646)
Bumps [wandb](https://github.com/wandb/client) from 0.10.29 to 0.12.5.
- [Release notes](https://github.com/wandb/client/releases)
- [Changelog](https://github.com/wandb/client/blob/master/CHANGELOG.md)
- [Commits](https://github.com/wandb/client/compare/v0.10.29...v0.12.5)

---
updated-dependencies:
- dependency-name: wandb
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-10-23 18:23:52 -07:00
Amog Kamsetty
97878162f4
[ActorGroup] Retry ActorGroup (#19658) 2021-10-23 16:37:29 -05:00
Edward Oakes
445fb0ee99
[runtime_env] Deflake test_runtime_env_working_dir.py (#19665) 2021-10-23 16:35:42 -05:00
Eric Liang
875d19f838
[data] Fix inconsistent naming of to_refs() methods, remove to_arrow() (#19620) 2021-10-23 12:20:23 -07:00
Jiao
e53fecfbd5
[jobs] Initial http jobs server on head node (#19657) 2021-10-23 12:48:16 -05:00
mwtian
d656b3a6d7
[Doc] Update instruction on starting Ray cluster for Ray client (#19653) 2021-10-22 19:14:07 -07:00
Philipp Moritz
dbd61b9e6b
[Dashboard] Include the dashboard in Windows wheels (#19575) 2021-10-22 17:57:36 -07:00
Jiajun Yao
a7b219fea1
[Core] Don't unpickle and run functions exported by other jobs (#19576) 2021-10-22 17:13:20 -07:00
Gagandeep Singh
358aa57474
Fixed usage of `cv_.wait_for` (#19582)
* Fixed usage of cv.wait_for

* Changed method to calculate remaining time out

* Modify timeout_ms -> remaining_timeout_ms
2021-10-22 16:23:13 -07:00
Edward Oakes
b4673daac6
[ray client] Add test that ray.init doesn't require resources to connect (#19635) 2021-10-22 18:21:53 -05:00
Alex Wu
31d89be926
[Workflow] Basic event support (#19239)
* basics

* .

* .

* a test

* a test

* tests

* cleanup

* concepts page

* docs

* polish

* fix sleep

* fix yi things

* lint

* fix

* .

* .

* .

* fix?

* .

Co-authored-by: Alex Wu <alex@anyscale.com>
2021-10-22 15:27:33 -07:00
Edward Oakes
c9258aff0f
Revert "[ActorGroup] Add ActorGroup (#18960)" (#19655)
This reverts commit 4f05bac8fb.
2021-10-22 14:55:17 -07:00
shrekris-anyscale
cfae64ebe8
[multiprocessing] Modify Ray's map_async() to match Multiprocessing's map_async() behavior (#19403) 2021-10-22 16:31:34 -05:00
Gagandeep Singh
2f8da8f8c8
Bumped timeout due to slow test times in Windows (#19595) 2021-10-22 13:48:15 -07:00
Jiao
f0be4cb390
[jobs] Add job manager class for simple jobs python APIs (#19567) 2021-10-22 14:18:11 -05:00
Jiajun Yao
43b8f8e522
Revert "Revert "[Test] Fix flaky test_gpu test (#19524)" (#19562)" (#19643)
This reverts commit 7daf28f348.
2021-10-22 11:48:57 -07:00
Yi Cheng
48fb86a978
[core] Fix the spilling back failure in case of node missing (#19564)
## Why are these changes needed?
When ray spill back, it'll check whether the node exists or not through gcs, so there is a race condition and sometimes raylet crashes due to this.

This PR filter out the node that's not available when select the node.

## Related issue number
#19438
2021-10-22 11:22:07 -07:00
mwtian
530f2d7c5e
[Pubsub] Wrap Redis-based publisher in GCS to allow incrementally switching to the GCS-based publisher (#19600)
## Why are these changes needed?
The most significant change of the PR is the `GcsPublisher` wrapper added to `src/ray/gcs/pubsub/gcs_pub_sub.h`. It forwards publishing to the underlying `GcsPubSub` (Redis-based) or `pubsub::Publisher` (GCS-based) depending on the migration status, so it allows incremental migration by channel.
   -  Since it was decided that we want to use typed ID and messages for GCS-based publishing, each member function of `GcsPublisher` accepts a typed message.

Most of the modified files are from migrating publishing logic in GCS to use `GcsPublisher` instead of `GcsPubSub`.

Later on, `GcsPublisher` member functions will be migrated to use GCS-based publishing.

This change should make no functionality difference. If this looks ok, a similar change would be made for subscribers in GCS client.

## Related issue number
2021-10-22 10:52:36 -07:00
Edward Oakes
0760fe869d
[runtime_env] Clean up working dir tests, add more test cases (#19597) 2021-10-22 12:35:27 -05:00