Commit graph

2174 commits

Author SHA1 Message Date
Archit Kulkarni
fa7a934bb9
[Doc] [Serve] Add note about relationship between serve autoscaler and ray autoscaler (#24414) 2022-05-03 13:54:19 -07:00
Eric Liang
d178645f18
[docs] Add documentation on how to handle read-only arrays and actor reprs (#24410) 2022-05-02 23:52:54 -07:00
Antoni Baum
cf1c5f2ccf
[docs] Restore external markdown stubs (#24357)
This PR introduces a modification to the external markdown logic in doc build to restore the original file content after build is finished. This ensures that the files are not accidentally committed.
2022-05-02 15:37:40 +01:00
fede
9a6e0538ea
Pythonic assert for initialization (#24378) 2022-05-01 22:01:10 -07:00
Simon Mo
3378e1924e
[Serve] Rename input_schema to http_adapter and clarify it in doc (#24353) 2022-04-29 16:14:04 -07:00
Antoni Baum
ff0ced1a64
[AIR] HuggingFaceTrainer&Predictor implementation (#23876)
Implements HuggingFaceTrainer & HuggingFacePredictor.
2022-04-29 14:31:54 -07:00
Balaji Veeramani
2190f7ff25
[Datsets] Add SimpleTensorFlowDatasource (#24022)
This PR makes it easier to use TensorFlow datasets with Ray Datasets.
2022-04-29 12:15:30 -07:00
Shawn
43ed78f6fd
[Datasets] Integrate Mars-on-Ray with Datasets; improve docs and add tests (#23402)
Add Mars-on-Ray + Datasets integration; improve Mars-on-Ray docs and add tests.
2022-04-29 09:43:52 -07:00
Sven Mika
ba14f0a41b
[RLlib] PGTrainer config object class (PGConfig). (#24295) 2022-04-28 22:25:16 +02:00
Balaji Veeramani
2fdea6e24f
[Datasets] Add SimpleTorchDatasource (#23926)
It's difficult to use torchvision datasets with Ray ML. This PR makes it easier to use Torch datasets with Ray Data.
2022-04-28 11:56:45 -07:00
Dmitri Gekhtman
d68c1ecaf9
[kuberay] Test Ray client and update autoscaler image (#24195)
This PR adds KubeRay e2e testing for Ray client and updates the suggested autoscaler image to one running the merge commit of PR #23883 .
2022-04-27 18:02:12 -07:00
Simon Mo
ee528957c7
[Serve][Doc] Update docs about input schema, and json_request adapter (#24191) 2022-04-27 14:51:07 -07:00
Max Pumperla
553c8a85b6
[docs] [serve] Extended Gradio notebook example for Ray Serve deployments (#23494) 2022-04-27 10:03:28 -07:00
Kai Fricke
61a9de732f
[docs/tune] Small fixes to tune-distributed for new restore modes (#24220)
We've updated restore modes, so we should reflect that in the docs.
2022-04-26 22:19:49 +01:00
Kai Fricke
c0ec20dc3a
[tune] Next deprecation cycle (#24076)
Rolling out next deprecation cycle:

- DeprecationWarnings that were `warnings.warn` or `logger.warn` before are now raised errors
- Raised Deprecation warnings are now removed
- Notably, this involves deprecating the TrialCheckpoint functionality and associated cloud tests
- Added annotations to deprecation warning for when to fully remove
2022-04-26 09:30:15 +01:00
Amog Kamsetty
ae9c68e75f
[Train] Fully deprecate Ray SGD v1 (#24038)
Ray SGD v1 has been denoted as a deprecated API for a while. This PR fully deprecates Ray SGD v1. An error will be raised if ray.util.sgd package is attempted to be imported.

Closes #16435
2022-04-25 16:12:57 -07:00
matthewdeng
cc08c01ade
[ml] add more preprocessors (#23904)
Adding some more common preprocessors:
* MaxAbsScaler
* RobustScaler
* PowerTransformer
* Normalizer
* FeatureHasher
* Tokenizer
* HashingVectorizer
* CountVectorizer

API docs: https://ray--23904.org.readthedocs.build/en/23904/ray-air/getting-started.html

Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
2022-04-25 21:12:59 +01:00
Chen Shen
cb8d216e62
[Doc][Ray collectives] fix example in the doc. #24162
the example is broken. this pr fixes it.
2022-04-25 11:20:51 -07:00
Brett Göhre
9e0a59d94a
[docs] search algorithm notebook examples (#23924)
Co-authored-by: brettskymind <brett@pathmind.com>
Co-authored-by: Max Pumperla <max.pumperla@googlemail.com>
2022-04-25 11:10:58 -07:00
Jeroen Bédorf
1263015931
[RLlib] Add support for writing env 'info' dicts to output datasets for TFPolicies (for TorchPolicies, these are part of the view-requirements by default and thus written either way). (#24041) 2022-04-25 11:17:50 +02:00
Chen Shen
1d981e0cf1
[doc] fix /cluster/config.html #23720
closes #23560
2022-04-22 10:13:12 -07:00
Kai Fricke
bb341eb1e4
Revert "Revert "[tune] Also interrupt training when SIGUSR1 received"" (#24101)
* Revert "Revert "[tune] Also interrupt training when SIGUSR1 received" (#24085)"

This reverts commit 00595653ed.

Failure in windows has been addressed by conditionally registering the signal handler if available.
2022-04-22 11:27:38 +01:00
Dmitri Gekhtman
8c5fe44542
[KubeRay] Fix autoscaling with GPUs and custom resources, with e2e tests (#23883)
- Closes #23874 by fixing a typo ("num_gpus" -> "num-gpus").
- Adds end-to-end test logic confirming the fix.
- Adds end-to-end test logic confirming autoscaling with custom resources works.
- Slightly refines developer instructions.
- Deflakes test logic a bit by allowing for the event that the head pod changes its identity as the Ray cluster starts up.
2022-04-21 14:54:37 -07:00
xwjiang2010
00595653ed
Revert "[tune] Also interrupt training when SIGUSR1 received" (#24085) 2022-04-21 13:27:34 -07:00
Zyiqin-Miranda
e4a66c0e2e
[doc] Add CloudWatch integration documentation (#22638)
This PR adds documentation for Ray CloudWatch integration.
2022-04-21 09:44:41 -07:00
Kai Fricke
f376dd8902
[tune] Also interrupt training when SIGUSR1 received (#24015)
Ray Tune currently gracefully stops training on SIGINT. However, the Ray core worker prevents SIGINT (and SIGTERM) to be processed by child tasks, which means that Ray Tune runs that are started in remote tasks (e.g. via Ray client) cannot be gracefully interrupted.

In k8s-based cloud tests that used the Ray client to kick off a Ray Tune run, this lead to test flakiness, as final experiment state could not be gracefully persisted to cloud storage.

This PR adds support for SIGUSR1 in addition to SIGINT to interrupt training gracefully.
2022-04-21 13:07:29 +01:00
Antoni Baum
9364ec39e4
[joblib] Make PoolActor's Ray options configurable (#24009)
Makes it possible to configure joblib/multiprocessing `PoolActor`s' Ray options for greater user control. Also adds some type hints.
2022-04-20 06:38:30 -07:00
Amog Kamsetty
7a3ccb93ee
[CI] Separate out banned words check from formatting script (#23998)
The recursive grep in the banned words check can get really messy when running locally depending on each person's directory structure or where the format script is being called from.

Separates the banned words check as a separate script so that it's not called by default in ./format.sh. Also adds this to the documentation
2022-04-19 13:30:37 -07:00
Michael (Mike) Gelbart
7f3031f451
[docs] Fix links and add clarifications to docs contributing page (#23693)
In the [docs contributing page](https://docs.ray.io/en/master/ray-contribute/docs.html), the links to other docs pages point to master/ instead of latest/, which can be a bit confusing since this is not the live version of the docs that people will be used to seeing.

I added a couple additional clarifications and fixed a typo as well. I also mentioned the need for an image and linked to the image directory (though some subprojects have their own image directories as well, which I did not mention).
2022-04-19 17:47:16 +01:00
Edward Oakes
669b38a2d6
[serve] Make monitoring section top-level in the docs (#23919) 2022-04-18 14:46:41 -05:00
Clark Zinzow
395a1c9aa2
[Doc] Fix actor fault tolerance link. (#23972) 2022-04-18 11:49:53 -07:00
Chen Shen
cb02e2f713
[linter] fix broken link in rllib examples #23959
fix broken link in rllib examples
2022-04-17 19:34:38 -07:00
Akash Patel
8eb99428ce
remove unmaintained blist (#23957)
This PR removes the unused `blist` dep. Causing issues during `py310` upgrade path.
2022-04-17 16:06:04 -07:00
Jian Xiao
57f620bd05
[Datasets] Add missing public APIs to Datasets API docs (#23935) 2022-04-16 11:57:38 -07:00
Kai Fricke
bc558eb81d
[docs] Fix link to outdated ci/travis (#23917)
Currently linkcheck is broken because it points to an outdated URI from the recent ci/ folder refactoring.
2022-04-15 07:20:21 +01:00
Archit Kulkarni
0673bde594
[Doc] [runtime env] [Serve] Update serve pip runtime_env doc (#23792) 2022-04-14 15:11:14 -05:00
Siyuan (Ryans) Zhuang
85542c9911
Revert "Revert "[serialization] Enable debugging into pickle backend (#23854)"(#23877)" (#23878)
* Revert "Revert "[serialization] Enable debugging into pickle backend (#23854)" (#23877)"

This reverts commit 12f0dc1faf.

* fix
2022-04-14 11:07:54 -07:00
Kai Fricke
65d9a410f7
[ci] Clean up ci/ directory (refactor ci/travis) (#23866)
Clean up the ci/ directory. This means getting rid of the travis/ path completely and moving the files into sensible subdirectories.

Details:

- Moves everything under ci/travis into subdirectories, e.g. ci/build, ci/lint, etc.
- Minor adjustments to some scripts (variable renames)
- Removes the outdated (unused) asan tests
2022-04-13 18:11:30 +01:00
Kai Fricke
40d3a62aa1
[air/wip] Add batch predictor class (#23808)
What: This class adds a generic BatchPredictor class that offers an interface to run batch inference on Ray datasets. It takes a Predictor class and checkpoint as an input, and provides a predict(dataset) method to run scalable scoring inference.

Why: Currently users have to implement scorers themselves. This is mostly boilerplate and prone to errors, so we should provide a simple solution instead.

Note that this predictor also implements the Predictor interface.
2022-04-13 08:58:08 +01:00
Jiajun Yao
95714cc281
Node affinity scheduling strategy (#23381)
Instead of relying on the node-ip custom resource for static task-to-node placement, this PR introduces an explicit NodeAffinitySchedulingStrategy with the following benefits:

1. Specify node using id instead of ip since ip may not be unique for each node.
2. Support soft constraint so the task can be tolerant to node failures.

After this PR, the node-ip custom resource can be deprecated.
2022-04-12 21:31:26 -07:00
Clark Zinzow
983ef1f2a7
[Datasets] Make from_numpy() more user-friendly. (#23871)
`ray.data.from_numpy()` currently expects to be given a list of ndarray futures, instead of handling concrete ndarrays, as expected (and as allowed by other `from_*` APIs, e.g. `from_pandas`). This PR renames the existing `from_numpy` API to `from_numpy_refs`, and exposes `ray.data.from_numpy`, which takes concrete ndarrays (not object references).
2022-04-12 18:37:59 -07:00
Jian Xiao
6d93e9f0f5
Cleanup the DatasetPipeline references in Getting Started; rename Exchanging to Accessing (#23786) 2022-04-12 17:10:14 -07:00
Clark Zinzow
12f0dc1faf
Revert "[serialization] Enable debugging into pickle backend (#23854)" (#23877)
This reverts commit ef7180365d.
2022-04-12 16:53:20 -07:00
Eric Liang
191c83305b
[minor] Fix minor spelling issue on actor task execution 2022-04-12 16:18:25 -07:00
Edward Oakes
de227ac407
[serve] Add component logger + basic access logging (#23558)
Adds a "component logger" to standardize logging across the HTTP proxy, controller, and deployment replicas.
2022-04-12 18:16:58 -05:00
Siyuan (Ryans) Zhuang
ef7180365d
[serialization] Enable debugging into pickle backend (#23854)
* enable debugging cloudpickle
2022-04-12 13:48:35 -07:00
Antoni Baum
40646eecd4
[AIR] SklearnTrainer & Predictor interfaces (#23803)
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
2022-04-11 15:11:42 -07:00
Eric Liang
1ff874e8e8
[spelling] Add linter rule for mis-capitalizations of RLLib -> RLlib (#23817) 2022-04-10 16:12:53 -07:00
Eric Liang
858d607b19
[data] Fix small doc issues (#23813) 2022-04-09 12:09:08 -07:00
Amog Kamsetty
5a41fb18bd
[Docs] Automatically render latest ray_lightning docs (#23729)
Automatically pull the latest ray_lightning README to render on Ray docs. (#23505)

Depends on ray-project/ray_lightning#135
2022-04-08 16:57:23 -07:00