Updates TensorflowPredictor to use the new _predict_pandas API.
Also as agreed upon offline, removes the extra configurations from TensorflowPredictor (column selection, concatenation) in favor of having this be done via a Preprocessor.
We added drop_columns() API to datasets in #26200, so updating documentation here to use the new API - doc/source/data/examples/nyc_taxi_basic_processing.ipynb. In addition, fixing some minor typos after proofreading the datasets documentation.
Uses the new AIR Train API for examples and tests.
The `Result` object gets a new attribute - `log_dir`, pointing to the Trial's `logdir` allowing users to access tensorboard logs and artifacts of other loggers.
This PR only deals with "low hanging fruit" - tests that need substantial rewriting or Train user guide are not touched. Those will be updated in followup PRs.
Tests and examples that concern deprecated features or which are duplicated in AIR have been removed or disabled.
Requires https://github.com/ray-project/ray/pull/25943 to be merged in first
This PR unified the semantics of some workflow APIs.
Those workflow APIs acts on workflow tasks so they could be blocked for a long time. So we have both the blocking and non-blocking versions for them: xxx for blocking and xxx_async for non-blocking APIs.
This is a simple example that shows how to do OCR with Ray Datasets. It includes:
- How to upload and download the dataset to and from S3
- How to run OCR on the dataset with tesseract
- How to use actors to keep around and re-use a spaCy context for doing NLP on the data
Co-authored-by: Clark Zinzow <clarkzinzow@gmail.com>
The existing docs didn't work for me and these updates did. 🤷♀️ I selectively pulled this stuff out of the CI (which ideally would just be runnable locally).
In Ray 2.0, we want to achieve api server HA.
Originally serve endpoints are in head node.
This pr moves serve endpoints to dashboard agents, so they will be HA due to multiple replica of dashboard agent.
This PR adds supported for specifying an exception allowlist (List[Exception]) as the retry_exceptions argument, such that an application-level exception will only be retried if it is in the allowlist.
Update documentation to use `session.report`.
Next steps:
1. Update our internal caller to use `session.report`. Most importantly, CheckpointManager and DataParallelTrainer.
2. Update `get_trial_resources` to use PGF notions to incorporate the requirement of ResourceChangingScheduler. @Yard1
3. After 2 is done, change all `tune.get_trial_resources` to `session.get_trial_resources`
4. [internal implementation] remove special checkpoint handling logic from huggingface trainer. Optimize the flow for checkpoint conversion with `session.report`.
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
1. Update `DummyTrainer` to take `num_epochs` instead of `runtime_seconds`.
1. Ray Train expects equal number of calls to `train.report()`. Different workers may run at different speeds and terminate after different epoch numbers, which causes an error.
2. Add `generate_epochs` to support `DatasetPipeline` when `use_stream_api` is True.
3. Update `__main__` code to support testing different configurations.
This PR:
* Allows the user to set `keep_checkpoints_num` and `checkpoint_score_attr` in `RunConfig` using the `CheckpointStrategy` dataclass
* Adds two new fields to the `Result` object - `best_checkpoints` - a list of saved best checkpoints as determined by `CheckpointingConfig`.
As the integration logging callbacks are commonly used with AIR Trainers, they should be moved from the tune package to the air package. The old imports will still work, but raise a deprecation warning.
This PR
Adds a warning about a known issue to the KubeRay section of the Ray docs.
Updates the description of the feature state of KubeRay integration.
Adds some links to the KubeRay docs.
Currently unqualified `conda install` is installing 1.44.0 whereas `ray` is requiring 1.43.0 in `pip install`, thus the instructions are cancelling themselves out and you end with an unusable installation due to no symbols for `grpcio` in ARM
Co-authored-by: Simon Mo <simon.mo@hey.com>
This PR renames the `suggest` package to `search` and alters the layout slightly.
In the new package, the higher-level abstractions are on the top level and the search algorithms have their own subdirectories.
In a future refactor, we can turn algorithms such as PBT into actual `SearchAlgorithm` classes and move them into the `search` package.
The main reason to keep algorithms and searchers in the same directory is to avoid user confusion - for a user, `Bayesopt` is as much a search algorithm as e.g. `PBT`, so it doesn't make sense to split them up.