We shouldn't promote Runtime Environments as the only way to do things until all Core nightly and release tests are run using runtime environments.
This PR adds the prior approach (using cluster launcher commands) to the doc on equal footing, describing the differences between the two.
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
This PR properly exposes `TableRow` as a public API (API docs + the "Public" tag), since it's already exposed to the user in our row-based ops. In addition, the following changes are made:
1. During row-based ops, we also choose a batch format that lines up with the current dataset format in order to eliminate unnecessary copies and type conversions.
2. `TableRow` now derives from `collections.abc.Mapping`, which lets `TableRow` better interop with code expecting a mapping, and includes a few helpful mixins so we only have to implement `__getitem__`, `__iter__`, and `__len__`.
This PR fixes our {NumPy, Pandas} <--> Arrow interop for boolean tensor columns. NumPy and Pandas represent boolean arrays with a byte per boolean, while Arrow bit-packs booleans with 8 booleans per byte. Previously, when casting NumPy arrays to tensor columns, we were interpreting NumPy's boolean array buffers as being bit-packed when they were not. This PR completes support by packing and unpacking bits for boolean arrays when creating a boolean tensor column from an ndarray and when creating an ndarray from a boolean tensor column, respectively.
It seems like the S3 read sometimes fails; #22214. I found out the file actually does exist in S3, so it is highly likely a transient error. This PR adds a retry mechanism to avoid the issue.
To helps resolve the issues users are facing with running Lightning examples with Ray Tune PyTorchLightning/pytorch-lightning#10407
Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
Updates to address @worldveil's feedback:
Include import train.torch in the docs
Allow methods in session.py to be called outside of the session with sensible defaults. These will no longer raise an error.
Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu>
ray.data.read_text() currently doesn't take care of empty lines; this pr adds a flag to enable the empty line filter;
with this change, read_text will only return non-empty line by default, unless otherwise setting drop_empty_line to False.
Co-authored-by: Eric Liang <ekhliang@gmail.com>
Co-authored-by: Jialin Liu <jialin.liu@bytedance.com>
Adding a minimal test suite to catch any regressions from accidentally adding backend imports (e.g. `torch`, `tensorflow`, `horovod`) to the main import path.
**Example:** If I'm running Ray Train with `tensorflow`, I should not be required to have `torch` installed.
This PR adds a comment to build_pipeline.py reminding anyone who makes changes to the test suites to also update the release process doc if necessary.
This is an action item from the Ray 1.10.0 release retrospective.
To use Jobs on a remote cluster, you need to set up port forwarding. When using the cluster launcher, the `ray dashboard` command provides this automatically. This PR adds a how-to to the docs for this feature.
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Adds a test to make sure a failed job runtime env creation doesn't hang the cluster (i.e. tasks can still be scheduled on the job, as long as the tasks' runtime env can be created.). Test requested by @rkooo567, good idea!
After GCS restarts, metadata will be loaded from redis. Now redis callback returns const &, which requires a copy of the loaded data. After modifying to && and then using std::move, data copy can be reduced.
Closes https://github.com/ray-project/ray/issues/22265
This was caused by implicitly inferring the namespace from within the HTTP proxy when calling `get_handle`. This makes me think we really need to simplify the namespace handling logic.
Currently, Serve deployments must store their class or function definitions in the `Deployment` object's `func_or_class` attribute. However, the declarative API must be able to initiate deployments using only their import path. This allows users to separately define their functions or classes, and pull these functions and classes into their clusters via [remote URIs](https://docs.ray.io/en/releases-1.9.2/handling-dependencies.html#remote-uris). With this change, `Deployment` objects can store an import path string as their `func_or_class`. This import path is then used to import the deployment's code definition when the `Deployment`'s replica is created.
When logs are not intended for the current driver, skip logging warning about too much logs being generated, and clear the counters for log rates.
Ideally the log subscriber should only subscribe to logs from the current job, and system logs. But the change has risk and we can do it in another PR.
Code formatting is disabled in several modules with the explanation
> [The module] ignores yapf because yapf doesn't allow comments right after code blocks,
but we put comments right after code blocks to prevent large white spaces
in the documentation.
Since we no longer use YAPF, it may be possible to re-enable code formatting on
these modules. I've added "FIXME" comments requesting developers to check
whether code formatter appeasements are still necessary.
For consistency and safety, we fix an explicit 6379 port for all default and example configs for Ray on K8s.
Documentation is updated to recommend matching Ray versions in operator and Ray cluster.