Commit graph

14133 commits

Author SHA1 Message Date
Amog Kamsetty
ea6d53dbf3
[CI/AIR] Cleanup Deprecated ml_utils (#28278)
ray.util.ml_utils was deprecated in Ray 2.0. This PR does some final cleanup of our CI pipeline

Signed-off-by: Amog Kamsetty <amogkamsetty@yahoo.com>
2022-09-06 08:39:21 +01:00
Philipp Moritz
2a0ff1b4d8
[docs] Document using a different separator for read_csv (#27850)
See discussion in https://github.com/ray-project/ray/issues/27738

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
2022-09-05 16:47:22 -07:00
Yi Cheng
10e9422f8f
[core] Rename OVERRIDE_NODE_ID_FOR_TESTING to RAYLET_NODE_ID to make it a feature (#28275)
This PR changed the OVERRIDE_NODE_ID_FOR_TESTING to RAYLET_NODE_ID so that this is a feature which can be used to start raylet with a given raylet id by setting os env RAY_RAYLET_NODE_ID.
2022-09-05 14:22:06 -07:00
XiaodongLv
63ab063997
Change_notes_in_setting_async_flag_for_python_actor_in_java (#28282)
Signed-off-by: lvxiaodong <lvxiaodong.lxd@antgroup.com>
2022-09-05 00:16:41 +08:00
Jialing He
ce70b8b96e
[Job Submission][refactor 2/N] introduce job agent (#28203) 2022-09-03 18:42:02 +08:00
XiaodongLv
a31be7cef1
[Ray][xlang]Setting async flag for Python actor actor in Java (#28149)
It's important that setting async flag for Python actor in Java for us.
So we added the API which is named "PyActorCreator setAsync(boolean enabled)" based on PyActorCreator,
To avoid misuse for user, we check the flag before the ActorCreationTask is executed.
2022-09-03 11:09:19 +08:00
shrekris-anyscale
3b7346ab50
[Runtime Environment] Parse special characters in private Git URIs (#28250) 2022-09-02 16:37:04 -07:00
Ricky Xu
8c0b0272ce
Make state API release tests stable (#28274)
Make state API release tests stable - it has been passing in the last few days.

Signed-off-by: rickyyx <rickyx@anyscale.com>
2022-09-02 13:43:49 -07:00
Dmitri Gekhtman
59be31d558
Update links. (#28269)
Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>

This PR updates the quickstart configuration in the Ray docs to reflect the fixes from
ray-project/kuberay#529

To provide access to the fixed version, we update the link to point to KubeRay master rather than the 0.3.0 branch.
After the next KubeRay release (0.4.0), we can update these links to point to a fixed release version again.
2022-09-02 12:18:04 -07:00
Justin Yu
9cf5df2c81
Add ray.widgets to be linked in setup dev script (#27984) 2022-09-02 11:44:06 -07:00
Kai Fricke
57484b28cf
[ci/air] Only run examples that need credentials in branch builds (#28260) 2022-09-02 09:10:36 -07:00
Kai Fricke
5d31f2d4bc
[tune] Run SigOpt tests in CI (#28225) 2022-09-02 09:10:01 -07:00
kourosh hakhamaneshi
5779ee764d
[RLlib] Fix ope v_gain (#28136) 2022-09-02 08:27:05 -07:00
Kai Fricke
3590a86db0
[tune] Add timeout ro retry_fn to catch hanging syncs (#28155)
Syncing sometimes hangs in pyarrow for unknown reasons. We should introduce a timeout for these syncing operations.

Signed-off-by: Kai Fricke <kai@anyscale.com>
2022-09-02 12:52:26 +01:00
Amog Kamsetty
8692eb6208
Update build-docker-images.py (#28249)
Upgrade to a more recent Ubuntu version based on user feedback and also matching the version we use for testing in CI.

Signed-off-by: Amog Kamsetty <amogkamsetty@yahoo.com>
2022-09-02 10:30:11 +01:00
Kilian Lieret
77722b86fd
[AIR] Fix deprecated import of MLflowLoggerCallback (#28247)
Signed-off-by: Kilian Lieret <kilian.lieret@posteo.de>
2022-09-01 17:55:59 -07:00
Amog Kamsetty
b83f10dbde
[Docs] [Train] Update Train API reference and docs (#28192)
Signed-off-by: Amog Kamsetty amogkamsetty@yahoo.com

Adds back more Ray Train APIs to Ray Train docs.

Also makes updates to the user guide for better references.
2022-09-01 17:47:42 -07:00
Yi Cheng
118b76218a
[core] Disable some gcs_heartbeat_manager_test in Mac. (#28229)
This test is very unstable and it seems hard to make it stable given that it highly depends on the time.
Fixing it on one platform will fail it on another platform. Given that

- we don't have this test for a long time and things are ok
- we are replacing the heartbeat to pull mode in the near future

This test seems not important for now, so just disable it on the broken platform.
2022-09-01 15:11:13 -07:00
Simon Mo
2b732dd1be
[CI] Skip windows://python/ray/serve:test_air_integrations_gpu (#28243)
No GPU on Windows.

Signed-off-by: simon-mo <simon.mo@hey.com>
2022-09-01 12:08:04 -07:00
Ricky Xu
5e0cf74377
remove env (#28218)
Try not to set special flags for nightly test.

Signed-off-by: rickyyx <rickyx@anyscale.com>
2022-09-01 11:58:13 -07:00
zcin
4c970cc882
[serve] Visualize Deployment Graph with Gradio (#27897) 2022-09-01 10:46:15 -07:00
Antoni Baum
48898aa03d
[AIR][CI] Speed up HF CI by ~20% (#28208)
Speeds up HuggingFaceTrainer/Predictor tests in CI by around ~20% by switching to a different GPT model. This is the same model Hugging Face team uses for their own CI.

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
2022-09-01 18:18:10 +01:00
clarng
ac6d63e397
on (#28014)
Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>
2022-08-31 22:30:41 -07:00
Philipp Moritz
1bba65705a
[doc] Convert custom datetime column when reading a CSV file (#27854)
Signed-off-by: Philipp Moritz <pcmoritz@gmail.com>
2022-08-31 21:25:28 -07:00
Yi Cheng
d0b879cdb1
[workflow] Change name in step to task_id (#28151)
We've deprecated the name options and use task_id. This is the cleanup to fix everything left.
2022-08-31 20:27:32 -07:00
shrekris-anyscale
f747415d80
[Serve] [Doc] Restore documentation about host and port in Serve config (#28219) 2022-08-31 20:27:00 -07:00
Alan Guo
91cacd6214
Don't unfold first node in dashboard unless there is only one node in the cluster (#28108)
fixes #28107

Also moves the Host / Cmd Line column to be the first column so nodes and workers can be more easily distinguished.
2022-08-31 19:05:24 -07:00
Stephanie Wang
213e24cafd
[tests] Remove unnecessary sleep time from pipelined ingest tests #28182
Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>
2022-08-31 17:43:58 -07:00
Justin Yu
5cec2492bb
Fix tune resources example code (#28210)
The tune resources user guide contained broken code snippets. This PR fixes those, adds some extra clarifying comments, and improves the code style for readability.

Signed-off-by: Justin Yu <justinvyu@berkeley.edu>
2022-08-31 14:48:41 -07:00
Ricky Xu
ed2929185c
[Core][State Observability] Wait for all nodes in release test (#28190)
Release tests are failing in buildkite run - however succeeds reliably in manual retry.
Suspected it's because not all nodes available when running with large number of actors.
2022-08-31 13:52:19 -07:00
clarng
65fdd720f9
[core] memory monitor observability improvements: add metrics and log message (#27716)
Add more observability and record events when the raylet kills a task or actor due to memory usage going above threshold.
2022-08-31 13:50:40 -07:00
Artur Niederfahrenhorst
f420407b0d
[ML] Pin Pydantic <= 1.9.2 (#28205)
CI is red because of a dependency issue around dataclass_transform .

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>
Signed-off-by: Kai Fricke <kai@anyscale.com>
Co-authored-by: Kai Fricke <kai@anyscale.com>
2022-08-31 13:35:18 -07:00
xwjiang2010
958c22a0b0
[tune] Update GPU warning message in tune. (#28167)
Mention scaling config / with resources

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>
Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
2022-08-31 12:29:09 -07:00
Alex Wu
dc08ce55ee
Add autoscaler code owners (#28213)
We already had these on docs, a bit of an oversight not adding this to the autoscaler itself too.

Signed-off-by: Alex Wu <alex@anyscale.io>

Signed-off-by: Alex Wu <alex@anyscale.io>
2022-08-31 12:02:09 -07:00
Jiajun Yao
5e2437923d
[Core] Remove unused args for default_worker.py (#28177)
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
2022-08-30 21:43:02 -07:00
Yi Cheng
4bff702e7b
[deflakey] Deflakey gcs_heartbeat_manager_test (#28142)
The heartbeat check is every seconds, so it could happen < 1s, which means it could happen very soon. This PR decrease the check period.
2022-08-30 15:26:57 -07:00
Peyton Murray
ffe12a5f10
[Tune] Add rich output for ray tune progress updates in notebooks (#26263)
These changes are part of a series intended to improve integration with notebooks. This PR modifies the tune progress status shown to the user if tuning is run from a notebook.

Previously, part of the trial progress was reported in an HTML table before; now, all progress is displayed in an organized HTML template.

Signed-off-by: pdmurray <peynmurray@gmail.com>
2022-08-30 15:09:40 -07:00
Balaji Veeramani
dad98dcabd
[AIR] Add TorchCheckpoint.from_state_dict (#27970)
PyTorch recommends saving state dictionaries instead of modules, but we don't support any way to do this.

Signed-off-by: Balaji Veeramani balaji@anyscale.com
2022-08-30 13:05:30 -07:00
Antoni Baum
8a30606308
[AIR][Docs] Improve Hugging Face notebook example (#28121)
Improves the HF notebook by making use of preprocessors and adding a section on tuning. Brings it in line with the Ray Summit 2022 demo.

Signed-off-by: Antoni Baum antoni.baum@protonmail.com
2022-08-30 12:36:41 -07:00
Antoni Baum
d7f712d202
[AIR] Split train dataset in HuggingFaceTrainer (#28170)
https://github.com/ray-project/ray/pull/25428 inadvertently turned off train dataset splitting for the `HuggingFaceTrainer`, which meant it wasn't actually running in a data parallel fashion. This PR fixes that.

Signed-off-by: Antoni Baum antoni.baum@protonmail.com
2022-08-30 12:35:44 -07:00
SangBin Cho
f74f155af4
Revert "Revert "Revert "[serve][xlang]Support deploying Python deploy… (#28153)
this starts breaking Mac java build with new errors; I think it is the same issue as before why we reverted this PR

…ment from Java. …" (#27945)"

This reverts commit af488e1.
2022-08-30 12:00:29 -07:00
Kai Fricke
42dc034503
[ci] Pin moto to >= 4.0.0, adjust API (#28099)
If this passes, it should be preferred over #28098.

Adjust moto setup to use new API.

Signed-off-by: Kai Fricke <kai@anyscale.com>
Co-authored-by: Sven Mika <svenmika1977@gmail.com>
2022-08-30 11:39:32 -07:00
Antoni Baum
13457dab03
[AIR] Fix HF checkpointing with same-node workers (#28154)
If we schedule multiple workers on the head node with HuggingFaceTrainer, a race condition can occur where they will begin moving the checkpoint files from their respective rank folders to one checkpoint folder, causing an exception. This PR fixes that and adds a test that would fail without this change.

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
2022-08-30 11:24:13 -07:00
Alex Wu
e643b75129
[release][ci] Update disk size on release tests (#28156)
The minimum size is 300GB

Signed-off-by: Alex Wu <alex@anyscale.io>

Signed-off-by: Alex Wu <alex@anyscale.io>
Signed-off-by: Kai Fricke <krfricke@users.noreply.github.com>
Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
2022-08-30 09:29:11 -07:00
Ian Rodney
adf875b4ce
[Cleanup] Update Put error message (#28050)
We allow tasks to return ObjectRefs. I'm not sure when this support was added, but I think for quite a while.
2022-08-30 08:35:20 -07:00
Jiajun Yao
2c6a960733
Don't include script directory in sys.path if it's started via python -m (#28140)
Redo #28043

Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
2022-08-30 08:33:27 -07:00
Yi Cheng
4d91f516ca
[nightly] Add serve ha chaos test into nightly test. (#27413)
This PR adds a serve ha test. The flow of the tests is:

1. check the kube ray build
2. start ray service
3. warm up the cluster
4. start killing nodes
5. get the stats and make sure it's good
2022-08-29 16:55:36 -07:00
Ian Rodney
8934a8d32b
[Raylet][Cleanup] Remove Extra Indent & Fix Typo (#28073)
* Rename `is_existing` to `is_exiting`
* Redundant `if statement`. This is covered by: 

6bedaa5c87/python/ray/_raylet.pyx (L581)
2022-08-29 15:32:36 -07:00
shrekris-anyscale
a15442a510
[docs] Omit bash prompt (#28028)
Signed-off-by: Shreyas Krishnaswamy <shrekris@anyscale.com>

Signed-off-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
2022-08-29 14:10:02 -07:00
Amog Kamsetty
acc4903db1
[AIR/Serve] Auto-enable GPU Prediction (#26549)
Automatically enable GPU prediction for Predictors if num_gpus is set for the PredictorDeployment.

Signed-off-by: Amog Kamsetty <amogkamsetty@yahoo.com>
2022-08-29 13:47:56 -07:00