hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-04 09:31:43 -05:00

Author	SHA1	Message	Date
Amog Kamsetty	ea6d53dbf3	[CI/AIR] Cleanup Deprecated ml_utils (#28278 ) ray.util.ml_utils was deprecated in Ray 2.0. This PR does some final cleanup of our CI pipeline Signed-off-by: Amog Kamsetty <amogkamsetty@yahoo.com>	2022-09-06 08:39:21 +01:00
Philipp Moritz	2a0ff1b4d8	[docs] Document using a different separator for read_csv (#27850 ) See discussion in https://github.com/ray-project/ray/issues/27738 Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>	2022-09-05 16:47:22 -07:00
Yi Cheng	10e9422f8f	[core] Rename `OVERRIDE_NODE_ID_FOR_TESTING` to `RAYLET_NODE_ID` to make it a feature (#28275 ) This PR changed the OVERRIDE_NODE_ID_FOR_TESTING to RAYLET_NODE_ID so that this is a feature which can be used to start raylet with a given raylet id by setting os env RAY_RAYLET_NODE_ID.	2022-09-05 14:22:06 -07:00
XiaodongLv	63ab063997	Change_notes_in_setting_async_flag_for_python_actor_in_java (#28282 ) Signed-off-by: lvxiaodong <lvxiaodong.lxd@antgroup.com>	2022-09-05 00:16:41 +08:00
Jialing He	ce70b8b96e	[Job Submission][refactor 2/N] introduce job agent (#28203 )	2022-09-03 18:42:02 +08:00
XiaodongLv	a31be7cef1	[Ray][xlang]Setting async flag for Python actor actor in Java (#28149 ) It's important that setting async flag for Python actor in Java for us. So we added the API which is named "PyActorCreator setAsync(boolean enabled)" based on PyActorCreator, To avoid misuse for user， we check the flag before the ActorCreationTask is executed.	2022-09-03 11:09:19 +08:00
shrekris-anyscale	3b7346ab50	[Runtime Environment] Parse special characters in private Git URIs (#28250 )	2022-09-02 16:37:04 -07:00
Ricky Xu	8c0b0272ce	Make state API release tests stable (#28274 ) Make state API release tests stable - it has been passing in the last few days. Signed-off-by: rickyyx <rickyx@anyscale.com>	2022-09-02 13:43:49 -07:00
Dmitri Gekhtman	59be31d558	Update links. (#28269 ) Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com> This PR updates the quickstart configuration in the Ray docs to reflect the fixes from ray-project/kuberay#529 To provide access to the fixed version, we update the link to point to KubeRay master rather than the 0.3.0 branch. After the next KubeRay release (0.4.0), we can update these links to point to a fixed release version again.	2022-09-02 12:18:04 -07:00
Justin Yu	9cf5df2c81	Add ray.widgets to be linked in setup dev script (#27984 )	2022-09-02 11:44:06 -07:00
Kai Fricke	57484b28cf	[ci/air] Only run examples that need credentials in branch builds (#28260 )	2022-09-02 09:10:36 -07:00
Kai Fricke	5d31f2d4bc	[tune] Run SigOpt tests in CI (#28225 )	2022-09-02 09:10:01 -07:00
kourosh hakhamaneshi	5779ee764d	[RLlib] Fix ope v_gain (#28136 )	2022-09-02 08:27:05 -07:00
Kai Fricke	3590a86db0	[tune] Add timeout ro retry_fn to catch hanging syncs (#28155 ) Syncing sometimes hangs in pyarrow for unknown reasons. We should introduce a timeout for these syncing operations. Signed-off-by: Kai Fricke <kai@anyscale.com>	2022-09-02 12:52:26 +01:00
Amog Kamsetty	8692eb6208	Update build-docker-images.py (#28249 ) Upgrade to a more recent Ubuntu version based on user feedback and also matching the version we use for testing in CI. Signed-off-by: Amog Kamsetty <amogkamsetty@yahoo.com>	2022-09-02 10:30:11 +01:00
Kilian Lieret	77722b86fd	[AIR] Fix deprecated import of MLflowLoggerCallback (#28247 ) Signed-off-by: Kilian Lieret <kilian.lieret@posteo.de>	2022-09-01 17:55:59 -07:00
Amog Kamsetty	b83f10dbde	[Docs] [Train] Update Train API reference and docs (#28192 ) Signed-off-by: Amog Kamsetty amogkamsetty@yahoo.com Adds back more Ray Train APIs to Ray Train docs. Also makes updates to the user guide for better references.	2022-09-01 17:47:42 -07:00
Yi Cheng	118b76218a	[core] Disable some `gcs_heartbeat_manager_test` in Mac. (#28229 ) This test is very unstable and it seems hard to make it stable given that it highly depends on the time. Fixing it on one platform will fail it on another platform. Given that - we don't have this test for a long time and things are ok - we are replacing the heartbeat to pull mode in the near future This test seems not important for now, so just disable it on the broken platform.	2022-09-01 15:11:13 -07:00
Simon Mo	2b732dd1be	[CI] Skip windows://python/ray/serve:test_air_integrations_gpu (#28243 ) No GPU on Windows. Signed-off-by: simon-mo <simon.mo@hey.com>	2022-09-01 12:08:04 -07:00
Ricky Xu	5e0cf74377	remove env (#28218 ) Try not to set special flags for nightly test. Signed-off-by: rickyyx <rickyx@anyscale.com>	2022-09-01 11:58:13 -07:00
zcin	4c970cc882	[serve] Visualize Deployment Graph with Gradio (#27897 )	2022-09-01 10:46:15 -07:00
Antoni Baum	48898aa03d	[AIR][CI] Speed up HF CI by ~20% (#28208 ) Speeds up HuggingFaceTrainer/Predictor tests in CI by around ~20% by switching to a different GPT model. This is the same model Hugging Face team uses for their own CI. Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>	2022-09-01 18:18:10 +01:00
clarng	ac6d63e397	on (#28014 ) Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>	2022-08-31 22:30:41 -07:00
Philipp Moritz	1bba65705a	[doc] Convert custom datetime column when reading a CSV file (#27854 ) Signed-off-by: Philipp Moritz <pcmoritz@gmail.com>	2022-08-31 21:25:28 -07:00
Yi Cheng	d0b879cdb1	[workflow] Change name in step to task_id (#28151 ) We've deprecated the name options and use task_id. This is the cleanup to fix everything left.	2022-08-31 20:27:32 -07:00
shrekris-anyscale	f747415d80	[Serve] [Doc] Restore documentation about host and port in Serve config (#28219 )	2022-08-31 20:27:00 -07:00
Alan Guo	91cacd6214	Don't unfold first node in dashboard unless there is only one node in the cluster (#28108 ) fixes #28107 Also moves the Host / Cmd Line column to be the first column so nodes and workers can be more easily distinguished.	2022-08-31 19:05:24 -07:00
Stephanie Wang	213e24cafd	[tests] Remove unnecessary sleep time from pipelined ingest tests #28182 Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>	2022-08-31 17:43:58 -07:00
Justin Yu	5cec2492bb	Fix tune resources example code (#28210 ) The tune resources user guide contained broken code snippets. This PR fixes those, adds some extra clarifying comments, and improves the code style for readability. Signed-off-by: Justin Yu <justinvyu@berkeley.edu>	2022-08-31 14:48:41 -07:00
Ricky Xu	ed2929185c	[Core][State Observability] Wait for all nodes in release test (#28190 ) Release tests are failing in buildkite run - however succeeds reliably in manual retry. Suspected it's because not all nodes available when running with large number of actors.	2022-08-31 13:52:19 -07:00
clarng	65fdd720f9	[core] memory monitor observability improvements: add metrics and log message (#27716 ) Add more observability and record events when the raylet kills a task or actor due to memory usage going above threshold.	2022-08-31 13:50:40 -07:00
Artur Niederfahrenhorst	f420407b0d	[ML] Pin Pydantic <= 1.9.2 (#28205 ) CI is red because of a dependency issue around dataclass_transform . Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com> Signed-off-by: Kai Fricke <kai@anyscale.com> Co-authored-by: Kai Fricke <kai@anyscale.com>	2022-08-31 13:35:18 -07:00
xwjiang2010	958c22a0b0	[tune] Update GPU warning message in tune. (#28167 ) Mention scaling config / with resources Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com> Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>	2022-08-31 12:29:09 -07:00
Alex Wu	dc08ce55ee	Add autoscaler code owners (#28213 ) We already had these on docs, a bit of an oversight not adding this to the autoscaler itself too. Signed-off-by: Alex Wu <alex@anyscale.io> Signed-off-by: Alex Wu <alex@anyscale.io>	2022-08-31 12:02:09 -07:00
Jiajun Yao	5e2437923d	[Core] Remove unused args for default_worker.py (#28177 ) Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>	2022-08-30 21:43:02 -07:00
Yi Cheng	4bff702e7b	[deflakey] Deflakey gcs_heartbeat_manager_test (#28142 ) The heartbeat check is every seconds, so it could happen < 1s, which means it could happen very soon. This PR decrease the check period.	2022-08-30 15:26:57 -07:00
Peyton Murray	ffe12a5f10	[Tune] Add rich output for ray tune progress updates in notebooks (#26263 ) These changes are part of a series intended to improve integration with notebooks. This PR modifies the tune progress status shown to the user if tuning is run from a notebook. Previously, part of the trial progress was reported in an HTML table before; now, all progress is displayed in an organized HTML template. Signed-off-by: pdmurray <peynmurray@gmail.com>	2022-08-30 15:09:40 -07:00
Balaji Veeramani	dad98dcabd	[AIR] Add `TorchCheckpoint.from_state_dict` (#27970 ) PyTorch recommends saving state dictionaries instead of modules, but we don't support any way to do this. Signed-off-by: Balaji Veeramani balaji@anyscale.com	2022-08-30 13:05:30 -07:00
Antoni Baum	8a30606308	[AIR][Docs] Improve Hugging Face notebook example (#28121 ) Improves the HF notebook by making use of preprocessors and adding a section on tuning. Brings it in line with the Ray Summit 2022 demo. Signed-off-by: Antoni Baum antoni.baum@protonmail.com	2022-08-30 12:36:41 -07:00
Antoni Baum	d7f712d202	[AIR] Split train dataset in `HuggingFaceTrainer` (#28170 ) https://github.com/ray-project/ray/pull/25428 inadvertently turned off train dataset splitting for the `HuggingFaceTrainer`, which meant it wasn't actually running in a data parallel fashion. This PR fixes that. Signed-off-by: Antoni Baum antoni.baum@protonmail.com	2022-08-30 12:35:44 -07:00
SangBin Cho	f74f155af4	Revert "Revert "Revert "[serve][xlang]Support deploying Python deploy… (#28153 ) this starts breaking Mac java build with new errors; I think it is the same issue as before why we reverted this PR …ment from Java. …" (#27945)" This reverts commit `af488e1`.	2022-08-30 12:00:29 -07:00
Kai Fricke	42dc034503	[ci] Pin moto to >= 4.0.0, adjust API (#28099 ) If this passes, it should be preferred over #28098. Adjust moto setup to use new API. Signed-off-by: Kai Fricke <kai@anyscale.com> Co-authored-by: Sven Mika <svenmika1977@gmail.com>	2022-08-30 11:39:32 -07:00
Antoni Baum	13457dab03	[AIR] Fix HF checkpointing with same-node workers (#28154 ) If we schedule multiple workers on the head node with HuggingFaceTrainer, a race condition can occur where they will begin moving the checkpoint files from their respective rank folders to one checkpoint folder, causing an exception. This PR fixes that and adds a test that would fail without this change. Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>	2022-08-30 11:24:13 -07:00
Alex Wu	e643b75129	[release][ci] Update disk size on release tests (#28156 ) The minimum size is 300GB Signed-off-by: Alex Wu <alex@anyscale.io> Signed-off-by: Alex Wu <alex@anyscale.io> Signed-off-by: Kai Fricke <krfricke@users.noreply.github.com> Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>	2022-08-30 09:29:11 -07:00
Ian Rodney	adf875b4ce	[Cleanup] Update Put error message (#28050 ) We allow tasks to return ObjectRefs. I'm not sure when this support was added, but I think for quite a while.	2022-08-30 08:35:20 -07:00
Jiajun Yao	2c6a960733	Don't include script directory in sys.path if it's started via python -m (#28140 ) Redo #28043 Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>	2022-08-30 08:33:27 -07:00
Yi Cheng	4d91f516ca	[nightly] Add serve ha chaos test into nightly test. (#27413 ) This PR adds a serve ha test. The flow of the tests is: 1. check the kube ray build 2. start ray service 3. warm up the cluster 4. start killing nodes 5. get the stats and make sure it's good	2022-08-29 16:55:36 -07:00
Ian Rodney	8934a8d32b	[Raylet][Cleanup] Remove Extra Indent & Fix Typo (#28073 ) * Rename `is_existing` to `is_exiting` * Redundant `if statement`. This is covered by: `6bedaa5c87/python/ray/_raylet.pyx (L581)`	2022-08-29 15:32:36 -07:00
shrekris-anyscale	a15442a510	[docs] Omit bash prompt (#28028 ) Signed-off-by: Shreyas Krishnaswamy <shrekris@anyscale.com> Signed-off-by: Shreyas Krishnaswamy <shrekris@anyscale.com>	2022-08-29 14:10:02 -07:00
Amog Kamsetty	acc4903db1	[AIR/Serve] Auto-enable GPU Prediction (#26549 ) Automatically enable GPU prediction for Predictors if num_gpus is set for the PredictorDeployment. Signed-off-by: Amog Kamsetty <amogkamsetty@yahoo.com>	2022-08-29 13:47:56 -07:00

1 2 3 4 5 ...

14133 commits