hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Dmitri Gekhtman	41af24c2ea	Head pod identity can change (#20566 )	2021-11-19 09:57:48 -08:00
Eric Liang	79911510d3	Raise better error message when workers are killed with SIGTERM in k8s (#20557 ) In k8s, sigterm almost always means the pod was killed due to memory limits. Raise a better error message there.	2021-11-19 09:36:37 -08:00
Chen Shen	f0e8d66a85	[Core][Refactor CoreWorker 1/n] move CoreWorkerOptions to its own file #19675 Why are these changes needed? This is a serial of PRs to make CoreWorkerProcess thread-safe and CoreWorker Code easy to read. [#19675 #19677 #19678 #19679] Move CoreWorkerOptions out of core_worker.h; makes the code easier to read. Next PR: #19677	2021-11-19 09:24:30 -08:00
Alex Wu	24f27203ba	[hotfix] Fix inference nightly test by upgrading numpy (#20546 ) The ray-ml image depends on numpy ~=1.19.2 via the tensorflow==2.6 requirement. Unfortunately that's incompatible with Dataset (see here #20258 (comment)). This PR upgrades the numpy dependency only for the nightly test.	2021-11-19 08:15:23 -08:00
shrekris-anyscale	b910d7e9e1	[runtime_env] Remove deprecated username-password GitHub use case from doc (#20558 )	2021-11-19 10:03:44 -06:00
Artur Niederfahrenhorst	d07e50e957	[RLlib] Replay buffer API (cleanups; docstrings; renames; move into `rllib/execution/buffers` dir) (#20552 )	2021-11-19 11:57:37 +01:00
gjoliver	18862f9f44	[RLlib] Add a comment in the doc string of `on_learn_on_batch` callback function. (#20456 )	2021-11-19 10:49:07 +01:00
Ameer Haj Ali	3c308667f1	[Tune] Fix checkpointing error message on K8s (#20559 ) This commit improves the error message to guide users to setup cloud checkpointing if trial checkpoint syncing failed.	2021-11-19 09:17:38 +00:00
Sven Mika	9d5c4a9d21	[RLlib] API reference pages: `rllib/env` package only. (#20486 )	2021-11-19 10:06:40 +01:00
Avnish Narayan	b6077a36d4	[RLlib; Pre-checks/better failure behavior]: Env Checker for Gym Environments (#20481 )	2021-11-19 09:41:03 +01:00
Alex Wu	88266a6fce	Revert "Revert "[Docs] More detailed M1 Mac installation instructions"" (#20549 ) Reverts ray-project/ray#20547	2021-11-18 20:18:37 -08:00
Eric Liang	65a8698e82	Raise the dataset block size limit to 2GiB (#20551 ) The default block size of 500MiB seems too low for some common workloads, e.g. shuffling 500GB. This creates 1000 blocks which means 1 million intermediate shuffle objects until we implement #20500.	2021-11-18 19:36:10 -08:00
Clark Zinzow	2d50bf1302	[Datasets] Bump NumPy version to >= 1.19.0 for Python 3.6. (#20542 ) Datasets groupby boundary sampling requires `numpy>=1.19.0` otherwise it fails to concatenate the Arrow table columns.	2021-11-18 17:33:06 -08:00
Clark Zinzow	462e389791	[Datasets] Fix empty Dataset.iter_batches() when trying to prefetch more blocks than exist in the dataset (#20480 ) Before this PR, `ds.iter_batches()` would yield no batches if `prefetch_blocks > ds.num_blocks()` was given, since the sliding window semantics were to return no windows if `window_size > len(iterable)`. This PR tweaks the sliding window implementation to always return at least one window, even if the one window is smaller than the given window size.	2021-11-18 17:02:54 -08:00
Simon Mo	add2450b92	[CI] [Hotfix] Skip test_standalone (#20556 )	2021-11-18 16:47:18 -08:00
Richard Liaw	c964455642	Revert "[Docs] More detailed M1 Mac installation instructions" (#20547 ) Reverts ray-project/ray#20512 due to lint errors.	2021-11-18 12:06:57 -08:00
Alex Wu	a811b2b6d7	[hotfix] Fix stress_test_many_tasks cluster environment (#20519 ) This should fix the long running release tests that are failing to build their app configs. It seems like pip install ray[all] now downgrades the ray version. It's unclear why, but most likely, a dependency has pinned the ray version now. This PR explicitely install the version of Ray that we want after the pip install ray[all] to fix the problem.	2021-11-18 11:51:46 -08:00
Amog Kamsetty	3f1092fb3d	[Release] Revert impala app config (#20397 )	2021-11-18 11:24:22 -08:00
Antoni Baum	0b14f38ac7	[tune] Multi-objective support for Optuna (#20489 ) This PR adds multi-objective support for Optuna searchers, including a test and example. Co-authored-by: gjoliver <jungong@anyscale.com>	2021-11-18 18:47:29 +00:00
Simon Mo	7143d5d494	[Serve] Bump timeout for test_standalone to fix windows (#20543 )	2021-11-18 10:00:23 -08:00
Alex Wu	540c9e35d1	[Docs] More detailed M1 Mac installation instructions (#20512 ) This PR adds more detail the M1 mac installation instructions following the bug bash.	2021-11-18 09:35:43 -08:00
Sven Mika	7a585fb275	[RLlib; Documentation] RLlib README overhaul. (#20249 )	2021-11-18 18:08:40 +01:00
Edward Oakes	d26c9e67e8	[job submission] Add a `message` to the JobStatus to return more detailed errors (#20491 )	2021-11-18 10:15:23 -06:00
shrekris-anyscale	a91ddbdeb9	Add `smart_open` dependency to `ray[default]` (#20420 )	2021-11-18 10:00:30 -06:00
Chen Shen	2012b469f6	fix gcs client hang (#20531 )	2021-11-18 07:28:15 -08:00
qicosmos	a49c1d5f55	[C++] Deprecated global named actor and global PGs. (#20468 ) Why are these changes needed? This PR removes global named actor and global PGs. Related issue number #20460	2021-11-18 23:21:59 +08:00
Simon Mo	d7f208dea4	[Releaes] Make e2e.py link clickable on buildkite (#20436 ) Adds log formatting to output clickable links to buildkite console logs	2021-11-18 12:45:59 +00:00
SangBin Cho	140a180ebb	[xgboost] Fix flaky train_small test (#20529 ) Xgboosts train_small timed out because of a CPU borrowing feature related to placement groups. The root bug will be fixed in the coming weeks, but this PR makes the release test consistently pass by requesting 0 CPUs for the remote wrapper script.	2021-11-18 10:20:08 +00:00
shrekris-anyscale	65a023ef71	[runtime_env][docs] Add documentation on using remote URIs for runtime environments (#20352 )	2021-11-17 23:17:48 -06:00
Edward Oakes	eae523159f	[job submission] Prefix job ID with `raysubmit_` and pass `job_name` metadata (#20490 )	2021-11-17 21:48:22 -06:00
Amog Kamsetty	9796ae56d5	[Train][Data] Change usages of `iter_datasets` to `iter_epochs` (#20487 )	2021-11-17 18:05:51 -08:00
Gagandeep Singh	33b4245df2	Fix race condition when starting redis (#19836 ) Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>	2021-11-17 17:43:35 -08:00
Simon Mo	c85e9e69b3	[Serve] Change multi_deployment_1k_noop_replica threshold (#20514 )	2021-11-17 17:25:54 -08:00
Yi Cheng	cbf5826040	[workflow] Fix workflow event doc typo (#20465 ) In the example, it says `after_checkpoint`, but this should be `event_checkpointed`	2021-11-17 16:18:20 -08:00
Amog Kamsetty	4cbcb11458	[Docker] Add commit as label (#20504 ) Adds the Ray commit sha as a label for the docker image.	2021-11-17 15:20:41 -08:00
Richard Liaw	1cadd61917	Fix horovod failing tests by pinning down (#20484 )	2021-11-17 13:54:25 -08:00
Sven Mika	56619b955e	[RLlib; Documentation] Some docstring cleanups; Rename RemoteVectorEnv into RemoteBaseEnv for clarity. (#20250 )	2021-11-17 21:40:16 +01:00
gjoliver	724a140795	[rllib] Make sure json can serialize result dict (#20439 ) We may have fields in the result dict that are or None. Make sure our results are json serializable.	2021-11-17 10:27:00 -08:00
xwjiang2010	03aec4e04a	[Tune] Remove `runner` argument in start_trial. (#20464 ) This internal legacy argument was not used by any code.	2021-11-17 16:59:57 +00:00
Alex Wu	d1c624901f	Add hiredis dependency on supported platforms (#20437 ) <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? This PR adds the hiredis dependency for non M1 machines. This removes the `redis < 4.0` pin. Since hiredis doesn't have M1 mac wheels yet, so users there will have extra warning messages in their outputs if they use redis 4.0. <!-- Please give a short summary of the change and the problem this solves. --> ## Related issue number <!-- For example: "Closes #1234" --> ## Checks - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( Co-authored-by: Alex Wu <alex@anyscale.com>	2021-11-17 07:40:58 -08:00
Alex Wu	3d668768de	[docker] Upgrade numpy version (#20450 ) <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? The change in #20374 was interpreted as a file redirect, not a "greater than" by docker (strangely enough, differently than bash interprets it locally). <!-- Please give a short summary of the change and the problem this solves. --> ## Related issue number <!-- For example: "Closes #1234" --> ## Checks - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( Co-authored-by: Alex <alex@anyscale.com>	2021-11-17 07:15:18 -08:00
Qing Wang	e01f14d7df	[DOC] Add namespace doc for Java part. (#20428 ) Add namespace doc for Java part.	2021-11-17 23:02:47 +08:00
Devon Proctor	dba84546d9	[GCP] Filter GCP TPUs by cluster name, matching behavior for GCP compute nodes. (#20311 ) Ray currently does not filter GCP TPU nodes based on the cluster name, resulting in conflicts when multiple ray clusters are running on the same GCP account. This change updates the TPU behavior to match the GCP compute node behavior, i.e. filtering to TPU nodes for the current cluster.	2021-11-17 01:39:58 -08:00
Simon Mo	18d605fa7c	[Serve] Add experimental CLI for `serve deploy` (#20371 )	2021-11-16 20:22:09 -08:00
Larry	454db6902c	[Java] Add timeout parameter for Ray.get() API (#20282 ) Why are these changes needed? Add timeout(ms) param for Java ray.get. The API changes have been updated to doc ([Ray Core Walkthrough]->[Fetching Results]). eg: ObjectRef<Integer> objRef = Ray.put(1); objRef.get(1000) Ray.get(Ray.task(MyRayApp::slowFunction).remote(), 3000) Related issue number #20247	2021-11-17 11:02:17 +08:00
Simon Mo	2dc7a6c9f8	[CI] Pin manylinux image (#20451 )	2021-11-16 17:52:51 -08:00
Antoni Baum	20fc9f907d	[CI] Fix tune dashboard, increase timeout for `test_commands` (#20453 )	2021-11-16 17:52:17 -08:00
Avnish Narayan	dc17f0a241	Add error messages for missing tf and torch imports (#20205 ) Co-authored-by: Sven Mika <sven@anyscale.io> Co-authored-by: sven1977 <svenmika1977@gmail.com>	2021-11-16 16:30:53 -08:00
Simon Mo	5fccad4cc9	[Serve] Add experimental pipeline docs (#20292 )	2021-11-16 16:13:55 -08:00
Simon Mo	32a4f48aa2	[CI] Don't test tune dashboard (#20452 )	2021-11-16 15:07:56 -08:00

1 2 3 4 5 ...

10474 commits