hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Kai Fricke	3a90804713	[Testing] Add RLlib release tests (#16651 )	2021-08-03 12:34:27 -04:00
Sven Mika	924f11cd45	[RLlib] Torch algos use now-framework-agnostic MultiGPUTrainOneStep execution op (~33% speedup for PPO-torch + GPU). (#17371 )	2021-08-03 11:35:49 -04:00
Sasha Sobol	5dbbaf7261	[autoscaler] Enforce per-node-type max workers (#17352 ) * Enforce per-node-type max workers * type annonation Co-authored-by: Ameer Haj Ali <ameerh@berkeley.edu> * cleanup. comments. type annotations * additional type annotation Co-authored-by: Ameer Haj Ali <ameerh@berkeley.edu> * additional cleanup. comments. type annotations * _get_nodes_needed_for_request_resources to use FrozenSet * comments * whitespace * [Placement Group] Fix resource index assignment between with bundle index and without bundle index pg (#17318) * [serve] Add Ray API stability annotations (#17295) * Support streaming output of runtime env setup to logger/driver (#17306) * [SGD] v2 prototype: ``WorkerGroup`` implementation (#17330) * wip * formatting * increase timeouts * address comments * comments * fix * address comments * Update python/ray/util/sgd/v2/worker_group.py Co-authored-by: Richard Liaw <rliaw@berkeley.edu> * Update python/ray/util/sgd/v2/worker_group.py Co-authored-by: Richard Liaw <rliaw@berkeley.edu> * address comments * formatting * fix * avoid race condition Co-authored-by: Richard Liaw <rliaw@berkeley.edu> * [RLlib] Discussion 3001: Fix comment on internal state shape (must be [B x S=state dim]). (#17341) * [autoscaler] GCP TPU VM autoscaler (#17278) * [Rllib] set self._allow_unknown_config (#17335) Co-authored-by: Sven Mika <sven@anyscale.io> * [RLlib] Discussion 2294: Custom vector env example and fix. (#16083) * [docs] Link broken in Tune's page (#17394) (#17407) * [Serve] Fix response_model for class based view routes as well (#17376) * [serve] Fix single deployment nightly test (#17368) * [RLlib] SAC tuple observation space fix (#17356) * Support schema on read for csv/json (#17354) * [RLlib] New and changed version of parametric actions cartpole example + small suggested update in policy_client.py (#15664) * [gcs] Fix GCS related issues: ByteSizeLong and redis connection (#17373) * [runtime_env] Gracefully fail tasks when an environment fails to be set up (#17249) * [docs] update docs with pip requirements (#17317) * removed nodes_to_keep. cleanup * formatting * +comment * treat max_workers=0 as 0 workers (as opposed to unlimited) * fix wrong comment * warning for inconsistent config * terminate nodes with no matching node type right away * quotes * special handling for head node when enforcing max_workers per type. tests. cleanup * cleanup comments and prints * comments * cleanup. removed special handling of head node. * adding an eplicit non-None check in schedule_node_termination * raise the exception Co-authored-by: Ameer Haj Ali <ameerh@berkeley.edu> Co-authored-by: Ameer Haj Ali <ameerh@berkeley.edu> Co-authored-by: DK.Pino <loushang.ls@antfin.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Co-authored-by: Simon Mo <simon.mo@hey.com> Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com> Co-authored-by: Richard Liaw <rliaw@berkeley.edu> Co-authored-by: Sven Mika <sven@anyscale.io> Co-authored-by: Antoni Baum <antoni.baum@protonmail.com> Co-authored-by: Rohan138 <66227218+Rohan138@users.noreply.github.com> Co-authored-by: amavilla <takashi.tameshige.jj@hitachi.com> Co-authored-by: Jiao <sophchess@gmail.com> Co-authored-by: Julius Frost <33183774+juliusfrost@users.noreply.github.com> Co-authored-by: Eric Liang <ekhliang@gmail.com> Co-authored-by: kk-55 <63732956+kk-55@users.noreply.github.com> Co-authored-by: Yi Cheng <74173148+iycheng@users.noreply.github.com> Co-authored-by: matthewdeng <matt@anyscale.com>	2021-08-03 11:31:32 -04:00
Antoni Baum	c40555c82b	[tune] Add define-by-run support to `OptunaSearcher` (#17464 )	2021-08-03 16:11:58 +01:00
Kai Fricke	81d3d8705e	[tune] fix docs example for tune qloguniform (#17539 )	2021-08-03 14:48:22 +01:00
Ian Rodney	7b1c207be3	[Dashboard] Allow agent to bind to Wildcard address. (#17393 )	2021-08-03 02:03:19 -07:00
Antoni Baum	df2fce9ab6	[tune] Allow to pass searcher/scheduler string names to `tune.run` (#17517 )	2021-08-03 09:28:03 +01:00
Eric Liang	fbd3f11533	OBOD log source error properly	2021-08-02 20:57:01 -07:00
Jiao	b13892e82a	Add initial 1.5.0 benchmark (#17513 ) * add initial 1.5.0 benchmark * add more logs Co-authored-by: Jiao Dong <jiaodong@anyscale.com>	2021-08-02 18:19:02 -07:00
Eric Liang	f9552765cb	Avoid re-exporting same function repeatedly in dataset (#17522 )	2021-08-02 18:15:25 -07:00
SangBin Cho	f1ccadbb27	Skip flaky windows object spilling tests (#17510 )	2021-08-02 15:53:07 -07:00
matthewdeng	e89195bfb9	[SGD] add SGDv2 Trainer prototype implementation (#17440 ) * wip * formatting * increase timeouts * wip * address comments * comments * fix * address comments * Update python/ray/util/sgd/v2/worker_group.py Co-authored-by: Richard Liaw <rliaw@berkeley.edu> * Update python/ray/util/sgd/v2/worker_group.py Co-authored-by: Richard Liaw <rliaw@berkeley.edu> * address comments * formatting * fix * wip * finish * fix * formatting * remove reporting * split TorchBackend * fix tests * address comments * add file * more fixes * remove default value * update run method doc * add comment * minor doc fixes * lint * add args to BaseWorker.execute * address comments * remove extra parentheses * properly instantiate backend * fix some of the tests * fix torch setup * fix type hint * [SGD] add SGDv2 Trainer prototype implementation * add fashion mnist test * add HuggingFace example * format * formatting * address comment * address comments * update comment * Update python/ray/util/sgd/v2/examples/transformers/cluster.yaml Co-authored-by: Antoni Baum <antoni.baum@protonmail.com> * update huggingface transformers * update hugging face transformers * fix shutdown on worker failure * Update python/requirements/tune/requirements_tune.txt * Update python/requirements/tune/requirements_tune.txt * Update python/requirements/tune/requirements_tune.txt * Update python/requirements/tune/requirements_tune.txt * address comment and fix test Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com> Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com> Co-authored-by: Richard Liaw <rliaw@berkeley.edu> Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>	2021-08-02 15:27:42 -07:00
Sven Mika	8a844ff840	[RLlib] Issues: 17397, 17425, 16715, 17174. When on driver, Torch\|TFPolicy should not use `ray.get_gpu_ids()` (b/c no GPUs assigned by ray). (#17444 )	2021-08-02 17:29:59 -04:00
Alex Wu	af880378da	Lower threshold on scalability envelope many tasks (#17511 )	2021-08-02 11:50:08 -07:00
Eric Liang	748cbbb23d	[hotfix] Parquet S3 reads broken due to pyarrow.lib.ArrowInvalid: S3 subsystem not initialized (#17492 )	2021-08-02 11:48:48 -07:00
Ian Rodney	acde351cba	[GCP][GPU] Specify GPU name, not the full URL (#17409 ) * convert gpu_name to URL * update examples * comment about scheduling * fix node.py * add test	2021-08-02 11:01:24 -04:00
Qingyun Wu	7678503d84	[Tune][docs]Correct reference name to CFO example (#17503 )	2021-08-02 14:46:10 +01:00
Ian Rodney	b26ba7ba9e	[Dashboard] Allow Agent HTTP listening port to be specified. (#17392 )	2021-08-02 02:09:50 -07:00
Richard Liaw	ecc7cf4c5e	[sgd] v2 documentation draft (#17253 ) Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> Co-authored-by: Matthew Deng <matthew.j.deng@gmail.com> Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>	2021-08-02 01:47:14 -07:00
Eric Liang	e812691909	Support top-level tensor values in dataset (#17439 )	2021-08-01 22:45:21 -07:00
Lixin Wei	6f4c8ebdb2	[Core] Rmove the GetActorIfno RPC for Current Actor When Creating Actors (#17334 )	2021-08-01 22:10:40 -07:00
Chen Shen	1b89fa8624	[object store refactor 2/n] More refactor on PlasmaAllocator, and add unit tests	2021-08-01 22:10:03 -07:00
Alex Wu	d9cd3800c7	Dataset speed up read (#17435 )	2021-08-01 18:03:46 -07:00
Ivorius	6703091cdc	[Docs] Update example-full.yaml for ulimits as supported by docker. (#17408 ) Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2021-08-01 01:36:16 -07:00
Chen Shen	96c69f8c77	[object store refactor 1/n] Introduce IAllocator and PlasmaAllocator (#17307 ) * initial commit * address comments	2021-07-30 19:08:20 -07:00
Stephanie Wang	c9a2046287	[core] Update error message for hanging `ray.get` (#17449 ) * Update error message * x	2021-07-30 17:57:10 -07:00
Jin Dong	7197b26a3c	Update streaming example to use wait	2021-07-30 16:14:30 -07:00
matthewdeng	3a1aed28b7	[torch] fix process group timedelta (#17468 )	2021-07-30 15:47:33 -07:00
SangBin Cho	9a696cc66a	Pin aioredis version (#17472 )	2021-07-30 12:04:14 -07:00
Jiao	d67c57007b	change placement group report size to 1k (#17216 ) Co-authored-by: Jiao Dong <jiaodong@anyscale.com>	2021-07-30 11:29:41 -07:00
Chen Shen	32803b53b0	Fix potential dead-lock (#17396 )	2021-07-30 11:28:49 -07:00
Alex Wu	9e79301d35	Split scalability envelope + smoke tests (#17455 ) * . * done? * done? * sang comments * . Co-authored-by: Alex Wu <alex@anyscale.com>	2021-07-30 10:20:19 -07:00
Kai Fricke	b0f00b1b4b	[default] pin aioredis < 2 (#17465 )	2021-07-30 17:57:17 +01:00
Eric Liang	cd13059691	[dataset] Implement random_shuffle() and split(equal=True) (#17448 )	2021-07-30 09:51:21 -07:00
Patrick Ames	131710f9f9	[autoscaler] Add support for EC2 launch templates. (#17236 )	2021-07-30 08:05:59 -07:00
Antoni Baum	23bdad01be	Fix XGBoost-Ray and LightGBM-Ray docs properly (#17433 )	2021-07-30 15:47:41 +01:00
wanxing	705248f4ee	[CoreWorker]Remove plasma_objects_only parameter (#17384 )	2021-07-30 14:48:36 +08:00
Qing Wang	b8baac3cb0	[Java] Filter error log for intentional system exit. (#17289 )	2021-07-30 13:17:38 +08:00
Chen Shen	d856abb70d	[Test] increase memory for 5000 partitions shuffle (#17429 )	2021-07-29 21:56:16 -07:00
matthewdeng	58c4fe727c	[SGD] TrainerV2 API interface (#17447 ) Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>	2021-07-29 19:39:39 -07:00
Eric Liang	0373c54b3e	Add warning if get_gpu_ids() is called on the driver. (#17436 )	2021-07-29 19:39:22 -07:00
Siyuan (Ryans) Zhuang	17c25345d0	[Workflow] Virtual actor writer - Part 2 (#17336 ) * virtual actor writer pass step_type around simplify readonly actor return different thing for a virtual actor return state and output WorkflowExecutionResult simplify workflow execution initial virtual actor writer workflow_ref deeper integration resume a step of a workflow cache step output Support dynamic workflow ref * fix recovery tests * fix * fix get_output * better error message * pressure test * fix * verbose error message * verbose error message * fix get_cached_step issue * update tests * simplify readonly virtual actor * fix storage tests * workflow.resume returns state of an actor * fix verbose * fix comment * make it more clear by renaming * comment * test init error in virtual actor * update docs * update docs * update test_actor_manager/list_all * fix comment	2021-07-29 19:29:28 -07:00
Amog Kamsetty	ff04a923ea	[SGD] v2 prototype: `BackendExecutor` and `TorchBackend` implementation (#17357 ) * wip * formatting * increase timeouts * wip * address comments * comments * fix * address comments * Update python/ray/util/sgd/v2/worker_group.py Co-authored-by: Richard Liaw <rliaw@berkeley.edu> * Update python/ray/util/sgd/v2/worker_group.py Co-authored-by: Richard Liaw <rliaw@berkeley.edu> * address comments * formatting * fix * wip * finish * fix * formatting * remove reporting * split TorchBackend * fix tests * address comments * add file * more fixes * remove default value * update run method doc * add comment * minor doc fixes * lint * add args to BaseWorker.execute * address comments * remove extra parentheses * properly instantiate backend * fix some of the tests * fix torch setup * fix type hint Co-authored-by: Richard Liaw <rliaw@berkeley.edu> Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>	2021-07-29 14:38:44 -07:00
Kai Fricke	2ae6b944a2	[release tests] limit number of results fetched for alerting (#17430 )	2021-07-29 18:43:44 +01:00
Tao Wang	411c49746d	Remove deprecated HEARTBEAT table (#17405 ) * Remove deprecated HEARTBEAT table * incr by 1	2021-07-29 10:14:59 -07:00
Jiao	3dc49c0b79	[serve] Add multi deployment to serve nightly tests (#17411 ) Co-authored-by: Jiao Dong <jiaodong@anyscale.com>	2021-07-29 11:47:58 -05:00
Kai Fricke	44d209dd5f	[tune] re-enable tensorboardx without torch installed (#17403 )	2021-07-29 10:39:38 +01:00
kimikuri	93172b535f	[doc][sgd] Broken Link in SGD's page. (#17404 ) (#17423 )	2021-07-29 01:13:23 -07:00
xwjiang2010	93d12b1b5e	[CLI] Fix `ray submit` when --stop is supplied. (#17385 ) * [cli] Fix `ray submit` when --stop is supplied. * syntax sugar.	2021-07-29 00:01:25 -07:00
Eric Liang	7ed62ea0ad	Initial implementation of Dataset pipelining and docs (#17309 )	2021-07-28 21:12:01 -07:00

... 4 5 6 7 8 ...

9163 commits