hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Will Drevo	fa878e2d4d	Added example to user guide for cloud checkpointing (#20045 ) Co-authored-by: will <will@anyscale.com> Co-authored-by: Antoni Baum <antoni.baum@protonmail.com> Co-authored-by: Kai Fricke <kai@anyscale.com>	2021-11-15 15:43:06 +00:00
matthewdeng	4674c78050	[Train] Rename Ray SGD v2 to Ray Train (#19436 )	2021-10-18 22:27:46 -07:00
Amog Kamsetty	f6f2435b91	[SGD] Sgd v2 Dataset Integration (#17626 ) * wip * wip * wip * draft * disable tf autosharding * wip * wip * wip * wip * add example * wip * wip * wip * use dataset.split * add unit tests * add linear example * concatenate tensors and fix example * WIP tune example * add tensorflow example * wip * random_shuffle_each_window * fault tolerance test * GPU, examples, CI * formatting * fix * Update python/ray/util/sgd/v2/tests/test_trainer.py Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> * wip * type hints * wip * update user guide * fix * fix immediate issues * update example * update * fix tune gpu test * fix resources for smoke test - 1 CPU for dataset tasks * update tests, docs, examples * Apply suggestions from code review Co-authored-by: Clark Zinzow <clarkzinzow@gmail.com> * address comments * add warning * fix tests * minor doc updates * update example in doc * configure tests * Update doc/source/raysgd/v2/user_guide.rst Co-authored-by: Clark Zinzow <clarkzinzow@gmail.com> * Update python/ray/data/dataset.py Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> * fix docstring Co-authored-by: Matthew Deng <matthew.j.deng@gmail.com> Co-authored-by: matthewdeng <matt@anyscale.com> Co-authored-by: Clark Zinzow <clarkzinzow@gmail.com>	2021-10-12 14:03:10 -07:00
Amog Kamsetty	db0483a29a	[SGD] SGD Namespace Consistency (#19048 ) * wip * update * add callbacks * fix * fix * update * add * address comments	2021-10-05 15:56:42 -07:00
Eric Liang	032a420ee6	Rename Dataset.pipeline to Dataset.window (#19050 )	2021-10-01 19:55:29 -07:00
Amog Kamsetty	98ac3f601c	[SGD] v1 to v2 Migration Guide (#18887 ) * wip * add guide * fix test * address comments * add to docs * fix * remove markdown * add warning to all pages * formatting * fix * links * Update doc/source/raysgd/v2/migration-guide.rst Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> * Update doc/source/raysgd/v2/migration-guide.rst Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> * Update doc/source/raysgd/v2/migration-guide.rst Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> * Update doc/source/raysgd/v2/migration-guide.rst Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> * Update doc/source/raysgd/v2/migration-guide.rst Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> * address comments * address comments * fix * address comments Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>	2021-09-30 09:15:21 -07:00
matthewdeng	d2caa00be8	[SGD] add SGDv2 survey link to docs (#18934 )	2021-09-27 19:15:37 -07:00
Antoni Baum	72cc0c9bda	[SGDv2] Add Tune-Cifar-PyTorch-PBT example (#18860 ) * [SGDv2] Add Tune-Cifar-PyTorch-PBT example * Update python/ray/util/sgd/v2/BUILD * Lint * Update example * Update docs	2021-09-27 09:22:40 -07:00
Amog Kamsetty	99b1d8c95f	[SGD] Update Docs (#18839 )	2021-09-23 07:52:57 -07:00
Amog Kamsetty	d354161528	[SGD] Link `ray.sgd` namespace to `ray.util.sgd.v2` (#18732 ) * wip * add symlink * update * remove from init * no require tune * try fix * change * * import * fix docs * address comment	2021-09-22 18:49:41 -07:00
Amog Kamsetty	00dd190df9	[SGD] Retry `sgd.local_rank()` (#18824 ) * finish * fix * wip * address comment * update * fix test * fix failing test * address comments * fix test * fix	2021-09-22 15:48:38 -07:00
Amog Kamsetty	d9b166252b	Revert "[SGD] `sgd.local_rank`" (#18822 )	2021-09-22 13:50:00 -07:00
Amog Kamsetty	39bcbe03bc	[SGD] `sgd.local_rank` (#18686 ) * finish * fix * wip * address comment * update * fix test * fix failing test * address comments * fix test	2021-09-22 08:10:49 -07:00
matthewdeng	380a653787	[SGD] update SGDv2 user guide docs (#18270 ) * [SGD] update SGDv2 user guide docs * Update doc/source/raysgd/v2/user_guide.rst Co-authored-by: Antoni Baum <antoni.baum@protonmail.com> * add new line * update docs * fix header line length * lint * lint * lint * lint * fix remaining lint issues * Update doc/source/raysgd/v2/user_guide.rst Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com> * Update doc/source/raysgd/v2/user_guide.rst Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com> * address comments * address comments * add TODO for iterator API * Update doc/source/raysgd/v2/user_guide.rst Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com> * address comments * address comments * add tune doc * restructure table of contents * add examples; rename example files to include example suffix * add quick start, porting code * address comments Co-authored-by: Antoni Baum <antoni.baum@protonmail.com> Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>	2021-09-14 09:07:25 -07:00
Amog Kamsetty	3b77840c1b	PyTorch Lightning Updates (#17876 )	2021-08-27 23:15:51 -07:00
Richard Liaw	ecc7cf4c5e	[sgd] v2 documentation draft (#17253 ) Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> Co-authored-by: Matthew Deng <matthew.j.deng@gmail.com> Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>	2021-08-02 01:47:14 -07:00
kimikuri	93172b535f	[doc][sgd] Broken Link in SGD's page. (#17404 ) (#17423 )	2021-07-29 01:13:23 -07:00
Eric Liang	38bddc3f2b	First cut at dataset documentation (#16956 )	2021-07-14 23:27:13 -07:00
Antoni Baum	2fb10e6730	[SGD] Add support for native Torch AMP in SGD (#16382 ) * SGD native AMP initial commit * SGD native amp second pass * Update docs * Update TorchTrainer doc * Temp fix release test * Update release/sgd_tests/sgd_gpu/sgd_gpu_app_config.yaml Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>	2021-06-15 17:48:21 -07:00
YeahNew	9a93dd9682	Adding a RaySGD and DGL ( Deep Graph Library) integration example(gat… (#15718 ) * Adding a RaySGD and DGL ( Deep Graph Library) integration example(gat_dgl.py) * Update gat_dgl.py * Update gat_dgl.py * Update gat_dgl.py * the gat_dgl.py has been formated by the format.sh script * delet useless code in the gat_dgl.py * add 'import numpy as np', modified the output form of accuracy in the validate method * Modified the code for better readability and added the README.md file * Update README.md * Update README.md * Update README.md * updates * formatting Co-authored-by: YeahNew <1650996069@qq.com> Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>	2021-05-20 08:47:19 -07:00
Richard Liaw	6c77aeb98a	[docs] ray slack remove banners (#13898 ) Signed-off-by: Richard Liaw <rliaw@berkeley.edu>	2021-02-04 01:14:34 -08:00
Simon Mo	8e0a2f669b	[Doc] Remove trailing whitespaces (#13390 )	2021-01-12 20:35:38 -08:00
Amog Kamsetty	8a406e1f9a	[SGD] Add PTL Docs (#12440 ) Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2020-11-28 10:09:38 -08:00
Amog Kamsetty	92718de40c	[SGD] Better support for custom DDP (#11771 )	2020-11-04 13:58:51 -08:00
Amog Kamsetty	d87c186721	[RaySGD] Docs for SGD+Tune usage (#11479 )	2020-10-22 13:32:27 -07:00
Amog Kamsetty	d5a7c53908	[Ray SGD] use_local flag + Worker group abstraction (#10539 ) Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2020-09-15 11:58:57 -07:00
Amog Kamsetty	415be78cc0	[RaySGD] Simplify Builder Process (#10321 ) Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2020-09-08 15:19:40 -07:00
Richard Liaw	3f98a8bfcb	[docs] Fix warnings for sphinx 1.8 (#10476 ) * fix-build-for-sphinx18 * jnilit	2020-09-01 13:37:35 -07:00
Amog Kamsetty	9ff687c093	[SGD][Docs] docs for training/ validation results (#10181 )	2020-08-19 17:22:28 -07:00
Richard Liaw	0c3b9ebeef	[tune/sgd] Document func_trainable and add checkpoint context (#9739 ) Co-authored-by: krfricke <krfricke@users.noreply.github.com> Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>	2020-07-30 09:46:37 -07:00
Richard Liaw	56d934bc18	[docs] Revised Cluster documentation (#9062 ) Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>	2020-06-26 09:29:22 -07:00
Alex Wu	dcf58a43dc	[SGD] Dataset API (#7839 )	2020-06-01 15:48:15 -07:00
Bill Chambers	b3d686b78f	[docs] Add Overview Section & Gentle Introduction (#8517 )	2020-05-26 10:39:34 -05:00
Eric Liang	eabb801a40	less important (#8439 )	2020-05-13 22:52:38 -07:00
Richard Liaw	857e4dba2f	[sgd] HuggingFace GLUE Fine-tuning Example (#7792 ) * Init fp16 * fp16 and schedulers * scheduler linking and fp16 * to fp16 * loss scaling and documentation * more documentation * add tests, refactor config * moredocs * more docs * fix logo, add test mode, add fp16 flag * fix tests * fix scheduler * fix apex * improve safety * fix tests * fix tests * remove pin memory default * rm * fix * Update doc/examples/doc_code/raysgd_torch_signatures.py * fix * migrate changes from other PR * ok thanks * pass * signatures * lint' * Update python/ray/experimental/sgd/pytorch/utils.py * Apply suggestions from code review Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com> * should address most comments * comments * fix this ci * first_pass * add overrides * override * fixing up operators * format * sgd * constants * rm * revert * save * failures * fixes * trainer * run test * operator * code * op * ok done * operator * sgd test fixes * ok * trainer * format * Apply suggestions from code review Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com> * Update doc/source/raysgd/raysgd_pytorch.rst * docstring * dcgan * doc * commits * nit * testing * revert * Start renaming pytorch to torch * Rename PyTorchTrainer to TorchTrainer * Rename PyTorch runners to Torch runners * Finish renaming API * Rename to torch in tests * Finish renaming docs + tests * Run format + fix DeprecationWarning * fix * move tests up * benchmarks * rename * remove some args * better metrics output * fix up the benchmark * benchmark-yaml * horovod-benchmark * benchmarks * Remove benchmark code for cleanups * benchmark-code * nits * benchmark yamls * benchmark yaml * ok * ok * ok * benchmark * nit * finish_bench * makedatacreator * relax * metrics * autosetsampler * profile * movements * OK * smoothen * fix * nitdocs * loss * envflag * comments * nit * format * visible * images * move_images * fix * rernder * rrender * rest * multgpu * fix * nit * finish * extrra * setup * experimental * as_trainable * fix * ok * format * create_torch_pbt * setup_pbt * ok * format * ok * format * docs * ok * Draft head-is-worker * Fix missing concurrency between local and remote workers * Fix tqdm to work with head-is-worker * Cleanup * Implement state_dict and load_state_dict * Reserve resources on the head node for the local worker * Update the development cluster setup * Add spot block reservation to the development yaml * ok * Draft the fault tolerance fix * Small fixes to local-remote concurrency * Cleanup + fix typo * fixes * worker_counts * some formatting and asha * fix * okme * fixactorkill * unify * Revert the cluster mounts * Cut the handler-reporter API * Fix most tests * Rm tqdm_handler.py * Re-add tune test * Automatically force-shutdown on actor errors on shutdown * Formatting * fix_tune_test * Add timeout error verification * Rename tqdm to use_tqdm * fixtests * ok * remove_redundant * deprecated * deactivated * ok_try_this * lint * nice * done * retries * fixes * kill * retry * init_transformer * init * deployit * improve_example * trans * rename * formats * format-to-py37 * time_to_test * more_changes * ok * update_args_and_script * fp16_epoch * huggingface * training stats * distributed * Apply suggestions from code review * transformer Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Co-authored-by: Maksim Smolin <maximsmol@gmail.com>	2020-04-17 15:17:30 -07:00
Maksim Smolin	d6f4e5b3e1	[SGD] Imagenet example (basic) (#8020 ) * Checkpoint the image-models example * Update cluster definition * Fix copyright info * Use original args * Checkpoint fixes * Add README * Add some missing features * Format * Get rid of the unused Namespace class * Address comments * Link the imagenet example in docs * Cleanup * Fix lint	2020-04-17 13:33:55 -07:00
Richard Liaw	dd63178e91	[sgd] Semantic Segmentation Example (#7825 ) * better_example * test * improve some usability things * submit * fix * making a segmentation example * segmentation_example * segmentation * device * flake * Update python/ray/util/sgd/torch/training_operator.py * uti * finished_example * block * format * locationg * fix * ok * revert * segmentation * lint_and_test * address_comments	2020-04-10 20:35:45 -07:00
Richard Liaw	f63b4c1110	[sgd] make ddp optional (#7875 ) * loosen * devices * tryitout * fix * fix * fix * easy * test * fix * fix * better visibility * fix	2020-04-06 11:41:36 -07:00
Richard Liaw	314250d072	[docs] Make Ray slack more prominent (#7870 )	2020-04-02 11:14:02 -07:00
Richard Liaw	24bf6ad607	[raysgd] Improve raysgd examples (#7818 ) * better_example * test * improve some usability things * submit * fix * flake * Update python/ray/util/sgd/torch/training_operator.py * trythis * fix * fix * smoke * fail * fix * fix	2020-04-01 08:58:39 -07:00
Richard Liaw	86cff17e7e	[tune/raysgd] Tune API for TorchTrainer + Fix State Restoration (#7547 )	2020-03-30 12:58:49 -05:00
Richard Liaw	d046faeb9c	[sgd] Readme fix (#7564 ) * readme fix * replicas	2020-03-11 13:40:18 -07:00
Richard Liaw	b70f31339c	[sgd] Benchmark Fixes (#7553 ) * fix * fix	2020-03-11 13:08:27 -07:00
Richard Liaw	fbac256982	[sgd] Add benchmarks (#7454 ) * Init fp16 * fp16 and schedulers * scheduler linking and fp16 * to fp16 * loss scaling and documentation * more documentation * add tests, refactor config * moredocs * more docs * fix logo, add test mode, add fp16 flag * fix tests * fix scheduler * fix apex * improve safety * fix tests * fix tests * remove pin memory default * rm * fix * Update doc/examples/doc_code/raysgd_torch_signatures.py * fix * migrate changes from other PR * ok thanks * pass * signatures * lint' * Update python/ray/experimental/sgd/pytorch/utils.py * Apply suggestions from code review Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com> * should address most comments * comments * fix this ci * first_pass * add overrides * override * fixing up operators * format * sgd * constants * rm * revert * save * failures * fixes * trainer * run test * operator * code * op * ok done * operator * sgd test fixes * ok * trainer * format * Apply suggestions from code review Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com> * Update doc/source/raysgd/raysgd_pytorch.rst * docstring * dcgan * doc * commits * nit * testing * revert * Start renaming pytorch to torch * Rename PyTorchTrainer to TorchTrainer * Rename PyTorch runners to Torch runners * Finish renaming API * Rename to torch in tests * Finish renaming docs + tests * Run format + fix DeprecationWarning * fix * move tests up * benchmarks * rename * remove some args * better metrics output * fix up the benchmark * benchmark-yaml * horovod-benchmark * benchmarks * Remove benchmark code for cleanups * benchmark-code * nits * benchmark yamls * benchmark yaml * ok * ok * ok * benchmark * nit * finish_bench * makedatacreator * relax * metrics * autosetsampler * profile * movements * OK * smoothen * fix * nitdocs * loss * envflag * comments * nit * format * visible * images * move_images * fix * rernder * rrender * rest * multgpu * fix * nit * finish * extrra * setup * revert Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Co-authored-by: Maksim Smolin <maximsmol@gmail.com>	2020-03-11 01:09:08 -07:00
Richard Liaw	d192ef0611	[raysgd] Cleanup User API (#7384 ) * Init fp16 * fp16 and schedulers * scheduler linking and fp16 * to fp16 * loss scaling and documentation * more documentation * add tests, refactor config * moredocs * more docs * fix logo, add test mode, add fp16 flag * fix tests * fix scheduler * fix apex * improve safety * fix tests * fix tests * remove pin memory default * rm * fix * Update doc/examples/doc_code/raysgd_torch_signatures.py * fix * migrate changes from other PR * ok thanks * pass * signatures * lint' * Update python/ray/experimental/sgd/pytorch/utils.py * Apply suggestions from code review Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com> * should address most comments * comments * fix this ci * first_pass * add overrides * override * fixing up operators * format * sgd * constants * rm * revert * save * failures * fixes * trainer * run test * operator * code * op * ok done * operator * sgd test fixes * ok * trainer * format * Apply suggestions from code review Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com> * Update doc/source/raysgd/raysgd_pytorch.rst * docstring * dcgan * doc * commits * nit * testing * revert * Start renaming pytorch to torch * Rename PyTorchTrainer to TorchTrainer * Rename PyTorch runners to Torch runners * Finish renaming API * Rename to torch in tests * Finish renaming docs + tests * Run format + fix DeprecationWarning * fix * move tests up * benchmarks * rename * remove some args * better metrics output * fix up the benchmark * benchmark-yaml * horovod-benchmark * benchmarks * Remove benchmark code for cleanups * makedatacreator * relax * metrics * autosetsampler * profile * movements * OK * smoothen * fix * nitdocs * loss * comments * fix * fix * runner_tests * codes * example * fix_test * fix * tests Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Co-authored-by: Maksim Smolin <maximsmol@gmail.com>	2020-03-10 08:41:42 -07:00
Maksim Smolin	3a134c7224	[RaySGD] Rename PyTorch API endpoints to start with Torch (#7425 ) * Start renaming pytorch to torch * Rename PyTorchTrainer to TorchTrainer * Rename PyTorch runners to Torch runners * Finish renaming API * Rename to torch in tests * Finish renaming docs + tests * Run format + fix DeprecationWarning * fix * move tests up * rename Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2020-03-03 16:44:42 -08:00
Richard Liaw	48cdca843f	[raysgd] Custom training operator (#7211 )	2020-03-01 21:22:48 -08:00
Eric Liang	5df801605e	Add ray.util package and move libraries from experimental (#7100 )	2020-02-18 13:43:19 -08:00
Richard Liaw	94e2fcea2e	[sgd] fp16 (apex) and scheduler support + move examples page (#7061 ) * Init fp16 * fp16 and schedulers * scheduler linking and fp16 * to fp16 * loss scaling and documentation * more documentation * add tests, refactor config * moredocs * more docs * fix logo, add test mode, add fp16 flag * fix tests * fix scheduler * fix apex * improve safety * fix tests * fix tests * remove pin memory default * rm * fix * Update doc/examples/doc_code/raysgd_torch_signatures.py * fix * migrate changes from other PR * ok thanks * pass * signatures * lint' * Update python/ray/experimental/sgd/pytorch/utils.py * Apply suggestions from code review Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com> * should address most comments * comments * fix this ci * fix tests' * testmode Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>	2020-02-16 19:04:08 -08:00
Richard Liaw	037aa2b961	[sgd] Refactor PyTorch SGD Documentation. (#6910 ) * Refactor documentation and directory structurre * update loss * ,ore examples * fix comments * more code * svgs * formatting * more_docs * more writing * comments ready * move * whitespace * examples * fix * bold * pytorch * batch * fix * fix test * Apply suggestions from code review * quarantinegp * tests/ * fix missing	2020-01-29 08:51:01 -08:00

1 2

52 commits