hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
mehrdadn	56d2cf6479	Shellcheck rewrites (#9597 ) * Fix SC2001: See if you can use ${variable//search/replace} instead. * Fix SC2010: Don't use ls \| grep. Use a glob or a for loop with a condition to allow non-alphanumeric filenames. * Fix SC2012: Use find instead of ls to better handle non-alphanumeric filenames. * Fix SC2015: Note that A && B \|\| C is not if-then-else. C may run when A is true. * Fix SC2028: echo may not expand escape sequences. Use printf. * Fix SC2034: variable appears unused. Verify use (or export if used externally). * Fix SC2035: Use ./glob or -- glob so names with dashes won't become options. * Fix SC2071: > is for string comparisons. Use -gt instead. * Fix SC2154: variable is referenced but not assigned * Fix SC2164: Use 'cd ... \|\| exit' or 'cd ... \|\| return' in case cd fails. * Fix SC2188: This redirection doesn't have a command. Move to its command (or use 'true' as no-op). * Fix SC2236: Use -n instead of ! -z. * Fix SC2242: Can only exit with status 0-255. Other data should be written to stdout/stderr. * Fix SC2086: Double quote to prevent globbing and word splitting. Co-authored-by: Mehrdad <noreply@github.com>	2020-07-24 17:24:19 -05:00
mehrdadn	b14728d999	Shellcheck quoting (#9596 ) * Fix SC2006: Use $(...) notation instead of legacy backticked `...`. * Fix SC2016: Expressions don't expand in single quotes, use double quotes for that. * Fix SC2046: Quote this to prevent word splitting. * Fix SC2053: Quote the right-hand side of == in [[ ]] to prevent glob matching. * Fix SC2068: Double quote array expansions to avoid re-splitting elements. * Fix SC2086: Double quote to prevent globbing and word splitting. * Fix SC2102: Ranges can only match single chars (mentioned due to duplicates). * Fix SC2140: Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A\"B\"C"? * Fix SC2145: Argument mixes string and array. Use * or separate argument. * Fix SC2209: warning: Use var=$(command) to assign output (or quote to assign string). Co-authored-by: Mehrdad <noreply@github.com>	2020-07-21 21:56:41 -05:00
mehrdadn	4f470c3fc1	Shellcheck comments (#9595 )	2020-07-21 16:47:09 -05:00
Richard Liaw	7e3ded7439	[tune] pin tune-sklearn (#9498 )	2020-07-17 21:25:12 -07:00
krfricke	deba082cb4	[tune] PyTorch CIFAR10 example (#9338 ) Co-authored-by: Richard Liaw <rliaw@berkeley.edu> Co-authored-by: Kai Fricke <kai@anyscale.com>	2020-07-13 23:16:05 -07:00
Richard Liaw	dfe3ebe4a2	[tune] sklearn comment out (#9454 )	2020-07-13 16:06:44 -07:00
Richard Liaw	139d21e068	[tune] Docs for tune-sklearn (#9129 ) Co-authored-by: krfricke <krfricke@users.noreply.github.com>	2020-07-06 15:35:10 -07:00
krfricke	e0b6984dce	[tune] pytorch lightning template and walkthrough (#9151 ) Co-authored-by: Kai Fricke <kai@anyscale.com>	2020-06-29 16:52:07 -07:00
Amog Kamsetty	f95ab4f506	[Testing] Multi-node Training+Tune Long Running Test (#8966 )	2020-06-22 14:49:16 -07:00
Richard Liaw	6c49c01837	[tune] Function API checkpointing (#8471 ) Co-authored-by: krfricke <krfricke@users.noreply.github.com>	2020-06-15 10:42:54 -07:00
Simon Mo	b93d6813ae	Build from source in Jenkins (#8255 )	2020-05-28 09:38:16 -07:00
Maksim Smolin	c2acb7ffe2	[SGD] Add imagenet example CI (#8150 )	2020-05-02 16:48:35 -07:00
Richard Liaw	87557a00fa	[tune] Refactor search algorithms (#7037 ) * start refactoring of search algorithms * format * needs tests * fix * suggestions * Fix PBT * lint * refactoring * hyperopt_working * dragonfly * hyperopt * change_half_of_algs * save * code-removed * remove_lots_of_unneccessary * changes * formatting * suggest * reset * rm * tests * search-change * exception * refactor-doc * search * py * moredocs * Update doc/source/tune-searchalg.rst * concurrency * max * tune * betterwarning * bohb * tests * test-change Co-authored-by: ujvl <misraujval@gmail.com>	2020-04-27 08:51:13 -07:00
Richard Liaw	9f3e9e7e9f	[tune] Add more intensive tests (#7667 ) * make_heavier_tests * help	2020-04-20 11:14:44 -07:00
Richard Liaw	6545534805	[tune/sgd] DCGAN example self-contained, turn example into modu… (#8012 ) * ok * done * run_benchmarks * should_make_examples_usable	2020-04-16 17:55:27 -07:00
Servon	5c274fe631	[Tune] Add ZOOpt search algorithm (#7960 ) * add zoopt * add zoopt search algo * add zoopt * fix zoopt * add zoopt requirements * fix zoopt * remove generated guides * Apply suggestions from code review Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2020-04-15 21:13:29 -07:00
Simon Mo	59867dad75	Move Jenkins test to Github action (#7342 )	2020-04-09 10:27:19 -07:00
Richard Liaw	24bf6ad607	[raysgd] Improve raysgd examples (#7818 ) * better_example * test * improve some usability things * submit * fix * flake * Update python/ray/util/sgd/torch/training_operator.py * trythis * fix * fix * smoke * fail * fix * fix	2020-04-01 08:58:39 -07:00
Ujval Misra	6022eb53c4	[tune] Use newest checkpoint in normal operation (#7563 ) * Use persistent checkpoint for failures * Fix test * Add unpause test * move test * Fix tests * remove debug statement * Mark test as flaky	2020-03-12 22:21:42 -07:00
Richard Liaw	d192ef0611	[raysgd] Cleanup User API (#7384 ) * Init fp16 * fp16 and schedulers * scheduler linking and fp16 * to fp16 * loss scaling and documentation * more documentation * add tests, refactor config * moredocs * more docs * fix logo, add test mode, add fp16 flag * fix tests * fix scheduler * fix apex * improve safety * fix tests * fix tests * remove pin memory default * rm * fix * Update doc/examples/doc_code/raysgd_torch_signatures.py * fix * migrate changes from other PR * ok thanks * pass * signatures * lint' * Update python/ray/experimental/sgd/pytorch/utils.py * Apply suggestions from code review Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com> * should address most comments * comments * fix this ci * first_pass * add overrides * override * fixing up operators * format * sgd * constants * rm * revert * save * failures * fixes * trainer * run test * operator * code * op * ok done * operator * sgd test fixes * ok * trainer * format * Apply suggestions from code review Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com> * Update doc/source/raysgd/raysgd_pytorch.rst * docstring * dcgan * doc * commits * nit * testing * revert * Start renaming pytorch to torch * Rename PyTorchTrainer to TorchTrainer * Rename PyTorch runners to Torch runners * Finish renaming API * Rename to torch in tests * Finish renaming docs + tests * Run format + fix DeprecationWarning * fix * move tests up * benchmarks * rename * remove some args * better metrics output * fix up the benchmark * benchmark-yaml * horovod-benchmark * benchmarks * Remove benchmark code for cleanups * makedatacreator * relax * metrics * autosetsampler * profile * movements * OK * smoothen * fix * nitdocs * loss * comments * fix * fix * runner_tests * codes * example * fix_test * fix * tests Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Co-authored-by: Maksim Smolin <maximsmol@gmail.com>	2020-03-10 08:41:42 -07:00
Anthony Yu	89ec4adb72	[tune] Dragonfly Optimizer (#5955 ) * Add sample example * Copy relevant lines of ask from inherited Optimizer * Ignore strategy * Additional changes * Add DragonflySearch for tune connector for Dragonfly * Add example and fix small errors * lint * Remove skopt references * Update example based off of Dragonfly changes * Edit example for final Dragonfly edits * Formatting and documentation edits * Add documentation and add to test pipeline * Address PR comments * Fix Jenkins test * Adjust Dragonfly to PR#7366 * Lint * fix_tests Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2020-03-10 08:40:36 -07:00
Maksim Smolin	3a134c7224	[RaySGD] Rename PyTorch API endpoints to start with Torch (#7425 ) * Start renaming pytorch to torch * Rename PyTorchTrainer to TorchTrainer * Rename PyTorch runners to Torch runners * Finish renaming API * Rename to torch in tests * Finish renaming docs + tests * Run format + fix DeprecationWarning * fix * move tests up * rename Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2020-03-03 16:44:42 -08:00
Eric Liang	5df801605e	Add ray.util package and move libraries from experimental (#7100 )	2020-02-18 13:43:19 -08:00
mehrdadn	3bd82d0bcd	Fix various issues/warnings that come up on Jenkins (#7147 ) * Avoid warning about swap being unlimited Currently we get the following message on Jenkins: "Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap." Since we're not limiting swap anyway, we might as well avoid trying to. https://docs.docker.com/config/containers/resource_constraints/#--memory-swap-details * Fix escaping in re.search() * Fix escaping in _noisy_layer() * Raise a more descriptive error when dashboard data isn't found * Don't error on dashboard files not being found when webui isn't required * Change dashboard error to a warning instead	2020-02-17 16:08:55 -08:00
Richard Liaw	94e2fcea2e	[sgd] fp16 (apex) and scheduler support + move examples page (#7061 ) * Init fp16 * fp16 and schedulers * scheduler linking and fp16 * to fp16 * loss scaling and documentation * more documentation * add tests, refactor config * moredocs * more docs * fix logo, add test mode, add fp16 flag * fix tests * fix scheduler * fix apex * improve safety * fix tests * fix tests * remove pin memory default * rm * fix * Update doc/examples/doc_code/raysgd_torch_signatures.py * fix * migrate changes from other PR * ok thanks * pass * signatures * lint' * Update python/ray/experimental/sgd/pytorch/utils.py * Apply suggestions from code review Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com> * should address most comments * comments * fix this ci * fix tests' * testmode Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>	2020-02-16 19:04:08 -08:00
Sven Mika	2e60f0d4d8	[RLlib] Move all jenkins RLlib-tests into bazel (rllib/BUILD). (#7178 ) * commit * comment	2020-02-15 14:50:44 -08:00
Sven Mika	5518a738b3	[RLlib] Fix erroneous use of LinearSchedule (in DDPG's exploration annealing). (#7125 ) * Fix erroneous use of LinearSchedule (in DDPG's exploration annealing). Erase schedules_obsoleted.py. * Trigger re-test. * Re-test.	2020-02-12 23:46:49 -08:00
Sven Mika	6e1c3ea824	[RLlib] Exploration API (+EpsilonGreedy sub-class). (#6974 )	2020-02-10 15:22:07 -08:00
Eric Liang	fbc545c03b	[rllib] Support parallel, parameterized evaluation (#6981 ) * eval api * update * sync eval filters * sync fix * docs * update * docs * update * link * nit * doc updates * format	2020-02-01 22:12:12 -08:00
Richard Liaw	037aa2b961	[sgd] Refactor PyTorch SGD Documentation. (#6910 ) * Refactor documentation and directory structurre * update loss * ,ore examples * fix comments * more code * svgs * formatting * more_docs * more writing * comments ready * move * whitespace * examples * fix * bold * pytorch * batch * fix * fix test * Apply suggestions from code review * quarantinegp * tests/ * fix missing	2020-01-29 08:51:01 -08:00
Sven Mika	446cbdf2e0	[RLlib] Fix issue (bug): LSTM + non-shared vf + PPO + tuple actions (#6890 ) * Add `RandomEnv` example to examples folder. Convert warning into Error message when using an LSTM in a non-shared-vf network (after the warning, the program would crash). * LINT. * Fix issue #6884. LSTM + non-shared vf NN + PPO crashes when using a Tuple action space. * LINT * Change warning message for Model: shared_vf=False, LSTM=True cases. * Bug fix. * Add examples/random_env.py test to Jenkins.	2020-01-24 10:29:35 -08:00
Sven Mika	ae9a3a2237	[RLlib] from_config util method for framework agnostic components; start moving RLlib tests into Bazel. (#6865 )	2020-01-22 17:02:58 -08:00
Sven Mika	c957ed58ed	[RLlib] Implement PPO torch version. (#6826 )	2020-01-20 23:06:50 -08:00
Sven	60d4d5e1aa	Remove future imports (#6724 ) * Remove all __future__ imports from RLlib. * Remove (object) again from tf_run_builder.py::TFRunBuilder. * Fix 2xLINT warnings. * Fix broken appo_policy import (must be appo_tf_policy) * Remove future imports from all other ray files (not just RLlib). * Remove future imports from all other ray files (not just RLlib). * Remove future import blocks that contain `unicode_literals` as well. Revert appo_tf_policy.py to appo_policy.py (belongs to another PR). * Add two empty lines before Schedule class. * Put back __future__ imports into determine_tests_to_run.py. Fails otherwise on a py2/print related error.	2020-01-09 00:15:48 -08:00
Sven	f1b56fa5ee	PG unify/cleanup tf vs torch and PG functionality test cases (tf + torch). (#6650 ) * Unifying the code for PGTrainer/Policy wrt tf vs torch. Adding loss function test cases for the PGAgent (confirm equivalence of tf and torch). * Fix LINT line-len errors. * Fix LINT errors. * Fix `tf_pg_policy` imports (formerly: `pg_policy`). * Rename tf_pg_... into pg_tf_... following <alg>_<framework>_... convention, where ...=policy/loss/agent/trainer. Retire `PGAgent` class (use PGTrainer instead). * - Move PG test into agents/pg/tests directory. - All test cases will be located near the classes that are tested and then built into the Bazel/Travis test suite. * Moved post_process_advantages into pg.py (from pg_tf_policy.py), b/c the function is not a tf-specific one. * Fix remaining import errors for agents/pg/... * Fix circular dependency in pg imports. * Add pg tests to Jenkins test suite.	2020-01-02 16:08:03 -08:00
Richard Liaw	5719a05757	[sgd] Add support for multi-model multi-optimizer training (#6317 )	2019-12-15 15:19:45 -08:00
Yuhao Yang	ad4da17899	[Tune] Add example and tutorial for DCGAN (#6400 )	2019-12-13 14:15:44 -08:00
Eric Liang	be5dd8eb5e	Enable direct calls by default (#6367 ) * wip * add * timeout fix * const ref * comments * fix * fix * Move actor state into actor handle * comments 2 * enable by default * temp reorder * some fixes * add debug code * tmp * fix * wip * remove dbg * fix compile * fix * fix check * remove non direct tests * Increment ref count before resolving value * rename * fix another bug * tmp * tmp * Fix object pinning * build change * lint * ActorManager * tmp * ActorManager * fix test component failures * Remove old code * Remove unused * fix * fix * fix resources * fix advanced * eric's diff * blacklist * blacklist * cleanup * annotate * disable tests for now * remove * fix * fix * clean up verbosity * fix test * fix concurrency test * Update .travis.yml * Update .travis.yml * Update .travis.yml * split up analysis suite * split up trial runner suite * fix detached direct actors * fix * split up advanced tesT * lint * fix core worker test hang * fix bad check fail which breaks test_cluster.py in tune * fix some minor diffs in test_cluster * less workers * make less stressful * split up test * retry flaky tests * remove old test flags * fixes * lint * Update worker_pool.cc * fix race * fix * fix bugs in node failure handling * fix race condition * fix bugs in node failure handling * fix race condition * nits * fix test * disable heartbeatS * disable heartbeatS * fix * fix * use worker id * fix max fail * debug exit * fix merge, and apply [PATCH] fix concurrency test * [patch] fix core worker test hang * remove NotifyActorCreation, and return worker on completion of actor creation task * remove actor diied callback * Update core_worker.cc * lint * use task manager * fix merge * fix deadlock * wip * merge conflits * fix * better sysexit handling * better sysexit handling * better sysexit handling * check id * better debug * task failed msg * task failed msg * retry failed tasks with delay * retry failed tasks with delay * clip deps * fix * fix core worker tests * fix task manager test * fix all tests * cleanup * set to 0 for direct tests * dont check worker id for ownership rpc * dont check worker id for ownership rpc * debug messages * add comment * remove debug statements * nit * check worker id * fix test * owner * fix tests	2019-12-13 13:58:04 -08:00
Victor Le	4e24c805ee	AlphaZero and Ranked reward implementation (#6385 )	2019-12-07 12:08:40 -08:00
Eric Liang	4c6739476b	[rllib] Raise an error if GPUs are enabled but not tf.test.is_gpu_available() (#6365 )	2019-12-05 10:13:54 -08:00
Eric Liang	e5863d7914	Force tune tests to run in direct call mode (#6301 ) * force tune direct mode * force tune * fix * Update run_multi_node_tests.sh	2019-11-27 19:58:33 -08:00
Eric Liang	64a3a7239e	Set RAY_FORCE_DIRECT=1 for run_rllib_tests, test_basic (#6171 )	2019-11-25 14:12:11 -08:00
daiyaanarfeen	8f6d73a93a	[sgd] Extend distributed pytorch functionality (#5675 ) * raysgd * apply fn * double quotes * removed duplicate TimerStat * removed duplicate find_free_port * imports in pytorch_trainer * init doc * ray.experimental * remove resize example * resnet example * cifar * Fix up after kwargs * data_dir and dataloader_workers args * formatting * loss * init * update code * lint * smoketest * better_configs * fix * fix * fix * train_loader * fixdocs * ok * ok * fix * fix_update * fix * fix * done * fix * fix * fix * small * lint * fix * fix * fix_test * fix * validate * fix * fi	2019-11-05 11:16:46 -08:00
Richard Liaw	e94bebb1de	[tune] Fix Jenkins tests (#6028 )	2019-11-01 16:42:04 -07:00
Richard Liaw	48ba484640	[tune] Test TF2.0, TF1.14, TF1.12 Tensorboard support (#5931 )	2019-10-18 13:50:42 -07:00
Richard Liaw	d52a4983af	Update TF documentation (#5918 )	2019-10-16 01:31:27 -07:00
Richard Liaw	9f23620412	[tune] tf2.0 mnist example (#5898 ) * tfmnistexample * tfmnist * add_to_ci * format * exampledownlaod * fix	2019-10-15 22:25:01 -07:00
Richard Liaw	1650f7b174	[tune] Remove TF MNIST example + add TrialRunner hook to execut… (#5868 ) * remove test * add trial runner * remvoerestore * Remove other mnist examples * tunetest * revert * v1 * Revert "v1" This reverts commit c8bddaf2db7a8270c43c02021cac0e75df15ed20. * Revert "revert" This reverts commit b58f56884a0c288d3a6f997d149ab4d496ddd7a3. * errors * format	2019-10-13 20:33:56 -07:00
Eric Liang	04e997fe0d	Fix TF2 / rllib test (#5846 )	2019-10-07 14:25:16 -07:00
Anthony Yu	b99cdf4e39	[tune] PBT + Memnn example (#5723 ) * Add example file * Move into train function * Somewhat working example of MemNN, still has some failed trials * Reorganize into a class * Small fixes * Iteration decrease and fix hyperparam_mutations * Add example file * Move into train function * Somewhat working example of MemNN, still has some failed trials * Reorganize into a class * Small fixes * Iteration decrease and fix hyperparam_mutations * Some style edits * Address PR changes without modifying learning rate * Add configs and hyperparameter mutations * Add tune test * Modify import locations * Some parameter changes for testing * Update memnn example * Add tensorboard support and address PR comment * Final changes * lint * generator	2019-10-05 09:22:37 -07:00

1 2 3

116 commits