hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 18:41:40 -05:00

Author	SHA1	Message	Date
Edward Oakes	9a721ed71a	Link to serve in tune overview (#8487 )	2020-05-18 11:29:38 -05:00
Sven Mika	796a834c48	[RLlib] Attention Net integration into ModelV2 and learning RL example. (#8371 )	2020-05-18 17:26:40 +02:00
Richard Liaw	87cbf2aedd	[docs][tune] Make search algorithm, scheduler docs better! (#8179 )	2020-05-17 12:19:44 -07:00
SangBin Cho	2f01776d09	Fix ray memory example (#8462 )	2020-05-17 11:34:11 -05:00
Tao Wang	acffdb2349	[TEST]use cc_test to run core_worker_test, enforce/reuse RedisServiceManagerForTest (#8443 )	2020-05-17 18:43:00 +08:00
Edward Oakes	fb23bd6fc0	[serve] Optionally namespace serve clusters (#8447 )	2020-05-17 00:14:42 -05:00
Richard Liaw	67c01455fe	[tune] `tune.track` -> `tune.report` (#8388 )	2020-05-16 12:55:08 -07:00
Stephanie Wang	bd169749e0	Option to retry failed actor tasks (#8330 ) * Python * Consolidate state in the direct actor transport, set the caller starts at * todo * Remove unused * Update and unit tests * Doc * Remove unused * doc * Remove debug * Update src/ray/core_worker/transport/direct_actor_transport.h Co-authored-by: Eric Liang <ekhliang@gmail.com> * Update src/ray/core_worker/transport/direct_actor_transport.cc Co-authored-by: Eric Liang <ekhliang@gmail.com> * lint and fix build * Update * Fix build * Fix tests * Unit test for max_task_retries=0 * Fix java? * Fix bad test * Cross language fix * fix java Co-authored-by: Eric Liang <ekhliang@gmail.com>	2020-05-15 20:15:15 -07:00
Edward Oakes	ef498e8aa5	[serve] Add basic session affinity via shard key (#8449 )	2020-05-15 16:18:52 -05:00
Max Fitton	00325eb2b2	Rename max_reconstructions to max_restarts and use -1 for infinite (#8274 ) Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>	2020-05-14 10:30:29 -05:00
Eric Liang	eabb801a40	less important (#8439 )	2020-05-13 22:52:38 -07:00
Siyuan (Ryans) Zhuang	ab278071ac	Update serialization doc (#8381 ) * update serialization doc	2020-05-12 16:47:00 -07:00
Jason McGhee	24ced808cd	Fix config key in docs for using PyTorch (#8300 ) Docs improperly suggest using "torch" when the actual flag is called "use_pytorch"	2020-05-11 12:41:21 -07:00
Eric Liang	f48da50e1c	[rllib] observation function api for multi-agent (#8236 )	2020-05-04 22:13:49 -07:00
Rüdiger Busche	e93ec3134a	Use kubectl delete pod in example (#8295 ) Co-authored-by: rbusche <rbusche@inserve.de>	2020-05-04 21:39:30 -05:00
Sven Mika	b95e28faea	[RLlib] APEX_DDPG (PyTorch) test case and docs. (#8288 ) APEX_DDPG (PyTorch) test case and docs.	2020-05-04 09:36:27 +02:00
Sven Mika	166bb5d690	[RLlib] IMPALA PyTorch (#8287 ) This PR adds an IMPALA PyTorch implementation. - adds compilation tests for LSTM and w/o LSTM. - adds learning test for CartPole.	2020-05-03 13:44:25 +02:00
Sven Mika	42991d723f	[RLlib] rllib/examples folder restructuring (#8250 ) Cleans up of the rllib/examples folder by moving all example Envs into rllibexamples/env (so they can be used by other scripts and tests as well).	2020-05-01 22:59:34 +02:00
Edward Oakes	6373c70661	[serve] Refactor BackendConfig (#8202 )	2020-04-30 22:31:07 -05:00
Edward Oakes	95d187e556	[serve] Add delete_endpoint call (#8256 )	2020-04-30 20:59:07 -05:00
Edward Oakes	43be73e4cf	[serve] Add delete_backend call (#8252 )	2020-04-30 13:10:39 -05:00
mehrdadn	254b1ec370	Set up testing and wheels for Windows on GitHub Actions (#8131 ) * Move some Java tests into ci.sh * Move C++ worker tests into ci.sh * Define run() * Prepare to move Python tests into ci.sh * Fix issues in install-dependencies.sh * Reload environment for GitHub Actions * Move wheels to ci.sh and fix related issues * Don't bypass failures in install-ray.sh anymore * Make CI a little quieter * Move linting into ci.sh * Add vitals test right after build * Fix os.uname() unavailability on Windows Co-authored-by: Mehrdad <noreply@github.com>	2020-04-29 21:19:02 -07:00
Simon Mo	101255f782	[Serve] RayServe TF, PyTorch, Sklearn Examples (#8156 )	2020-04-28 22:24:55 -07:00
Simon Mo	af3d3e778e	[RayServe] Specify installation instruction in doc (#8220 )	2020-04-28 14:38:10 -07:00
Richard Liaw	be5235d982	[tune] Clarify Intro Tune Documentation (#8201 )	2020-04-27 18:01:00 -07:00
ijrsvt	a77e5a8cbf	[Doc] Fix Docstring for Task Cancellation (#8198 )	2020-04-27 17:06:08 -07:00
Robert Zangnan Yu	a77b19e4f2	[docs] Comments on potential srun orders during Slurm Deployment (#8183 )	2020-04-27 09:30:16 -07:00
Richard Liaw	87557a00fa	[tune] Refactor search algorithms (#7037 ) * start refactoring of search algorithms * format * needs tests * fix * suggestions * Fix PBT * lint * refactoring * hyperopt_working * dragonfly * hyperopt * change_half_of_algs * save * code-removed * remove_lots_of_unneccessary * changes * formatting * suggest * reset * rm * tests * search-change * exception * refactor-doc * search * py * moredocs * Update doc/source/tune-searchalg.rst * concurrency * max * tune * betterwarning * bohb * tests * test-change Co-authored-by: ujvl <misraujval@gmail.com>	2020-04-27 08:51:13 -07:00
Richard Liaw	b506f87117	[tune] New Doc edits, add Concepts page (#8083 ) Co-Authored-By: Sven Mika <sven@anyscale.io>	2020-04-25 18:25:56 -07:00
Sven Mika	499ad5fbe4	[RLlib] PyTorch version of APPO. (#8120 ) - Translate all vtrace functionality to torch and added torch to the framework_iterator-loop in all existing vtrace test cases. - Add learning test cases for APPO torch (both w/ and w/o v-trace). - Add quick compilation tests for APPO (tf and torch, v-trace and no v-trace).	2020-04-23 09:11:12 +02:00
Edward Oakes	505f3a8714	[serve] Remove serve.link(), rename serve.split() -> serve.set_traffic() (#8072 )	2020-04-21 14:26:07 -05:00
Sven Mika	d15609ba2a	[RLlib] PyTorch version of ARS (Augmented Random Search). (#8106 ) This PR implements a PyTorch version of RLlib's ARS algorithm using RLlib's functional algo builder API. It also adds a regression test for ARS (torch) on CartPole.	2020-04-21 09:47:52 +02:00
Sven Mika	3812bfedda	[RLlib] PyTorch version of ES (Evolution Strategies). (#8104 ) PyTorch version of Evolution Strategies (ES) Algo.	2020-04-20 21:47:28 +02:00
Bill Chambers	77655749fb	[RayServe] RayServe Introduction and Overview (#8038 )	2020-04-20 12:05:59 -05:00
Sven Mika	165a86f1ab	[RLlib] SAC MuJoCo instability issues (tf and torch versions). (#8063 ) SAC (both torch and tf versions) are showing issues (crashes) due to numeric instabilities in the SquashedGaussian distribution (sampling + logp after extreme NN outputs). This PR fixes these. Stable MuJoCo learning (HalfCheetah) has been confirmed on both tf and torch versions. A Distribution stability test (using extreme NN outputs) has been added for SquashedGaussian (can be used for any other type of distribution as well).	2020-04-19 10:20:23 +02:00
Sumanth Ratna	bdb03a0544	[tune] Update dragonfly installation instructions (#8086 ) Closes #8084	2020-04-18 20:25:38 -07:00
Richard Liaw	857e4dba2f	[sgd] HuggingFace GLUE Fine-tuning Example (#7792 ) * Init fp16 * fp16 and schedulers * scheduler linking and fp16 * to fp16 * loss scaling and documentation * more documentation * add tests, refactor config * moredocs * more docs * fix logo, add test mode, add fp16 flag * fix tests * fix scheduler * fix apex * improve safety * fix tests * fix tests * remove pin memory default * rm * fix * Update doc/examples/doc_code/raysgd_torch_signatures.py * fix * migrate changes from other PR * ok thanks * pass * signatures * lint' * Update python/ray/experimental/sgd/pytorch/utils.py * Apply suggestions from code review Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com> * should address most comments * comments * fix this ci * first_pass * add overrides * override * fixing up operators * format * sgd * constants * rm * revert * save * failures * fixes * trainer * run test * operator * code * op * ok done * operator * sgd test fixes * ok * trainer * format * Apply suggestions from code review Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com> * Update doc/source/raysgd/raysgd_pytorch.rst * docstring * dcgan * doc * commits * nit * testing * revert * Start renaming pytorch to torch * Rename PyTorchTrainer to TorchTrainer * Rename PyTorch runners to Torch runners * Finish renaming API * Rename to torch in tests * Finish renaming docs + tests * Run format + fix DeprecationWarning * fix * move tests up * benchmarks * rename * remove some args * better metrics output * fix up the benchmark * benchmark-yaml * horovod-benchmark * benchmarks * Remove benchmark code for cleanups * benchmark-code * nits * benchmark yamls * benchmark yaml * ok * ok * ok * benchmark * nit * finish_bench * makedatacreator * relax * metrics * autosetsampler * profile * movements * OK * smoothen * fix * nitdocs * loss * envflag * comments * nit * format * visible * images * move_images * fix * rernder * rrender * rest * multgpu * fix * nit * finish * extrra * setup * experimental * as_trainable * fix * ok * format * create_torch_pbt * setup_pbt * ok * format * ok * format * docs * ok * Draft head-is-worker * Fix missing concurrency between local and remote workers * Fix tqdm to work with head-is-worker * Cleanup * Implement state_dict and load_state_dict * Reserve resources on the head node for the local worker * Update the development cluster setup * Add spot block reservation to the development yaml * ok * Draft the fault tolerance fix * Small fixes to local-remote concurrency * Cleanup + fix typo * fixes * worker_counts * some formatting and asha * fix * okme * fixactorkill * unify * Revert the cluster mounts * Cut the handler-reporter API * Fix most tests * Rm tqdm_handler.py * Re-add tune test * Automatically force-shutdown on actor errors on shutdown * Formatting * fix_tune_test * Add timeout error verification * Rename tqdm to use_tqdm * fixtests * ok * remove_redundant * deprecated * deactivated * ok_try_this * lint * nice * done * retries * fixes * kill * retry * init_transformer * init * deployit * improve_example * trans * rename * formats * format-to-py37 * time_to_test * more_changes * ok * update_args_and_script * fp16_epoch * huggingface * training stats * distributed * Apply suggestions from code review * transformer Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Co-authored-by: Maksim Smolin <maximsmol@gmail.com>	2020-04-17 15:17:30 -07:00
Maksim Smolin	d6f4e5b3e1	[SGD] Imagenet example (basic) (#8020 ) * Checkpoint the image-models example * Update cluster definition * Fix copyright info * Use original args * Checkpoint fixes * Add README * Add some missing features * Format * Get rid of the unused Namespace class * Address comments * Link the imagenet example in docs * Cleanup * Fix lint	2020-04-17 13:33:55 -07:00
roireshef	dbcad35022	[RLlib] Added DefaultCallbacks which replaces old callbacks dict interface (#6972 )	2020-04-16 16:06:42 -07:00
Richard Liaw	2cb3355495	[docs] Move css to right location (#8053 )	2020-04-16 13:46:50 -07:00
Richard Liaw	d5f517b2f5	[docs] Hotfix for missing css files. (#8051 )	2020-04-16 11:44:55 -07:00
Richard Liaw	4d8bf5635d	[hotfix] Lint formatting for new Tune optimizer ZOOpt (#8040 ) * formatting * removedill * lint	2020-04-16 09:24:30 -07:00
Sven Mika	d0fab84e4d	[RLlib] DDPG PyTorch version. (#7953 ) The DDPG/TD3 algorithms currently do not have a PyTorch implementation. This PR adds PyTorch support for DDPG/TD3 to RLlib. This PR: - Depends on the re-factor PR for DDPG (Functional Algorithm API). - Adds learning regression tests for the PyTorch version of DDPG and a DDPG (torch) - Updates the documentation to reflect that DDPG and TD3 now support PyTorch. * Learning Pendulum-v0 on torch version (same config as tf). Wall time a little slower (~20% than tf). * Fix GPU target model problem.	2020-04-16 10:20:01 +02:00
Servon	5c274fe631	[Tune] Add ZOOpt search algorithm (#7960 ) * add zoopt * add zoopt search algo * add zoopt * fix zoopt * add zoopt requirements * fix zoopt * remove generated guides * Apply suggestions from code review Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2020-04-15 21:13:29 -07:00
Simon Mo	7455610d5a	Serve Doc: Quickstart (#7940 )	2020-04-15 12:25:37 -07:00
Robert Nishihara	d985d7537e	Replace all instances of ray.readthedocs.io with ray.io (#7994 )	2020-04-13 16:17:05 -07:00
Richard Liaw	e97adba6ac	[autoscaler] Improve argument handling for submit (#7986 ) * docs * Apply suggestions from code review Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com> * ok Co-authored-by: Kristian Hartikainen <kristian.hartikainen@gmail.com>	2020-04-13 15:53:42 -07:00
Richard Liaw	e68d601ec7	[docs] Add link master <-> latest via sphinx version warnings (#8010 )	2020-04-13 15:21:08 -07:00
Richard Liaw	dd63178e91	[sgd] Semantic Segmentation Example (#7825 ) * better_example * test * improve some usability things * submit * fix * making a segmentation example * segmentation_example * segmentation * device * flake * Update python/ray/util/sgd/torch/training_operator.py * uti * finished_example * block * format * locationg * fix * ok * revert * segmentation * lint_and_test * address_comments	2020-04-10 20:35:45 -07:00
Sven Mika	d2b5c171cb	[RLlib] Add pytorch sigils to toc and add links to algo overview table. (#7950 ) * Add torch sigils to toc-tree for DQN/APEX. * WIP.	2020-04-09 10:40:18 -07:00

1 2 3 4 5 ...

801 commits