hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 18:41:40 -05:00

Author	SHA1	Message	Date
Philipp Moritz	a3f8fa426b	Start integrating new GCS APIs (#1379 ) * Start integrating new GCS calls * fixes * tests * cleanup * cleanup and valgrind fix * update tests * fix valgrind * fix more valgrind * fixes * add separate tests for GCS * fix linting * update tests * cleanup * fix python linting * more fixes * fix linting * add plasma manager callback * add some documentation * fix linting * fix linting * fixes * update * fix linting * fix * add spillback count * fixes * linting * fixes * fix linting * fix * fix * fix	2018-01-31 11:01:12 -08:00
Robert Nishihara	4c6dae5517	Raise an exception in Jenkins tests after a timeout. (#1477 )	2018-01-27 20:21:27 -08:00
Robert Nishihara	3195c6aa63	Fix local scheduler crash when driver creates actor and exits. (#1474 ) * Make check failures in redis.cc more informative. * Fix bug by calling task_table_add_task. * Add test.	2018-01-26 14:29:53 -08:00
Kaahan	7aa979a024	[tune] Added Population Based Training (#1355 ) Adds a Population-Based Training (as described in https://arxiv.org/abs/1711.09846) scheduler to Ray.tune. Currently mutates hyperparameters according to either a user-defined list of possible values to mutate to (necessary if hyperparameters can only be certain values ex. sgd_batch_size), or by a factor of 0.8 or 1.2.	2018-01-25 21:38:37 -08:00
Richard Liaw	e5c4d9ea0c	[tune] Fix Trial Logging File name (#1466 )	2018-01-25 17:57:40 -08:00
Robert Nishihara	ab5d4a6010	Bring cloudpickle inside the repository. (#1445 ) * Bring cloudpickle version 0.5.2 inside the repo. * Use internal copy of cloudpickle everywhere. * Fix linting. * Import ordering. * Change __init__.py. * Set pickler in serialization context. * Don't check ray location.	2018-01-25 11:36:37 -08:00
Eric Liang	173f1d629a	[tune] Ray Tune API cleanup (#1454 ) Remove rllib dep: trainable is now a standalone abstract class that can be easily subclassed. Clean up hyperband: fix debug string and add an example. Remove YAML api / ScriptRunner: this was never really used. Move ray.init() out of run_experiments(): This provides greater flexibility and should be less confusing since there isn't an implicit init() done there. Note that this is a breaking API change for tune.	2018-01-24 16:55:17 -08:00
Richard Liaw	a7d544424c	[tune] Experiment Management API (#1328 ) * init for exposing external interface * revisions * http server * small * simplify * ui * fixes * test * nit * nit * merge * untested * nits * nit * init tests * tests * more tests * nit * fix hyperband * cleanup * nits * good stuff * cleanup * comments and need to test * nit * notebook * testing * test and expose server * server_tests * docs * periods * fix tests * committing test * fi	2018-01-24 13:45:10 -08:00
Eric Liang	1d2a28ab07	[rllib] test all combinations of {obs_space} x {action_space} (#1449 )	2018-01-24 11:03:43 -08:00
Robert Nishihara	f32c0c8ec1	Move calls to ray.worker.cleanup into tearDown part of tests for isolation. (#1433 )	2018-01-22 22:54:56 -08:00
Devin Petersohn	4aca016bff	Adding series and a way to validate our API. (#1435 ) * Adding series and a way to validate our API. * Moving partitions into protected status	2018-01-21 19:20:54 -08:00
Stephanie Wang	74718efa73	Nondeterministic reconstruction for actors (#1344 ) * Add failing unit test for nondeterministic reconstruction * Retry scheduling actor tasks if reassigned to local scheduler * Update execution edges asynchronously upon dispatch for nondeterministic reconstruction * Fix bug for updating checkpoint task execution dependencies * Update comments for deterministic reconstruction * cleanup * Add (and skip) failing test case for nondeterministic reconstruction * Suppress test output	2018-01-21 13:44:13 -08:00
eugenevinitsky	37076a9ff8	Multiagent model using concatenated observations (#1416 ) * working multi action distribution and multiagent model * currently working but the splits arent done in the right place * added shared models * added categorical support and mountain car example * now compatible with generalized advantage estimation * working multiagent code with discrete and continuous example * moved reshaper to utils * code review changes made, ppo action placeholder moved to model catalog, all multiagent code moved out of fcnet * added examples in * added PEP8 compliance * examples are mostly pep8 compliant * removed all flake errors * added examples to jenkins tests * fixed custom options bug * added lines to let docker file find multiagent tests * shortened example run length * corrected nits * fixed flake errors	2018-01-18 19:51:31 -08:00
Richard Liaw	d4592382a4	[tune][minor] Fixes (#1383 )	2018-01-11 18:14:20 -08:00
Philipp Moritz	44792530a9	fix autoscaler test (#1411 )	2018-01-10 13:18:34 -08:00
Devin Petersohn	112ef07563	Adding all DataFrame methods with NotImplementedErrors (#1403 ) * Adding all DataFrame methods with NotImplementedErrors * Moving dataframe creation into function call	2018-01-07 12:00:16 -08:00
Eric Liang	b6c42f96be	Auto-scale ray clusters based on GCS load metrics (#1348 ) This adds (experimental) auto-scaling support for Ray clusters based on GCS load metrics. The auto-scaling algorithm is as follows: Based on current (instantaneous) load information, we compute the approximate number of "used workers". This is based on the bottleneck resource, e.g. if 8/8 GPUs are used in a 8-node cluster but all the CPUs are idle, the number of used nodes is still counted as 8. This number can also be fractional. We scale that number by 1 / target_utilization_fraction and round up to determine the target cluster size (subject to the max_workers constraint). The autoscaler control loop takes care of launching new nodes until the target cluster size is met. When a node is idle for more than idle_timeout_minutes, we remove it from the cluster if that would not drop the cluster size below min_workers. Note that we'll need to update the wheel in the example yaml file after this PR is merged.	2017-12-31 14:39:57 -08:00
Devin Petersohn	a75a473d7f	Add a distributed Dataframe API to Ray (#1330 ) * Adding dataframe object and minor APIs * Adding reduce functionality * Adding some print and making reduce work on current Ray * Cleanup * Added new functionality and docs. * Adding more functionality. * New functionality with older cleanup * Complying with flake8 formatting * Added tests and addressed reviewer comments * Complying with flake8. * Adding pandas to travis and requirements doc * Fixing flake8 failures * Fixing flake8 errors from imports * Fixing import error * Fixing import errors * Addressing reviewer comments * Addressing lint error	2017-12-20 09:31:22 -08:00
Eric Liang	47b1f02d3e	[rllib] Pull out multi-gpu optimizer as a generic class (#1313 )	2017-12-17 15:59:57 -08:00
Eric Liang	f5ea44338e	EC2 cluster setup scripts and initial version of auto-scaler (#1311 )	2017-12-15 23:56:39 -08:00
Eric Liang	fbf1806b8a	[tune] Clean up result logging: move out of /tmp, add timestamp (#1297 )	2017-12-15 14:19:08 -08:00
Robert Nishihara	f75b51d178	Register Common.error with local scheduler extension module. (#1316 ) * Register Common.error with local scheduler extension module. * Add test.	2017-12-13 11:55:54 -08:00
Peter Schafhalter	20d6b74aa6	[rllib] Added evaluation script to RLLib (#1295 )	2017-12-11 11:59:44 -08:00
Robert Nishihara	96463c680c	Allow actor methods to return multiple object IDs. (#1296 ) * Allow actor methods to return multiple object IDs. * Add test. * Fixes * Remove outdated comment. * Add comment and assert	2017-12-09 10:37:57 -08:00
Philipp Moritz	26125e1547	Fixing the jenkins tests (#1299 ) * trying to fix jenkins tests * comment out more tests * remove pytorch stuff * use non-monotonic clock (monotonic not supported on python 2.7) * whitespace	2017-12-07 17:03:58 -08:00
Eric Liang	2d543b6e19	[rllib] Refactor DQN to use an Evaluator abstraction (#1276 ) This introduces rllib.Evaluator and rllib.Optimizer classes. Optimizers encapsulate a particular distributed optimization strategy for RL. Evaluators encapsulate the model graph, and once implemented, any Optimizer may be "plugged in" to any algorithm that implements the Evaluator interface.	2017-12-06 17:51:57 -08:00
Robert Nishihara	c21e189371	Allow scheduling with arbitrary user-defined resource labels. (#1236 ) * Enable scheduling with custom resource labels. * Fix. * Minor fixes and ref counting fix. * Linting * Use .data() instead of .c_str(). * Fix linting. * Fix ResourcesTest.testGPUIDs test by waiting for workers to start up. * Sleep in test so that all tasks are submitted before any completes.	2017-12-01 11:41:40 -08:00
Eric Liang	37831ae0c3	Add a nicer warning message when you pass the wrong thing to ray.wait() (#1239 ) * add warnings * fix python mode * Small changes and add tests. * Fix test failure.	2017-11-27 22:57:33 -08:00
Robert Nishihara	2865128df0	Remove counter from run_function_on_all_workers. Also remove utilitie… (#1260 ) * Remove counter from run_function_on_all_workers. Also remove utilities for copying directories across machines. * Fix linting.	2017-11-26 18:29:10 -08:00
Robert Nishihara	0b4961b161	Provide flag for setting redis maxclients. (#1257 ) * Add flag for attempting to increase ulimit -n and the redis maxclients. * Don't bother trying to set ulimit -n. * Fix linting. * Add basic test.	2017-11-26 18:25:55 -08:00
Robert Nishihara	7af5292646	Give error if a worker has a version mismatch for Python Ray, or clou… (#1245 ) * Give error if a worker has a version mismatch for Python Ray, or cloudpickle. * Check version when attaching driver to cluster. * Only do check if the version info is present. * Bug fix. * Fix typo.	2017-11-23 23:31:03 -08:00
Robert Nishihara	477a40f76d	Prohibit returning actor handles and also update actor documentation. (#1246 ) * Prohibit returning actor handles and also update actor documentation. * Clarify documentation.	2017-11-23 09:37:24 -08:00
shane	9af8dc568a	testing with --rm and docker run (#1240 ) Add --rm to docker run for Jenkins tests.	2017-11-22 10:20:04 -08:00
Eric Liang	316f9e2bb7	[tune] Support user-defined trainable functions / classes / envs with a shared object registry (#1226 )	2017-11-20 17:52:43 -08:00
Eric Liang	9233e496cc	Raise exception when getting the task results of workers that died (#1224 ) * wip * with test * add timeout * also add test for f * remove on cleanup * update * wip * fix tests * mark actor removed in redis * clang-format * fix bug when no-inprogress tasks * try to set task status done * Add comment.	2017-11-20 15:18:39 -08:00
Eric Liang	28f1e12940	[rllib] [build-fix] ES iterations get unexpectedly long (#1235 ) * fix very long es * Revert prior change. * Shorten ES jenkins tests.	2017-11-20 14:42:42 -08:00
Robert Nishihara	0eae917766	[rllib] Clean up evolution strategies example. (#1225 ) * Remove ES observation statistics. * Consolidate policy classes. * Remove random stream. * Move rollout function out of policy. * Consolidate policy initialization. * Replace act implementation with sess.run. * Remove tf_utils. * Remove variable scope. * Remove unused imports. * Use regular TF session. * Use MeanStdFilter. * Minor. * Clarify naming. * Update documentation. * eps -> episodes * Report noiseless evaluation runs. * Clean up naming. * Update documentation. * Fix some bugs. * Make it run on atari. * Don't add action noise during evaluation runs. * Add ES to checkpoint/restore test. * Small cleanups and remove redundant calls to get_weights. * Remove outdated comment.	2017-11-16 21:58:30 -08:00
Richard Liaw	eadb998643	[tune] Make HyperBand Usable (#1215 )	2017-11-16 10:31:42 -08:00
Richard Liaw	71f8cd2403	[tune] Fixing up Hyperband (#1207 ) * Fixing up Hyperband * nit * cleanup * Timing test Added * added_exception_back * fixup_tests * reverse placement * fixes_and_tests * fix * fix * fixlint * cleanup_timing * lint * Update hyperband.py	2017-11-12 12:05:32 -08:00
Eric Liang	7c38f964b7	[tune] Add command line support for choosing early stopping schedulers (#1209 ) * command line support * add checkpoint freq * fix other flags * fix * docs * doc	2017-11-12 12:05:18 -08:00
Richard Liaw	afdc87323f	[rllib] PyTorch Models for A3C (#1187 ) * fixing policy * Compute Action is singular, fixed weird issue with arrays * remove vestige * extraneous ipdb * Can Drop in Pytorch Model * lint * introducing models * fix base policy * Missed this from last time * lint * removedolds * getting vision working * LINT * trying to fix test dependencies * requiremnets * try * tryconda * yes * shutup * flake_passes * changes * removing weight initializer for lstm for now * unused * adam * clip * zero * properscaling * weight * try * fix up pytorch visionnet * bias correction * fix model * same visionnet * matching_bad_things * test * try locking * fixing_linear * naming * lint * FORJENKINS * clouds * lint * Lint + removed dependencies * removed dependencies * format	2017-11-12 00:20:33 -08:00
Daniel Suo	4f0da6f81c	Add basic functionality for Cython functions and actors (#1193 ) * Add basic functionality for Cython functions and actors * Fix up per @pcmoritz comments * Fixes per @richardliaw comments * Fixes per @robertnishihara comments * Forgot double quotes when updating masked_log * Remove import typing for Python 2 compatibility	2017-11-09 17:49:06 -08:00
Richard Liaw	6197b260b8	Fix Jenkins issue introduced by Variant Generator (#1194 ) * try fix * shorten * added a flag * finish * Fix linting.	2017-11-09 00:56:20 -08:00
Eric Liang	52888e4c6f	[tune] Improve the tune Python API and variant generation (#1154 ) * new variant gen * wip * Sat Oct 21 18:21:34 PDT 2017 * update * comment * fix * update * update readme * fix * Update README.rst * Update README.rst * fix repeat * update * note on restore	2017-11-06 23:41:17 -08:00
Richard Liaw	6222ec3bd7	[tune] hyperband (#1156 ) * trial scheduler interface * remove * wip median stopping * remove * median stopping rule * update * docs * update * Revrt * update * hyperband untested * small changes before moving on * added endpoints * good changes * init tests * smore tests * unfinished tests * testing * testing code * morbugs * fixes * end * tests and typo * nit * try this * tests * testing * lint * lint * lint * comments and docs * almost screwed up * lint	2017-11-06 22:30:25 -08:00
Eric Liang	d06beacd84	[tune] Implement median stopping rule (#1170 ) * trial scheduler interface * remove * wip median stopping * remove * median stopping rule * update * docs * update * Revrt * update * comments * fix tesT	2017-11-03 11:25:02 -07:00
Robert Nishihara	3317d38278	Replace hostnames with numerical IP addresses in redis address. (#1177 ) * Replace hostnames with numerical IP addresses in redis address. * Also do conversion for node_ip_address. Add test. * Simplifications.	2017-11-01 17:13:22 -07:00
Robert Nishihara	6852e8839e	Expose custom serializers through the API. (#1147 ) * Expose custom serializers through the API. * minor renaming * Add test. * Remove comment. * Clean up assertions.	2017-10-29 00:08:55 -07:00
Richard Liaw	797f4fcbf3	Fixing Lint after flake upgrade (#1162 ) * Fixing Lint after flake upgrade * more lint fixes	2017-10-26 21:02:07 -05:00
Eric Liang	cd9dc398ff	[rllib] Support discrete observation spaces such as FrozenLake-v0 (#1140 ) * add * remove transform_shape * fix test * fix	2017-10-23 23:16:52 -07:00

1 2 3 4 5 ...

318 commits