hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-10 05:16:49 -04:00

Author	SHA1	Message	Date
Eric Liang	0cecf6b79c	[rllib] Cleanup RNN support and make it work with multi-GPU optimizer (#2394 ) Cleanup: TFPolicyGraph now automatically adds loss input entries for state_in_*, so that graph sub-classes don't need to worry about it. Multi-GPU support: Allow setting up model tower replicas with existing state input tensors Truncate the per-device minibatch slices so that they are always a multiple of max_seq_len.	2018-07-17 06:55:46 +02:00
Eric Liang	b316afeb43	[rllib] Add debug info back to PPO and fix optimizer compatibility (#2366 )	2018-07-12 19:22:46 +02:00
Richard Liaw	0048e77093	[rllib] RLlib CLI (#2375 )	2018-07-12 19:12:04 +02:00
Richard Liaw	4d7da9f668	[rllib] Remove "Common", cleanup some code (#2348 )	2018-07-08 13:03:53 -07:00
Eric Liang	8aa56c12e6	[rllib] Document "v2" APIs (#2316 ) * re * wip * wip * a3c working * torch support * pg works * lint * rm v2 * consumer id * clean up pg * clean up more * fix python 2.7 * tf session management * docs * dqn wip * fix compile * dqn * apex runs * up * impotrs * ddpg * quotes * fix tests * fix last r * fix tests * lint * pass checkpoint restore * kwar * nits * policy graph * fix yapf * com * class * pyt * vectorization * update * test cpe * unit test * fix ddpg2 * changes * wip * args * faster test * common * fix * add alg option * batch mode and policy serving * multi serving test * todo * wip * serving test * doc async env * num envs * comments * thread * remove init hook * update * fix ppo * comments1 * fix * updates * add jenkins tests * fix * fix pytorch * fix * fixes * fix a3c policy * fix squeeze * fix trunc on apex * fix squeezing for real * update * remove horizon test for now * multiagent wip * update * fix race condition * fix ma * t * doc * st * wip * example * wip * working * cartpole * wip * batch wip * fix bug * make other_batches None default * working * debug * nit * warn * comments * fix ppo * fix obs filter * update * wip * tf * update * fix * cleanup * cleanup * spacing * model * fix * dqn * fix ddpg * doc * keep names * update * fix * com * docs * clarify model outputs * Update torch_policy_graph.py * fix obs filter * pass thru worker index * fix * rename * vlad torch comments * fix log action * debug name * fix lstm * remove unused ddpg net * remove conv net * revert lstm * wip * wip * cast * wip * works * fix a3c * works * lstm util test * doc * clean up * update * fix lstm check * move to end * fix sphinx * fix cmd * remove bad doc * envs * vec * doc prep * models * rl * alg * up * clarify * copy * async sa * fix * comments * fix a3c conf * tune lstm * fix reshape * fix * back to 16 * tuned a3c update * update * tuned * optional * merge * wip * fix up * move pg class * rename env * wip * update * tip * alg * readme * fix catalog * readme * doc * context * remove prep * comma * add env * link to paper * paper * update * rnn * update * wip * clean up ev creation * fix * fix * fix * fix lint * up * no comma * ma * Update run_multi_node_tests.sh * fix * sphinx is stupid * sphinx is stupid * clarify torch graph * no horizon * fix config * sb * Update test_optimizers.py	2018-07-01 00:05:08 -07:00
Eric Liang	b197c0c404	[rllib] General RNN support (#2299 ) * wip * cls * re * wip * wip * a3c working * torch support * pg works * lint * rm v2 * consumer id * clean up pg * clean up more * fix python 2.7 * tf session management * docs * dqn wip * fix compile * dqn * apex runs * up * impotrs * ddpg * quotes * fix tests * fix last r * fix tests * lint * pass checkpoint restore * kwar * nits * policy graph * fix yapf * com * class * pyt * vectorization * update * test cpe * unit test * fix ddpg2 * changes * wip * args * faster test * common * fix * add alg option * batch mode and policy serving * multi serving test * todo * wip * serving test * doc async env * num envs * comments * thread * remove init hook * update * fix ppo * comments1 * fix * updates * add jenkins tests * fix * fix pytorch * fix * fixes * fix a3c policy * fix squeeze * fix trunc on apex * fix squeezing for real * update * remove horizon test for now * multiagent wip * update * fix race condition * fix ma * t * doc * st * wip * example * wip * working * cartpole * wip * batch wip * fix bug * make other_batches None default * working * debug * nit * warn * comments * fix ppo * fix obs filter * update * wip * tf * update * fix * cleanup * cleanup * spacing * model * fix * dqn * fix ddpg * doc * keep names * update * fix * com * docs * clarify model outputs * Update torch_policy_graph.py * fix obs filter * pass thru worker index * fix * rename * vlad torch comments * fix log action * debug name * fix lstm * remove unused ddpg net * remove conv net * revert lstm * wip * wip * cast * wip * works * fix a3c * works * lstm util test * doc * clean up * update * fix lstm check * move to end * fix sphinx * fix cmd * remove bad doc * clarify * copy * async sa * fix * comments * fix a3c conf * tune lstm * fix reshape * fix * back to 16 * tuned a3c update * update * tuned * optional * fix catalog * remove prep	2018-06-27 22:51:04 -07:00
Eric Liang	1251abf0d1	[rllib] Modularize Torch and TF policy graphs (#2294 ) * wip * cls * re * wip * wip * a3c working * torch support * pg works * lint * rm v2 * consumer id * clean up pg * clean up more * fix python 2.7 * tf session management * docs * dqn wip * fix compile * dqn * apex runs * up * impotrs * ddpg * quotes * fix tests * fix last r * fix tests * lint * pass checkpoint restore * kwar * nits * policy graph * fix yapf * com * class * pyt * vectorization * update * test cpe * unit test * fix ddpg2 * changes * wip * args * faster test * common * fix * add alg option * batch mode and policy serving * multi serving test * todo * wip * serving test * doc async env * num envs * comments * thread * remove init hook * update * fix ppo * comments1 * fix * updates * add jenkins tests * fix * fix pytorch * fix * fixes * fix a3c policy * fix squeeze * fix trunc on apex * fix squeezing for real * update * remove horizon test for now * multiagent wip * update * fix race condition * fix ma * t * doc * st * wip * example * wip * working * cartpole * wip * batch wip * fix bug * make other_batches None default * working * debug * nit * warn * comments * fix ppo * fix obs filter * update * wip * tf * update * fix * cleanup * cleanup * spacing * model * fix * dqn * fix ddpg * doc * keep names * update * fix * com * docs * clarify model outputs * Update torch_policy_graph.py * fix obs filter * pass thru worker index * fix * rename * vlad torch comments * fix log action * debug name * fix lstm * remove unused ddpg net * remove conv net * revert lstm * cast * clean up * fix lstm check * move to end * fix sphinx * fix cmd * remove bad doc * clarify * copy * async sa * fix	2018-06-26 13:17:15 -07:00
Eric Liang	a9a26b7560	[rllib] Part 2 of multiagent support (#2286 ) * wip * cls * re * wip * wip * a3c working * torch support * pg works * lint * rm v2 * consumer id * clean up pg * clean up more * fix python 2.7 * tf session management * docs * dqn wip * fix compile * dqn * apex runs * up * impotrs * ddpg * quotes * fix tests * fix last r * fix tests * lint * pass checkpoint restore * kwar * nits * policy graph * fix yapf * com * class * pyt * vectorization * update * test cpe * unit test * fix ddpg2 * changes * wip * args * faster test * common * fix * add alg option * batch mode and policy serving * multi serving test * todo * wip * serving test * doc async env * num envs * comments * thread * remove init hook * update * fix ppo * comments1 * fix * updates * add jenkins tests * fix * fix pytorch * fix * fixes * fix a3c policy * fix squeeze * fix trunc on apex * fix squeezing for real * update * remove horizon test for now * multiagent wip * update * fix race condition * fix ma * t * doc * st * wip * example * wip * working * cartpole * wip * batch wip * fix bug * make other_batches None default * working * debug * nit * warn * comments * fix ppo * fix obs filter * update * fix obs filter * pass thru worker index * fix * fix log action * debug name * fix sphinx	2018-06-25 22:33:57 -07:00
Eric Liang	e5724a9cfe	[rllib] Add a simple REST policy server and client example (#2232 ) * wip * cls * re * wip * wip * a3c working * torch support * pg works * lint * rm v2 * consumer id * clean up pg * clean up more * fix python 2.7 * tf session management * docs * dqn wip * fix compile * dqn * apex runs * up * impotrs * ddpg * quotes * fix tests * fix last r * fix tests * lint * pass checkpoint restore * kwar * nits * policy graph * fix yapf * com * class * pyt * vectorization * update * test cpe * unit test * fix ddpg2 * changes * wip * args * faster test * common * fix * add alg option * batch mode and policy serving * multi serving test * todo * wip * serving test * doc async env * num envs * comments * thread * remove init hook * update * policy serve * spaces * checkpoint * no train * fix ppo * comments1 * fix * updates * add jenkins tests * fix * fix pytorch * fix * fixes * fix a3c policy * fix squeeze * fix trunc on apex * fix squeezing for real * update * remove horizon test for now * fix race condition * update * com * updat * add test * Update run_multi_node_tests.sh * use curl * curl * kill * Update run_multi_node_tests.sh * Update run_multi_node_tests.sh * fix import * update	2018-06-20 13:22:39 -07:00
Eric Liang	7dee2c6735	[rllib] Envs for vectorized execution, async execution, and policy serving (#2170 ) ## What do these changes do? Vectorized envs: Users can either implement `VectorEnv`, or alternatively set `num_envs=N` to auto-vectorize gym envs (this vectorizes just the action computation part). ``` # CartPole-v0 on single core with 64x64 MLP: # vector_width=1: Actions per second 2720.1284458322966 # vector_width=8: Actions per second 13773.035334888269 # vector_width=64: Actions per second 37903.20472563333 ``` Async envs: The more general form of `VectorEnv` is `AsyncVectorEnv`, which allows agents to execute out of lockstep. We use this as an adapter to support `ServingEnv`. Since we can convert any other form of env to `AsyncVectorEnv`, utils.sampler has been rewritten to run against this interface. Policy serving: This provides an env which is not stepped. Rather, the env executes in its own thread, querying the policy for actions via `self.get_action(obs)`, and reporting results via `self.log_returns(rewards)`. We also support logging of off-policy actions via `self.log_action(obs, action)`. This is a more convenient API for some use cases, and also provides parallelizable support for policy serving (for example, if you start a HTTP server in the env) and ingest of offline logs (if the env reads from serving logs). Any of these types of envs can be passed to RLlib agents. RLlib handles conversions internally in CommonPolicyEvaluator, for example: ``` gym.Env => rllib.VectorEnv => rllib.AsyncVectorEnv rllib.ServingEnv => rllib.AsyncVectorEnv ```	2018-06-18 11:55:32 -07:00
Eric Liang	71eb558eb0	[rllib] Refactor rllib to have a common sample collection pathway (#2149 )	2018-06-09 00:21:35 -07:00
Alok Singh	fd234e3171	[rllib] Fix A3C PyTorch implementation (#2036 ) * Use F.softmax instead of a pointless network layer Stateless functions should not be network layers. * Use correct pytorch functions * Rename argument name to out_size Matches in_size and makes more sense. * Fix shapes of tensors Advantages and rewards both should be scalars, and therefore a list of them should be 1D. * Fmt * replace deprecated function * rm unnecessary Variable wrapper * rm all use of torch Variables Torch does this for us now. * Ensure that values are flat list * Fix shape error in conv nets * fmt * Fix shape errors Reshaping the action before stepping in the env fixes a few errors. * Add TODO * Use correct filter size Works when `self.config['model']['channel_major'] = True`. * Add missing channel major * Revert reshape of action This should be handled by the agent or at least in a cleaner way that doesn't break existing envs. * Squeeze action * Squeeze actions along first dimension This should deal with some cases such as cartpole where actions are scalars while leaving alone cases where actions are arrays (some robotics tasks). * try adding pytorch tests * typo * fixup docker messages * Fix A3C for some envs Pendulum doesn't work since it's an edge case (expects singleton arrays, which `.squeeze()` collapses to scalars). * fmt * nit flake * small lint	2018-05-30 10:48:11 -07:00
Eric Liang	7ab890f4a1	[tune] [rllib] Automatically determine RLlib resources and add queueing mechanism for autoscaling (#1848 )	2018-04-16 16:58:15 -07:00
alvkao58	15a668dd12	[RLLib] DDPG (#1685 )	2018-04-11 15:08:39 -07:00
Richard Liaw	888e70f1be	[tune] HyperOpt Support (v2) (#1763 )	2018-04-04 11:08:26 -07:00
Richard Liaw	9b361115c3	[tune] Added Async HyperBand example (#1709 )	2018-03-16 13:25:29 -07:00
Richard Liaw	78716094b5	[tune] Async Hyperband (#1595 )	2018-03-04 14:05:56 -08:00
Eric Liang	ecb811c26e	[rllib] Ape-X implementation and DQN refactor to handle replay in policy optimizer (#1604 ) * minimal apex checkin * cleanup dqn options * actor utils * Sun Feb 25 17:39:54 PST 2018 * update * compression refactor * fix * add test * fix models * Sun Feb 25 21:46:27 PST 2018 * Wed Feb 28 10:26:34 PST 2018 * Wed Feb 28 10:28:09 PST 2018 * Wed Feb 28 10:42:59 PST 2018 * refactor * Wed Feb 28 11:17:19 PST 2018 * Wed Feb 28 11:42:08 PST 2018 * Wed Feb 28 11:42:13 PST 2018 * Wed Feb 28 11:59:02 PST 2018 * Wed Feb 28 11:59:58 PST 2018 * Wed Feb 28 12:00:08 PST 2018 * Wed Feb 28 12:02:19 PST 2018 * Wed Feb 28 13:44:31 PST 2018 * Wed Feb 28 17:01:20 PST 2018 * Sat Mar 3 14:55:59 PST 2018 * make optimizer construction explicit * Sat Mar 3 18:23:08 PST 2018 * Sat Mar 3 18:24:28 PST 2018 * Sat Mar 3 18:49:28 PST 2018 * Sat Mar 3 18:50:42 PST 2018 * Sat Mar 3 18:56:10 PST 2018	2018-03-04 12:25:25 -08:00
Eric Liang	80d7def9dc	[autoscaler] [tune] More doc fixes (#1560 ) * Fri Feb 16 13:53:50 PST 2018 * Sat Feb 17 15:32:08 PST 2018 * Sat Feb 17 15:44:59 PST 2018 * fix * Sun Feb 18 14:46:24 PST 2018 * Sun Feb 18 14:46:37 PST 2018 * Sun Feb 18 14:55:52 PST 2018 * Sun Feb 18 15:14:32 PST 2018 * Wed Feb 21 17:34:17 PST 2018 * Sun Feb 25 17:51:17 PST 2018 * Sun Feb 25 22:18:40 PST 2018 * Wed Feb 28 13:19:05 PST 2018 * Wed Feb 28 13:22:13 PST 2018 * Wed Feb 28 13:33:29 PST 2018 * Wed Feb 28 13:35:33 PST 2018 * add ex * Fri Mar 2 12:50:17 PST 2018 * Fri Mar 2 12:54:31 PST 2018	2018-03-03 13:01:49 -08:00
Richard Liaw	c2ad800cbf	[rllib] Registry fix for DQN Replay Evaluators (#1593 )	2018-02-25 22:30:11 -08:00
alvkao58	81a4be8f65	[rllib] Added vanilla policy gradient (#1497 )	2018-02-10 13:54:51 -08:00
Eric Liang	b948405532	[tune] clean up population based training prototype (#1478 ) * patch up pbt * Sat Jan 27 01:00:03 PST 2018 * Sat Jan 27 01:04:14 PST 2018 * Sat Jan 27 01:04:21 PST 2018 * Sat Jan 27 01:15:15 PST 2018 * Sat Jan 27 01:15:42 PST 2018 * Sat Jan 27 01:16:14 PST 2018 * Sat Jan 27 01:38:42 PST 2018 * Sat Jan 27 01:39:21 PST 2018 * add pbt * Sat Jan 27 01:41:19 PST 2018 * Sat Jan 27 01:44:21 PST 2018 * Sat Jan 27 01:45:46 PST 2018 * Sat Jan 27 16:54:42 PST 2018 * Sat Jan 27 16:57:53 PST 2018 * clean up test * Sat Jan 27 18:01:15 PST 2018 * Sat Jan 27 18:02:54 PST 2018 * Sat Jan 27 18:11:18 PST 2018 * Sat Jan 27 18:11:55 PST 2018 * Sat Jan 27 18:14:09 PST 2018 * review * try out a ppo example * some tweaks to ppo example * add postprocess hook * Sun Jan 28 15:00:40 PST 2018 * clean up custom explore fn * Sun Jan 28 15:10:21 PST 2018 * Sun Jan 28 15:14:53 PST 2018 * Sun Jan 28 15:17:04 PST 2018 * Sun Jan 28 15:33:13 PST 2018 * Sun Jan 28 15:56:40 PST 2018 * Sun Jan 28 15:57:36 PST 2018 * Sun Jan 28 16:00:35 PST 2018 * Sun Jan 28 16:02:58 PST 2018 * Sun Jan 28 16:29:50 PST 2018 * Sun Jan 28 16:30:36 PST 2018 * Sun Jan 28 16:31:44 PST 2018 * improve tune doc * concepts * update humanoid * Fri Feb 2 18:03:33 PST 2018 * fix example * show error file	2018-02-02 23:03:12 -08:00
Eric Liang	1d2a28ab07	[rllib] test all combinations of {obs_space} x {action_space} (#1449 )	2018-01-24 11:03:43 -08:00
eugenevinitsky	37076a9ff8	Multiagent model using concatenated observations (#1416 ) * working multi action distribution and multiagent model * currently working but the splits arent done in the right place * added shared models * added categorical support and mountain car example * now compatible with generalized advantage estimation * working multiagent code with discrete and continuous example * moved reshaper to utils * code review changes made, ppo action placeholder moved to model catalog, all multiagent code moved out of fcnet * added examples in * added PEP8 compliance * examples are mostly pep8 compliant * removed all flake errors * added examples to jenkins tests * fixed custom options bug * added lines to let docker file find multiagent tests * shortened example run length * corrected nits * fixed flake errors	2018-01-18 19:51:31 -08:00
Eric Liang	47b1f02d3e	[rllib] Pull out multi-gpu optimizer as a generic class (#1313 )	2017-12-17 15:59:57 -08:00
Peter Schafhalter	20d6b74aa6	[rllib] Added evaluation script to RLLib (#1295 )	2017-12-11 11:59:44 -08:00
Philipp Moritz	26125e1547	Fixing the jenkins tests (#1299 ) * trying to fix jenkins tests * comment out more tests * remove pytorch stuff * use non-monotonic clock (monotonic not supported on python 2.7) * whitespace	2017-12-07 17:03:58 -08:00
Eric Liang	2d543b6e19	[rllib] Refactor DQN to use an Evaluator abstraction (#1276 ) This introduces rllib.Evaluator and rllib.Optimizer classes. Optimizers encapsulate a particular distributed optimization strategy for RL. Evaluators encapsulate the model graph, and once implemented, any Optimizer may be "plugged in" to any algorithm that implements the Evaluator interface.	2017-12-06 17:51:57 -08:00
shane	9af8dc568a	testing with --rm and docker run (#1240 ) Add --rm to docker run for Jenkins tests.	2017-11-22 10:20:04 -08:00
Eric Liang	316f9e2bb7	[tune] Support user-defined trainable functions / classes / envs with a shared object registry (#1226 )	2017-11-20 17:52:43 -08:00
Eric Liang	28f1e12940	[rllib] [build-fix] ES iterations get unexpectedly long (#1235 ) * fix very long es * Revert prior change. * Shorten ES jenkins tests.	2017-11-20 14:42:42 -08:00
Robert Nishihara	0eae917766	[rllib] Clean up evolution strategies example. (#1225 ) * Remove ES observation statistics. * Consolidate policy classes. * Remove random stream. * Move rollout function out of policy. * Consolidate policy initialization. * Replace act implementation with sess.run. * Remove tf_utils. * Remove variable scope. * Remove unused imports. * Use regular TF session. * Use MeanStdFilter. * Minor. * Clarify naming. * Update documentation. * eps -> episodes * Report noiseless evaluation runs. * Clean up naming. * Update documentation. * Fix some bugs. * Make it run on atari. * Don't add action noise during evaluation runs. * Add ES to checkpoint/restore test. * Small cleanups and remove redundant calls to get_weights. * Remove outdated comment.	2017-11-16 21:58:30 -08:00
Richard Liaw	afdc87323f	[rllib] PyTorch Models for A3C (#1187 ) * fixing policy * Compute Action is singular, fixed weird issue with arrays * remove vestige * extraneous ipdb * Can Drop in Pytorch Model * lint * introducing models * fix base policy * Missed this from last time * lint * removedolds * getting vision working * LINT * trying to fix test dependencies * requiremnets * try * tryconda * yes * shutup * flake_passes * changes * removing weight initializer for lstm for now * unused * adam * clip * zero * properscaling * weight * try * fix up pytorch visionnet * bias correction * fix model * same visionnet * matching_bad_things * test * try locking * fixing_linear * naming * lint * FORJENKINS * clouds * lint * Lint + removed dependencies * removed dependencies * format	2017-11-12 00:20:33 -08:00
Richard Liaw	6197b260b8	Fix Jenkins issue introduced by Variant Generator (#1194 ) * try fix * shorten * added a flag * finish * Fix linting.	2017-11-09 00:56:20 -08:00
Eric Liang	cd9dc398ff	[rllib] Support discrete observation spaces such as FrozenLake-v0 (#1140 ) * add * remove transform_shape * fix test * fix	2017-10-23 23:16:52 -07:00
Eric Liang	5a50e0e1d7	[rllib] Add the ability to run arbitrary Python scripts with ray.tune (#1132 ) * fix yaml bug * add ext agent * gpus * update * tuning * docs * Sun Oct 15 21:09:25 PDT 2017 * lint * update * Sun Oct 15 22:39:55 PDT 2017 * Sun Oct 15 22:40:17 PDT 2017 * Sun Oct 15 22:43:06 PDT 2017 * Sun Oct 15 22:46:06 PDT 2017 * Sun Oct 15 22:46:21 PDT 2017 * Sun Oct 15 22:48:11 PDT 2017 * Sun Oct 15 22:48:44 PDT 2017 * Sun Oct 15 22:49:23 PDT 2017 * Sun Oct 15 22:50:21 PDT 2017 * Sun Oct 15 22:53:00 PDT 2017 * Sun Oct 15 22:53:34 PDT 2017 * Sun Oct 15 22:54:33 PDT 2017 * Sun Oct 15 22:54:50 PDT 2017 * Sun Oct 15 22:55:20 PDT 2017 * Sun Oct 15 22:56:56 PDT 2017 * Sun Oct 15 22:59:03 PDT 2017 * fix * Update tune_mnist_ray.py * remove script trial * fix * reorder * fix ex * py2 support * upd * comments * comments * cleanup readme * fix trial * annotate * Update rllib.rst	2017-10-18 11:49:28 -07:00
Eric Liang	802941994d	[rllib] Use RLlib preprocessors in DQN (fixes PongDeterministic-v4) (#1124 ) * fix pong * rename * update	2017-10-14 20:16:36 -07:00
Eric Liang	79ea205b3e	[rllib] Initial work on integrating hyperparameter search tool (#1107 ) * clean up train * update * update train script * add tuned examples * add agent catalog * add tune lib * update * fix * testS * remove * train docs * comments * todo * fix resource parsing * fix cr test * add test * try to fix travis test	2017-10-13 16:18:16 -07:00
Eric Liang	6ecc899cf2	[rllib] Fix DQN checkpoint/restore and enable test in jenkins (#1063 ) * fix dqn restore and add test * Update .gitignore * Update test_checkpoint_restore.py * add checkpoint restore	2017-10-03 23:17:54 -07:00
Richard Liaw	cb6dea94bc	[rllib] Fix Preprocessor for ATARI (#1066 ) * Removing squeeze, fix atari preprocessing * nit comment * comments * jenkins * Lint	2017-10-03 18:45:02 -07:00
Richard Liaw	16e82b43d1	[rllib] Changes for preprocessors (#1033 ) * Changes for preprocessors * removed comments * Changes + push for lint * linted * adding dependency for travis * linting won't pass * reordering * needed for testing * added comments * pip it * pip dependencies	2017-09-30 13:11:20 -07:00
Eric Liang	9f3a4fce50	[rllib] Parallelize sample collection and gradient computation in DQN (#746 ) * wip * works with cartpole * lint * fix pg * comment * action dist rename * preprocessor * fix test * typo * fix the action[0] nonsense * revert * satisfy the lint * wip * wip * works with cartpole * lint * fix pg * comment * action dist rename * preprocessor * fix test * typo * fix the action[0] nonsense * revert * satisfy the lint * Minor indentation changes. * fix merge * add humanoid * initial dqn refactor * remove tfutil * fix calls * fix tf errors 1 * closer * runs now * lint * tensorboard graph * fix linting * more 4 space * fix * fix linT * more lint * oops * es parity * remove example.py * fix training bug * add cartpole demo * try fixing cartpole * allow model options, configure cartpole * debug * simplify * no dueling * avoid out of file handles * Test dqn in jenkins. * Minor formatting. * lint * fix py3 * fix issue * remove chekcpoint * revert * Fixit * sanity check configs * update cuda * fix * parallel gradient computation * update * upd * bug * upd * always record training stats * fix * comments * revert assert * add gpu mask * fofset * a tie * Merge * fix * fix * fix examples * A3C -> DQN * fix dqn test * remove submodule * fix linting	2017-09-29 00:06:51 -07:00
Eric Liang	e17412a72b	fix free log std param (#964 )	2017-09-11 18:52:48 -07:00
Eric Liang	1ebfe9608f	[rllib] Add downscale and frameskip options for Montezumas (#908 ) * up * update * fix * update * update * update * api break * Update run_multi_node_tests.sh * fix	2017-09-02 17:20:56 -07:00
Philipp Moritz	164a8f368e	[rllib] Rename algorithms (#890 ) * rename algorithms * fix * fix jenkins test * fix documentation * fix	2017-08-29 16:56:42 -07:00
Robert Nishihara	e1831792f8	For PPO, rename num_agents -> num_workers. (#882 )	2017-08-28 23:11:06 -07:00
Philipp Moritz	791bee343f	[rllib] Implement GAE for PPO (#849 ) * make information available for GAE * buggy version of GAE estimator * fix * add more logging and reweight losses * fix logging * fix loss * adapt advantage calculation * update gae * standardize returns * don't normalize td lambda ret * fix * don't standardize advantages * do standardization earlier * different standardization * initializer * drop into the debugger * fix tensorflow broadcasting bug * vf clipping * don't standardize tdlambdaret * different standardization * use huber loss for value function * refactor -- first half * it runs * fix * update * documentation * linting and tests * fix linting * naming * fix * linting * fix * remove prefix madness * fixes * fix * add value function example * fix linting * remove newline	2017-08-23 20:35:47 -07:00
Philipp Moritz	862e56000b	[rllib] Unify RLLib examples and add jenkins test for policy gradients (#815 ) * add jenkins test * correct handling of the number of iterations * convert policy gradient and evolution strategies script * convert DQN * fix A3C * fix * fix * fixes * remove redundant A3C example	2017-08-07 19:05:48 -07:00
Eric Liang	b6a18cb39b	[rllib] Also refactor DQN to use shared RLlib models (#730 ) * wip * works with cartpole * lint * fix pg * comment * action dist rename * preprocessor * fix test * typo * fix the action[0] nonsense * revert * satisfy the lint * wip * works with cartpole * lint * fix pg * comment * action dist rename * preprocessor * fix test * typo * fix the action[0] nonsense * revert * satisfy the lint * Minor indentation changes. * fix merge * add humanoid * initial dqn refactor * remove tfutil * fix calls * fix tf errors 1 * closer * runs now * lint * tensorboard graph * fix linting * more 4 space * fix * fix linT * more lint * oops * es parity * remove example.py * fix training bug * add cartpole demo * try fixing cartpole * allow model options, configure cartpole * debug * simplify * no dueling * avoid out of file handles * Test dqn in jenkins. * Minor formatting. * fix issue * fix another * Fix problem in which we log to a directory that hasn't been created.	2017-07-26 12:29:00 -07:00
Robert Nishihara	80e8426b5e	Test example applications and rllib in jenkins tests. (#707 ) * Test example applications in Jenkins. * Fix default upload_dir argument for Algorithm class. * Fix evolution strategies. * Comment out policy gradient example which doesn't seem to work. * Set --env-name for evolution strategies.	2017-07-16 18:51:33 +00:00

1 2 3

107 commits