hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Eric Liang	7ab890f4a1	[tune] [rllib] Automatically determine RLlib resources and add queueing mechanism for autoscaling (#1848 )	2018-04-16 16:58:15 -07:00
Robert Nishihara	7792032ee3	Fix UI issue for non-json-serializable task arguments. (#1892 ) * Fix UI issue for non-json-serializable task arguments. * Simplify approach.	2018-04-15 13:54:42 -07:00
alvkao58	15a668dd12	[RLLib] DDPG (#1685 )	2018-04-11 15:08:39 -07:00
Philipp Moritz	74162d1492	Lint Python files with Yapf (#1872 )	2018-04-11 10:11:35 -07:00
Robert Nishihara	256389dc59	Use new task spec for computing IDs in raylet code path. (#1830 ) * Use new task spec for computing IDs in raylet code path. * Fix linting. * Fixes * Fix test.	2018-04-08 13:31:55 -07:00
Stephanie Wang	bf194db4bc	[xray] Basic actor support (#1835 )	2018-04-06 00:17:14 -07:00
Robert Nishihara	5bde5e75e7	Implement unsafe method for flushing entire object table and task table. (#1824 ) * Implement unsafe method for flushing entire object table and task table. * Add test. * Fix test.	2018-04-04 18:29:24 -07:00
Richard Liaw	888e70f1be	[tune] HyperOpt Support (v2) (#1763 )	2018-04-04 11:08:26 -07:00
Robert Nishihara	fbfbb1c079	[xray] Integrate worker.py with raylet. (#1810 ) * Integrate worker with raylet. * Begin allowing worker to attach to cluster. * Fix linting and documentation. * Fix linting. * Comment tests back in. * Fix type of worker command. * Remove xray python files and tests. * Fix from rebase. * Add test. * Copy over raylet executable. * Small cleanup.	2018-04-03 02:38:56 -07:00
Robert Nishihara	0fc989c6c1	Don't use 127.0.0.1 for local ip address. (#1596 ) * Don't use 127.0.0.1 for ip address. * Update test	2018-04-02 00:34:20 -07:00
Robert Nishihara	0c835a379f	Fix resource bookkeeping for blocked actor methods. (#1766 )	2018-03-21 20:48:04 -07:00
Robert Nishihara	c6ad71fc9d	Fix bug when connecting another driver in local case. (#1760 ) * Allow connecting another driver when using ip address 127.0.0.1. * Add test.	2018-03-21 11:49:53 -07:00
Robert Nishihara	4bccabd910	Redirect output of all processes by default. (#1752 ) * Redirect output of all processes by default. * Add separate flag for redirecting worker output. * Fix tests.	2018-03-20 18:14:54 -07:00
Robert Nishihara	2922e1c388	Add API for getting total cluster resources. (#1736 ) * Add API for getting total cluster resources. * Add test.	2018-03-20 15:57:00 -07:00
Robert Nishihara	4658d0a180	Print error when actor takes too long to start, and refactor error me… (#1747 ) * Print error when actor takes too long to start, and refactor error message pushing. * Print warning every ten seconds. * Fix linting and tests. * Fix tests.	2018-03-19 20:24:35 -07:00
Robert Nishihara	73bb149c8a	Remove unnecessary file. (#1742 )	2018-03-19 19:36:18 -07:00
Robert Nishihara	d78de0d41f	Provide experimental API for changing number of return values and res… (#1735 ) * Provide experimental API for changing number of return values and resource requirements at task submission time. * Remove code duplication and add tests.	2018-03-19 15:32:23 -07:00
Philipp Moritz	7b493aa4a1	Register credis with redis (#1730 )	2018-03-18 14:02:19 -07:00
Christian Barra	070e27ea7a	Add external module as a node scaler. (#1703 ) * WIP: add external module as a node scaler. * Fix style. * Add tests, fix style issues. * Fix typos. * Fix test error. * Fix node provider path. * Add function to spli pkg from class. * Add doc. * Correct documentation. * Debugging.... * Debugging.... * Add __init__.py to tests. * add more output for debugging * Add more test, fix error with import. * Add a small detail to the documentation. * Update autoscaler.py	2018-03-17 16:59:13 -07:00
Richard Liaw	9b361115c3	[tune] Added Async HyperBand example (#1709 )	2018-03-16 13:25:29 -07:00
Robert Nishihara	96913be939	Treat actor creation like a regular task. (#1668 ) * Treat actor creation like a regular task. * Small cleanups. * Change semantics of actor resource handling. * Bug fix. * Minor linting * Bug fix * Fix jenkins test. * Fix actor tests * Some cleanups * Bug fix * Fix bug. * Remove cached actor tasks when a driver is removed. * Add more info to taskspec in global state API. * Fix cyclic import bug in tune. * Fix * Fix linting. * Fix linting. * Don't schedule any tasks (especially actor creaiton tasks) on local schedulers with 0 CPUs. * Bug fix. * Add test for 0 CPU case * Fix linting * Address comments. * Fix typos and add comment. * Add assertion and fix test.	2018-03-16 11:18:07 -07:00
Philipp Moritz	a9acfab3a6	Start chain replicated GCS with Ray (#1538 )	2018-03-07 10:18:58 -08:00
Richard Liaw	162d063f0d	[autoscaler/tune] Optional YAML Fields + Fix Pretty Printing for Tune (#1541 )	2018-03-04 23:35:58 -08:00
Richard Liaw	78716094b5	[tune] Async Hyperband (#1595 )	2018-03-04 14:05:56 -08:00
Eric Liang	ecb811c26e	[rllib] Ape-X implementation and DQN refactor to handle replay in policy optimizer (#1604 ) * minimal apex checkin * cleanup dqn options * actor utils * Sun Feb 25 17:39:54 PST 2018 * update * compression refactor * fix * add test * fix models * Sun Feb 25 21:46:27 PST 2018 * Wed Feb 28 10:26:34 PST 2018 * Wed Feb 28 10:28:09 PST 2018 * Wed Feb 28 10:42:59 PST 2018 * refactor * Wed Feb 28 11:17:19 PST 2018 * Wed Feb 28 11:42:08 PST 2018 * Wed Feb 28 11:42:13 PST 2018 * Wed Feb 28 11:59:02 PST 2018 * Wed Feb 28 11:59:58 PST 2018 * Wed Feb 28 12:00:08 PST 2018 * Wed Feb 28 12:02:19 PST 2018 * Wed Feb 28 13:44:31 PST 2018 * Wed Feb 28 17:01:20 PST 2018 * Sat Mar 3 14:55:59 PST 2018 * make optimizer construction explicit * Sat Mar 3 18:23:08 PST 2018 * Sat Mar 3 18:24:28 PST 2018 * Sat Mar 3 18:49:28 PST 2018 * Sat Mar 3 18:50:42 PST 2018 * Sat Mar 3 18:56:10 PST 2018	2018-03-04 12:25:25 -08:00
Eric Liang	80d7def9dc	[autoscaler] [tune] More doc fixes (#1560 ) * Fri Feb 16 13:53:50 PST 2018 * Sat Feb 17 15:32:08 PST 2018 * Sat Feb 17 15:44:59 PST 2018 * fix * Sun Feb 18 14:46:24 PST 2018 * Sun Feb 18 14:46:37 PST 2018 * Sun Feb 18 14:55:52 PST 2018 * Sun Feb 18 15:14:32 PST 2018 * Wed Feb 21 17:34:17 PST 2018 * Sun Feb 25 17:51:17 PST 2018 * Sun Feb 25 22:18:40 PST 2018 * Wed Feb 28 13:19:05 PST 2018 * Wed Feb 28 13:22:13 PST 2018 * Wed Feb 28 13:33:29 PST 2018 * Wed Feb 28 13:35:33 PST 2018 * add ex * Fri Mar 2 12:50:17 PST 2018 * Fri Mar 2 12:54:31 PST 2018	2018-03-03 13:01:49 -08:00
Richard Liaw	c2ad800cbf	[rllib] Registry fix for DQN Replay Evaluators (#1593 )	2018-02-25 22:30:11 -08:00
Robert Nishihara	330159d8bd	Allow setting redis shard ports through ray start (also object store memory). (#1581 ) * Allow passing in --object-store-memory to ray start. * Allow setting ports for the redis shards. * Reorder arguments and infer number of shards from ports. * Move code block into only the head node case. * Add test.	2018-02-22 11:05:37 -08:00
Richard Liaw	1cd2703cac	[autoscaler] Docker Support (#1505 )	2018-02-20 00:24:01 -08:00
Alexey Tumanov	844a6afcdd	Implement simple random spillback policy. (#1493 ) * spillback policy implementation: global + local scheduler * modernize global scheduler policy state; factor out random number engine and generator * Minimal version. * Fix test. * Make load balancing test less strenuous.	2018-02-13 00:09:35 -08:00
William Paul	f2b6a7b58d	Polished TensorFlowVariables code and documentation (#566 )	2018-02-12 15:38:58 -08:00
alvkao58	81a4be8f65	[rllib] Added vanilla policy gradient (#1497 )	2018-02-10 13:54:51 -08:00
Stephanie Wang	ff8e7f8259	Actor checkpointing for distributed actor handles (#1498 ) * Expose calls to get and set the actor frontier * Remove fields used for old checkpointing prototype, change actor_checkpoint_failed -> succeeded * Prototype for actor checkpointing * Filter out duplicate tasks on the local scheduler * Clean up some of the Python checkpointing code * More cleanups * Documentation * cleanup and fix unit test * Allow remote checkpoint calls through actor handle * Check whether object is local before reconstructing * Enable checkpointing for distributed actor handles, refactor tests * Fix local scheduler tests * lint * Address comments * lint * Skip tests that fail on new GCS * style * Don't put same object twice when setting the actor frontier * Address Philipp's comments, cleaner fbs naming	2018-02-07 11:19:32 -08:00
Eric Liang	b948405532	[tune] clean up population based training prototype (#1478 ) * patch up pbt * Sat Jan 27 01:00:03 PST 2018 * Sat Jan 27 01:04:14 PST 2018 * Sat Jan 27 01:04:21 PST 2018 * Sat Jan 27 01:15:15 PST 2018 * Sat Jan 27 01:15:42 PST 2018 * Sat Jan 27 01:16:14 PST 2018 * Sat Jan 27 01:38:42 PST 2018 * Sat Jan 27 01:39:21 PST 2018 * add pbt * Sat Jan 27 01:41:19 PST 2018 * Sat Jan 27 01:44:21 PST 2018 * Sat Jan 27 01:45:46 PST 2018 * Sat Jan 27 16:54:42 PST 2018 * Sat Jan 27 16:57:53 PST 2018 * clean up test * Sat Jan 27 18:01:15 PST 2018 * Sat Jan 27 18:02:54 PST 2018 * Sat Jan 27 18:11:18 PST 2018 * Sat Jan 27 18:11:55 PST 2018 * Sat Jan 27 18:14:09 PST 2018 * review * try out a ppo example * some tweaks to ppo example * add postprocess hook * Sun Jan 28 15:00:40 PST 2018 * clean up custom explore fn * Sun Jan 28 15:10:21 PST 2018 * Sun Jan 28 15:14:53 PST 2018 * Sun Jan 28 15:17:04 PST 2018 * Sun Jan 28 15:33:13 PST 2018 * Sun Jan 28 15:56:40 PST 2018 * Sun Jan 28 15:57:36 PST 2018 * Sun Jan 28 16:00:35 PST 2018 * Sun Jan 28 16:02:58 PST 2018 * Sun Jan 28 16:29:50 PST 2018 * Sun Jan 28 16:30:36 PST 2018 * Sun Jan 28 16:31:44 PST 2018 * improve tune doc * concepts * update humanoid * Fri Feb 2 18:03:33 PST 2018 * fix example * show error file	2018-02-02 23:03:12 -08:00
Robert Nishihara	ed77a4c415	Make ray.get_gpu_ids() respect existing CUDA_VISIBLE_DEVICES. (#1499 ) * Make ray.get_gpu_ids() respect existing CUDA_VISIBLE_DEVICES. * Comment out failing GPUID check. * Add import. * Fix test. * Remove test. * Factor out environment variable setting/getting into utils.	2018-02-01 21:29:14 -08:00
Philipp Moritz	a3f8fa426b	Start integrating new GCS APIs (#1379 ) * Start integrating new GCS calls * fixes * tests * cleanup * cleanup and valgrind fix * update tests * fix valgrind * fix more valgrind * fixes * add separate tests for GCS * fix linting * update tests * cleanup * fix python linting * more fixes * fix linting * add plasma manager callback * add some documentation * fix linting * fix linting * fixes * update * fix linting * fix * add spillback count * fixes * linting * fixes * fix linting * fix * fix * fix	2018-01-31 11:01:12 -08:00
Robert Nishihara	4c6dae5517	Raise an exception in Jenkins tests after a timeout. (#1477 )	2018-01-27 20:21:27 -08:00
Robert Nishihara	3195c6aa63	Fix local scheduler crash when driver creates actor and exits. (#1474 ) * Make check failures in redis.cc more informative. * Fix bug by calling task_table_add_task. * Add test.	2018-01-26 14:29:53 -08:00
Kaahan	7aa979a024	[tune] Added Population Based Training (#1355 ) Adds a Population-Based Training (as described in https://arxiv.org/abs/1711.09846) scheduler to Ray.tune. Currently mutates hyperparameters according to either a user-defined list of possible values to mutate to (necessary if hyperparameters can only be certain values ex. sgd_batch_size), or by a factor of 0.8 or 1.2.	2018-01-25 21:38:37 -08:00
Richard Liaw	e5c4d9ea0c	[tune] Fix Trial Logging File name (#1466 )	2018-01-25 17:57:40 -08:00
Robert Nishihara	ab5d4a6010	Bring cloudpickle inside the repository. (#1445 ) * Bring cloudpickle version 0.5.2 inside the repo. * Use internal copy of cloudpickle everywhere. * Fix linting. * Import ordering. * Change __init__.py. * Set pickler in serialization context. * Don't check ray location.	2018-01-25 11:36:37 -08:00
Eric Liang	173f1d629a	[tune] Ray Tune API cleanup (#1454 ) Remove rllib dep: trainable is now a standalone abstract class that can be easily subclassed. Clean up hyperband: fix debug string and add an example. Remove YAML api / ScriptRunner: this was never really used. Move ray.init() out of run_experiments(): This provides greater flexibility and should be less confusing since there isn't an implicit init() done there. Note that this is a breaking API change for tune.	2018-01-24 16:55:17 -08:00
Richard Liaw	a7d544424c	[tune] Experiment Management API (#1328 ) * init for exposing external interface * revisions * http server * small * simplify * ui * fixes * test * nit * nit * merge * untested * nits * nit * init tests * tests * more tests * nit * fix hyperband * cleanup * nits * good stuff * cleanup * comments and need to test * nit * notebook * testing * test and expose server * server_tests * docs * periods * fix tests * committing test * fi	2018-01-24 13:45:10 -08:00
Eric Liang	1d2a28ab07	[rllib] test all combinations of {obs_space} x {action_space} (#1449 )	2018-01-24 11:03:43 -08:00
Robert Nishihara	f32c0c8ec1	Move calls to ray.worker.cleanup into tearDown part of tests for isolation. (#1433 )	2018-01-22 22:54:56 -08:00
Devin Petersohn	4aca016bff	Adding series and a way to validate our API. (#1435 ) * Adding series and a way to validate our API. * Moving partitions into protected status	2018-01-21 19:20:54 -08:00
Stephanie Wang	74718efa73	Nondeterministic reconstruction for actors (#1344 ) * Add failing unit test for nondeterministic reconstruction * Retry scheduling actor tasks if reassigned to local scheduler * Update execution edges asynchronously upon dispatch for nondeterministic reconstruction * Fix bug for updating checkpoint task execution dependencies * Update comments for deterministic reconstruction * cleanup * Add (and skip) failing test case for nondeterministic reconstruction * Suppress test output	2018-01-21 13:44:13 -08:00
eugenevinitsky	37076a9ff8	Multiagent model using concatenated observations (#1416 ) * working multi action distribution and multiagent model * currently working but the splits arent done in the right place * added shared models * added categorical support and mountain car example * now compatible with generalized advantage estimation * working multiagent code with discrete and continuous example * moved reshaper to utils * code review changes made, ppo action placeholder moved to model catalog, all multiagent code moved out of fcnet * added examples in * added PEP8 compliance * examples are mostly pep8 compliant * removed all flake errors * added examples to jenkins tests * fixed custom options bug * added lines to let docker file find multiagent tests * shortened example run length * corrected nits * fixed flake errors	2018-01-18 19:51:31 -08:00
Richard Liaw	d4592382a4	[tune][minor] Fixes (#1383 )	2018-01-11 18:14:20 -08:00
Philipp Moritz	44792530a9	fix autoscaler test (#1411 )	2018-01-10 13:18:34 -08:00

1 2 3 4 5 ...

353 commits