hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Crystal	8fc7dc3ed4	Change Python examples in documentation to use 4 space indentation. (#736 ) * Ray doc - changed python indentation to 4 spaces in documentation files actors.rst, api.rst, and example-.rst Ray documentation - changed Python to 4 space indentation for files install-.rst, installation-troubleshooting.rst, internals-overview.rst, serialization.rst, troubleshootin.rst, tutorial.rst, using-ray-.rst	2017-07-16 22:19:33 -07:00
Eric Liang	86a7909149	make es worker count independent (#740 )	2017-07-16 16:23:56 -07:00
Robert Nishihara	80e8426b5e	Test example applications and rllib in jenkins tests. (#707 ) * Test example applications in Jenkins. * Fix default upload_dir argument for Algorithm class. * Fix evolution strategies. * Comment out policy gradient example which doesn't seem to work. * Set --env-name for evolution strategies.	2017-07-16 18:51:33 +00:00
Robert Nishihara	4349f1f966	Fix broken links in example documentation. (#732 )	2017-07-14 20:31:53 +00:00
Robert Nishihara	e0867c8845	Switch Python indentation from 2 spaces to 4 spaces. (#726 ) * 4 space indentation for actor.py. * 4 space indentation for worker.py. * 4 space indentation for more files. * 4 space indentation for some test files. * Check indentation in Travis. * 4 space indentation for some rl files. * Fix failure test. * Fix multi_node_test. * 4 space indentation for more files. * 4 space indentation for remaining files. * Fixes.	2017-07-13 21:53:57 +00:00
Robert Nishihara	310ba82131	Use miniconda for all travis tests. (#728 ) * Use miniconda for all travis tests. * Fix. * Fix.	2017-07-13 16:23:04 +00:00
Philipp Moritz	c24c07613c	[rllib] unify writing performance metrics and make it queryable (#708 ) * write config to s3 * add train file * write performance to S3 * writing needs to be fixed, replacing result.json at the moment * update * add experiment_id * more logging and example queries * update * add info * fill in other algorithms * fix linting * convert readme to rst * fixes * simplejson -> json * make files executable * edit README.rst * unify storing logs in S3 and on local filesystem * use 'info' entry in TrainingResult for algorithm specific info * don't install smart_open with ray * fixes * linting fixes	2017-07-11 01:36:14 +02:00
alanamarzoev	8464d77c76	Change event logs to store one Redis ZSET per worker. (#705 ) * Changing to zset * Fixed bug. * Fixed another bug. * Modified task_profiles. * Removed extra file. * Modified task_profiles test. * WIP * WIP * Undid changes * Updated * WIP * Made changes according to comments. * Removed unneeded print. * Removed ujson usage. * failing test * tests passing * Fixed linting errors and modified style. * Fixed bug. * Fixed linting * Fixed according to comments. * Redis crashing? * Fixed linting * Fixed linting	2017-07-09 01:42:29 +02:00
Eric Liang	cd12ea7e09	[rllib] Pull out the GPU-parallel optimizer from policy gradients to common (#711 ) * refactor * docs * cleanup * clean up more * Update parallel.py * add imports from future	2017-07-07 22:20:02 +00:00
Robert Nishihara	5b3d0c00f2	Create /tmp/ray directory in services.py. (#715 )	2017-07-07 18:41:56 +00:00
Eric Liang	f012e597c2	[rllib] Basic port of baselines/deepq to rllib (#709 ) * rllib v0 * fix imports * lint * comments * update docs * a3c wip * a3c wip * report stats * update doc * add common logdir attr * name is too long * fix small bug * propagate exception on error * fetch metrics * initial port * fix lint * add right license * port to common alg format * fix lint * rename dqn * add imports from future * fix lint	2017-07-07 18:37:00 +00:00
Robert Nishihara	6c45657280	Reset the SIGCHLD handler after forking a worker to avoid influencing the worker. (#713 )	2017-07-07 14:50:37 +00:00
Eric Liang	66734847bb	[rllib] Standardize writing output logs and other files to /tmp/ray (#706 ) * rllib v0 * fix imports * lint * comments * update docs * a3c wip * a3c wip * report stats * update doc * add common logdir attr * name is too long * fix small bug * propagate exception on error * fetch metrics * fix small nits	2017-07-03 16:01:47 +00:00
alanamarzoev	2b11a7bca2	Add task ID and object ID search boxes to web UI. (#704 ) * Task search box. * Cleaned up. * Small reformatting. * Add object table search box.	2017-07-01 17:48:23 -04:00
alanamarzoev	716469160e	Enable dumping profiling information to timeline format viewable by chrome tracing. (#703 ) * Chrome tracing timeline. * Modified decode statement. * Some cleanups and add test. * Remove example. * Fix test.	2017-06-30 12:14:11 -04:00
Eric Liang	2d81edfcdc	[rllib] Move a3c implementation from examples/ to python/ray/rllib/ (#698 ) * rllib v0 * fix imports * lint * comments * update docs * a3c wip * a3c wip * report stats * update doc * name is too long * fix small bug * propagate exception on error * fetch metrics * fix lint	2017-06-29 15:49:56 +00:00
Robert Nishihara	efce49cfbc	Bump version to 0.1.2 in preparation for uploading wheels to PyPI. (#700 )	2017-06-27 04:35:42 +00:00
Robert Nishihara	1941e0f7b1	Fix compilation on CentOS. (#699 )	2017-06-26 05:54:21 +00:00
Robert Nishihara	0926550661	Remove -mtune and -march compiler flags. (#697 )	2017-06-26 05:52:45 +00:00
Eric Liang	a674ec958c	[rllib] Move policy gradient and evolution strategies algorithms from examples/ to ray/rllib/ (#694 ) * rllib v0 * fix imports * lint * comments * update docs	2017-06-25 22:13:03 +00:00
Robert Nishihara	8bc9c275fa	Increase the number of log file names and handle errors better in log monitor. (#693 )	2017-06-25 05:20:50 +00:00
Robert Nishihara	ad480f8165	Don't reconstruct all objects in every fetch request in local scheduler. (#686 ) * Don't reconstruct all objects in every fetch request in local scheduler. * Separate out fetch timer and reconstruction timer. * Fix bug. * Bug fix. * Fix naming convention for global variables. * Address comments. * Make reconstruct_counter a static variable. * Fix linting. * Redo reconstruct handler using a set of objects to fetch. * Fix linting. * Replace set with vector.	2017-06-23 21:08:02 +00:00
alanamarzoev	e16df6da9a	Updated task_profiles function to avoid future repetitive parsing. (#691 ) * Updated task_profiles function to avoid future repetitive parsing. * Fix indentation. * Fixed according to comments. * Included updated test for task_profiles function. * Simplify test. * Fix indentation. * Fix.	2017-06-22 19:21:18 -07:00
Robert Nishihara	2d636d9278	Kill jupyter in ray stop. (#689 ) * Kill jupyter in ray stop. * Terminate jupyter notebook in ray stop. * Fix linting.	2017-06-21 05:58:34 +00:00
Robert Nishihara	5bb07cb01b	Remove old UI code. (#688 )	2017-06-21 05:54:21 +00:00
Robert Nishihara	5ebc2f3f2e	Do resource bookkeeping for actor methods. (#682 ) * Dispatch regular and actor tasks when resources become available. * Make actor methods do resource bookkeeping and add test. * Remove unnecessary field. * Fix linting. * Fix actor test. * Maintain set of actors with pending tasks to speed up task dispatch. * Exit early from task dispatch if there are no resources available. * Fix linting. * Fix error. * Fix bug related to iterator invalidation. * When an actor is removed, remove it from the set of actors with pending tasks.	2017-06-21 05:52:45 +00:00
alanamarzoev	ed9380d73d	Automatically start web UI in ray.init(). (#687 ) * Start up webui on ray.init * Removed .ipynb checkpoint folders. * Removed print statements in cleanup function. * Fixed * Removed extra file. * Cleaned up ui. * Don't start browser automatically in ray.init(), also copy the notebook every time so that changes don't persist. * Update setup.py and installation instructions to install jupyter. * Don't automatically install jupyter, don't start the UI if jupyter is not installed. * Improve error message when failing to start UI.	2017-06-20 10:32:55 -07:00
Robert Nishihara	3052ce25a6	Divide up large fetch requests from local scheduler, also print warni… (#683 ) * Divide up large fetch requests from local scheduler, also print warning if fetch handler is slow. * Fix linting. * Fix typo.	2017-06-19 22:57:51 +00:00
Robert Nishihara	9e4a3e4972	Replace some UT data structures in local scheduler with C++ STL. (#680 ) * Replace a local scheduler ut_array with a std::vector. * Replace vector of sizes in local scheduler with std::pair. * Remove utarray include. * Replace utarray with std::vector for reading local scheduler input messages. * Remove more UT data structures. * Remove UT includes. * Fix linting. * Include stdlib.h to find size_t. * Remove includes of stdbool.h. * Replace std::pair with TaskQueueEntry. * Fix redis tests. * Reinstate tests.	2017-06-19 21:58:42 +00:00
Philipp Moritz	9bcaaaeaf5	Debugging for policy gradients (#681 ) * configuration option for tensorflow debugger * add model checkpointing * fix linting * make it possible to run without checkpointing * fix * loading from checkpoint and expose debugger through cli * todo for filters * Fix typo.	2017-06-18 17:58:41 -07:00
Robert Nishihara	f12db5f0e2	Divide large plasma requests into smaller chunks, and wait longer before reissuing large requests. (#678 ) * Divide large get requests into smaller chunks. * Divide fetches into smaller chunks. * Wait longer in worker and manager before reissuing fetch requests if there are many outstanding fetch requests. * Log warning if a handler in the local scheduler or plasma manager takes more than one second.	2017-06-18 04:42:15 +00:00
alanamarzoev	4d5ac9dad5	Include object size and hash in the table returned by the object_table function in the GlobalStateAPI. (#665 ) * added log_table function and a test * fixed log_files and added task_profiles * fixed formatting * fixed linting errors * fixes * removed file * more fixes * hopefully fixed * Small changes. * Fix linting. * Fix bug in log monitor. * Small changes. * Fix bug in travis. * Including data_size and hash in the ResultTableReply. * Included data_size and hash info in object_table. * Fixed bugs in ray_redis_module.cc. * Removing commented out code. * Fixes * Freed hash and data_size strings after using, and checked if they're null along with task_id and is_put. * Changed it so that data_size is set correctly. * Removed iostream import. * Included a check to ensure that the Redis string to long long conversion was successful. * Included separate data_size and hash null checks. * Fixed bug. * Made linting changes. * Another linting error. * Slight simplication.	2017-06-16 23:17:11 -07:00
Robert Nishihara	019ba07e9c	Correct actor class name and module. (#675 ) * Correct actor class name and module. * Add test. * Fix linting.	2017-06-17 05:44:42 +00:00
Robert Nishihara	96962cdee0	Log fatal error if plasma manager or local scheduler heartbeats take too long. (#676 ) * Log fatal error if plasma manager or local scheduler take too long to send heartbeat. * Fix linting. * Use int64_t for milliseconds since unix epoch.	2017-06-16 19:11:01 +00:00
Alexey Tumanov	8317025987	reducing the size of objects created for the global scheduler test (#674 )	2017-06-15 10:02:46 -07:00
Philipp Moritz	8798f4e690	fix flaky mac os x plasma store component_failure_test (#673 ) Fix flaky mac os x plasma store component_failure_test.	2017-06-15 00:31:50 -07:00
Philipp Moritz	c343df832e	use multiple threads for memcpy (#669 )	2017-06-14 19:14:24 -07:00
alanamarzoev	cc4990b543	Task profiles function and test (#647 ) Expose some task profiling information through global state API.	2017-06-13 17:53:34 -07:00
alanamarzoev	43bae46e47	Included worker_id in task event logs. (#668 )	2017-06-13 17:30:43 -07:00
Robert Nishihara	fb119bb50c	Automatically add ip addresses to list of known hosts in cluster usage documentation. (#667 )	2017-06-14 00:13:33 +00:00
Philipp Moritz	54925996ca	Allow remote functions to specify max executions and kill worker once limit is reached. (#660 ) * implement restarting workers after certain number of task executions * Clean up python code. * Don't start new worker when an actor disconnects. * Move wait_for_pid_to_exit to test_utils.py. * Add test. * Fix linting errors. * Fix linting. * Fix typo.	2017-06-13 00:34:58 -07:00
Eric Liang	4374ad1453	Policy gradient example: Support multi-GPU training (#584 ) * add tf metrics * comments * fix network scopes * add doc * initial work * try with 3 virtual cpus * clean up metrics * use format string * fix trace level * back to pong * always run summary on cpu * plot intermediate and final sgd stats * add back a global step * update * add timeline * use staging area and reuse weights properly * stage at cpu * whoops, stage only the batch * clean up a bit * fix py flake * wip * create an optimizer graph per device * print timeline on 5th batch instead * print examples per second * log placement for training ops * force placement on cpu:0 * try separating weights onto different gpus * try using nccl * add cpu fallback * remove space from date * check has gpu device * fix flag config * checkpoint * wip * update * add some timing * trace loading * try cpu * revert that * remove expensive test * lint * cleanups * clean up timers * clean it up a bit * fix code for non-scalar action spaces * address some nits * fix quotes * efficient shuffling between sgd epochs	2017-06-13 06:03:25 +00:00
Robert Nishihara	1916475e14	Increase socket listen backlog from 5 to 128. (#661 )	2017-06-11 06:34:16 +00:00
Richard Liaw	8d350f628a	Fixing Redis Key Consistencies for Actor, FunctionTable, FunctionsToRun, and RemoteFunction (#659 ) * consistencies for Actor, FunctionTable, and FunctionsToRun * NOT WORKING: changing remote fn keys	2017-06-10 23:45:22 +00:00
Eric Liang	d4d2c03ac5	Remove timeout for Redis commands. (#649 ) * update * Remove interaction between callback data identifier and event loop. * Remove tests that no longer apply.	2017-06-09 15:55:36 -07:00
alanamarzoev	ee1d4e5ea2	Redirect worker stdout/stderr to log files. (#646 ) * local scheduler * redirect output files to be associated with workers rather than the local scheduler * fixed formatting * fixes * Moved output redirection logic to worker.py. * Changed write mode. * Fixed formatting. * Added comment. * Reuse log file creation in services.py. * Fix linting. * Fix problem in which multiple processes attempt to create /tmp/raylogs at the same time.	2017-06-08 18:30:48 -07:00
Crystal	fff50d824c	Doc using ray with gpu (#644 ) * Added to troubleshooting documentation about whether redefining remote functions runs the new code version * Minor correction to troubleshooting documentation * Writing new documentation page for using Ray with GPUs * Wrote new documentation page on using ray with gpus * Add some more details.	2017-06-08 00:12:44 -07:00
alanamarzoev	f0339f3386	Expose log files through global state API. (#641 ) * added log_table function and a test * fixed log_files and added task_profiles * fixed formatting * fixed linting errors * fixes * removed file * more fixes * hopefully fixed * Small changes. * Fix linting. * Fix bug in log monitor. * Small changes. * Fix bug in travis.	2017-06-08 00:08:10 -07:00
Robert Nishihara	fde843a636	Update installation documentation to recommend installing Ray with pip. (#637 )	2017-06-07 05:51:06 +00:00
Crystal	60161f276b	Added to troubleshooting documentation about whether redefining remot… (#640 ) * Added to troubleshooting documentation about whether redefining remote functions runs the new code version * Minor correction to troubleshooting documentation * Small rewordings.	2017-06-06 22:49:53 -07:00

1 2 3 4 5 ...

995 commits