hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-07 02:51:39 -05:00

Author	SHA1	Message	Date
Robert Nishihara	5bb07cb01b	Remove old UI code. (#688 )	2017-06-21 05:54:21 +00:00
Robert Nishihara	5ebc2f3f2e	Do resource bookkeeping for actor methods. (#682 ) * Dispatch regular and actor tasks when resources become available. * Make actor methods do resource bookkeeping and add test. * Remove unnecessary field. * Fix linting. * Fix actor test. * Maintain set of actors with pending tasks to speed up task dispatch. * Exit early from task dispatch if there are no resources available. * Fix linting. * Fix error. * Fix bug related to iterator invalidation. * When an actor is removed, remove it from the set of actors with pending tasks.	2017-06-21 05:52:45 +00:00
alanamarzoev	ed9380d73d	Automatically start web UI in ray.init(). (#687 ) * Start up webui on ray.init * Removed .ipynb checkpoint folders. * Removed print statements in cleanup function. * Fixed * Removed extra file. * Cleaned up ui. * Don't start browser automatically in ray.init(), also copy the notebook every time so that changes don't persist. * Update setup.py and installation instructions to install jupyter. * Don't automatically install jupyter, don't start the UI if jupyter is not installed. * Improve error message when failing to start UI.	2017-06-20 10:32:55 -07:00
Robert Nishihara	3052ce25a6	Divide up large fetch requests from local scheduler, also print warni… (#683 ) * Divide up large fetch requests from local scheduler, also print warning if fetch handler is slow. * Fix linting. * Fix typo.	2017-06-19 22:57:51 +00:00
Robert Nishihara	9e4a3e4972	Replace some UT data structures in local scheduler with C++ STL. (#680 ) * Replace a local scheduler ut_array with a std::vector. * Replace vector of sizes in local scheduler with std::pair. * Remove utarray include. * Replace utarray with std::vector for reading local scheduler input messages. * Remove more UT data structures. * Remove UT includes. * Fix linting. * Include stdlib.h to find size_t. * Remove includes of stdbool.h. * Replace std::pair with TaskQueueEntry. * Fix redis tests. * Reinstate tests.	2017-06-19 21:58:42 +00:00
Philipp Moritz	9bcaaaeaf5	Debugging for policy gradients (#681 ) * configuration option for tensorflow debugger * add model checkpointing * fix linting * make it possible to run without checkpointing * fix * loading from checkpoint and expose debugger through cli * todo for filters * Fix typo.	2017-06-18 17:58:41 -07:00
Robert Nishihara	f12db5f0e2	Divide large plasma requests into smaller chunks, and wait longer before reissuing large requests. (#678 ) * Divide large get requests into smaller chunks. * Divide fetches into smaller chunks. * Wait longer in worker and manager before reissuing fetch requests if there are many outstanding fetch requests. * Log warning if a handler in the local scheduler or plasma manager takes more than one second.	2017-06-18 04:42:15 +00:00
alanamarzoev	4d5ac9dad5	Include object size and hash in the table returned by the object_table function in the GlobalStateAPI. (#665 ) * added log_table function and a test * fixed log_files and added task_profiles * fixed formatting * fixed linting errors * fixes * removed file * more fixes * hopefully fixed * Small changes. * Fix linting. * Fix bug in log monitor. * Small changes. * Fix bug in travis. * Including data_size and hash in the ResultTableReply. * Included data_size and hash info in object_table. * Fixed bugs in ray_redis_module.cc. * Removing commented out code. * Fixes * Freed hash and data_size strings after using, and checked if they're null along with task_id and is_put. * Changed it so that data_size is set correctly. * Removed iostream import. * Included a check to ensure that the Redis string to long long conversion was successful. * Included separate data_size and hash null checks. * Fixed bug. * Made linting changes. * Another linting error. * Slight simplication.	2017-06-16 23:17:11 -07:00
Robert Nishihara	019ba07e9c	Correct actor class name and module. (#675 ) * Correct actor class name and module. * Add test. * Fix linting.	2017-06-17 05:44:42 +00:00
Robert Nishihara	96962cdee0	Log fatal error if plasma manager or local scheduler heartbeats take too long. (#676 ) * Log fatal error if plasma manager or local scheduler take too long to send heartbeat. * Fix linting. * Use int64_t for milliseconds since unix epoch.	2017-06-16 19:11:01 +00:00
Alexey Tumanov	8317025987	reducing the size of objects created for the global scheduler test (#674 )	2017-06-15 10:02:46 -07:00
Philipp Moritz	8798f4e690	fix flaky mac os x plasma store component_failure_test (#673 ) Fix flaky mac os x plasma store component_failure_test.	2017-06-15 00:31:50 -07:00
Philipp Moritz	c343df832e	use multiple threads for memcpy (#669 )	2017-06-14 19:14:24 -07:00
alanamarzoev	cc4990b543	Task profiles function and test (#647 ) Expose some task profiling information through global state API.	2017-06-13 17:53:34 -07:00
alanamarzoev	43bae46e47	Included worker_id in task event logs. (#668 )	2017-06-13 17:30:43 -07:00
Robert Nishihara	fb119bb50c	Automatically add ip addresses to list of known hosts in cluster usage documentation. (#667 )	2017-06-14 00:13:33 +00:00
Philipp Moritz	54925996ca	Allow remote functions to specify max executions and kill worker once limit is reached. (#660 ) * implement restarting workers after certain number of task executions * Clean up python code. * Don't start new worker when an actor disconnects. * Move wait_for_pid_to_exit to test_utils.py. * Add test. * Fix linting errors. * Fix linting. * Fix typo.	2017-06-13 00:34:58 -07:00
Eric Liang	4374ad1453	Policy gradient example: Support multi-GPU training (#584 ) * add tf metrics * comments * fix network scopes * add doc * initial work * try with 3 virtual cpus * clean up metrics * use format string * fix trace level * back to pong * always run summary on cpu * plot intermediate and final sgd stats * add back a global step * update * add timeline * use staging area and reuse weights properly * stage at cpu * whoops, stage only the batch * clean up a bit * fix py flake * wip * create an optimizer graph per device * print timeline on 5th batch instead * print examples per second * log placement for training ops * force placement on cpu:0 * try separating weights onto different gpus * try using nccl * add cpu fallback * remove space from date * check has gpu device * fix flag config * checkpoint * wip * update * add some timing * trace loading * try cpu * revert that * remove expensive test * lint * cleanups * clean up timers * clean it up a bit * fix code for non-scalar action spaces * address some nits * fix quotes * efficient shuffling between sgd epochs	2017-06-13 06:03:25 +00:00
Robert Nishihara	1916475e14	Increase socket listen backlog from 5 to 128. (#661 )	2017-06-11 06:34:16 +00:00
Richard Liaw	8d350f628a	Fixing Redis Key Consistencies for Actor, FunctionTable, FunctionsToRun, and RemoteFunction (#659 ) * consistencies for Actor, FunctionTable, and FunctionsToRun * NOT WORKING: changing remote fn keys	2017-06-10 23:45:22 +00:00
Eric Liang	d4d2c03ac5	Remove timeout for Redis commands. (#649 ) * update * Remove interaction between callback data identifier and event loop. * Remove tests that no longer apply.	2017-06-09 15:55:36 -07:00
alanamarzoev	ee1d4e5ea2	Redirect worker stdout/stderr to log files. (#646 ) * local scheduler * redirect output files to be associated with workers rather than the local scheduler * fixed formatting * fixes * Moved output redirection logic to worker.py. * Changed write mode. * Fixed formatting. * Added comment. * Reuse log file creation in services.py. * Fix linting. * Fix problem in which multiple processes attempt to create /tmp/raylogs at the same time.	2017-06-08 18:30:48 -07:00
Crystal	fff50d824c	Doc using ray with gpu (#644 ) * Added to troubleshooting documentation about whether redefining remote functions runs the new code version * Minor correction to troubleshooting documentation * Writing new documentation page for using Ray with GPUs * Wrote new documentation page on using ray with gpus * Add some more details.	2017-06-08 00:12:44 -07:00
alanamarzoev	f0339f3386	Expose log files through global state API. (#641 ) * added log_table function and a test * fixed log_files and added task_profiles * fixed formatting * fixed linting errors * fixes * removed file * more fixes * hopefully fixed * Small changes. * Fix linting. * Fix bug in log monitor. * Small changes. * Fix bug in travis.	2017-06-08 00:08:10 -07:00
Robert Nishihara	fde843a636	Update installation documentation to recommend installing Ray with pip. (#637 )	2017-06-07 05:51:06 +00:00
Crystal	60161f276b	Added to troubleshooting documentation about whether redefining remot… (#640 ) * Added to troubleshooting documentation about whether redefining remote functions runs the new code version * Minor correction to troubleshooting documentation * Small rewordings.	2017-06-06 22:49:53 -07:00
Philipp Moritz	690fe10bb6	Save policies for Evolution Strategies (#638 ) Save policies for evolution strategies.	2017-06-04 16:21:19 -07:00
Crystal	4c94d6c3b9	Rewrote and reordered the examples in the Actor documentation for cla… (#635 ) * Rewrote and reordered the examples in the Actor documentation for clarity. Also added an introduction to Gym * Minor tweaks to actor documentation * Small changes to wording.	2017-06-02 23:42:41 -07:00
Philipp Moritz	6adf39959c	put back large python object tests (commented out) (#636 )	2017-06-02 20:36:10 -07:00
Robert Nishihara	301e0b0db8	Bump version to 0.1.1 in preparation for uploading wheels to PyPI. (#630 )	2017-06-03 02:17:39 +00:00
Philipp Moritz	0254efa5e8	Use parallel memcopy from arrow (#633 ) * use parallel memcopy from arrow * fix linting * remove memory.h	2017-06-02 18:18:41 -07:00
Robert Nishihara	2694337c0f	Fix large memory tests. (#632 ) * Log the driver ID in hex instead of binary. * Fix large memory test and add more tests to it. * Remove tests that are too stressful.	2017-06-03 01:12:56 +00:00
Robert Nishihara	23b0c80967	Rename linux wheels so they can be uploaded to PyPI. (#629 )	2017-06-02 20:20:34 +00:00
Robert Nishihara	1a682e2807	Enable starting and stopping ray with "ray start" and "ray stop". (#628 ) * Install start_ray and stop_ray scripts in setup.py. * Update documentation. * Fix docker tests. * Implement stop_ray script in python. * Fix linting.	2017-06-02 20:17:48 +00:00
Robert Nishihara	a4d8e13094	Suppress excess warning messages related to intentional actor deaths. (#627 ) * Don't submit the actor destructor tasks when the job is exiting. * Don't propagate error messages to the driver when an actor exits intentionally.	2017-06-01 20:10:40 +00:00
Robert Nishihara	d0bfc0a849	Clean up actor workers when actor handle goes out of scope. (#617 )	2017-06-01 07:02:43 +00:00
Robert Nishihara	dd7f866a92	Fix compilation error on CentOS. (#622 ) * Fix compilation error on CentOS. * add TODO	2017-06-01 06:51:00 +00:00
Robert Nishihara	5f193afb87	Tell local scheduler to ignore SIGCHLD so that workers don't become zombies. (#620 )	2017-06-01 06:37:28 +00:00
Robert Nishihara	4d51ed37b2	Fix bug in which plasma client file descriptors were not closed. (#618 ) * Fix bug in which plasma client file descriptors were not closed. * Add logging statement when disconnecting client from plasma store. * Fix after rebasing. * Add more checks to plasma disconnect client.	2017-06-01 05:37:29 +00:00
Robert Nishihara	bcaab78908	Add script for building MacOS wheels. (#601 ) * Add script for building MacOS wheels. * Small cleanups to script. * Fix setting of PATH before building wheel. * Create symbolic link to correct Python executable so Ray installation finds the right Python. * Address comments. * Rename readme.	2017-06-01 00:30:46 +00:00
Philipp Moritz	b94b4a35e0	Make the Plasma store ready for Arrow integration (#579 ) * port plasma to arrow * fixes * refactor plasma client * more modernization * fix plasma manager tests * everything compiles * fix plasma client tests * update plasma serialization tests * fix plasma manager tests * fix bug * updates * fix bug * fix tests * fix rebase * address comments * fix travis valgrind build * fix linting * fix include order again * fix linting * address comments	2017-05-31 16:24:23 -07:00
Richard Shin	609b5c1a4c	Add script to build manylinux1 .whl files (#600 ) * Add manylinux setup * Switch to cp27mu * python/MANIFEST.in * Fix MANIFEST.in * Add build-wheel-manylinux1.sh * Update readme * Install correct version of numpy * Fix typo in README-manylinux1.md * Don't install cmake * Remove commented line from setup.py * Delete unused manylinux1.sh * Run setup.py bdist_wheel twice * Don't use package_data and MANIFEST.in. * Small aesthetic change. * Trigger build_ext in setup.py. * Remove nonexistent file from MANIFEST.in. * Manually copy files in MANIFEST.in to where Python expects them in order to prevent setup.py from having to be run twice. * Only run setup.py once when building wheels. * Aesthetic change to readme. * Copy generated flatbuffer Python files in build_ext. * Fix permission denied error by making sure to preserve executableness when copying files. * Remove unnecessary argument to setup.py. * Remove MANIFEST.in and move files to include into list in setup.py. * Fix numpy version when building wheels and replace rm with git clean.	2017-05-27 21:35:48 -07:00
Robert Nishihara	97af3b34d8	Use string instead of list in tutorial example to make it clearer. (#586 )	2017-05-26 15:32:51 -07:00
Philipp Moritz	647e1d9fc3	Fix runtest.py on the ubuntu system python 3 (#599 ) * fix runtest.py on the ubuntu system python 3 * less strict version of the test	2017-05-26 15:22:36 -07:00
Richard Shin	16050eca8d	Don't link Python extensions to libpython*.so (#598 )	2017-05-25 19:01:12 -07:00
Chelsea Finn	f97d0393cc	Fix to json decoding bug (#597 ) * fix json decoding bug * Fix linting error.	2017-05-25 18:48:39 -07:00
Michael Whittaker	1985838a30	Fixed small typo in actors.rst. (#595 )	2017-05-25 11:30:45 -07:00
Philipp Moritz	3885d1b286	make builds with CMake incremental (#592 )	2017-05-24 21:52:33 -07:00
Robert Nishihara	997aa35721	Remove cloudpickle customization and just use plain cloudpickle. (#588 ) * Remove augmentations of cloudpickle. * Entirely remove cloudpickle modifications. Just use plain cloudpickle.	2017-05-24 20:22:28 -07:00
Philipp Moritz	679910496e	fix policy gradients for mujoco domains (#589 )	2017-05-24 18:39:37 -07:00

... 3 4 5 6 7 ...

1171 commits