hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Robert Nishihara	3934d5f6eb	Remove old files and remove old documentation for copying files around cluster. (#274 )	2017-02-13 11:20:04 -08:00
Robert Nishihara	cb7f6ca9b5	Attempt to start web UI when starting Ray. (#269 ) * Attempt to start web UI when starting Ray. * Add instructions for using web UI to cluster documentation. * Don't check if port 8080 is open. * Remove print statement.	2017-02-12 15:17:58 -08:00
Robert Nishihara	f6ce9dfa6c	Allow start_ray.sh to take an object manager port. (#272 ) * Allow start_ray.sh to take a object manager port. * Fix typo and add test. * Small cleanups.	2017-02-12 12:39:32 -08:00
Johann Schleier-Smith	7bf80b6b22	bug fix on printing exception traceback (#268 )	2017-02-10 21:05:05 -08:00
Stephanie Wang	2b8e6485e3	Start and clean up workers from the local scheduler. (#250 ) * Start and clean up workers from the local scheduler Ability to kill workers in photon scheduler Test for old method of starting workers Common codepath for killing workers Common codepath for killing workers Photon test case for starting and killing workers fix build Fix component failure test Register a worker's pid as part of initial connection Address comments and revert photon_connect Set PATH during travis install Fix * Fix photon test case to accept clients on plasma manager fd	2017-02-10 12:46:23 -08:00
Robert Nishihara	ec175b7dfb	Check if processes are alive in test. (#261 )	2017-02-09 23:40:39 -08:00
Robert Nishihara	249b667b0e	Raise exception in Python if wait is called with duplicate object IDs. (#262 )	2017-02-09 23:32:19 -08:00
Robert Nishihara	0aa234fb9c	Fix CXX numbuf error message for Anaconda 3.6. (#258 )	2017-02-09 23:29:43 -08:00
Johann Schleier-Smith	883bedf46e	Add documentation for upgrading a Ray cluster. (#256 ) * add documentation for upgrading a Ray cluster * Update documentation and link to it from README.	2017-02-09 11:55:37 -08:00
Alexey Tumanov	dfb6107b22	General attribute-based heterogeneity support with hard and soft constraints (#248 ) * attribute-based heterogeneity-awareness in global scheduler and photon * minor post-rebase fix * photon: enforce dynamic capacity constraint on task dispatch * globalsched: cap the number of times we try to schedule a task in round robin * propagating ability to specify resource capacity to ray.init * adding resources to remote function export and fetch/register * globalsched: remove unused functions; update cached photon resource capacity (until next photon heartbeat) * Add some integration tests. * globalsched: cleanup + factor out constraint checking * lots of style * task_spec_required_resource: global refactor * clang format * clang format + comment update in photon * clang format photon comment * valgrind * reduce verbosity for Travis * Add test for scheduler load balancing. * addressing comments * refactoring global scheduler algorithm * Minor cleanups. * Linting. * Fix array_test.py and linting. * valgrind fix for photon tests * Attempt to fix stress tests. * fix hashmap free * fix hashmap free comment * memset photon resource vectors to 0 in case they get used before the first heartbeat * More whitespace changes. * Undo whitespace error I introduced.	2017-02-09 01:34:14 -08:00
Wapaul1	1a7e1c47cb	Added example for compute grads in ray tutorial (#238 ) * Added example for compute grads in ray * Added formatting * Removed need for placeholders in apply gradient * Streamlined examples * Fixed docs * Added formatting * Removed old references * Simplified code some * Addressed comments * Changes to first code block * Added test for training and updated code snippets * Formatting * Removed mean * Removed all mention of mean * Added comments * Added comments	2017-02-07 18:07:21 -08:00
Robert Nishihara	1fec94ef00	Display drivers in web UI. (#252 ) * Display drivers in web UI. * Display more rows in grid and factor out function in webui backend.	2017-02-07 14:21:25 -08:00
Philipp Moritz	fefc7d9b49	fix segfault in photon.Task (#253 )	2017-02-07 11:17:11 -08:00
Robert Nishihara	2d1c980ad7	Refactor local scheduler to remove worker indices. (#245 ) * Refactor local scheduler to remove worker indices. * Change scheduling state enum to int in all function signatures. * Bug fix, don't use pointers into a resizable array. * Remove total_num_workers. * Fix tests.	2017-02-05 14:52:28 -08:00
Philipp Moritz	ca254b8689	Fix stack overflow if many objects are fetched. (#237 ) * fix stack overflow if many objects are fetched * fix other stack allocations * add tests and fix linting * address stephanie's comments * fix linting * fix tests	2017-02-04 16:49:36 -08:00
Johann Schleier-Smith	e5a9fc0032	Cluster setup instructions (#233 ) * start updating cluster documentation with parallel ssh * add using ray on a large cluster * revert changes to using ray on a cluster * update cluster documentation * update title * Some formatting changes, and added some notes. * clarification * Add warning about public versus private IP addresses. * Typos and wording. * Clarifications. * Clarifications.	2017-02-02 16:10:26 -08:00
Robert Nishihara	7a7e14ef85	Visualize recent tasks in timeline. (#240 )	2017-02-02 15:53:56 -08:00
Stephanie Wang	241b539ff8	Reconstruction for evicted objects (#181 ) * First pass at reconstruction in the worker Modify reconstruction stress testing to start Plasma service before rest of Ray cluster TODO about reconstructing ray.puts Fix ray.put error for double creates Distinguish between empty entry and no entry in object table Fix test case Fix Python test Fix tests * Only call reconstruct on objects we have not yet received * Address review comments * Fix reconstruction for Python3 * remove unused code * Address Robert's comments, stress tests are crashing * Test and update the task's scheduling state to suppress duplicate reconstruction requests. * Split result table into two lookups, one for task ID and the other as a test-and-set for the task state * Fix object table tests * Fix redis module result_table_lookup test case * Multinode reconstruction tests * Fix python3 test case * rename * Use new start_redis * Remove unused code * lint * indent * Address Robert's comments * Use start_redis from ray.services in state table tests * Remove unnecessary memset	2017-02-01 19:18:46 -08:00
Robert Nishihara	f69d4aaaa7	Change fetch requests in plasma manager to use a single timer. (#234 ) * Change fetch requests in plasma manager to use a single timer. * Fix manager tests, other cleanups.	2017-02-01 12:21:52 -08:00
Johann Schleier-Smith	6ad2b5d87a	Add Redis port option to startup script (#232 ) * specify redis address when starting head * cleanup * update starting cluster documentation * Whitespace. * Address Philipp's comments. * Change redis_host -> redis_ip_address.	2017-01-31 00:28:00 -08:00
Wapaul1	db7297865f	Added functionality for retrieving variables from control dependencies (#220 ) * Added test for retriving variables from an optimizer * Added comments to test * Addressed comments * Fixed travis bug * Added fix to circular controls * Added set for explored operations and duplicate prefix stripping * Removed embeded ipython * Removed prefix, use seperate graph for each network * Removed redundant imports * Addressed comments and added separate graph to initializer * fix typos * get rid of prefix in documentation	2017-01-30 19:17:42 -08:00
Robert Nishihara	6703f7be6f	Provide functionality for local scheduler to start new workers. (#230 ) * Provide functionality for local scheduler to start new workers. * Pass full command for starting new worker in to local scheduler. * Separate out configuration state of local scheduler.	2017-01-27 01:28:48 -08:00
Stephanie Wang	a5c8f28f33	Plasma subscribe (#227 ) * Use object_info as notification, not just the object_id * Add a regression test for plasma managers connecting to store after some objects have been created * Send notifications for existing objects to new plasma subscribers * Continuously try the request to the plasma manager instead of setting a timeout in the test case * Use ray.services to start Redis in plasma test cases * fix test case	2017-01-25 22:57:15 -08:00
Robert Nishihara	ab8c3432f7	Add driver ID to task spec and add driver ID to Python error handling. (#225 ) * Add driver ID to task spec and add driver ID to Python error handling. * Make constants global variables. * Add test for error isolation.	2017-01-25 22:53:48 -08:00
Stephanie Wang	3c6686db08	Photon optimizations (#219 ) * Optimizations: - Track mapping of missing object to dependent tasks to avoid iterating over task queue - Perform all fetch requests for missing objects using the same timer * Fix bug and add regression test * Record task dependencies and active fetch requests in the same hash table * fix typo * Fix memory leak and add test cases for scheduling when dependencies are evicted * Fix python3 test case * Minor details.	2017-01-23 19:44:15 -08:00
Richard Liaw	4575cd88b2	Improve error messages when nodes can't communicate with each other. (#223 ) * Good error messages when nodes can't communicate with each other * Print more information when starting the head node. * Change retries back to 5.	2017-01-22 14:53:15 -08:00
Robert Nishihara	7151ed5cdf	Fix bug in tensorflow tests. (#218 ) * Fix bug in tensorflow tests. * Address comment.	2017-01-19 20:29:05 -08:00
Robert Nishihara	9bb8162621	Improvements to documentation and error messages. (#221 )	2017-01-19 20:27:46 -08:00
Richard Liaw	b3b294e3ad	updated cluster documentation (#216 )	2017-01-19 13:59:54 -08:00
Robert Nishihara	b98a63fd3a	Change get to take a timeout and multiple object IDs. (#212 ) * Change plasma_get to take a timeout and an array of object IDs. * Address comments. * Bug fix related to computing object hashes. * Add test. * Fix file descriptor leak. * Fix valgrind. * Formatting. * Remove call to plasma_contains from the plasma client. Use timeout internally in ray.get. * small fixes	2017-01-19 12:21:12 -08:00
Johann Schleier-Smith	4f6100b67f	fix docker build bug (#207 )	2017-01-18 23:23:34 -08:00
Robert Nishihara	677a019cbd	Remove unnecessary bookkeepping in utlist in plasma client. (#215 )	2017-01-18 23:03:08 -08:00
Robert Nishihara	58570b4981	Give better error message if ray.put or ray.get are called before ray.init. (#214 )	2017-01-18 22:37:37 -08:00
Stephanie Wang	f1987cdc16	Split local scheduler task queue (#211 ) * Split local scheduler task queue into waiting and dispatch queue * Fix memory leak * Add a new task scheduling status for when a task has been queued locally * Fix global scheduler test case and add task status doc * Documentation * Address Philipp's comments * Move tasks back to the waiting queue if their dependencies become unavailable * Update existing task table entries instead of overwriting	2017-01-18 20:27:40 -08:00
Wapaul1	6fe69bec11	Selects from all variables now independent of graph, and uses standar… (#199 ) * Smarter variable retrieval and doc update * doc update and small fixes * addressing robert's comments	2017-01-18 17:36:58 -08:00
Robert Nishihara	303d0fed3e	Prevent plasma store and manager from dying when a client dies. (#203 ) * Prevent plasma store and manager from dying when a worker dies. * Check errno inside of warn_if_sigpipe. Passing in errno doesn't work because the arguments to warn_if_sigpipe can be evaluated out of order.	2017-01-17 20:34:31 -08:00
Philipp Moritz	7f329db4b2	wait until kill operation was successful (#210 )	2017-01-17 20:15:48 -08:00
Philipp Moritz	a708e36225	Switch build system to use CMake completely. (#200 ) * switch to CMake completely ... * cleanup * Run C tests, update installation instructions.	2017-01-17 16:56:40 -08:00
Robert Nishihara	ba8933e10f	Update tutorial. (#196 ) * Update tutorial. * Small updates to documentation and code.	2017-01-10 23:52:38 -08:00
Robert Nishihara	87d8d05792	Rename reusable variables -> environment variables. (#195 )	2017-01-10 20:14:33 -08:00
Wapaul1	aaf3be3c53	Fixed lbfgs for ray-cluster (#180 ) * Updated lbfgs example to include TensorflowVariables * Whitespace.	2017-01-10 18:40:06 -08:00
Robert Nishihara	be4a37bf37	Various cleanups: remove start_ray_local from ray.init, remove unused code, fix "pip install numbuf". (#193 ) * Remove start_ray_local from ray.init and change default number of workers to 10. * Remove alexnet example. * Move array methods to experimental. * Remove TRPO example. * Remove old files. * Compile plasma when we build numbuf. * Address comments.	2017-01-10 17:35:27 -08:00
Wapaul1	b9d6135aa1	Added option for user to not pass in the session and error messages if so (#192 ) * Added option for user to not pass in the session * Small changes.	2017-01-09 21:03:22 -08:00
Philipp Moritz	ab3448a9b4	Plasma Optimizations (#190 ) * bypass python when storing objects into the object store * clang-format * Bug fixes. * fix include paths * Fixes. * fix bug * clang-format * fix * fix release after disconnect	2017-01-09 20:15:54 -08:00
Robert Nishihara	0320902787	Fix Python reference counting bug. (#191 )	2017-01-09 13:08:02 -08:00
Robert Nishihara	973716d310	Use cloudpickle 0.2.2. (#189 )	2017-01-08 17:30:06 -08:00
Alexey Tumanov	674ec3a3cb	generate pytask from string and string from pytask (#188 ) * pytask creation from bytestring: saving work * pytask now works * documentation and tests * linting * Lint and fix test case	2017-01-08 02:16:40 -08:00
Wapaul1	c45342e39d	Updated code to mesh with get_weights returning a dict and new tf code (#187 ) * Updated code to mesh with get_weights returning a dict and new tf code * Added tf.global_variables_initalizer to hyperopt example as well * Small fix. * Small name change.	2017-01-07 14:25:45 -08:00
Wapaul1	0ac2abee51	Added helper class for getting tf variables from loss function (#184 ) * Added helper class for getting tf variables from loss function * Updated usage and documentation * Removed try-catches * Added futures * Added documentation * fixes and tests * more tests * install tensorflow in travis	2017-01-07 01:54:11 -08:00
Stephanie Wang	c13d73b4c9	Suppress duplicate transfer requests (#185 )	2017-01-06 22:14:51 -08:00

1 2 3 4 5 ...

772 commits