hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-07 02:51:39 -05:00

Author	SHA1	Message	Date
Robert Nishihara	249b667b0e	Raise exception in Python if wait is called with duplicate object IDs. (#262 )	2017-02-09 23:32:19 -08:00
Alexey Tumanov	dfb6107b22	General attribute-based heterogeneity support with hard and soft constraints (#248 ) * attribute-based heterogeneity-awareness in global scheduler and photon * minor post-rebase fix * photon: enforce dynamic capacity constraint on task dispatch * globalsched: cap the number of times we try to schedule a task in round robin * propagating ability to specify resource capacity to ray.init * adding resources to remote function export and fetch/register * globalsched: remove unused functions; update cached photon resource capacity (until next photon heartbeat) * Add some integration tests. * globalsched: cleanup + factor out constraint checking * lots of style * task_spec_required_resource: global refactor * clang format * clang format + comment update in photon * clang format photon comment * valgrind * reduce verbosity for Travis * Add test for scheduler load balancing. * addressing comments * refactoring global scheduler algorithm * Minor cleanups. * Linting. * Fix array_test.py and linting. * valgrind fix for photon tests * Attempt to fix stress tests. * fix hashmap free * fix hashmap free comment * memset photon resource vectors to 0 in case they get used before the first heartbeat * More whitespace changes. * Undo whitespace error I introduced.	2017-02-09 01:34:14 -08:00
Wapaul1	1a7e1c47cb	Added example for compute grads in ray tutorial (#238 ) * Added example for compute grads in ray * Added formatting * Removed need for placeholders in apply gradient * Streamlined examples * Fixed docs * Added formatting * Removed old references * Simplified code some * Addressed comments * Changes to first code block * Added test for training and updated code snippets * Formatting * Removed mean * Removed all mention of mean * Added comments * Added comments	2017-02-07 18:07:21 -08:00
Philipp Moritz	ca254b8689	Fix stack overflow if many objects are fetched. (#237 ) * fix stack overflow if many objects are fetched * fix other stack allocations * add tests and fix linting * address stephanie's comments * fix linting * fix tests	2017-02-04 16:49:36 -08:00
Stephanie Wang	241b539ff8	Reconstruction for evicted objects (#181 ) * First pass at reconstruction in the worker Modify reconstruction stress testing to start Plasma service before rest of Ray cluster TODO about reconstructing ray.puts Fix ray.put error for double creates Distinguish between empty entry and no entry in object table Fix test case Fix Python test Fix tests * Only call reconstruct on objects we have not yet received * Address review comments * Fix reconstruction for Python3 * remove unused code * Address Robert's comments, stress tests are crashing * Test and update the task's scheduling state to suppress duplicate reconstruction requests. * Split result table into two lookups, one for task ID and the other as a test-and-set for the task state * Fix object table tests * Fix redis module result_table_lookup test case * Multinode reconstruction tests * Fix python3 test case * rename * Use new start_redis * Remove unused code * lint * indent * Address Robert's comments * Use start_redis from ray.services in state table tests * Remove unnecessary memset	2017-02-01 19:18:46 -08:00
Wapaul1	db7297865f	Added functionality for retrieving variables from control dependencies (#220 ) * Added test for retriving variables from an optimizer * Added comments to test * Addressed comments * Fixed travis bug * Added fix to circular controls * Added set for explored operations and duplicate prefix stripping * Removed embeded ipython * Removed prefix, use seperate graph for each network * Removed redundant imports * Addressed comments and added separate graph to initializer * fix typos * get rid of prefix in documentation	2017-01-30 19:17:42 -08:00
Robert Nishihara	ab8c3432f7	Add driver ID to task spec and add driver ID to Python error handling. (#225 ) * Add driver ID to task spec and add driver ID to Python error handling. * Make constants global variables. * Add test for error isolation.	2017-01-25 22:53:48 -08:00
Robert Nishihara	7151ed5cdf	Fix bug in tensorflow tests. (#218 ) * Fix bug in tensorflow tests. * Address comment.	2017-01-19 20:29:05 -08:00
Robert Nishihara	9bb8162621	Improvements to documentation and error messages. (#221 )	2017-01-19 20:27:46 -08:00
Robert Nishihara	b98a63fd3a	Change get to take a timeout and multiple object IDs. (#212 ) * Change plasma_get to take a timeout and an array of object IDs. * Address comments. * Bug fix related to computing object hashes. * Add test. * Fix file descriptor leak. * Fix valgrind. * Formatting. * Remove call to plasma_contains from the plasma client. Use timeout internally in ray.get. * small fixes	2017-01-19 12:21:12 -08:00
Wapaul1	6fe69bec11	Selects from all variables now independent of graph, and uses standar… (#199 ) * Smarter variable retrieval and doc update * doc update and small fixes * addressing robert's comments	2017-01-18 17:36:58 -08:00
Robert Nishihara	303d0fed3e	Prevent plasma store and manager from dying when a client dies. (#203 ) * Prevent plasma store and manager from dying when a worker dies. * Check errno inside of warn_if_sigpipe. Passing in errno doesn't work because the arguments to warn_if_sigpipe can be evaluated out of order.	2017-01-17 20:34:31 -08:00
Robert Nishihara	87d8d05792	Rename reusable variables -> environment variables. (#195 )	2017-01-10 20:14:33 -08:00
Wapaul1	aaf3be3c53	Fixed lbfgs for ray-cluster (#180 ) * Updated lbfgs example to include TensorflowVariables * Whitespace.	2017-01-10 18:40:06 -08:00
Robert Nishihara	be4a37bf37	Various cleanups: remove start_ray_local from ray.init, remove unused code, fix "pip install numbuf". (#193 ) * Remove start_ray_local from ray.init and change default number of workers to 10. * Remove alexnet example. * Move array methods to experimental. * Remove TRPO example. * Remove old files. * Compile plasma when we build numbuf. * Address comments.	2017-01-10 17:35:27 -08:00
Robert Nishihara	973716d310	Use cloudpickle 0.2.2. (#189 )	2017-01-08 17:30:06 -08:00
Wapaul1	0ac2abee51	Added helper class for getting tf variables from loss function (#184 ) * Added helper class for getting tf variables from loss function * Updated usage and documentation * Removed try-catches * Added futures * Added documentation * fixes and tests * more tests * install tensorflow in travis	2017-01-07 01:54:11 -08:00
Robert Nishihara	651aa6007a	Log profiling information from worker. (#178 ) * Log timing events on workers. * Have workers log to the event log through the local scheduler. * Fixes and address comments. * bug fix * styling	2017-01-05 16:47:16 -08:00
Robert Nishihara	509685d240	Let the worker know about remote functions that failed to unpickle. (#175 ) * Let the worker know about remote functions that failed to unpickle. * Cleanup.	2017-01-03 18:41:03 -08:00
Stephanie Wang	c403ab11ab	Allow ray.init to take in address information about existing services. (#161 ) * Refactor ray.init and ray.services to allow processes that are already running * Fix indexing error * Address Robert's comments	2016-12-28 14:17:29 -08:00
Robert Nishihara	10e067e5e5	Delay releasing a maximum number of bytes in the plasma client. (#160 ) * Send message from plasma client to get plasma store capacity. * Release objects from plasma client if they are too large. * Use doubly-linked list instead of ring buffer for plasma client release history. * Address comments. * Fix problem with slicing PlasmaBuffer objects. * Fix crash in plasma manager during transfer. * Formatting. * Make plasma client cache larger and make caching test not throw exceptions on Travis.	2016-12-27 19:51:26 -08:00
Robert Nishihara	8d90c9f432	Experimental utils for copying directories to other machines in the c… (#150 ) * Experimental utils for copying directories to other machines in the cluster using Ray. * Test copying directory functionality. * Small fix.	2016-12-23 00:43:16 -08:00
Robert Nishihara	86b211f5c2	Give run_function_on_all_workers to take a worker_info dictionary including a counter. (#149 ) * Suppress Redis warnings and remove some global scheduler logging. * Pass a counter into run_function_on_all_workers indicating how many workers have begun executing this function.	2016-12-22 22:05:58 -08:00
Robert Nishihara	79dd1815a2	Python 3 compatibility. (#121 ) * Make common module Python 3 compatible. * Make plasma module Python 3 compatible. * Make photon module Python 3 compatible. * Make numbuf module Python 3 compatible. * Remaining changes for Python 3 compatibility. * Test Python 3 in Travis. * Fixes.	2016-12-16 14:40:37 -08:00
Robert Nishihara	ddba1df802	Start working toward Python3 compatibility. (#117 )	2016-12-11 12:25:31 -08:00
Robert Nishihara	86973059de	Switch to new wait implementation. (#113 ) * Duplicate wait1 implementation and seperate out wait datastructures. * Address Philipp's comments. * Temporarily address test failure problem by increasing timeout and reducing load in tests. * Update stress tests to include distributed wait.	2016-12-09 19:26:11 -08:00
Robert Nishihara	6441571d31	Introduce some stress tests. (#106 ) * Retry first connection to redis in db_connect. * Declare usleep. * Formatting. * Introduce some stress tests.	2016-12-09 17:49:31 -08:00
Robert Nishihara	b3c05655a0	Enable fetching objects from remote object stores. (#87 ) * Fetch missing dependencies from local scheduler. * Factor out global scheduler policy state. * Use object_table_subscribe instead of object_table_lookup. * Fix bug in which timer was being created twice for a single fetch request. * Free old manager vector.	2016-12-06 15:47:31 -08:00
Philipp Moritz	58e8bbcb34	Fix bug in serializing arguments of tasks that are more complex objects (#72 ) * Give more informative error message when we do not know how to serialize a class. * Check that passing arguments to remote functions and getting them does not change their values. * fix serialization bug * fix tests for common module * Formatting. * Bug fix in init_pickle_module signature. * Use pickle with HIGHEST_PROTOCOL.	2016-11-30 23:21:53 -08:00
Robert Nishihara	d77b685a90	Global scheduler skeleton (#45 ) * Initial scheduler commit * global scheduler * add global scheduler * Implement global scheduler skeleton. * Formatting. * Allow local scheduler to be started without a connection to redis so that we can test it without a global scheduler. * Fail if there are no local schedulers when the global scheduler receives a task. * Initialize uninitialized value and formatting fix. * Generalize local scheduler table to db client table. * Remove code duplication in local scheduler and add flag for whether a task came from the global scheduler or not. * Queue task specs in the local scheduler instead of tasks. * Simple global scheduler tests, including valgrind. * Factor out functions for starting processes. * Fixes.	2016-11-18 19:57:51 -08:00
Robert Nishihara	336a904404	Implement repr, hash, and richcompare for ObjectIDs. (#33 ) * Implement repr, hash, and richcompare for ObjectIDs. * Addressing comments. * Partially fix example applications.	2016-11-11 09:18:36 -08:00
Robert Nishihara	90f88af902	Fix bug in which worker import counters were treated incorrectly. (#28 ) * Fix bug in which worker import counters were treated incorrectly. * Fix bug in which cached functions-to-run were double counted as exports. This also runs the functions-to-run on the driver only after ray.init is called. * Only define reusable variables locally after ray.init has been called. * Remove flaky reference counting tests. It's not clear that these tests make sense. * Make numbuf pip install verbose. * Export cached reusable variables before cached remote functions. * Fix bug causing the worker to hang sometimes. This happens when the worker is trying to run a task, but it hasn't imported enough imports to run the task, so it continually acquires and releases a lock while checking if it has enough imports. However, for some reason, the import thread is waiting to acquire the same lock and never does so (or takes a very long time to do so). By dropping the lock before sleeping, this makes it easier for other threads to acquire the lock. * Acquire locks using 'with' statements. * Fix possible test failure. * Try to start Redis multiple times with different random ports if the original attempt failed. * Fix test in which we redefine a remote function.	2016-11-06 22:24:39 -08:00
Philipp Moritz	1147c4d34b	Keep objects in cache between tasks (#29 ) * fix caching behavior * fixes	2016-11-06 17:31:14 -08:00
Robert Nishihara	072f442c1f	Update worker.py and services.py to use plasma and the local scheduler. (#19 ) * Update worker code and services code to use plasma and the local scheduler. * Cleanups. * Fix bug in which threads were started before the worker mode was set. This caused remote functions to be defined on workers before the worker knew it was in WORKER_MODE. * Fix bug in install-dependencies.sh. * Lengthen timeout in failure_test.py. * Cleanups. * Cleanup services.start_ray_local. * Clean up random name generation. * Cleanups.	2016-11-02 00:39:35 -07:00
Robert Nishihara	09a3ff7173	Pip install numbuf. (#8 )	2016-10-28 14:30:20 -07:00
Robert Nishihara	0a44145906	Fix the resetting of reusable variables on the driver and cache functions to run on all workers. (#446 ) * Properly reset reusable variables on the driver when remote functions are run locally on the driver. * Cache functions to run on all workers that occur before ray.init is called.	2016-10-12 22:17:22 -07:00
Robert Nishihara	9a6991116f	Small fix in test. (#441 )	2016-09-25 23:08:27 -07:00
Robert Nishihara	de6ec47f9e	Add a recursion depth for serialization to prevent infinite loops. (#440 )	2016-09-19 17:17:42 -07:00
Robert Nishihara	91f16a3df0	Migrate repositories to ray-project. (#438 ) * Migrate repositories to ray-project. * Update numbuf to the migrated version.	2016-09-17 00:52:05 -07:00
Robert Nishihara	1aa89a4ae6	Update numbuf to properly handle Python floats. (#435 )	2016-09-15 15:44:11 -07:00
Wapaul1	d5815673a5	Changed ray.select() to ray.wait() and its functionality (#426 ) * Re-implemented select, changed name to wait * Changed tests for select to tests for wait * Updated the hyperopt example to match wait * Small fixes and improve example readme. * Make tests pass.	2016-09-14 17:14:11 -07:00
Robert Nishihara	3b47a15ebd	Fix naming in tests. (#424 )	2016-09-10 21:12:09 -07:00
Robert Nishihara	ba56b08474	Reintroduce passing arguments by value to remote functions. (#425 ) * Reintroduce passing arguments by value to remote functions. * Check size of arguments passed by value. * Fix computation graph visualization.	2016-09-10 21:11:18 -07:00
Robert Nishihara	0191d42751	Check in runtest.py that the correct version of cloudpickle is installed. (#421 )	2016-09-09 16:46:18 -07:00
Robert Nishihara	11a8914684	Allow users to serialize custom classes. (#393 ) * Allow serialization of custom classes. * Add documentation and test cases, also fix pickle case. * Don't allow old-style classes.	2016-09-06 13:28:24 -07:00
Robert Nishihara	d5cb3ac090	Propagate error messages from functions that run on all workers. (#410 )	2016-09-06 10:06:43 -07:00
Robert Nishihara	327d7ff689	Fix bug to enable calling ray.get multiple times on same ObjectID. (#409 )	2016-09-04 13:32:55 -07:00
Philipp Moritz	68cec55a98	Refcount without modifying objects (#407 ) * refcount without modifying objects * add documentation * Update tests and documentation. * Remove extraneous code. * Update numbuf version.	2016-09-04 12:07:52 -07:00
Robert Nishihara	81f40774a7	Remove ObjectID aliasing from the API. (#406 ) * Remove ObjectID aliasing from the API. * Update documentation to remove aliasing.	2016-09-03 19:34:45 -07:00
Philipp Moritz	3548797202	[API] Implement get for multiple objects (#398 ) * [API] Implement get for multiple objects * Small fixes.	2016-09-02 18:02:44 -07:00

... 3 4 5 6 7 ...

358 commits