hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Robert Nishihara	b98a63fd3a	Change get to take a timeout and multiple object IDs. (#212 ) * Change plasma_get to take a timeout and an array of object IDs. * Address comments. * Bug fix related to computing object hashes. * Add test. * Fix file descriptor leak. * Fix valgrind. * Formatting. * Remove call to plasma_contains from the plasma client. Use timeout internally in ray.get. * small fixes	2017-01-19 12:21:12 -08:00
Wapaul1	6fe69bec11	Selects from all variables now independent of graph, and uses standar… (#199 ) * Smarter variable retrieval and doc update * doc update and small fixes * addressing robert's comments	2017-01-18 17:36:58 -08:00
Robert Nishihara	303d0fed3e	Prevent plasma store and manager from dying when a client dies. (#203 ) * Prevent plasma store and manager from dying when a worker dies. * Check errno inside of warn_if_sigpipe. Passing in errno doesn't work because the arguments to warn_if_sigpipe can be evaluated out of order.	2017-01-17 20:34:31 -08:00
Robert Nishihara	87d8d05792	Rename reusable variables -> environment variables. (#195 )	2017-01-10 20:14:33 -08:00
Wapaul1	aaf3be3c53	Fixed lbfgs for ray-cluster (#180 ) * Updated lbfgs example to include TensorflowVariables * Whitespace.	2017-01-10 18:40:06 -08:00
Robert Nishihara	be4a37bf37	Various cleanups: remove start_ray_local from ray.init, remove unused code, fix "pip install numbuf". (#193 ) * Remove start_ray_local from ray.init and change default number of workers to 10. * Remove alexnet example. * Move array methods to experimental. * Remove TRPO example. * Remove old files. * Compile plasma when we build numbuf. * Address comments.	2017-01-10 17:35:27 -08:00
Robert Nishihara	973716d310	Use cloudpickle 0.2.2. (#189 )	2017-01-08 17:30:06 -08:00
Wapaul1	0ac2abee51	Added helper class for getting tf variables from loss function (#184 ) * Added helper class for getting tf variables from loss function * Updated usage and documentation * Removed try-catches * Added futures * Added documentation * fixes and tests * more tests * install tensorflow in travis	2017-01-07 01:54:11 -08:00
Robert Nishihara	651aa6007a	Log profiling information from worker. (#178 ) * Log timing events on workers. * Have workers log to the event log through the local scheduler. * Fixes and address comments. * bug fix * styling	2017-01-05 16:47:16 -08:00
Robert Nishihara	509685d240	Let the worker know about remote functions that failed to unpickle. (#175 ) * Let the worker know about remote functions that failed to unpickle. * Cleanup.	2017-01-03 18:41:03 -08:00
Stephanie Wang	c403ab11ab	Allow ray.init to take in address information about existing services. (#161 ) * Refactor ray.init and ray.services to allow processes that are already running * Fix indexing error * Address Robert's comments	2016-12-28 14:17:29 -08:00
Robert Nishihara	10e067e5e5	Delay releasing a maximum number of bytes in the plasma client. (#160 ) * Send message from plasma client to get plasma store capacity. * Release objects from plasma client if they are too large. * Use doubly-linked list instead of ring buffer for plasma client release history. * Address comments. * Fix problem with slicing PlasmaBuffer objects. * Fix crash in plasma manager during transfer. * Formatting. * Make plasma client cache larger and make caching test not throw exceptions on Travis.	2016-12-27 19:51:26 -08:00
Robert Nishihara	8d90c9f432	Experimental utils for copying directories to other machines in the c… (#150 ) * Experimental utils for copying directories to other machines in the cluster using Ray. * Test copying directory functionality. * Small fix.	2016-12-23 00:43:16 -08:00
Robert Nishihara	86b211f5c2	Give run_function_on_all_workers to take a worker_info dictionary including a counter. (#149 ) * Suppress Redis warnings and remove some global scheduler logging. * Pass a counter into run_function_on_all_workers indicating how many workers have begun executing this function.	2016-12-22 22:05:58 -08:00
Robert Nishihara	79dd1815a2	Python 3 compatibility. (#121 ) * Make common module Python 3 compatible. * Make plasma module Python 3 compatible. * Make photon module Python 3 compatible. * Make numbuf module Python 3 compatible. * Remaining changes for Python 3 compatibility. * Test Python 3 in Travis. * Fixes.	2016-12-16 14:40:37 -08:00
Robert Nishihara	ddba1df802	Start working toward Python3 compatibility. (#117 )	2016-12-11 12:25:31 -08:00
Robert Nishihara	86973059de	Switch to new wait implementation. (#113 ) * Duplicate wait1 implementation and seperate out wait datastructures. * Address Philipp's comments. * Temporarily address test failure problem by increasing timeout and reducing load in tests. * Update stress tests to include distributed wait.	2016-12-09 19:26:11 -08:00
Robert Nishihara	6441571d31	Introduce some stress tests. (#106 ) * Retry first connection to redis in db_connect. * Declare usleep. * Formatting. * Introduce some stress tests.	2016-12-09 17:49:31 -08:00
Robert Nishihara	b3c05655a0	Enable fetching objects from remote object stores. (#87 ) * Fetch missing dependencies from local scheduler. * Factor out global scheduler policy state. * Use object_table_subscribe instead of object_table_lookup. * Fix bug in which timer was being created twice for a single fetch request. * Free old manager vector.	2016-12-06 15:47:31 -08:00
Philipp Moritz	58e8bbcb34	Fix bug in serializing arguments of tasks that are more complex objects (#72 ) * Give more informative error message when we do not know how to serialize a class. * Check that passing arguments to remote functions and getting them does not change their values. * fix serialization bug * fix tests for common module * Formatting. * Bug fix in init_pickle_module signature. * Use pickle with HIGHEST_PROTOCOL.	2016-11-30 23:21:53 -08:00
Robert Nishihara	d77b685a90	Global scheduler skeleton (#45 ) * Initial scheduler commit * global scheduler * add global scheduler * Implement global scheduler skeleton. * Formatting. * Allow local scheduler to be started without a connection to redis so that we can test it without a global scheduler. * Fail if there are no local schedulers when the global scheduler receives a task. * Initialize uninitialized value and formatting fix. * Generalize local scheduler table to db client table. * Remove code duplication in local scheduler and add flag for whether a task came from the global scheduler or not. * Queue task specs in the local scheduler instead of tasks. * Simple global scheduler tests, including valgrind. * Factor out functions for starting processes. * Fixes.	2016-11-18 19:57:51 -08:00
Robert Nishihara	336a904404	Implement repr, hash, and richcompare for ObjectIDs. (#33 ) * Implement repr, hash, and richcompare for ObjectIDs. * Addressing comments. * Partially fix example applications.	2016-11-11 09:18:36 -08:00
Robert Nishihara	90f88af902	Fix bug in which worker import counters were treated incorrectly. (#28 ) * Fix bug in which worker import counters were treated incorrectly. * Fix bug in which cached functions-to-run were double counted as exports. This also runs the functions-to-run on the driver only after ray.init is called. * Only define reusable variables locally after ray.init has been called. * Remove flaky reference counting tests. It's not clear that these tests make sense. * Make numbuf pip install verbose. * Export cached reusable variables before cached remote functions. * Fix bug causing the worker to hang sometimes. This happens when the worker is trying to run a task, but it hasn't imported enough imports to run the task, so it continually acquires and releases a lock while checking if it has enough imports. However, for some reason, the import thread is waiting to acquire the same lock and never does so (or takes a very long time to do so). By dropping the lock before sleeping, this makes it easier for other threads to acquire the lock. * Acquire locks using 'with' statements. * Fix possible test failure. * Try to start Redis multiple times with different random ports if the original attempt failed. * Fix test in which we redefine a remote function.	2016-11-06 22:24:39 -08:00
Philipp Moritz	1147c4d34b	Keep objects in cache between tasks (#29 ) * fix caching behavior * fixes	2016-11-06 17:31:14 -08:00
Robert Nishihara	072f442c1f	Update worker.py and services.py to use plasma and the local scheduler. (#19 ) * Update worker code and services code to use plasma and the local scheduler. * Cleanups. * Fix bug in which threads were started before the worker mode was set. This caused remote functions to be defined on workers before the worker knew it was in WORKER_MODE. * Fix bug in install-dependencies.sh. * Lengthen timeout in failure_test.py. * Cleanups. * Cleanup services.start_ray_local. * Clean up random name generation. * Cleanups.	2016-11-02 00:39:35 -07:00
Robert Nishihara	09a3ff7173	Pip install numbuf. (#8 )	2016-10-28 14:30:20 -07:00
Robert Nishihara	0a44145906	Fix the resetting of reusable variables on the driver and cache functions to run on all workers. (#446 ) * Properly reset reusable variables on the driver when remote functions are run locally on the driver. * Cache functions to run on all workers that occur before ray.init is called.	2016-10-12 22:17:22 -07:00
Robert Nishihara	9a6991116f	Small fix in test. (#441 )	2016-09-25 23:08:27 -07:00
Robert Nishihara	de6ec47f9e	Add a recursion depth for serialization to prevent infinite loops. (#440 )	2016-09-19 17:17:42 -07:00
Robert Nishihara	91f16a3df0	Migrate repositories to ray-project. (#438 ) * Migrate repositories to ray-project. * Update numbuf to the migrated version.	2016-09-17 00:52:05 -07:00
Robert Nishihara	1aa89a4ae6	Update numbuf to properly handle Python floats. (#435 )	2016-09-15 15:44:11 -07:00
Wapaul1	d5815673a5	Changed ray.select() to ray.wait() and its functionality (#426 ) * Re-implemented select, changed name to wait * Changed tests for select to tests for wait * Updated the hyperopt example to match wait * Small fixes and improve example readme. * Make tests pass.	2016-09-14 17:14:11 -07:00
Robert Nishihara	3b47a15ebd	Fix naming in tests. (#424 )	2016-09-10 21:12:09 -07:00
Robert Nishihara	ba56b08474	Reintroduce passing arguments by value to remote functions. (#425 ) * Reintroduce passing arguments by value to remote functions. * Check size of arguments passed by value. * Fix computation graph visualization.	2016-09-10 21:11:18 -07:00
Robert Nishihara	0191d42751	Check in runtest.py that the correct version of cloudpickle is installed. (#421 )	2016-09-09 16:46:18 -07:00
Robert Nishihara	11a8914684	Allow users to serialize custom classes. (#393 ) * Allow serialization of custom classes. * Add documentation and test cases, also fix pickle case. * Don't allow old-style classes.	2016-09-06 13:28:24 -07:00
Robert Nishihara	d5cb3ac090	Propagate error messages from functions that run on all workers. (#410 )	2016-09-06 10:06:43 -07:00
Robert Nishihara	327d7ff689	Fix bug to enable calling ray.get multiple times on same ObjectID. (#409 )	2016-09-04 13:32:55 -07:00
Philipp Moritz	68cec55a98	Refcount without modifying objects (#407 ) * refcount without modifying objects * add documentation * Update tests and documentation. * Remove extraneous code. * Update numbuf version.	2016-09-04 12:07:52 -07:00
Robert Nishihara	81f40774a7	Remove ObjectID aliasing from the API. (#406 ) * Remove ObjectID aliasing from the API. * Update documentation to remove aliasing.	2016-09-03 19:34:45 -07:00
Philipp Moritz	3548797202	[API] Implement get for multiple objects (#398 ) * [API] Implement get for multiple objects * Small fixes.	2016-09-02 18:02:44 -07:00
Robert Nishihara	fb7ccef493	Allow remote decorator to be used with no parentheses.	2016-08-30 16:38:26 -07:00
Robert Nishihara	ce4e5ec544	Fix failure_test.py.	2016-08-29 22:52:13 -07:00
Robert Nishihara	d7f313a026	Remove type information from remote decorator.	2016-08-29 22:05:59 -07:00
Philipp Moritz	93e6c9947b	update numbuf (#392 ) * update numbuf * Augment serialization tests.	2016-08-25 20:05:48 -07:00
Wapaul1	420bcc0477	Remote function returning non-serializable type no longer shuts worker down (#384 ) * Moved put_objects in main_loop to inside of try block * Added test for failed serialization * Fixed naming * Minor	2016-08-25 15:26:22 -07:00
Robert Nishihara	314bc9e980	Test blocking behavior of select. (#379 )	2016-08-16 14:54:54 -07:00
Robert Nishihara	e06311d415	Automatically add relevant directories to Python paths of workers (#380 ) * Make ray.init set python paths of workers. * Decouple starting cluster from copying user source code * also add current directory to path * Add comments about deallocation. * Add test for new code path.	2016-08-16 14:53:55 -07:00
Wapaul1	7246013008	Implement select to enable waiting for a specific number of remote objects to be ready. (#369 )	2016-08-15 16:51:59 -07:00
Robert Nishihara	87bb7a8f67	[WIP] Large changes to make the tests pass. (#376 ) * Revert "Make tests more informative (#372)" This reverts commit `fd353250c8`. * fix bugs, in particular deactivate worker service on driver and remove condition variables * changes to minimize the changes in this PR * switch from faulty mutex synchronization to using atomics * Increase the default size of the message queues, to accommodate exporting large numbers of remote functions. This is a temporary fix, but not a long term solution. * Reorganize the scheduler export code to queue up exports. This does not solve the underlying problem yet, but sets up a solution. * Start a separate thread on driver to print error messages by constantly querying the scheduler. This is a temporary solution because the solution based on starting a worker service for the driver which the scheduler can push error messages to is buggy. * Fix segfault in taskcapsule destructor. * Move tests for catching errors into a separate test file. * Revert "roll back grpc (#368)" This reverts commit `c01ef95d04`.	2016-08-15 11:02:54 -07:00

1 2 3 4 5

249 commits