hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-08 11:31:40 -05:00

Author	SHA1	Message	Date
gycn	a432285e77	Disable parallelization for Actors and ray.wait for debugging (#961 ) Support actors and ray.wait in PYTHON_MODE.	2017-09-17 00:12:50 -07:00
Eric Liang	d8aa826e63	[webui] Scalability fixes for the task timeline and visualizations (#935 ) * fixes * comments * fix test * Update ui.py * upd * Fix linting.	2017-09-10 15:47:44 -07:00
Alexey Tumanov	fc885bd918	Adding basic support for a user-interpretable resource label (#761 ) * adding support for the user-interpretable label(UIR) * more plumbing for num_uirs further upstream; set to infty when specified on cmd line * pass default num_uirs for actors; update GlobalStateAPI * support num_uirs in ray.init() * local scheduler resource accounting: support num_uirs; prep for vectorized resource accounting * global scheduler test updated * Fix bug introduced by rebase. * Rename UIR -> CustomResource and add test. * Small changes and use constexpr instead of macros. * Linting and some renaming. * Reorder some code. * Remove cpus_in_use and fix bug. * Add another test and make a small change. * Rephrase documentation about feature stability.	2017-08-08 02:53:59 -07:00
Robert Nishihara	d7b10a84b6	Fallback to custom serializer for very long python ints. (#821 ) * Fallback to custom serializer for very long python ints. * Fix linting. * Fix naming convention and add RETURN_NOT_OK.	2017-08-07 17:21:06 -07:00
Robert Nishihara	1fe49d7676	Simplify testMultipleLocalSchedulers by having it start only one worker. (#789 )	2017-07-31 17:44:45 -07:00
alanamarzoev	2b3190ad13	Chrome trace timeline with sliders. (#731 ) * Trace timeline with sliders. * Trace. * Switched ujson to json. * Fixed tests. * linting fixes * Fixed bug. * Cleaned up code. * Fixes according to comments. * removed checkpoints. * Undid accidental delete. * Fixed linting error. * Added documentation to notebook. * Undid accidental deletes. * Add comments and small formatting fixes. * Small fix.	2017-07-17 19:59:49 -07:00
Robert Nishihara	e0867c8845	Switch Python indentation from 2 spaces to 4 spaces. (#726 ) * 4 space indentation for actor.py. * 4 space indentation for worker.py. * 4 space indentation for more files. * 4 space indentation for some test files. * Check indentation in Travis. * 4 space indentation for some rl files. * Fix failure test. * Fix multi_node_test. * 4 space indentation for more files. * 4 space indentation for remaining files. * Fixes.	2017-07-13 21:53:57 +00:00
alanamarzoev	8464d77c76	Change event logs to store one Redis ZSET per worker. (#705 ) * Changing to zset * Fixed bug. * Fixed another bug. * Modified task_profiles. * Removed extra file. * Modified task_profiles test. * WIP * WIP * Undid changes * Updated * WIP * Made changes according to comments. * Removed unneeded print. * Removed ujson usage. * failing test * tests passing * Fixed linting errors and modified style. * Fixed bug. * Fixed linting * Fixed according to comments. * Redis crashing? * Fixed linting * Fixed linting	2017-07-09 01:42:29 +02:00
alanamarzoev	716469160e	Enable dumping profiling information to timeline format viewable by chrome tracing. (#703 ) * Chrome tracing timeline. * Modified decode statement. * Some cleanups and add test. * Remove example. * Fix test.	2017-06-30 12:14:11 -04:00
alanamarzoev	e16df6da9a	Updated task_profiles function to avoid future repetitive parsing. (#691 ) * Updated task_profiles function to avoid future repetitive parsing. * Fix indentation. * Fixed according to comments. * Included updated test for task_profiles function. * Simplify test. * Fix indentation. * Fix.	2017-06-22 19:21:18 -07:00
alanamarzoev	cc4990b543	Task profiles function and test (#647 ) Expose some task profiling information through global state API.	2017-06-13 17:53:34 -07:00
Philipp Moritz	54925996ca	Allow remote functions to specify max executions and kill worker once limit is reached. (#660 ) * implement restarting workers after certain number of task executions * Clean up python code. * Don't start new worker when an actor disconnects. * Move wait_for_pid_to_exit to test_utils.py. * Add test. * Fix linting errors. * Fix linting. * Fix typo.	2017-06-13 00:34:58 -07:00
alanamarzoev	f0339f3386	Expose log files through global state API. (#641 ) * added log_table function and a test * fixed log_files and added task_profiles * fixed formatting * fixed linting errors * fixes * removed file * more fixes * hopefully fixed * Small changes. * Fix linting. * Fix bug in log monitor. * Small changes. * Fix bug in travis.	2017-06-08 00:08:10 -07:00
Philipp Moritz	647e1d9fc3	Fix runtest.py on the ubuntu system python 3 (#599 ) * fix runtest.py on the ubuntu system python 3 * less strict version of the test	2017-05-26 15:22:36 -07:00
Robert Nishihara	c5bc76193f	Remove Ray environment variables from codebase. (#590 )	2017-05-24 18:29:40 -07:00
Stephanie Wang	ee08c8274b	Shard Redis. (#539 ) * Implement sharding in the Ray core * Single node Python modifications to do sharding * Do the sharding in redis.cc * Pipe num_redis_shards through start_ray.py and worker.py. * Use multiple redis shards in multinode tests. * first steps for sharding ray.global_state * Fix problem in multinode docker test. * fix runtest.py * fix some tests * fix redis shard startup * fix redis sharding * fix * fix bug introduced by the map-iterator being consumed * fix sharding bug * shard event table * update number of Redis clients to be 64K * Fix object table tests by flushing shards in between unit tests * Fix local scheduler tests * Documentation * Register shard locations in the primary shard * Add plasma unit tests back to build * lint * lint and fix build * Fix * Address Robert's comments * Refactor start_ray_processes to start Redis shard * lint * Fix global scheduler python tests * Fix redis module test * Fix plasma test * Fix component failure test * Fix local scheduler test * Fix runtest.py * Fix global scheduler test for python3 * Fix task_table_test_and_update bug, from actor task table submission race * Fix jenkins tests. * Retry Redis shard connections * Fix test cases * Convert database clients to DBClient struct * Fix race condition when subscribing to db client table * Remove unused lines, add APITest for sharded Ray * Fix * Fix memory leak * Suppress ReconstructionTests output * Suppress output for APITestSharded * Reissue task table add/update commands if initial command does not publish to any subscribers. * fix * Fix linting. * fix tests * fix linting * fix python test * fix linting	2017-05-18 17:40:41 -07:00
Philipp Moritz	28f0882387	Expose function table to python global control state API (#542 ) * expose function table to python global control state API * fix * fix linting * add test for function table	2017-05-16 20:06:13 -07:00
Robert Nishihara	ec2534422b	Remove register_class from API. (#550 ) * Perform ray.register_class under the hood. * Fix bug. * Release worker lock when waiting for imports to arrive in get. * Remove calls to register_class from examples and tests. * Clear serialization state between tests. * Fix bug and add test for multiple custom classes with same name. * Fix failure test. * Fix linting and cleanups to python code. * Fixes to documentation. * Implement recursion depth for recursively registering classes. * Fix linting. * Push warning to user if waiting for class for too long. * Fix typos. * Don't export FunctionToRun if pickling the function fails. * Don't broadcast class definition when pickling class.	2017-05-16 18:38:52 -07:00
Eric Liang	e2e9e4ce6f	Fix segmentation fault when calling ray.put on a dictionary with object keys (#548 ) * fix segfault when serializing dict key * fix style * fix test * Fix linting.	2017-05-15 01:09:13 -07:00
Robert Nishihara	c688a64235	Expose GPU IDs to remote functions. (#496 ) * Change local scheduler bookkeeping to use GPU IDs. * Update actor test. * Add tests for actors and tasks simultaneously using GPUs. * Add additional task GPU ID test. * Fix linting. * Make redis GPU assignment ignore GPU IDs. * Small fix.	2017-05-07 13:03:49 -07:00
Robert Nishihara	8532ba4272	Serialize lambdas, sets, and types with pickle by default. (#511 ) * Serialize lambdas with pickle by default. * Serialize sets with pickle by default. * Serialize types with pickle by default. * Small update to documentation. * Update tests.	2017-05-04 00:16:35 -07:00
Robert Nishihara	0ac125e9b2	Clean up when a driver disconnects. (#462 ) * Clean up state when drivers exit. * Remove unnecessary field in ActorMapEntry struct. * Have monitor release GPU resources in Redis when driver exits. * Enable multiple drivers in multi-node tests and test driver cleanup. * Make redis GPU allocation a redis transaction and small cleanups. * Fix multi-node test. * Small cleanups. * Make global scheduler take node_ip_address so it appears in the right place in the client table. * Cleanups. * Fix linting and cleanups in local scheduler. * Fix removed_driver_test. * Fix bug related to vector -> list. * Fix linting. * Cleanup. * Fix multi node tests. * Fix jenkins tests. * Add another multi node test with many drivers. * Fix linting. * Make the actor creation notification a flatbuffer message. * Revert "Make the actor creation notification a flatbuffer message." This reverts commit af99099c8084dbf9177fb4e34c0c9b1a12c78f39. * Add comment explaining flatbuffer problems.	2017-04-24 18:10:21 -07:00
Philipp Moritz	8ac6c59931	Remove n^2 algorithm in plasma get (#466 ) Remove n^2 algorithm in plasma get.	2017-04-17 23:37:33 -07:00
Richard Liaw	c3a2505ffd	Loadbalancing Test issue (#452 ) * Limiting number of CPUs in loadbalancing test * fixes as requested	2017-04-11 22:33:58 -07:00
Robert Nishihara	f4c1adae17	Unify function signature handling between remote functions and actor … (#441 ) * Unify function signature handling between remote functions and actor methods. * Fixes. * Fix tests.	2017-04-08 21:34:13 -07:00
Robert Nishihara	7af6f462fb	Add API for querying global control state. (#431 ) * Add API for querying global control state. * Fix linting. * Fix errors in Python 2. * Fix bug in test. * Fix bug in test.	2017-04-06 23:51:12 -07:00
Robert Nishihara	ba02fc0eb0	Run flake8 in Travis and make code PEP8 compliant. (#387 )	2017-03-21 12:57:54 -07:00
Robert Nishihara	3b7788bf88	Disallow calling ray.put on an object ID. (#353 )	2017-03-11 12:09:28 -08:00
Stephanie Wang	a0dd3a44c0	Dynamically grow worker pool to partially solve hanging workloads (#286 ) * First pass at a policy to solve deadlock * Address Robert's comments * stress test * unit test * Fix test cases * Fix test for python3 * add more logging * White space.	2017-02-17 17:08:52 -08:00
Robert Nishihara	88a5b4e77b	Simplify imports and exports and provide driver isolation for remote functions. (#288 ) * Remove import counter and export counter. * Provide isolation between drivers for remote functions. * Add test for driver function isolation. * Hash source code into function ID to reduce likelihood of collisions. * Fix failure test example. * Replace assertTrue with assertIn to improve failure messages in tests. * Fix failure test.	2017-02-16 11:30:35 -08:00
Philipp Moritz	12a68e84d2	Implement a first pass at actors in the API. (#242 ) * Implement actor field for tasks * Implement actor management in local scheduler. * initial python frontend for actors * import actors on worker * IPython code completion and tests * prepare creating actors through local schedulers * add actor id to PyTask * submit actor calls to local scheduler * starting to integrate * simple fix * Fixes from rebasing. * more work on python actors * Improve local scheduler actor handlers. * Pass actor ID to local scheduler when connecting a client. * first working version of actors * fixing actors * fix creating two copies of the same actor * fix actors * remove sleep * get rid of export synchronization * update * insert actor methods into the queue in the right order * remove print statements * make it compile again after rebase * Minor updates. * fix python actor ids * Pass actor_id to start_worker. * add test * Minor changes. * Update actor tests. * Temporary plan for import counter. * Temporarily fix import counters. * Fix some tests. * Fixes. * Make actor creation non-blocking. * Fix test? * Fix actors on Python 2. * fix rare case. * Fix python 2 test. * More tests. * Small fixes. * Linting. * Revert tensorflow version to 0.12.0 temporarily. * Small fix. * Enhance inheritance test.	2017-02-15 00:10:05 -08:00
Robert Nishihara	072eadd57f	Pipe num_cpus and num_gpus through from start_ray.py. (#275 ) * Pipe num_cpus and num_gpus through from start_ray.py. * Improve load balancing tests. * Fix bug. * Factor out some testing code.	2017-02-13 17:43:23 -08:00
Stephanie Wang	2b8e6485e3	Start and clean up workers from the local scheduler. (#250 ) * Start and clean up workers from the local scheduler Ability to kill workers in photon scheduler Test for old method of starting workers Common codepath for killing workers Common codepath for killing workers Photon test case for starting and killing workers fix build Fix component failure test Register a worker's pid as part of initial connection Address comments and revert photon_connect Set PATH during travis install Fix * Fix photon test case to accept clients on plasma manager fd	2017-02-10 12:46:23 -08:00
Robert Nishihara	249b667b0e	Raise exception in Python if wait is called with duplicate object IDs. (#262 )	2017-02-09 23:32:19 -08:00
Alexey Tumanov	dfb6107b22	General attribute-based heterogeneity support with hard and soft constraints (#248 ) * attribute-based heterogeneity-awareness in global scheduler and photon * minor post-rebase fix * photon: enforce dynamic capacity constraint on task dispatch * globalsched: cap the number of times we try to schedule a task in round robin * propagating ability to specify resource capacity to ray.init * adding resources to remote function export and fetch/register * globalsched: remove unused functions; update cached photon resource capacity (until next photon heartbeat) * Add some integration tests. * globalsched: cleanup + factor out constraint checking * lots of style * task_spec_required_resource: global refactor * clang format * clang format + comment update in photon * clang format photon comment * valgrind * reduce verbosity for Travis * Add test for scheduler load balancing. * addressing comments * refactoring global scheduler algorithm * Minor cleanups. * Linting. * Fix array_test.py and linting. * valgrind fix for photon tests * Attempt to fix stress tests. * fix hashmap free * fix hashmap free comment * memset photon resource vectors to 0 in case they get used before the first heartbeat * More whitespace changes. * Undo whitespace error I introduced.	2017-02-09 01:34:14 -08:00
Robert Nishihara	9bb8162621	Improvements to documentation and error messages. (#221 )	2017-01-19 20:27:46 -08:00
Robert Nishihara	b98a63fd3a	Change get to take a timeout and multiple object IDs. (#212 ) * Change plasma_get to take a timeout and an array of object IDs. * Address comments. * Bug fix related to computing object hashes. * Add test. * Fix file descriptor leak. * Fix valgrind. * Formatting. * Remove call to plasma_contains from the plasma client. Use timeout internally in ray.get. * small fixes	2017-01-19 12:21:12 -08:00
Robert Nishihara	87d8d05792	Rename reusable variables -> environment variables. (#195 )	2017-01-10 20:14:33 -08:00
Robert Nishihara	be4a37bf37	Various cleanups: remove start_ray_local from ray.init, remove unused code, fix "pip install numbuf". (#193 ) * Remove start_ray_local from ray.init and change default number of workers to 10. * Remove alexnet example. * Move array methods to experimental. * Remove TRPO example. * Remove old files. * Compile plasma when we build numbuf. * Address comments.	2017-01-10 17:35:27 -08:00
Robert Nishihara	973716d310	Use cloudpickle 0.2.2. (#189 )	2017-01-08 17:30:06 -08:00
Robert Nishihara	651aa6007a	Log profiling information from worker. (#178 ) * Log timing events on workers. * Have workers log to the event log through the local scheduler. * Fixes and address comments. * bug fix * styling	2017-01-05 16:47:16 -08:00
Robert Nishihara	8d90c9f432	Experimental utils for copying directories to other machines in the c… (#150 ) * Experimental utils for copying directories to other machines in the cluster using Ray. * Test copying directory functionality. * Small fix.	2016-12-23 00:43:16 -08:00
Robert Nishihara	86b211f5c2	Give run_function_on_all_workers to take a worker_info dictionary including a counter. (#149 ) * Suppress Redis warnings and remove some global scheduler logging. * Pass a counter into run_function_on_all_workers indicating how many workers have begun executing this function.	2016-12-22 22:05:58 -08:00
Robert Nishihara	79dd1815a2	Python 3 compatibility. (#121 ) * Make common module Python 3 compatible. * Make plasma module Python 3 compatible. * Make photon module Python 3 compatible. * Make numbuf module Python 3 compatible. * Remaining changes for Python 3 compatibility. * Test Python 3 in Travis. * Fixes.	2016-12-16 14:40:37 -08:00
Robert Nishihara	ddba1df802	Start working toward Python3 compatibility. (#117 )	2016-12-11 12:25:31 -08:00
Philipp Moritz	58e8bbcb34	Fix bug in serializing arguments of tasks that are more complex objects (#72 ) * Give more informative error message when we do not know how to serialize a class. * Check that passing arguments to remote functions and getting them does not change their values. * fix serialization bug * fix tests for common module * Formatting. * Bug fix in init_pickle_module signature. * Use pickle with HIGHEST_PROTOCOL.	2016-11-30 23:21:53 -08:00
Robert Nishihara	d77b685a90	Global scheduler skeleton (#45 ) * Initial scheduler commit * global scheduler * add global scheduler * Implement global scheduler skeleton. * Formatting. * Allow local scheduler to be started without a connection to redis so that we can test it without a global scheduler. * Fail if there are no local schedulers when the global scheduler receives a task. * Initialize uninitialized value and formatting fix. * Generalize local scheduler table to db client table. * Remove code duplication in local scheduler and add flag for whether a task came from the global scheduler or not. * Queue task specs in the local scheduler instead of tasks. * Simple global scheduler tests, including valgrind. * Factor out functions for starting processes. * Fixes.	2016-11-18 19:57:51 -08:00
Robert Nishihara	336a904404	Implement repr, hash, and richcompare for ObjectIDs. (#33 ) * Implement repr, hash, and richcompare for ObjectIDs. * Addressing comments. * Partially fix example applications.	2016-11-11 09:18:36 -08:00
Robert Nishihara	90f88af902	Fix bug in which worker import counters were treated incorrectly. (#28 ) * Fix bug in which worker import counters were treated incorrectly. * Fix bug in which cached functions-to-run were double counted as exports. This also runs the functions-to-run on the driver only after ray.init is called. * Only define reusable variables locally after ray.init has been called. * Remove flaky reference counting tests. It's not clear that these tests make sense. * Make numbuf pip install verbose. * Export cached reusable variables before cached remote functions. * Fix bug causing the worker to hang sometimes. This happens when the worker is trying to run a task, but it hasn't imported enough imports to run the task, so it continually acquires and releases a lock while checking if it has enough imports. However, for some reason, the import thread is waiting to acquire the same lock and never does so (or takes a very long time to do so). By dropping the lock before sleeping, this makes it easier for other threads to acquire the lock. * Acquire locks using 'with' statements. * Fix possible test failure. * Try to start Redis multiple times with different random ports if the original attempt failed. * Fix test in which we redefine a remote function.	2016-11-06 22:24:39 -08:00
Robert Nishihara	072f442c1f	Update worker.py and services.py to use plasma and the local scheduler. (#19 ) * Update worker code and services code to use plasma and the local scheduler. * Cleanups. * Fix bug in which threads were started before the worker mode was set. This caused remote functions to be defined on workers before the worker knew it was in WORKER_MODE. * Fix bug in install-dependencies.sh. * Lengthen timeout in failure_test.py. * Cleanups. * Cleanup services.start_ray_local. * Clean up random name generation. * Cleanups.	2016-11-02 00:39:35 -07:00

1 2 3

148 commits