hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-12 06:06:39 -04:00

Author	SHA1	Message	Date
Robert Nishihara	2d1c980ad7	Refactor local scheduler to remove worker indices. (#245 ) * Refactor local scheduler to remove worker indices. * Change scheduling state enum to int in all function signatures. * Bug fix, don't use pointers into a resizable array. * Remove total_num_workers. * Fix tests.	2017-02-05 14:52:28 -08:00
Philipp Moritz	ca254b8689	Fix stack overflow if many objects are fetched. (#237 ) * fix stack overflow if many objects are fetched * fix other stack allocations * add tests and fix linting * address stephanie's comments * fix linting * fix tests	2017-02-04 16:49:36 -08:00
Stephanie Wang	241b539ff8	Reconstruction for evicted objects (#181 ) * First pass at reconstruction in the worker Modify reconstruction stress testing to start Plasma service before rest of Ray cluster TODO about reconstructing ray.puts Fix ray.put error for double creates Distinguish between empty entry and no entry in object table Fix test case Fix Python test Fix tests * Only call reconstruct on objects we have not yet received * Address review comments * Fix reconstruction for Python3 * remove unused code * Address Robert's comments, stress tests are crashing * Test and update the task's scheduling state to suppress duplicate reconstruction requests. * Split result table into two lookups, one for task ID and the other as a test-and-set for the task state * Fix object table tests * Fix redis module result_table_lookup test case * Multinode reconstruction tests * Fix python3 test case * rename * Use new start_redis * Remove unused code * lint * indent * Address Robert's comments * Use start_redis from ray.services in state table tests * Remove unnecessary memset	2017-02-01 19:18:46 -08:00
Robert Nishihara	f69d4aaaa7	Change fetch requests in plasma manager to use a single timer. (#234 ) * Change fetch requests in plasma manager to use a single timer. * Fix manager tests, other cleanups.	2017-02-01 12:21:52 -08:00
Robert Nishihara	6703f7be6f	Provide functionality for local scheduler to start new workers. (#230 ) * Provide functionality for local scheduler to start new workers. * Pass full command for starting new worker in to local scheduler. * Separate out configuration state of local scheduler.	2017-01-27 01:28:48 -08:00
Stephanie Wang	a5c8f28f33	Plasma subscribe (#227 ) * Use object_info as notification, not just the object_id * Add a regression test for plasma managers connecting to store after some objects have been created * Send notifications for existing objects to new plasma subscribers * Continuously try the request to the plasma manager instead of setting a timeout in the test case * Use ray.services to start Redis in plasma test cases * fix test case	2017-01-25 22:57:15 -08:00
Robert Nishihara	ab8c3432f7	Add driver ID to task spec and add driver ID to Python error handling. (#225 ) * Add driver ID to task spec and add driver ID to Python error handling. * Make constants global variables. * Add test for error isolation.	2017-01-25 22:53:48 -08:00
Stephanie Wang	3c6686db08	Photon optimizations (#219 ) * Optimizations: - Track mapping of missing object to dependent tasks to avoid iterating over task queue - Perform all fetch requests for missing objects using the same timer * Fix bug and add regression test * Record task dependencies and active fetch requests in the same hash table * fix typo * Fix memory leak and add test cases for scheduling when dependencies are evicted * Fix python3 test case * Minor details.	2017-01-23 19:44:15 -08:00
Robert Nishihara	b98a63fd3a	Change get to take a timeout and multiple object IDs. (#212 ) * Change plasma_get to take a timeout and an array of object IDs. * Address comments. * Bug fix related to computing object hashes. * Add test. * Fix file descriptor leak. * Fix valgrind. * Formatting. * Remove call to plasma_contains from the plasma client. Use timeout internally in ray.get. * small fixes	2017-01-19 12:21:12 -08:00
Robert Nishihara	677a019cbd	Remove unnecessary bookkeepping in utlist in plasma client. (#215 )	2017-01-18 23:03:08 -08:00
Stephanie Wang	f1987cdc16	Split local scheduler task queue (#211 ) * Split local scheduler task queue into waiting and dispatch queue * Fix memory leak * Add a new task scheduling status for when a task has been queued locally * Fix global scheduler test case and add task status doc * Documentation * Address Philipp's comments * Move tasks back to the waiting queue if their dependencies become unavailable * Update existing task table entries instead of overwriting	2017-01-18 20:27:40 -08:00
Robert Nishihara	303d0fed3e	Prevent plasma store and manager from dying when a client dies. (#203 ) * Prevent plasma store and manager from dying when a worker dies. * Check errno inside of warn_if_sigpipe. Passing in errno doesn't work because the arguments to warn_if_sigpipe can be evaluated out of order.	2017-01-17 20:34:31 -08:00
Philipp Moritz	7f329db4b2	wait until kill operation was successful (#210 )	2017-01-17 20:15:48 -08:00
Philipp Moritz	a708e36225	Switch build system to use CMake completely. (#200 ) * switch to CMake completely ... * cleanup * Run C tests, update installation instructions.	2017-01-17 16:56:40 -08:00
Philipp Moritz	ab3448a9b4	Plasma Optimizations (#190 ) * bypass python when storing objects into the object store * clang-format * Bug fixes. * fix include paths * Fixes. * fix bug * clang-format * fix * fix release after disconnect	2017-01-09 20:15:54 -08:00
Robert Nishihara	973716d310	Use cloudpickle 0.2.2. (#189 )	2017-01-08 17:30:06 -08:00
Alexey Tumanov	674ec3a3cb	generate pytask from string and string from pytask (#188 ) * pytask creation from bytestring: saving work * pytask now works * documentation and tests * linting * Lint and fix test case	2017-01-08 02:16:40 -08:00
Stephanie Wang	c13d73b4c9	Suppress duplicate transfer requests (#185 )	2017-01-06 22:14:51 -08:00
Robert Nishihara	651aa6007a	Log profiling information from worker. (#178 ) * Log timing events on workers. * Have workers log to the event log through the local scheduler. * Fixes and address comments. * bug fix * styling	2017-01-05 16:47:16 -08:00
Johann Schleier-Smith	b1e76e582e	Check /dev/shm on Linux (#174 ) * check available shared memory when starting object store * exit with error if not enough shared memory available for object store * Some comments and formatting.	2017-01-03 12:33:29 -08:00
Stephanie Wang	6828d694ae	Test object notifications from Plasma store (#141 ) * Object notification test for Photon, and turn on valgrind for Photon C tests * Test object notification handler in the plasma manager * Fix hanging test case	2016-12-29 23:10:38 -08:00
Robert Nishihara	acf1703afd	Implement naive scheduling algorithm using local scheduler load. (#164 ) * Implement naive scheduling algorithm using local scheduler load. * Have the global scheduler estimate load on local schedulers better. * Fixes.	2016-12-28 22:33:20 -08:00
Robert Nishihara	baf835efcd	Throw Python exception if plasma store cannot create new object. (#162 ) * Propagate error messages through plasma create. * Use custom exception types instead of exception messages.	2016-12-28 11:56:16 -08:00
Robert Nishihara	10e067e5e5	Delay releasing a maximum number of bytes in the plasma client. (#160 ) * Send message from plasma client to get plasma store capacity. * Release objects from plasma client if they are too large. * Use doubly-linked list instead of ring buffer for plasma client release history. * Address comments. * Fix problem with slicing PlasmaBuffer objects. * Fix crash in plasma manager during transfer. * Formatting. * Make plasma client cache larger and make caching test not throw exceptions on Travis.	2016-12-27 19:51:26 -08:00
Robert Nishihara	26941e02aa	Attempt to free up to 20% of the plasma store capacity during eviction. (#159 )	2016-12-27 12:12:33 -08:00
Robert Nishihara	985c424172	Use redismodules for task table and result table. (#156 ) * Switch to using redis modules for task table. * Switch to using redis modules for the task table. * Fix some tests. * Fix naming and remove code duplication. * Remove duplication in redis modules and add more cleanups. * Address comments.	2016-12-25 23:57:05 -08:00
Philipp Moritz	d6695c867a	fix wait test (#158 )	2016-12-25 23:43:01 -08:00
Philipp Moritz	8309e3f355	Redis string formatting (#157 ) * redis string formatting * fixes * add documentation * fixes	2016-12-25 22:43:07 -08:00
Robert Nishihara	3d697c7ed2	Introduce local scheduler heartbeats which carry load information. (#155 ) * Introduce local scheduler heartbeats which carry load information.	2016-12-24 20:02:25 -08:00
Robert Nishihara	9bb9f8cb54	Fix bug in ray.wait. (#153 ) * Fix bug in wait implementation. * Add test that exposes previous bug.	2016-12-23 16:22:41 -08:00
Robert Nishihara	86b211f5c2	Give run_function_on_all_workers to take a worker_info dictionary including a counter. (#149 ) * Suppress Redis warnings and remove some global scheduler logging. * Pass a counter into run_function_on_all_workers indicating how many workers have begun executing this function.	2016-12-22 22:05:58 -08:00
Alexey Tumanov	46a887039e	Global scheduler - per-task transfer-aware policy (#145 ) * global scheduler with object transfer cost awareness -- upstream rebase * debugging global scheduler: multiple subscriptions * global scheduler: utarray push bug fix; tasks change state to SCHEDULED * change global scheduler test to be an integraton test * unit and integration tests are passing for global scheduler * improve global scheduler test: break up into several * global scheduler checkpoint: fix photon object id bug in test * test with timesync between object and task notifications; TODO: handle OoO object+task notifications in GS * fallback to base policy if no object dependencies are cached (may happen due to OoO object+task notification arrivals * clean up printfs; handle a missing LS in LS cache * Minor changes to Python test and factor out some common code. * refactoring handle task waiting * addressing comments * log_info -> log_debug * Change object ID printing. * PRId64 merge * Python 3 fix. * PRId64. * Python 3 fix. * resurrect differentiation between no args and missing object info; spacing * Valgrind fix. * Run all global scheduler tests in valgrind. * clang format * Comments and documentation changes. * Minor cleanups. * fix whitespace * Fix. * Documentation fix.	2016-12-22 03:11:46 -08:00
Robert Nishihara	6cd02d71f8	Fixes and cleanups for the multinode setting. (#143 ) * Add function for driver to get address info from Redis. * Use Redis address instead of Redis port. * Configure Redis to run in unprotected mode. * Add method for starting Ray processes on non-head node. * Pass in correct node ip address to start_plasma_manager. * Script for starting Ray processes. * Handle the case where an object already exists in the store. Maybe this should also compare the object hashes. * Have driver get info from Redis when start_ray_local=False. * Fix. * Script for killing ray processes. * Catch some errors when the main_loop in a worker throws an exception. * Allow redirecting stdout and stderr to /dev/null. * Wrap start_ray.py in a shell script. * More helpful error messages. * Fixes. * Wait for redis server to start up before configuring it. * Allow seeding of deterministic object ID generation. * Small change.	2016-12-21 18:53:12 -08:00
Robert Nishihara	c9c1b3e6af	Change db_connect to allow different arguments from different processes. (#142 ) * Allow db_connect to take a variable number of arguments. * Fix tests. * Fixes. * Formatting. * Fixes. * Simplifications. * Fix typo.	2016-12-20 20:21:35 -08:00
Philipp Moritz	0ca0864856	Use flatcc for serialization of IPC messages. (#140 ) * added Phllipp's updates * Switch to using flatbuffers for IPC. * Various changes. * convert remaining messages and cleanups * fix * fix function signatures * fix valgrind errors * clang-format * final commit * Fix valgrind test.	2016-12-20 14:46:25 -08:00
Stephanie Wang	6a73711888	Update the task table (#129 ) * Update the task table * Move updating task table out of scheduling algorithm.	2016-12-20 00:13:39 -08:00
Stephanie Wang	d729f9b7ea	Object table remove (#139 ) * Object table remove redis module * Test case for object table remove redis module * Client code for object_table_remove * Delete object notifications in plasma * Test for object deletion notifications * Fix subscribe deletion test * Address Robert's comments * free hash table entry	2016-12-19 23:18:57 -08:00
Alexey Tumanov	cb3e6cde9e	passing object info information with redis module (#138 ) * adding object broadcast channel; published on each object table add * publishing data size to the bcast channel * bug fix: objectkey * update object tests to test for data size: C + py * remove debug * clang format * Minor changes. * Fix error. * merging with Robert's comments * clang format for the object table test upgrade	2016-12-19 21:07:25 -08:00
Robert Nishihara	269f37e26f	Implement object table notification subscriptions and switch to using Redis modules for object table. (#134 ) * Implement RAY.OBJECT_TABLE_REQUEST_NOTIFICATIONS. * Call object_table_request_notifications from plasma manager. * Use Redis modules for object table. * Cleaning up code. * More checks. * Formatting. * Make object table tests pass. * Formatting. * Add prefix to the object notification channel name. * Formatting. * Fixes. * Increase time in redismodule test.	2016-12-18 18:19:02 -08:00
Robert Nishihara	c89bf4e5bc	Fix improper handling of NULL characters when opening Redis keys. (#136 ) * Fix improper handling of NULL characters when opening Redis keys. * Add test.	2016-12-18 13:06:28 -08:00
Robert Nishihara	edf8d1ee9f	Fix Python3 error in tests. (#135 )	2016-12-17 12:42:37 -08:00
Stephanie Wang	e23661c375	Task table Redis module (#125 ) * Task table redis module implementation * Publish tasks and take in individual fields as args, not task object * Scheduling state integer has width 1, error on illegal put * Unit tests for task table and more documentation * Task table subscribe, fix publish topics and address Philipp and Alexey's comments * Helper function to create prefixed strings * Factor out the table prefixes in the test cases	2016-12-16 14:40:44 -08:00
Robert Nishihara	58a873eb20	Deploy Redis module and start using custom Redis commands. (#128 ) * Add RAY.CONNECT Redis command. * Add RAY.GET_CLIENT_ADDRESS command. * Build and clean Redis in common Makefile. * Use custom Redis module in Ray and use custom CONNECT and GET_CLIENT_ADDRESS commands. * Fixes. * Remove mapping from redis client ID to ray db client ID. * Fix.	2016-12-16 14:40:44 -08:00
Stephanie Wang	b0ba54e4c0	Fix psubscribe bug in object_table_subscribe (#126 ) * Fix psubscribe * Add TODO about subscription callbacks	2016-12-16 14:40:44 -08:00
Robert Nishihara	79dd1815a2	Python 3 compatibility. (#121 ) * Make common module Python 3 compatible. * Make plasma module Python 3 compatible. * Make photon module Python 3 compatible. * Make numbuf module Python 3 compatible. * Remaining changes for Python 3 compatibility. * Test Python 3 in Travis. * Fixes.	2016-12-16 14:40:37 -08:00
Alexey Tumanov	946242929f	Plasma photon association: passing through plasma address with photon db connection (#123 ) * passing plasma ip:port association with photon through redis to global scheduler * Fix test. * sanity-checking aux_address inside db_connect_extended * clang format * fix photon tests * clang format photon tests	2016-12-13 17:21:38 -08:00
Robert Nishihara	bce7e0fc07	Add include for usleep. (#124 )	2016-12-13 14:24:59 -08:00
Philipp Moritz	2152cd9f31	Fix seed bug for generating object ids for put (#120 ) * fix seed bug for generating object ids for put * fix clang-format	2016-12-13 00:54:38 -08:00
Stephanie Wang	24d2b42d86	Fix object table subscriptions (#122 ) * First attempt at fixing psubscribe. psubscribe_success_test will fail * psubscribe test * SUBSCRIBE returns the number of subscriptions, not success * Comment out failing test.	2016-12-13 00:47:21 -08:00
Stephanie Wang	4bdb9f7224	Object reconstruction in Photon (#65 ) * Object reconstruction in Photon and C test cases for Photon * Fix hanging test case on mac * Remove unnecessary event from photon tests * make photon_disconnect not leak file descriptors * fix some of the memory errors * Fix valgrind * lint * Address Robert's comments and add test case for object reconstruction suppression * Remove OWNER	2016-12-12 23:17:22 -08:00

... 8 9 10 11 12 ...

608 commits