hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-12 14:16:39 -04:00

Author	SHA1	Message	Date
Robert Nishihara	232601f90d	Change all table calls to use default retry behavior. (#312 ) * Change all table calls to use default retry behavior and change default retry behavior. * Add warning for table retries.	2017-02-24 12:41:32 -08:00
Philipp Moritz	12a68e84d2	Implement a first pass at actors in the API. (#242 ) * Implement actor field for tasks * Implement actor management in local scheduler. * initial python frontend for actors * import actors on worker * IPython code completion and tests * prepare creating actors through local schedulers * add actor id to PyTask * submit actor calls to local scheduler * starting to integrate * simple fix * Fixes from rebasing. * more work on python actors * Improve local scheduler actor handlers. * Pass actor ID to local scheduler when connecting a client. * first working version of actors * fixing actors * fix creating two copies of the same actor * fix actors * remove sleep * get rid of export synchronization * update * insert actor methods into the queue in the right order * remove print statements * make it compile again after rebase * Minor updates. * fix python actor ids * Pass actor_id to start_worker. * add test * Minor changes. * Update actor tests. * Temporary plan for import counter. * Temporarily fix import counters. * Fix some tests. * Fixes. * Make actor creation non-blocking. * Fix test? * Fix actors on Python 2. * fix rare case. * Fix python 2 test. * More tests. * Small fixes. * Linting. * Revert tensorflow version to 0.12.0 temporarily. * Small fix. * Enhance inheritance test.	2017-02-15 00:10:05 -08:00
Alexey Tumanov	dfb6107b22	General attribute-based heterogeneity support with hard and soft constraints (#248 ) * attribute-based heterogeneity-awareness in global scheduler and photon * minor post-rebase fix * photon: enforce dynamic capacity constraint on task dispatch * globalsched: cap the number of times we try to schedule a task in round robin * propagating ability to specify resource capacity to ray.init * adding resources to remote function export and fetch/register * globalsched: remove unused functions; update cached photon resource capacity (until next photon heartbeat) * Add some integration tests. * globalsched: cleanup + factor out constraint checking * lots of style * task_spec_required_resource: global refactor * clang format * clang format + comment update in photon * clang format photon comment * valgrind * reduce verbosity for Travis * Add test for scheduler load balancing. * addressing comments * refactoring global scheduler algorithm * Minor cleanups. * Linting. * Fix array_test.py and linting. * valgrind fix for photon tests * Attempt to fix stress tests. * fix hashmap free * fix hashmap free comment * memset photon resource vectors to 0 in case they get used before the first heartbeat * More whitespace changes. * Undo whitespace error I introduced.	2017-02-09 01:34:14 -08:00
Robert Nishihara	2d1c980ad7	Refactor local scheduler to remove worker indices. (#245 ) * Refactor local scheduler to remove worker indices. * Change scheduling state enum to int in all function signatures. * Bug fix, don't use pointers into a resizable array. * Remove total_num_workers. * Fix tests.	2017-02-05 14:52:28 -08:00
Stephanie Wang	241b539ff8	Reconstruction for evicted objects (#181 ) * First pass at reconstruction in the worker Modify reconstruction stress testing to start Plasma service before rest of Ray cluster TODO about reconstructing ray.puts Fix ray.put error for double creates Distinguish between empty entry and no entry in object table Fix test case Fix Python test Fix tests * Only call reconstruct on objects we have not yet received * Address review comments * Fix reconstruction for Python3 * remove unused code * Address Robert's comments, stress tests are crashing * Test and update the task's scheduling state to suppress duplicate reconstruction requests. * Split result table into two lookups, one for task ID and the other as a test-and-set for the task state * Fix object table tests * Fix redis module result_table_lookup test case * Multinode reconstruction tests * Fix python3 test case * rename * Use new start_redis * Remove unused code * lint * indent * Address Robert's comments * Use start_redis from ray.services in state table tests * Remove unnecessary memset	2017-02-01 19:18:46 -08:00
Robert Nishihara	acf1703afd	Implement naive scheduling algorithm using local scheduler load. (#164 ) * Implement naive scheduling algorithm using local scheduler load. * Have the global scheduler estimate load on local schedulers better. * Fixes.	2016-12-28 22:33:20 -08:00
Robert Nishihara	985c424172	Use redismodules for task table and result table. (#156 ) * Switch to using redis modules for task table. * Switch to using redis modules for the task table. * Fix some tests. * Fix naming and remove code duplication. * Remove duplication in redis modules and add more cleanups. * Address comments.	2016-12-25 23:57:05 -08:00
Robert Nishihara	3d697c7ed2	Introduce local scheduler heartbeats which carry load information. (#155 ) * Introduce local scheduler heartbeats which carry load information.	2016-12-24 20:02:25 -08:00
Alexey Tumanov	46a887039e	Global scheduler - per-task transfer-aware policy (#145 ) * global scheduler with object transfer cost awareness -- upstream rebase * debugging global scheduler: multiple subscriptions * global scheduler: utarray push bug fix; tasks change state to SCHEDULED * change global scheduler test to be an integraton test * unit and integration tests are passing for global scheduler * improve global scheduler test: break up into several * global scheduler checkpoint: fix photon object id bug in test * test with timesync between object and task notifications; TODO: handle OoO object+task notifications in GS * fallback to base policy if no object dependencies are cached (may happen due to OoO object+task notification arrivals * clean up printfs; handle a missing LS in LS cache * Minor changes to Python test and factor out some common code. * refactoring handle task waiting * addressing comments * log_info -> log_debug * Change object ID printing. * PRId64 merge * Python 3 fix. * PRId64. * Python 3 fix. * resurrect differentiation between no args and missing object info; spacing * Valgrind fix. * Run all global scheduler tests in valgrind. * clang format * Comments and documentation changes. * Minor cleanups. * fix whitespace * Fix. * Documentation fix.	2016-12-22 03:11:46 -08:00
Robert Nishihara	6cd02d71f8	Fixes and cleanups for the multinode setting. (#143 ) * Add function for driver to get address info from Redis. * Use Redis address instead of Redis port. * Configure Redis to run in unprotected mode. * Add method for starting Ray processes on non-head node. * Pass in correct node ip address to start_plasma_manager. * Script for starting Ray processes. * Handle the case where an object already exists in the store. Maybe this should also compare the object hashes. * Have driver get info from Redis when start_ray_local=False. * Fix. * Script for killing ray processes. * Catch some errors when the main_loop in a worker throws an exception. * Allow redirecting stdout and stderr to /dev/null. * Wrap start_ray.py in a shell script. * More helpful error messages. * Fixes. * Wait for redis server to start up before configuring it. * Allow seeding of deterministic object ID generation. * Small change.	2016-12-21 18:53:12 -08:00
Robert Nishihara	c9c1b3e6af	Change db_connect to allow different arguments from different processes. (#142 ) * Allow db_connect to take a variable number of arguments. * Fix tests. * Fixes. * Formatting. * Fixes. * Simplifications. * Fix typo.	2016-12-20 20:21:35 -08:00
Stephanie Wang	d729f9b7ea	Object table remove (#139 ) * Object table remove redis module * Test case for object table remove redis module * Client code for object_table_remove * Delete object notifications in plasma * Test for object deletion notifications * Fix subscribe deletion test * Address Robert's comments * free hash table entry	2016-12-19 23:18:57 -08:00
Alexey Tumanov	cb3e6cde9e	passing object info information with redis module (#138 ) * adding object broadcast channel; published on each object table add * publishing data size to the bcast channel * bug fix: objectkey * update object tests to test for data size: C + py * remove debug * clang format * Minor changes. * Fix error. * merging with Robert's comments * clang format for the object table test upgrade	2016-12-19 21:07:25 -08:00
Robert Nishihara	269f37e26f	Implement object table notification subscriptions and switch to using Redis modules for object table. (#134 ) * Implement RAY.OBJECT_TABLE_REQUEST_NOTIFICATIONS. * Call object_table_request_notifications from plasma manager. * Use Redis modules for object table. * Cleaning up code. * More checks. * Formatting. * Make object table tests pass. * Formatting. * Add prefix to the object notification channel name. * Formatting. * Fixes. * Increase time in redismodule test.	2016-12-18 18:19:02 -08:00
Robert Nishihara	58a873eb20	Deploy Redis module and start using custom Redis commands. (#128 ) * Add RAY.CONNECT Redis command. * Add RAY.GET_CLIENT_ADDRESS command. * Build and clean Redis in common Makefile. * Use custom Redis module in Ray and use custom CONNECT and GET_CLIENT_ADDRESS commands. * Fixes. * Remove mapping from redis client ID to ray db client ID. * Fix.	2016-12-16 14:40:44 -08:00
Stephanie Wang	b0ba54e4c0	Fix psubscribe bug in object_table_subscribe (#126 ) * Fix psubscribe * Add TODO about subscription callbacks	2016-12-16 14:40:44 -08:00
Alexey Tumanov	946242929f	Plasma photon association: passing through plasma address with photon db connection (#123 ) * passing plasma ip:port association with photon through redis to global scheduler * Fix test. * sanity-checking aux_address inside db_connect_extended * clang format * fix photon tests * clang format photon tests	2016-12-13 17:21:38 -08:00
Stephanie Wang	24d2b42d86	Fix object table subscriptions (#122 ) * First attempt at fixing psubscribe. psubscribe_success_test will fail * psubscribe test * SUBSCRIBE returns the number of subscriptions, not success * Comment out failing test.	2016-12-13 00:47:21 -08:00
Robert Nishihara	c740b165f4	Retry first connection to redis in db_connect. (#112 ) * Retry first connection to redis in db_connect. * Declare usleep. * Formatting.	2016-12-09 17:21:49 -08:00
Alexey Tumanov	0abbf5a113	End-to-end object size information passthrough (#105 ) * rebase Alexey's PR on top * rebase on master * fix test failure waiting for plasma manager to exit * clang format * addressing comments * Minor formatting and naming fixes.	2016-12-09 00:51:44 -08:00
Stephanie Wang	61904c4c3e	Object hashes (#104 ) * factoring out object_info for general use by several Ray components * addressing comments * Replace SHA256 task hash with MD5 Add object hash to object table (always overwrites) Support for table operations that span multiple asynchronous Redis commands Add a new object location in a transaction, using Redis's optimistic concurrency Use Redis GETSET instead of transactions and Python frontend code for object hashing Remove spurious log message Fix for object_table_add Revert "Replace SHA256 task hash with MD5" This reverts commit e599de473c8dad9189ccb0600429534b469b76a2. Revert to sha256 Test case for illegal puts Use SETNX to set object hashes Initialize digest with zeros Initialize plasma_request with zeros * Fixes * replace SHA256 with a faster hash in the object store * Fix valgrind * Address Robert's comments * Check that plasma_compute_object_hash succeeds. * Don't run test_illegal_put test with valgrind because it causes an intentional crash which causes valgrind to complain. * Debugging after rebase. * handling Robert's comments * Fix bugs after rebase. * final fixes for Stephanie's PR * fix	2016-12-08 20:57:08 -08:00
Philipp Moritz	ba53e4a43a	Change object table subscribe to also return payload (#88 ) * implement object table subscribe that also returns payload * fix * fix valgrind * fix ray test * fix clang-format * fix * fix	2016-12-05 00:26:53 -08:00
Robert Nishihara	2a3e9267f8	Non-blocking fetch implementation. (#83 ) * Non-blocking fetch implementation. * Make fetch tests more robust to timing issues. * Bug fix when ignoring transferred objects. * Fix. * Documentation fixes.	2016-12-03 19:09:05 -08:00
Wapaul1	9a513363f9	Init_table_callback now takes ownership of passed in data (#80 ) * temp commit * Stuff * Ownership is now taken by init table callback * Fixed lint errors * Fixed travis warnings * Fixed spacing * add .gitkeep * fix global scheduler * Whitespace.	2016-12-03 13:49:09 -08:00
Ion	f89be9699c	Introduce non-blocking Plasma API. (#71 ) * Implement new plasma client API. * Formatting fixes. * Make tests work again. * Make tests run. * Comment style. * Fix bugs with fetch tests. * Introduce fetch1 flag. * Remove timer only if present. * Formatting fixes. * Don't access object after free. * Formatting fixes. * Minor change. * refactoring plasma datastructures * Change plasma_request and plasma_reply to use only arrays of object requests. * some more fixes * Remove unnecessary methods. * Trivial. * fixes * use plasma_send_reply in return_from_wait1 * Lint.	2016-12-01 02:15:21 -08:00
Philipp Moritz	c7073d623b	Object table subscribe with new semantics (#62 ) * new plasma subscribe implementation * object table subscribe with test * clang-format * fix * fix test * fix tests * fix clang-format * add check * final clang-format * final fixes * fix clang-format	2016-11-27 21:26:23 -08:00
mehrdadn	7237ec4124	Windows compatibility (#57 ) * Add Python and Redis submodules, and remove old third-party modules * Update VS projects (WARNING: references files that do not exist yet) * Update code & add shims for APIs except AF_UNIX/{send,recv}msg() * Minor style changes.	2016-11-22 17:04:24 -08:00
Robert Nishihara	c8c3983195	Use sizeof(field) instead of sizeof(type) and other fixes. (#47 ) * Use sizeof(field) instead of sizeof(type) and other fixes. * Fix formatting. * Bug fix. * Zero-initialize structs. There are many more instances of these that I haven't changed yet. * Bug fix. * Revert from atexit to signaling to fix valgrind tests. * Address Philipp's comments.	2016-11-19 12:19:49 -08:00
Robert Nishihara	d77b685a90	Global scheduler skeleton (#45 ) * Initial scheduler commit * global scheduler * add global scheduler * Implement global scheduler skeleton. * Formatting. * Allow local scheduler to be started without a connection to redis so that we can test it without a global scheduler. * Fail if there are no local schedulers when the global scheduler receives a task. * Initialize uninitialized value and formatting fix. * Generalize local scheduler table to db client table. * Remove code duplication in local scheduler and add flag for whether a task came from the global scheduler or not. * Queue task specs in the local scheduler instead of tasks. * Simple global scheduler tests, including valgrind. * Factor out functions for starting processes. * Fixes.	2016-11-18 19:57:51 -08:00
Stephanie Wang	7babe0d22f	Logging level (#38 ) * Set logging levels in Makefile using -DRAY_COMMON_LOG_LEVEL=level * Lower level of some LOG_ERROR messages, log the name of the table operation on failure * Address rest of Robert's comments * Fix spurious log message	2016-11-15 20:33:29 -08:00
Stephanie Wang	9d1e750e8f	Merge task table and task log into a single table (#30 ) * Merge task table and task log * Fix test in db tests * Address Robert's comments and some better error checking * Add a LOG_FATAL that exits the program	2016-11-10 18:13:26 -08:00
Ion	ee3718c80c	Ion and Philipp's table retries (#10 ) * Ion and Philipp's table retries * Refactor the retry struct: - Rename it from retry_struct to retry_info - Retry information contains the failure callback, not the retry callback - All functions take in retry information as an arg instead of its expanded fields * Rename cb -> callback * Remove prints * Fix compiler warnings * Change some CHECKs to greatest ASSERTs * Key outstanding callbacks hash table with timer ID instead of callback data pointer * Use the new retry API for table commands * Memory cleanup in plasma unit tests * fix Robert's comments * add valgrind for common	2016-10-29 15:22:33 -07:00
Robert Nishihara	1915539c5f	Rearrange files to prepare to merge into Ray.	2016-10-25 13:59:47 -07:00

1 2

83 commits