hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-12 14:16:39 -04:00

Author	SHA1	Message	Date
Philipp Moritz	08e988aee5	Modernize plasma store (C to C++ changes). (#546 )	2017-05-15 01:19:44 -07:00
Philipp Moritz	3a6922276a	convert malloc.c to STL (#537 ) * convert malloc.c to STL * linting * cleanup and comments * address Richard's comments	2017-05-11 11:18:23 -07:00
Philipp Moritz	3a0e86395e	Convert eviction code to STL (#534 ) * temp commit * convert eviction policy to C++ * temp commit * fix plasma tests * fix * linting * fixes * fix linting	2017-05-09 21:26:22 -07:00
Stephanie Wang	e50a23b820	Fix bug with reused file descriptors (#471 ) * Fix bug with reused file descriptors * Remove client connection if write_object_chunk fails * Handle ECONNRESET on unsuccessful write * lint * Back to lowercase * fix compilation * fix linting	2017-05-02 19:45:27 -07:00
Philipp Moritz	b7ace01b5f	Convert Plasma client to STL (#486 ) * convert mmap table to STL * update * fix * convert objects_in_use * fix * convert release_history * cleanup * linting * update * fix * linting	2017-04-25 01:25:40 -07:00
Philipp Moritz	8194b71f32	Convert pending_notifications to STL (#484 ) * temp commit * converted more plasma notifications * cleanup * rename * linting * fixes * fixes	2017-04-24 14:41:34 -07:00
Philipp Moritz	892e53d69e	Convert plasma client array and object notification queue to STL (#482 ) * Conver plasma clients to STL * use a deque for object notifications in plasma store for perf * cleanup * linting * fix include order	2017-04-24 00:43:48 -07:00
Philipp Moritz	e36de2dad1	Convert object table to STL (#480 ) * convert object table to stl * temp commit * fix * comments * linting	2017-04-23 22:24:05 -07:00
Alexey Tumanov	a67a107e0e	Fix int-type compilation problem on redhat. (#472 )	2017-04-19 02:43:33 -07:00
Philipp Moritz	8ac6c59931	Remove n^2 algorithm in plasma get (#466 ) Remove n^2 algorithm in plasma get.	2017-04-17 23:37:33 -07:00
Philipp Moritz	6ffc849d23	Use Arrow Tensors for serializing numpy arrays and get rid of extra memcpy. (#436 ) * Use Arrow Tensors for serializing numpy arrays and get rid of extra memcpy * fix nondeterminism problem * mark array as immutable * make arrays contiguous * fix serialize_list and deseralize_list * fix numbuf tests * linting * add optimization flags * fixes * roll back arrow	2017-04-10 01:37:34 -07:00
Alexey Tumanov	6f9225490b	Plasma manager performance: speed up wait with a wait request object map (#427 ) * plasma manager perf: speedup wait with a wait request object map * removing duplicate == operator in plasma store * fix serialization test * code cleanup * minor cleanup * factoring out uniqueid hash and equality operators into common * plasma manager: c++ify the WaitRequest struct * plasma manager: get rid of the initial object request malloc * cleanup * linting * cleanups and fix compiler warnings * compiler warnings and linting	2017-04-07 12:32:12 -07:00
Stephanie Wang	93679df724	Stopped nodes can rejoin immediately (#428 ) * Ignore deleted clients when reading address info from Redis * Remove self from db_client table when exiting cleanly * Fix valgrind test * Do not call plasma_perform_release when disconnecting	2017-04-05 23:50:38 -07:00
Philipp Moritz	4043769ba2	Make putting large objects work. (#411 ) * putting large objects * add more checks * support large objects * fix test * fix linting * upgrade to latest arrow version * check malloc return code * print mmap file sizes * printing * revert to dlmalloc * add prints * more prints * add printing * printing * fix * update * fix * update * print * initialization * temp * fix * update * fix linting * comment out object_store_full tests * fix test * fix test * evict objects if dlmalloc fails * fix stresstests * Fix linting. * Uncomment large-memory tests. * Increase memory for docker image for jenkins tests. * Reduce large memory tests. * Further reduce large memory tests.	2017-04-05 01:04:05 -07:00
Richard Shin	227c916c25	Convert plasma/plasma_store.cc to use STL (#324 ) * Change plasma_store.c to C++ (clobbering existing FlatBuffers usage). * Convert plasma_store.cc to use STL (with a caveat) * Fix CMakeLists and mutation-while-iterating problem * Remove extra extern "C" declarations * Remove redundant -std=c++11 from plasma/CMakeLists.txt	2017-03-31 22:58:10 -07:00
Alexey Tumanov	a3d58607bf	parallelize numbuf memcpy and plasma object hash construction (#366 ) * parallelizing memcopy and object hash construction in numbuf/plasma * clang format * whitespace * refactoring compute object hash: get rid of the prefix chunk * clang format * Document performance optimization. * Remove check for 64-byte alignment, since it may not be guaranteed.	2017-03-21 16:17:35 -07:00
Robert Nishihara	ba02fc0eb0	Run flake8 in Travis and make code PEP8 compliant. (#387 )	2017-03-21 12:57:54 -07:00
Stephanie Wang	083e7a28ad	Push an error to the driver when the workload hangs on `ray.put` reconstruction (#382 ) * Fix worker blocked bug * tmp * Push an error to the driver on ray.put for non-driver tasks * Fix result table tests * Fix test, logging * Address comments * Fix suppression bug * Fix redis module test * Edit error message * Get values in chunks during reconstruction * Test case for driver ray.put errors * Error for evicting ray.put objects from the driver * Fix tests * Reduce verbosity * Documentation	2017-03-21 00:16:48 -07:00
Stephanie Wang	12c9618c0c	Plasma and worker node failure. (#373 ) * Failing test case * Local scheduler exits cleanly after plasma store dies * Tolerate one plasma store failure * Tolerate plasma store failures on all nodes except head node * Plasma manager heartbeats * Component failure tests * Don't run the helper for Python testing * Fix C test * Fix hanging plasma transfer test * Fix python3 * Consolidate ClientConnection code * Fix valgrind test * fix c test * We can restart worker nodes! * Fix flatbuffers bug * Address comments * Only register actual workers with the local scheduler * Fix bug * Fix segfaults * Add test case that tests for driver liveness, fix local scheduler bug * Clean up after tests * Allocate retry info on the stack * Send SIGKILL before waiting * Relax unit test conditions * Driver liveness test case and documentation	2017-03-17 17:03:58 -07:00
Philipp Moritz	068429ffd8	Convert local scheduler messages to flatbuffers (#340 ) * use flatbuffer messages for local scheduler * make sure constructor gets called for C++ object ObjectInfoT * fix typo * fix Robert's comments * Small change to actor test. * fix valgrind error * linting * free notification * fix * valgrind * fix valgrind * fix other bugs * valgrind fix * fixes * more fixes * Small changes to comments.	2017-03-15 16:27:52 -07:00
Stephanie Wang	da06b4db82	Warn the user when a nondeterministic task is detected. (#339 ) * WARN instead of FATAL for object hash mismatches, push error to driver * Document the callback signature for object_table_add/remove * Error table * Wait for all errors in python test * Fix doc * Fix state test	2017-03-07 00:32:15 -08:00
Philipp Moritz	0b8d279ef2	Convert task_spec to flatbuffers (#255 ) * convert Ray to C++ * convert task_spec to flatbuffers * fix * it compiles * latest * tests are passing * task2 -> task * fix * fix * fix * fix * fix * linting * fix valgrind * upgrade flatbuffers * use debug mode for valgrind * fix naming and comments * downgrade flatbuffers * fix linting * reintroduce TaskSpec_free * rename TaskSpec -> TaskInfo * refactoring * linting	2017-03-05 02:05:02 -08:00
Robert Nishihara	65a8659f3d	Some plasma manager transfer optimizations. (#334 ) * Change tranfer queue to doubly-linked list to speed up append. * Maintain set of pending transfers to make deduplication easy. * Fix naming convention for structs in plasma manager.	2017-03-04 23:15:17 -08:00
Stephanie Wang	41b8675d04	Availability after local scheduler failure (#329 ) * Clean up plasma subscribers on EPIPE First pass at a monitoring script - monitor can detect local scheduler death Clean up task table upon local scheduler death in monitoring script Don't schedule to dead local schedulers in global scheduler Have global scheduler update the db clients table, monitor script cleans up state Documentation Monitor script should scan tables before beginning to read from subscription channel Fix for python3 Redirect monitor output to redis logs, fix hanging in multinode tests * Publish auxiliary addresses as part of db_client deletion notifications * Fix test case? * Small changes. * Use SCAN instead of KEYS * Address comments * Address more comments * Free redis module strings	2017-03-02 19:51:20 -08:00
Alexey Tumanov	4f9e74469e	Fix segfault induced by getting more than 200k objects (#333 ) * [RAY-567]: allocate memory on the heap for large gets * linting	2017-03-02 01:35:10 -08:00
Philipp Moritz	793a102846	Make Ray code C++ compatible (#321 ) * convert Ray to C++ * const correctness	2017-03-01 01:17:24 -08:00
Alexey Tumanov	b91d9cba45	Adding flatbuffers and migrating flatcc to flatbuffers for plasma (#325 ) * adding flatbuffers and migrating flatcc to flatbuffers for plasma * variable name changes in plasma_protocol and plasma flatbuffers schema * quick fix * cleanups and remove flatcc * more cleanup * add doc * linting * fix linting * fix mac os x build * linting * cleanup * c++ fix for plasma flatbuffers * Remove flatcc from CMakeLists.txt. * linting; trigger travis	2017-02-28 18:47:40 -08:00
Philipp Moritz	a30eed452e	Change type naming convention. (#315 ) * Rename object_id -> ObjectID. * Rename ray_logger -> RayLogger. * rename task_id -> TaskID, actor_id -> ActorID, function_id -> FunctionID * Rename plasma_store_info -> PlasmaStoreInfo. * Rename plasma_store_state -> PlasmaStoreState. * Rename plasma_object -> PlasmaObject. * Rename object_request -> ObjectRequests. * Rename eviction_state -> EvictionState. * Bug fix. * rename db_handle -> DBHandle * Rename local_scheduler_state -> LocalSchedulerState. * rename db_client_id -> DBClientID * rename task -> Task * make redis.c C++ compatible * Rename scheduling_algorithm_state -> SchedulingAlgorithmState. * Rename plasma_connection -> PlasmaConnection. * Rename client_connection -> ClientConnection. * Fixes from rebase. * Rename local_scheduler_client -> LocalSchedulerClient. * Rename object_buffer -> ObjectBuffer. * Rename client -> Client. * Rename notification_queue -> NotificationQueue. * Rename object_get_requests -> ObjectGetRequests. * Rename get_request -> GetRequest. * Rename object_info -> ObjectInfo. * Rename scheduler_object_info -> SchedulerObjectInfo. * Rename local_scheduler -> LocalScheduler and some fixes. * Rename local_scheduler_info -> LocalSchedulerInfo. * Rename global_scheduler_state -> GlobalSchedulerState. * Rename global_scheduler_policy_state -> GlobalSchedulerPolicyState. * Rename object_size_entry -> ObjectSizeEntry. * Rename aux_address_entry -> AuxAddressEntry. * Rename various ID helper methods. * Rename Task helper methods. * Rename db_client_cache_entry -> DBClientCacheEntry. * Rename local_actor_info -> LocalActorInfo. * Rename actor_info -> ActorInfo. * Rename retry_info -> RetryInfo. * Rename actor_notification_table_subscribe_data -> ActorNotificationTableSubscribeData. * Rename local_scheduler_table_send_info_data -> LocalSchedulerTableSendInfoData. * Rename table_callback_data -> TableCallbackData. * Rename object_info_subscribe_data -> ObjectInfoSubscribeData. * Rename local_scheduler_table_subscribe_data -> LocalSchedulerTableSubscribeData. * Rename more redis call data structures. * Rename photon_conn PhotonConnection. * Rename photon_mock -> PhotonMock. * Fix formatting errors.	2017-02-26 00:32:43 -08:00
Robert Nishihara	232601f90d	Change all table calls to use default retry behavior. (#312 ) * Change all table calls to use default retry behavior and change default retry behavior. * Add warning for table retries.	2017-02-24 12:41:32 -08:00
Stephanie Wang	334aed9fa9	Fetch the object after requesting reconstruction during ray.get (#301 ) * Fetch the object after requesting reconstruction during ray.get * revert * Fix documentation and memory leak * Fix hanging reconstruction bug * Fix for python3	2017-02-20 21:41:34 -08:00
Stephanie Wang	67c591c33b	Retry connections in photon connect, consolidate code in io.c (#294 )	2017-02-17 23:41:21 -08:00
Alexey Tumanov	dfb6107b22	General attribute-based heterogeneity support with hard and soft constraints (#248 ) * attribute-based heterogeneity-awareness in global scheduler and photon * minor post-rebase fix * photon: enforce dynamic capacity constraint on task dispatch * globalsched: cap the number of times we try to schedule a task in round robin * propagating ability to specify resource capacity to ray.init * adding resources to remote function export and fetch/register * globalsched: remove unused functions; update cached photon resource capacity (until next photon heartbeat) * Add some integration tests. * globalsched: cleanup + factor out constraint checking * lots of style * task_spec_required_resource: global refactor * clang format * clang format + comment update in photon * clang format photon comment * valgrind * reduce verbosity for Travis * Add test for scheduler load balancing. * addressing comments * refactoring global scheduler algorithm * Minor cleanups. * Linting. * Fix array_test.py and linting. * valgrind fix for photon tests * Attempt to fix stress tests. * fix hashmap free * fix hashmap free comment * memset photon resource vectors to 0 in case they get used before the first heartbeat * More whitespace changes. * Undo whitespace error I introduced.	2017-02-09 01:34:14 -08:00
Philipp Moritz	ca254b8689	Fix stack overflow if many objects are fetched. (#237 ) * fix stack overflow if many objects are fetched * fix other stack allocations * add tests and fix linting * address stephanie's comments * fix linting * fix tests	2017-02-04 16:49:36 -08:00
Stephanie Wang	241b539ff8	Reconstruction for evicted objects (#181 ) * First pass at reconstruction in the worker Modify reconstruction stress testing to start Plasma service before rest of Ray cluster TODO about reconstructing ray.puts Fix ray.put error for double creates Distinguish between empty entry and no entry in object table Fix test case Fix Python test Fix tests * Only call reconstruct on objects we have not yet received * Address review comments * Fix reconstruction for Python3 * remove unused code * Address Robert's comments, stress tests are crashing * Test and update the task's scheduling state to suppress duplicate reconstruction requests. * Split result table into two lookups, one for task ID and the other as a test-and-set for the task state * Fix object table tests * Fix redis module result_table_lookup test case * Multinode reconstruction tests * Fix python3 test case * rename * Use new start_redis * Remove unused code * lint * indent * Address Robert's comments * Use start_redis from ray.services in state table tests * Remove unnecessary memset	2017-02-01 19:18:46 -08:00
Robert Nishihara	f69d4aaaa7	Change fetch requests in plasma manager to use a single timer. (#234 ) * Change fetch requests in plasma manager to use a single timer. * Fix manager tests, other cleanups.	2017-02-01 12:21:52 -08:00
Stephanie Wang	a5c8f28f33	Plasma subscribe (#227 ) * Use object_info as notification, not just the object_id * Add a regression test for plasma managers connecting to store after some objects have been created * Send notifications for existing objects to new plasma subscribers * Continuously try the request to the plasma manager instead of setting a timeout in the test case * Use ray.services to start Redis in plasma test cases * fix test case	2017-01-25 22:57:15 -08:00
Robert Nishihara	b98a63fd3a	Change get to take a timeout and multiple object IDs. (#212 ) * Change plasma_get to take a timeout and an array of object IDs. * Address comments. * Bug fix related to computing object hashes. * Add test. * Fix file descriptor leak. * Fix valgrind. * Formatting. * Remove call to plasma_contains from the plasma client. Use timeout internally in ray.get. * small fixes	2017-01-19 12:21:12 -08:00
Robert Nishihara	677a019cbd	Remove unnecessary bookkeepping in utlist in plasma client. (#215 )	2017-01-18 23:03:08 -08:00
Robert Nishihara	303d0fed3e	Prevent plasma store and manager from dying when a client dies. (#203 ) * Prevent plasma store and manager from dying when a worker dies. * Check errno inside of warn_if_sigpipe. Passing in errno doesn't work because the arguments to warn_if_sigpipe can be evaluated out of order.	2017-01-17 20:34:31 -08:00
Philipp Moritz	7f329db4b2	wait until kill operation was successful (#210 )	2017-01-17 20:15:48 -08:00
Philipp Moritz	a708e36225	Switch build system to use CMake completely. (#200 ) * switch to CMake completely ... * cleanup * Run C tests, update installation instructions.	2017-01-17 16:56:40 -08:00
Philipp Moritz	ab3448a9b4	Plasma Optimizations (#190 ) * bypass python when storing objects into the object store * clang-format * Bug fixes. * fix include paths * Fixes. * fix bug * clang-format * fix * fix release after disconnect	2017-01-09 20:15:54 -08:00
Stephanie Wang	c13d73b4c9	Suppress duplicate transfer requests (#185 )	2017-01-06 22:14:51 -08:00
Johann Schleier-Smith	b1e76e582e	Check /dev/shm on Linux (#174 ) * check available shared memory when starting object store * exit with error if not enough shared memory available for object store * Some comments and formatting.	2017-01-03 12:33:29 -08:00
Stephanie Wang	6828d694ae	Test object notifications from Plasma store (#141 ) * Object notification test for Photon, and turn on valgrind for Photon C tests * Test object notification handler in the plasma manager * Fix hanging test case	2016-12-29 23:10:38 -08:00
Robert Nishihara	baf835efcd	Throw Python exception if plasma store cannot create new object. (#162 ) * Propagate error messages through plasma create. * Use custom exception types instead of exception messages.	2016-12-28 11:56:16 -08:00
Robert Nishihara	10e067e5e5	Delay releasing a maximum number of bytes in the plasma client. (#160 ) * Send message from plasma client to get plasma store capacity. * Release objects from plasma client if they are too large. * Use doubly-linked list instead of ring buffer for plasma client release history. * Address comments. * Fix problem with slicing PlasmaBuffer objects. * Fix crash in plasma manager during transfer. * Formatting. * Make plasma client cache larger and make caching test not throw exceptions on Travis.	2016-12-27 19:51:26 -08:00
Robert Nishihara	26941e02aa	Attempt to free up to 20% of the plasma store capacity during eviction. (#159 )	2016-12-27 12:12:33 -08:00
Philipp Moritz	d6695c867a	fix wait test (#158 )	2016-12-25 23:43:01 -08:00
Robert Nishihara	9bb9f8cb54	Fix bug in ray.wait. (#153 ) * Fix bug in wait implementation. * Add test that exposes previous bug.	2016-12-23 16:22:41 -08:00

1 2 3

116 commits