hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Peter Schafhalter	9a6a056609	Convert UT datastructures in tests (#1203 ) * bind_ipc_sock_retry returns std::string * snprintf -> std::snprintf * Fix formatting * Use stringstream instead of snprintf * Fix typo	2017-11-11 16:55:05 -08:00
Philipp Moritz	e798a652bc	Change TaskSpec to allow multiple object IDs per argument. (#1204 ) * Implement object ID bags * linting * fix tests * fix linting * fix comments	2017-11-10 16:33:34 -08:00
Stephanie Wang	07f0532b9b	Local scheduler filters out dead clients during reconstruction (#1182 ) * Object table lookup returns vector of DBClientID instead of address strings * Add node IP address to DBClient notification * DB client cache stores entire DB client, convert addresses to std::string * get cached db client returns the client * Expose a call to initialize the redis cache * Local scheduler filters out dead clients during reconstruction * Remove node ip address from dbclient, use aux_address for plasma managers * Get entire db client entry when not found in cache * Fix common tests * Fix address in tests * Push error to driver if driver task did the put * Address Robert's comments and cleanup * Remove unused Redis command * Fix db test	2017-11-10 11:29:24 -08:00
Robert Nishihara	d3c082d325	More checking in redis.cc. (#1057 )	2017-11-08 23:25:19 -08:00
Robert Nishihara	1c6b30b5e2	Move all config constants into single file. (#1192 ) * Initial pass at factoring out C++ configuration into a single file. * Expose config through Python. * Forward declarations. * Fixes with Python extensions * Remove old code. * Consistent naming for constants. * Fixes * Fix linting. * More linting. * Whitespace * rename config -> _config. * Move config inside a class. * update naming convention * Fix linting. * More linting * More linting. * Add in some more constants. * Fix linting	2017-11-08 11:10:38 -08:00
Peter Schafhalter	a8032b9ca1	Convert connections from UT_array to std::vector (#1190 )	2017-11-07 20:59:41 -08:00
Peter Schafhalter	7215f7d228	Remove UT String from logging (#1184 ) * Removed unnecessary utarray include * Removed ut_string from logging * Fix formatting	2017-11-05 14:05:20 -08:00
Robert Nishihara	97c6369b49	Update arrow to include custom serializer for pytorch and register default serialization handlers. (#1152 ) * Update arrow to include custom serializer for pytorch. * Call pyarrow function for registering default custom serialization handlers. * Change class ID used in serialization context for object IDs.	2017-10-21 21:24:10 -07:00
Philipp Moritz	684e62e784	upgrade arrow to include numpy bool fix (#1148 )	2017-10-20 17:25:15 -07:00
Peter Schafhalter	ad4cbd4016	Updated outstanding_callbacks to unordered_map (#1108 ) * Updated outstanding_callbacks to unordered_map * Fix bug in destroy_outstanding_callbacks and comments	2017-10-20 10:06:22 -07:00
Stephanie Wang	af47737bd5	Prototype distributed actor handles (#1137 ) * Add actor handle ID to the task spec * Local scheduler dispatches actor tasks according to a task counter per handle * Fix python test * Allow passing actor handles into tasks. Not completely working yet. Also this is very messy. * Fixes, should be roughly working now. * Refactor actor handle wrapper * Fix __init__ tests * Terminate actor when the original handle goes out of scope * TODO and a couple test cases * Make tests for unsupported cases * Fix Python mode tests * Linting. * Cache actor definitions that occur before ray.init() is called. * Fix export actor class * Deterministically compute actor handle ID * Fix __getattribute__ * Fix string encoding for python3 * doc * Add comment and assertion.	2017-10-19 23:49:59 -07:00
Robert Nishihara	1cdc2fb011	Clean up event loop and callbacks when processes exit. (#1125 ) * Clean up event loop and callbacks when processes exit. * Fix bug.	2017-10-19 17:07:03 -07:00
Philipp Moritz	4157bcb80b	Improve deserialization performance by rebasing on latest arrow (#1129 ) * improve serialization performance by rebasing on latest arrow * update * revert worker.py	2017-10-17 14:56:11 -07:00
Robert Nishihara	f3e3c7ec71	Add is_actor_checkpoint_method to TaskSpec. (#1117 ) * Add is_actor_checkpoint_method to TaskSpec. * Fix linting. * Fix rebase error. * Fix errors from rebase.	2017-10-15 16:52:10 -07:00
Robert Nishihara	d6062ef8f6	Compile with -rdynamic for better debugging symbols. (#1123 ) * Compile with -rdynamic. * Only use -rdynamic on Linux. * Add comment.	2017-10-13 21:39:11 -07:00
Stephanie Wang	15486a14a0	Refactor actor task queues (#1118 ) * Refactor add_task_to_actor_queue into queue_actor_task and insert_actor_task_queue * Refactor actor task queue to share the waiting task queue * Fix	2017-10-13 20:52:11 -07:00
Robert Nishihara	486cb64e3f	Compile with -Werror and -Wall (#1116 ) * Compile global scheduler with -Werror -Wall. * Compile plasma manager with -Werror -Wall. * Compile local scheduler with -Werror -Wall. * Compile common code with -Werror -Wall. * Signed/unsigned comparisons. * More signed/unsigned fixes. * More signed/unsigned fixes and added extern keyword. * Fix linting. * Don't check strict-aliasing because Python.h doesn't pass.	2017-10-12 21:00:23 -07:00
Stephanie Wang	3764f2f2e1	Actor checkpointing with object lineage reconstruction (#1004 ) * Worker reports error in previous task, actor task counter is incremented after task is successful * Refactor actor task execution - Return new task counter in GetTaskRequest - Update worker state for actor tasks inside of the actor method executor * Manually invoked checkpoint method * Scheduling for actor checkpoint methods * Fix python bugs in checkpointing * Return task success from worker to local scheduler instead of actor counter * Kill local schedulers halfway through actor execution instead of waiting for all tasks to execute once * Remove redundant actor tasks during dispatch, reconstruct missing dependencies for actor tasks * Make executor for temporary actor methods * doc * Set default argument for whether the previous task was a success * Refactor actor method call * Simplify checkpoint task submission * lint * fix philipp's comments * Add missing line * Make actor reconstruction tests run faster * Unimportant whitespace. * Unimportant whitespace. * Update checkpoint method signature * Documentation and handle exceptions during checkpoint save/resume * Rename get_task message field to actor_checkpoint_failed * Fix bug. * Remove debugging check, redirect test output	2017-10-12 09:53:32 -07:00
Robert Nishihara	b585001881	When a task is passed to the global scheduler, if it is not received,… (#1106 ) * When a task is passed to the global scheduler, if it is not received, then try again. * Call give_task_to_global_scheduler directly (same with local).	2017-10-12 00:04:38 -07:00
Robert Nishihara	9f1e385335	Return errno from handle_sigpipe. (#1051 )	2017-10-11 18:36:28 -07:00
Peter Schafhalter	46f6c163dc	Converted ClientConnection to C++ standard library (#1099 )	2017-10-11 11:12:15 -07:00
Stephanie Wang	1e0ab3d386	Switch to monotonic clock (#1100 )	2017-10-10 22:35:21 -07:00
Philipp Moritz	0684258d2e	Update arrow to include pandas serialization (#1102 ) * update arrow to include pandas serialization * update	2017-10-10 22:16:35 -07:00
Robert Nishihara	8f1a73f041	Allow Ray to be built without UI by setting INCLUDE_UI=0. (#1094 ) * Allow building Ray without UI by setting INCLUDE_UI=0. * Fix bash. * Fix linting.	2017-10-09 23:32:38 -07:00
Stephanie Wang	aebe9f9374	Fix actor garbage collection by breaking cyclic references (#1064 ) * Fix bug in wait_for_pid_to_exit, add test for actor deletion. * Fix actor garbage collection by breaking cyclic references * Add test for calling actor method immediately after actor creation. * Fix bug, must dispatch tasks when workers are killed. * Fix python test * Fix cyclic reference problem by creating ActorMethod objects on the fly. * Try simply increasing the time allowed for many_drivers_test.py.	2017-10-05 00:55:33 -07:00
Mitar	a0d3fb1de1	Fix Arrow's repository URL. (#1072 ) Thanks!	2017-10-03 21:40:21 -07:00
Robert Nishihara	0dcf36c91e	Switch Arrow commit. (#1068 )	2017-10-03 13:56:53 -07:00
Philipp Moritz	57bd1d6ff5	Specialize Serialization for OrderedDict (#1035 ) Specialize Serialization for OrderedDict and defaultdict	2017-10-02 17:33:10 -07:00
Robert Nishihara	1488975d1b	Add timing statement to loop that calls redis_get_cached_db_client be… (#1045 ) * Add timing statement to loop that calls redis_get_cached_db_client because it has been slow in the past. * Fix linting. * Refactoring to make manager vectors into std::vector. * Fix linting. * Fixes.	2017-10-02 10:46:21 -07:00
Robert Nishihara	a31d138f21	Don't log when a worker can't be started. (#1056 )	2017-10-02 10:32:46 -07:00
Philipp Moritz	79e013e876	upgrade to latest arrow to fix XCode 9 problem (#1042 )	2017-09-30 16:24:59 -07:00
Robert Nishihara	ce278aa06a	Fix valgrind tests. (#1037 ) * Comment out local scheduler valgrind test. * Fix free/delete error. * More free -> delete errors * One more free -> delete and also clean up callback state in plasma manager. * Add set -x to run_valgrind scripts. * Fix valgrind error in CreateLocalSchedulerInfoMessage.	2017-09-30 00:11:09 -07:00
Eric Liang	ba153adc4c	Downgrade severity of most common messages (#1039 ) * downgrade severity of most common messages * update	2017-09-30 00:01:49 -07:00
Eric Liang	b118cef49e	[webui] Allow timeline scroll-to-zoom without holding ALT (#993 ) * Allow timeline scroll-to-zoom without holding ALT * Update build_ui.sh * Update build_ui.sh * Update build_ui.sh * Update build_ui.sh * Retry when getting catapult.	2017-09-29 21:35:12 -07:00
Peter Schafhalter	10027974b1	Replaced ObjectWaitRequests with unordered map (#990 ) * Replaced ObjectWaitRequests with unordered map * Pass C++ STL object by reference * Formatting changes and typos.	2017-09-28 15:29:26 -07:00
Zongheng Yang	427dee511b	Fill out specs of the task table in ray_redis_module.cc. (#1024 ) * Fill out specs of the task table in ray_redis_module.cc. * local scheduler field in task table * linting	2017-09-27 23:45:58 -07:00
Peter Schafhalter	bb76d4ca0a	PlasmaRequestBuffer data structure updates (#1023 ) * Replaced utstring with std::string * Converted transfer_queue to a list * Converted pending_object_transfers to unordered_map * Fix free/delete bug and small modifications.	2017-09-27 19:50:37 -07:00
Robert Nishihara	116fe168b5	Download boost 1.65.1 from bintray. (#1019 ) * Download boost 1.65.1 from bintray. * Pass --no-check-certificate to wget.	2017-09-27 13:25:05 -07:00
Zongheng Yang	5a50e80b63	Make Monitor remove dead Redis entries from exiting drivers. (#994 ) * WIP: removing OL, OI, TT on client exit; no saving yet. * ray_redis_module.cc: update header comment. * Cleanup: just the removal. * Reformat via yapf: use pep8 style instead of google. * Checkpoint addressing comments (partially) * Add 'b' marker before strings (py3 compat) * Add MonitorTest. * Use `isort` to sort imports. * Remove some loggings * Fix flake8 noqa marker runtest.py * Try to separate tests out to monitor_test.py * Rework cleanup algorithm: correct logic * Extend tests to cover multi-shard cases * Add some small comments and formatting changes.	2017-09-26 00:11:38 -07:00
Peter Schafhalter	6e9657e696	Replaced utstring with std::string (#1009 )	2017-09-24 22:42:17 -07:00
Peter Schafhalter	241612709e	Data structure updates to plasma manager (#937 ) * Implemented local_available_objects as an unordered set * Implemented fetch_requests as an unordered map * Fixed bug and changed fetch_requests from pointer to object * free(PlasmaManagerState ) -> delete PlasmaManagerState * removed unnecessary newline * Make local_available_objects not a pointer. * Attempt to safely iterate over unordered_map and remove elements.	2017-09-15 20:09:29 -07:00
Robert Nishihara	413140df38	Autogenerate catapult files if they are not already present. (#978 ) * Autogenerate catapult files if they are not already present. * Fix bash syntax.	2017-09-14 12:37:33 -07:00
Stephanie Wang	74ac80631b	Local scheduler sends a null heartbeat to global scheduler (#962 ) * Local scheduler sends a null heartbeat to global scheduler to notify death * Add whitespace. * Speed up component failures test * Free local scheduler state upon plasma manager disconnection	2017-09-12 10:45:21 -07:00
Stephanie Wang	99c8b1f38c	Actor fault tolerance using object lineage reconstruction (#902 ) * Revert Python actor reconstruction * Actor reconstruction using object lineage * Add dummy arguments and return values for actor tasks * Pin dummy outputs for actor tasks * Skip checkpointing test for now * TODOs * minor edits * Generate dummy object dependencies in Python, not C * Fix linting. * Move actor counter and dummy objects inside of the actor handle * Refactor Worker._process_task, suppress exception propagation for sequential actor tasks	2017-09-10 19:29:28 -07:00
Robert Nishihara	f3c1248d98	Clone catapult and generate html files during installation. (#956 ) * Clone catapult and generate static html during setup. * Include UI files in installation. * Fix directory to clone catapult to and fix linting. * Use absolute path. * Make sure we find a sufficiently new version of python2 when building wheels. * Copy the trace_viewer_full.html file to the local directory if it is not present. * Make sure wheels fail to build if UI is not included.	2017-09-10 13:41:16 -07:00
Philipp Moritz	546ba23ceb	Upgrade to latest arrow to include set serialization speedups (#957 ) * update arrow to pull in the set serialization speedups * remove _register_class for set	2017-09-10 00:12:17 -07:00
Peter Schafhalter	8906a920f7	Implemented wait_requests as vector (#943 )	2017-09-08 13:39:54 -07:00
Philipp Moritz	7030ef366f	Rebase Ray on latest arrow (remove numbuf from Ray). (#910 ) * remove some stuff * put get roundtrip working * fixes * more fixes * cleanup * fix tests * latest arrow * fixes * fix tests * fix linting * rebase * fixes * fix bug * bring back libgcc error * fix linting * use official arrow repo * fixes	2017-09-04 22:58:49 -07:00
Robert Nishihara	d8010723d7	Attempt to wget boost up to 20 times during installation. (#927 )	2017-09-04 14:42:29 -07:00
Stephanie Wang	ae0212b399	Fix failing task table test (#924 )	2017-09-03 22:41:38 -07:00

... 2 3 4 5 6 ...

474 commits