hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-12 06:06:39 -04:00

Author	SHA1	Message	Date
Robert Nishihara	ba1ce85f58	Download Redis and flatbuffers differently. (#1602 ) * Download Redis differently. * Get flatbuffers with curl	2018-02-25 20:32:33 -08:00
Alexey Tumanov	844a6afcdd	Implement simple random spillback policy. (#1493 ) * spillback policy implementation: global + local scheduler * modernize global scheduler policy state; factor out random number engine and generator * Minimal version. * Fix test. * Make load balancing test less strenuous.	2018-02-13 00:09:35 -08:00
Philipp Moritz	1ab2e63dbd	Tune transfer buffer size (#1363 ) Increase buffsize from `4096` to `80*1024`.	2018-02-09 14:56:36 -08:00
Philipp Moritz	a3f8fa426b	Start integrating new GCS APIs (#1379 ) * Start integrating new GCS calls * fixes * tests * cleanup * cleanup and valgrind fix * update tests * fix valgrind * fix more valgrind * fixes * add separate tests for GCS * fix linting * update tests * cleanup * fix python linting * more fixes * fix linting * add plasma manager callback * add some documentation * fix linting * fix linting * fixes * update * fix linting * fix * add spillback count * fixes * linting * fixes * fix linting * fix * fix * fix	2018-01-31 11:01:12 -08:00
Robert Nishihara	3195c6aa63	Fix local scheduler crash when driver creates actor and exits. (#1474 ) * Make check failures in redis.cc more informative. * Fix bug by calling task_table_add_task. * Add test.	2018-01-26 14:29:53 -08:00
Alexey Tumanov	f1303291b4	Ray scheduler spillback plumbing + mechanism (#1362 ) * spillback mechanism and plumbing : adding spillback counter + timestamp * linting fix * documentation * Fix argument name.	2018-01-23 20:18:12 -08:00
Melih Elibol	4b1c8be4fe	Fix setting log-level to debug. (#1432 )	2018-01-21 21:51:05 -08:00
Stephanie Wang	74718efa73	Nondeterministic reconstruction for actors (#1344 ) * Add failing unit test for nondeterministic reconstruction * Retry scheduling actor tasks if reassigned to local scheduler * Update execution edges asynchronously upon dispatch for nondeterministic reconstruction * Fix bug for updating checkpoint task execution dependencies * Update comments for deterministic reconstruction * cleanup * Add (and skip) failing test case for nondeterministic reconstruction * Suppress test output	2018-01-21 13:44:13 -08:00
Robert Nishihara	088f01496c	Remove unused object info table code. (#1388 )	2018-01-05 11:00:06 -08:00
Philipp Moritz	3d224c4edf	Second Part of Internal API Refactor (#1326 )	2017-12-26 16:22:04 -08:00
Stephanie Wang	12fdb3f53a	Convert actor dummy objects to task execution edges. (#1281 ) * Define execution dependencies flatbuffer and add to Redis commands * Convert TaskSpec to TaskExecutionSpec * Add execution dependencies to Python bindings * Submitting actor tasks uses execution dependency API instead of dummy argument * Fix dependency getters and some cleanup for fetching missing dependencies * C++ convention * Make TaskExecutionSpec a C++ class * Convert local scheduler to use TaskExecutionSpec class * Convert some pointers to references * Finish conversion to TaskExecutionSpec class * fix * Fix * Fix memory errors? * Cast flatbuffers GetSize to size_t * Fixes * add more retries in global scheduler unit test * fix linting and cast fbb.GetSize to size_t * Style and doc * Fix linting and simplify from_flatbuf.	2017-12-14 20:47:54 -08:00
Philipp Moritz	cac5f47600	First Part of Internal Ray API Refactor (#1173 ) * add Ray status class * add C++ util files * add ID types * more APIs * build system integration * add test infrastructure and implement some APIs * add more tests * fix bugs * add task table tests * update * add toolchain file * fix * test * link with pthread * update * fix * more fixes * fixes * always vendor gtest and gflags * linting * fixes * add constants file * comments * more fixes * fix linting	2017-12-14 14:54:09 -08:00
Stephanie Wang	bac39a134e	Define a wrapper class for callback_data.data (#1301 )	2017-12-08 11:48:21 -08:00
Robert Nishihara	c21e189371	Allow scheduling with arbitrary user-defined resource labels. (#1236 ) * Enable scheduling with custom resource labels. * Fix. * Minor fixes and ref counting fix. * Linting * Use .data() instead of .c_str(). * Fix linting. * Fix ResourcesTest.testGPUIDs test by waiting for workers to start up. * Sleep in test so that all tasks are submitted before any completes.	2017-12-01 11:41:40 -08:00
Robert Nishihara	e0a340ee7e	Allow actors to pin at most 1000 dummy objects at a time. (#1241 ) * Allow actors to pin at most 1000 dummy objects at a time. * Fix linting.	2017-11-22 13:38:01 -08:00
Eric Liang	9233e496cc	Raise exception when getting the task results of workers that died (#1224 ) * wip * with test * add timeout * also add test for f * remove on cleanup * update * wip * fix tests * mark actor removed in redis * clang-format * fix bug when no-inprogress tasks * try to set task status done * Add comment.	2017-11-20 15:18:39 -08:00
Peter Schafhalter	e0360eb429	Remove UT libraries and clean up remaining UT datastructures (#1230 ) * Remove UT string include from redis * Remove UT string include from DB tests * Modify TaskSpec_print to remove UT string * Remove UT libraries	2017-11-19 15:01:33 -08:00
Peter Schafhalter	4cbc2b1978	Clean up UT datastructures in Python extension (#1227 )	2017-11-17 01:07:12 -08:00
Peter Schafhalter	9a7b15447b	Replace UT string in redis tests (#1211 ) * Replace UT arg formatting with vsnprintf * Fix bug with va_list usage	2017-11-15 22:21:56 -08:00
Peter Schafhalter	428858c1ff	Convert UT string to std::string (#1210 )	2017-11-12 21:00:36 -08:00
Peter Schafhalter	9a6a056609	Convert UT datastructures in tests (#1203 ) * bind_ipc_sock_retry returns std::string * snprintf -> std::snprintf * Fix formatting * Use stringstream instead of snprintf * Fix typo	2017-11-11 16:55:05 -08:00
Philipp Moritz	e798a652bc	Change TaskSpec to allow multiple object IDs per argument. (#1204 ) * Implement object ID bags * linting * fix tests * fix linting * fix comments	2017-11-10 16:33:34 -08:00
Stephanie Wang	07f0532b9b	Local scheduler filters out dead clients during reconstruction (#1182 ) * Object table lookup returns vector of DBClientID instead of address strings * Add node IP address to DBClient notification * DB client cache stores entire DB client, convert addresses to std::string * get cached db client returns the client * Expose a call to initialize the redis cache * Local scheduler filters out dead clients during reconstruction * Remove node ip address from dbclient, use aux_address for plasma managers * Get entire db client entry when not found in cache * Fix common tests * Fix address in tests * Push error to driver if driver task did the put * Address Robert's comments and cleanup * Remove unused Redis command * Fix db test	2017-11-10 11:29:24 -08:00
Robert Nishihara	d3c082d325	More checking in redis.cc. (#1057 )	2017-11-08 23:25:19 -08:00
Robert Nishihara	1c6b30b5e2	Move all config constants into single file. (#1192 ) * Initial pass at factoring out C++ configuration into a single file. * Expose config through Python. * Forward declarations. * Fixes with Python extensions * Remove old code. * Consistent naming for constants. * Fixes * Fix linting. * More linting. * Whitespace * rename config -> _config. * Move config inside a class. * update naming convention * Fix linting. * More linting * More linting. * Add in some more constants. * Fix linting	2017-11-08 11:10:38 -08:00
Peter Schafhalter	a8032b9ca1	Convert connections from UT_array to std::vector (#1190 )	2017-11-07 20:59:41 -08:00
Peter Schafhalter	7215f7d228	Remove UT String from logging (#1184 ) * Removed unnecessary utarray include * Removed ut_string from logging * Fix formatting	2017-11-05 14:05:20 -08:00
Peter Schafhalter	ad4cbd4016	Updated outstanding_callbacks to unordered_map (#1108 ) * Updated outstanding_callbacks to unordered_map * Fix bug in destroy_outstanding_callbacks and comments	2017-10-20 10:06:22 -07:00
Stephanie Wang	af47737bd5	Prototype distributed actor handles (#1137 ) * Add actor handle ID to the task spec * Local scheduler dispatches actor tasks according to a task counter per handle * Fix python test * Allow passing actor handles into tasks. Not completely working yet. Also this is very messy. * Fixes, should be roughly working now. * Refactor actor handle wrapper * Fix __init__ tests * Terminate actor when the original handle goes out of scope * TODO and a couple test cases * Make tests for unsupported cases * Fix Python mode tests * Linting. * Cache actor definitions that occur before ray.init() is called. * Fix export actor class * Deterministically compute actor handle ID * Fix __getattribute__ * Fix string encoding for python3 * doc * Add comment and assertion.	2017-10-19 23:49:59 -07:00
Robert Nishihara	f3e3c7ec71	Add is_actor_checkpoint_method to TaskSpec. (#1117 ) * Add is_actor_checkpoint_method to TaskSpec. * Fix linting. * Fix rebase error. * Fix errors from rebase.	2017-10-15 16:52:10 -07:00
Robert Nishihara	d6062ef8f6	Compile with -rdynamic for better debugging symbols. (#1123 ) * Compile with -rdynamic. * Only use -rdynamic on Linux. * Add comment.	2017-10-13 21:39:11 -07:00
Stephanie Wang	15486a14a0	Refactor actor task queues (#1118 ) * Refactor add_task_to_actor_queue into queue_actor_task and insert_actor_task_queue * Refactor actor task queue to share the waiting task queue * Fix	2017-10-13 20:52:11 -07:00
Robert Nishihara	486cb64e3f	Compile with -Werror and -Wall (#1116 ) * Compile global scheduler with -Werror -Wall. * Compile plasma manager with -Werror -Wall. * Compile local scheduler with -Werror -Wall. * Compile common code with -Werror -Wall. * Signed/unsigned comparisons. * More signed/unsigned fixes. * More signed/unsigned fixes and added extern keyword. * Fix linting. * Don't check strict-aliasing because Python.h doesn't pass.	2017-10-12 21:00:23 -07:00
Stephanie Wang	3764f2f2e1	Actor checkpointing with object lineage reconstruction (#1004 ) * Worker reports error in previous task, actor task counter is incremented after task is successful * Refactor actor task execution - Return new task counter in GetTaskRequest - Update worker state for actor tasks inside of the actor method executor * Manually invoked checkpoint method * Scheduling for actor checkpoint methods * Fix python bugs in checkpointing * Return task success from worker to local scheduler instead of actor counter * Kill local schedulers halfway through actor execution instead of waiting for all tasks to execute once * Remove redundant actor tasks during dispatch, reconstruct missing dependencies for actor tasks * Make executor for temporary actor methods * doc * Set default argument for whether the previous task was a success * Refactor actor method call * Simplify checkpoint task submission * lint * fix philipp's comments * Add missing line * Make actor reconstruction tests run faster * Unimportant whitespace. * Unimportant whitespace. * Update checkpoint method signature * Documentation and handle exceptions during checkpoint save/resume * Rename get_task message field to actor_checkpoint_failed * Fix bug. * Remove debugging check, redirect test output	2017-10-12 09:53:32 -07:00
Robert Nishihara	b585001881	When a task is passed to the global scheduler, if it is not received,… (#1106 ) * When a task is passed to the global scheduler, if it is not received, then try again. * Call give_task_to_global_scheduler directly (same with local).	2017-10-12 00:04:38 -07:00
Stephanie Wang	1e0ab3d386	Switch to monotonic clock (#1100 )	2017-10-10 22:35:21 -07:00
Robert Nishihara	1488975d1b	Add timing statement to loop that calls redis_get_cached_db_client be… (#1045 ) * Add timing statement to loop that calls redis_get_cached_db_client because it has been slow in the past. * Fix linting. * Refactoring to make manager vectors into std::vector. * Fix linting. * Fixes.	2017-10-02 10:46:21 -07:00
Robert Nishihara	ce278aa06a	Fix valgrind tests. (#1037 ) * Comment out local scheduler valgrind test. * Fix free/delete error. * More free -> delete errors * One more free -> delete and also clean up callback state in plasma manager. * Add set -x to run_valgrind scripts. * Fix valgrind error in CreateLocalSchedulerInfoMessage.	2017-09-30 00:11:09 -07:00
Zongheng Yang	427dee511b	Fill out specs of the task table in ray_redis_module.cc. (#1024 ) * Fill out specs of the task table in ray_redis_module.cc. * local scheduler field in task table * linting	2017-09-27 23:45:58 -07:00
Zongheng Yang	5a50e80b63	Make Monitor remove dead Redis entries from exiting drivers. (#994 ) * WIP: removing OL, OI, TT on client exit; no saving yet. * ray_redis_module.cc: update header comment. * Cleanup: just the removal. * Reformat via yapf: use pep8 style instead of google. * Checkpoint addressing comments (partially) * Add 'b' marker before strings (py3 compat) * Add MonitorTest. * Use `isort` to sort imports. * Remove some loggings * Fix flake8 noqa marker runtest.py * Try to separate tests out to monitor_test.py * Rework cleanup algorithm: correct logic * Extend tests to cover multi-shard cases * Add some small comments and formatting changes.	2017-09-26 00:11:38 -07:00
Stephanie Wang	74ac80631b	Local scheduler sends a null heartbeat to global scheduler (#962 ) * Local scheduler sends a null heartbeat to global scheduler to notify death * Add whitespace. * Speed up component failures test * Free local scheduler state upon plasma manager disconnection	2017-09-12 10:45:21 -07:00
Stephanie Wang	ae0212b399	Fix failing task table test (#924 )	2017-09-03 22:41:38 -07:00
Peter Schafhalter	2c19ae97a3	Implemented db_client_cache as unordered_map (#921 ) * Implemented db_client_cache as unordered_map * Fix for memory leak * Fixed linting	2017-09-03 17:26:05 -07:00
Stephanie Wang	7496c98010	Fault tolerance race (#894 ) * Remove race between local scheduler disconnecting and global scheduler assigning a task * Fix number of workers started in component failures test * Fix race between global scheduler retrying a task assignment and monitor cleaning up task table. The global scheduler should only retry the task assignment if the local scheduler is still alive. * Clean up task_table_update callback if failure * Look up current local scheduler mapping when retrying actor task submission * Log warning if no subscribers received a task table update * Clean up database handle memory in local scheduler	2017-08-30 22:20:50 -07:00
Robert Nishihara	e6de744ef4	Fix potential bug in redis.cc. (#851 )	2017-08-23 20:38:25 -07:00
Robert Nishihara	be4beb19c1	Changes to build to fix creation of wheels. (#840 ) * Pass DPYTHON_EXECUTABLE into cmake for arrow and for ray. * Add cython to setup.py install_requires. * Revert custom code for finding python in cmake. * Correctly find arrow on CentOS. * In cmake, don't find PythonLibs, just find PYTHON_INCLUDE_DIRS. * Fix typo. * Do not use boost shared libraries when building arrow. * Add six to the setup.py install_requires because it is needed by pyarrow. * Don't link numbuf against boost_system and boost_filesystem. * Compile boost when we are on Linux. * Make numbuf find the correct boost libraries. * Only use find_package Boost on Linux, suppress output when building boost. * Changes to wheel building scripts, install cython in mac script. * Compile flatbuffers ourselves on Linux and pass it in when compiling Arrow. * Clean up build_flatbuffers.sh and build_boost.sh scripts a little. * Install cython when building linux wheel.	2017-08-21 17:49:35 -07:00
Alexey Tumanov	fc885bd918	Adding basic support for a user-interpretable resource label (#761 ) * adding support for the user-interpretable label(UIR) * more plumbing for num_uirs further upstream; set to infty when specified on cmd line * pass default num_uirs for actors; update GlobalStateAPI * support num_uirs in ray.init() * local scheduler resource accounting: support num_uirs; prep for vectorized resource accounting * global scheduler test updated * Fix bug introduced by rebase. * Rename UIR -> CustomResource and add test. * Small changes and use constexpr instead of macros. * Linting and some renaming. * Reorder some code. * Remove cpus_in_use and fix bug. * Add another test and make a small change. * Rephrase documentation about feature stability.	2017-08-08 02:53:59 -07:00
Philipp Moritz	054ae4180e	Fix installation instruction for ubuntu 14.04 (#805 ) * fix installation instruction for ubuntu 14.04 * upgrade cmake requirements * fix	2017-08-02 18:14:14 -07:00
Robert Nishihara	cb84972f6b	Recreate actors when local schedulers die. (#804 ) * Reconstruct actor state when local schedulers fail. * Simplify construction of arguments to pass into default_worker.py from local scheduler. * Remove deprecated ray.actor. * Simplify actor reconstruction method. * Fix linting. * Small fixes.	2017-08-02 18:02:52 -07:00
Robert Nishihara	8c8258de20	Move worker methods into Worker class and expose more TaskSpec fields to Python. (#796 ) * Move worker methods inside worker class. Move some helper methods from actor.py into utils.py and state.py. * Add more methods exposing task spec fields to Python. * Fix linting. * Fix error. * Remove unused code in default worker.	2017-08-01 17:16:57 -07:00

1 2 3 4

168 commits