hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Peter Schafhalter	2c19ae97a3	Implemented db_client_cache as unordered_map (#921 ) * Implemented db_client_cache as unordered_map * Fix for memory leak * Fixed linting	2017-09-03 17:26:05 -07:00
Stephanie Wang	7496c98010	Fault tolerance race (#894 ) * Remove race between local scheduler disconnecting and global scheduler assigning a task * Fix number of workers started in component failures test * Fix race between global scheduler retrying a task assignment and monitor cleaning up task table. The global scheduler should only retry the task assignment if the local scheduler is still alive. * Clean up task_table_update callback if failure * Look up current local scheduler mapping when retrying actor task submission * Log warning if no subscribers received a task table update * Clean up database handle memory in local scheduler	2017-08-30 22:20:50 -07:00
Robert Nishihara	e6de744ef4	Fix potential bug in redis.cc. (#851 )	2017-08-23 20:38:25 -07:00
Robert Nishihara	be4beb19c1	Changes to build to fix creation of wheels. (#840 ) * Pass DPYTHON_EXECUTABLE into cmake for arrow and for ray. * Add cython to setup.py install_requires. * Revert custom code for finding python in cmake. * Correctly find arrow on CentOS. * In cmake, don't find PythonLibs, just find PYTHON_INCLUDE_DIRS. * Fix typo. * Do not use boost shared libraries when building arrow. * Add six to the setup.py install_requires because it is needed by pyarrow. * Don't link numbuf against boost_system and boost_filesystem. * Compile boost when we are on Linux. * Make numbuf find the correct boost libraries. * Only use find_package Boost on Linux, suppress output when building boost. * Changes to wheel building scripts, install cython in mac script. * Compile flatbuffers ourselves on Linux and pass it in when compiling Arrow. * Clean up build_flatbuffers.sh and build_boost.sh scripts a little. * Install cython when building linux wheel.	2017-08-21 17:49:35 -07:00
Robert Nishihara	ea8da13938	Remove UT data structures from global scheduler. (#838 ) * Replace pending_tasks utarray with vector. * Replace local_schedulers vector with unordered_map. * Replace object info table with unordered_map. * Replace local_scheduler_plasma_map and plasma_local_scheduler_map with unordered maps. * Remove unnecessary includes. * Fix linting. * Bug fixes. * Add function for computing the amount of data for a task that wouldn't have to be shipped because it is already accessible to a local scheduler. * Small cleanups.	2017-08-16 22:28:21 -07:00
Alexey Tumanov	fc885bd918	Adding basic support for a user-interpretable resource label (#761 ) * adding support for the user-interpretable label(UIR) * more plumbing for num_uirs further upstream; set to infty when specified on cmd line * pass default num_uirs for actors; update GlobalStateAPI * support num_uirs in ray.init() * local scheduler resource accounting: support num_uirs; prep for vectorized resource accounting * global scheduler test updated * Fix bug introduced by rebase. * Rename UIR -> CustomResource and add test. * Small changes and use constexpr instead of macros. * Linting and some renaming. * Reorder some code. * Remove cpus_in_use and fix bug. * Add another test and make a small change. * Rephrase documentation about feature stability.	2017-08-08 02:53:59 -07:00
Robert Nishihara	03f2325780	Package pyarrow along with ray. (#822 ) * Rough pass at installing pyarrow along with Ray. * Remove hardcoded path and try to find correct path automatically. * Add print. * Fix linting. * Copy pyarrow files to a location that we manually add to python path in order to avoid interfering with pre-existing pyarrow installations. * Move call to build.sh back into build_ext in setup.py. * Ignore some linting errors. * Fix problem in which pyarrow files to copy were listed before they were built. * Fix tests by importing ray before pyarrow.	2017-08-07 21:17:28 -07:00
Robert Nishihara	d7b10a84b6	Fallback to custom serializer for very long python ints. (#821 ) * Fallback to custom serializer for very long python ints. * Fix linting. * Fix naming convention and add RETURN_NOT_OK.	2017-08-07 17:21:06 -07:00
Robert Nishihara	3071ba0070	Add correct Python executable to Path when building arrow. (#820 ) * Tell cmake which python to use when building arrow. * Pass different path into cmake when building arrow so that cmake finds the right python. * Add correct python executable to PATH when running cmake for ray.	2017-08-07 14:47:34 -07:00
Philipp Moritz	054ae4180e	Fix installation instruction for ubuntu 14.04 (#805 ) * fix installation instruction for ubuntu 14.04 * upgrade cmake requirements * fix	2017-08-02 18:14:14 -07:00
Robert Nishihara	cb84972f6b	Recreate actors when local schedulers die. (#804 ) * Reconstruct actor state when local schedulers fail. * Simplify construction of arguments to pass into default_worker.py from local scheduler. * Remove deprecated ray.actor. * Simplify actor reconstruction method. * Fix linting. * Small fixes.	2017-08-02 18:02:52 -07:00
Robert Nishihara	37282330c0	Allow plasma manager to gracefully handle EPROTOTYPE. (#802 ) * Allow plasma manager to gracefully handle EPROTOTYPE. * Fix linting.	2017-08-01 23:33:25 -07:00
Robert Nishihara	8c8258de20	Move worker methods into Worker class and expose more TaskSpec fields to Python. (#796 ) * Move worker methods inside worker class. Move some helper methods from actor.py into utils.py and state.py. * Add more methods exposing task spec fields to Python. * Fix linting. * Fix error. * Remove unused code in default worker.	2017-08-01 17:16:57 -07:00
Philipp Moritz	c3b39b4d86	Pull Plasma from Apache Arrow and remove Plasma store from Ray. (#692 ) * Rebase Ray on top of Plasma in Apache Arrow * add thirdparty building scripts * use rebased arrow * fix * fix build * fix python visibility * comment out C tests for now * fix multithreading * fix * reduce logging * fix plasma manager multithreading * make sure old and new object IDs can coexist peacefully * more rebasing * update * fixes * fix * install pyarrow * install cython * fix * install newer cmake * fix * rebase on top of latest arrow * getting runtest.py run locally (needed to comment out a test for that to work) * work on plasma tests * more fixes * fix local scheduler tests * fix global scheduler test * more fixes * fix python 3 bytes vs string * fix manager tests valgrind * fix documentation building * fix linting * fix c++ linting * fix linting * add tests back in * Install without sudo. * Set PKG_CONFIG_PATH in build.sh so that Ray can find plasma. * Install pkg-config * Link -lpthread, note that find_package(Threads) doesn't seem to work reliably. * Comment in testGPUIDs in runtest.py. * Set PKG_CONFIG_PATH when building pyarrow. * Pull apache/arrow and not pcmoritz/arrow. * Fix installation in docker image. * adapt to changes of the plasma api * Fix installation of pyarrow module. * Fix linting. * Use correct python executable to build pyarrow.	2017-07-31 21:04:15 -07:00
Robert Nishihara	8ad9ced99b	Fix task ID hash computation. (#774 )	2017-07-26 10:08:38 -07:00
Yeolar	31329d43dd	fixtypo: plasma_protocol (#764 ) Fix typo in plasma_protocol.	2017-07-22 17:52:27 -07:00
Robert Nishihara	e0867c8845	Switch Python indentation from 2 spaces to 4 spaces. (#726 ) * 4 space indentation for actor.py. * 4 space indentation for worker.py. * 4 space indentation for more files. * 4 space indentation for some test files. * Check indentation in Travis. * 4 space indentation for some rl files. * Fix failure test. * Fix multi_node_test. * 4 space indentation for more files. * 4 space indentation for remaining files. * Fixes.	2017-07-13 21:53:57 +00:00
alanamarzoev	8464d77c76	Change event logs to store one Redis ZSET per worker. (#705 ) * Changing to zset * Fixed bug. * Fixed another bug. * Modified task_profiles. * Removed extra file. * Modified task_profiles test. * WIP * WIP * Undid changes * Updated * WIP * Made changes according to comments. * Removed unneeded print. * Removed ujson usage. * failing test * tests passing * Fixed linting errors and modified style. * Fixed bug. * Fixed linting * Fixed according to comments. * Redis crashing? * Fixed linting * Fixed linting	2017-07-09 01:42:29 +02:00
Robert Nishihara	6c45657280	Reset the SIGCHLD handler after forking a worker to avoid influencing the worker. (#713 )	2017-07-07 14:50:37 +00:00
Robert Nishihara	1941e0f7b1	Fix compilation on CentOS. (#699 )	2017-06-26 05:54:21 +00:00
Robert Nishihara	0926550661	Remove -mtune and -march compiler flags. (#697 )	2017-06-26 05:52:45 +00:00
Robert Nishihara	ad480f8165	Don't reconstruct all objects in every fetch request in local scheduler. (#686 ) * Don't reconstruct all objects in every fetch request in local scheduler. * Separate out fetch timer and reconstruction timer. * Fix bug. * Bug fix. * Fix naming convention for global variables. * Address comments. * Make reconstruct_counter a static variable. * Fix linting. * Redo reconstruct handler using a set of objects to fetch. * Fix linting. * Replace set with vector.	2017-06-23 21:08:02 +00:00
Robert Nishihara	5ebc2f3f2e	Do resource bookkeeping for actor methods. (#682 ) * Dispatch regular and actor tasks when resources become available. * Make actor methods do resource bookkeeping and add test. * Remove unnecessary field. * Fix linting. * Fix actor test. * Maintain set of actors with pending tasks to speed up task dispatch. * Exit early from task dispatch if there are no resources available. * Fix linting. * Fix error. * Fix bug related to iterator invalidation. * When an actor is removed, remove it from the set of actors with pending tasks.	2017-06-21 05:52:45 +00:00
Robert Nishihara	3052ce25a6	Divide up large fetch requests from local scheduler, also print warni… (#683 ) * Divide up large fetch requests from local scheduler, also print warning if fetch handler is slow. * Fix linting. * Fix typo.	2017-06-19 22:57:51 +00:00
Robert Nishihara	9e4a3e4972	Replace some UT data structures in local scheduler with C++ STL. (#680 ) * Replace a local scheduler ut_array with a std::vector. * Replace vector of sizes in local scheduler with std::pair. * Remove utarray include. * Replace utarray with std::vector for reading local scheduler input messages. * Remove more UT data structures. * Remove UT includes. * Fix linting. * Include stdlib.h to find size_t. * Remove includes of stdbool.h. * Replace std::pair with TaskQueueEntry. * Fix redis tests. * Reinstate tests.	2017-06-19 21:58:42 +00:00
Robert Nishihara	f12db5f0e2	Divide large plasma requests into smaller chunks, and wait longer before reissuing large requests. (#678 ) * Divide large get requests into smaller chunks. * Divide fetches into smaller chunks. * Wait longer in worker and manager before reissuing fetch requests if there are many outstanding fetch requests. * Log warning if a handler in the local scheduler or plasma manager takes more than one second.	2017-06-18 04:42:15 +00:00
alanamarzoev	4d5ac9dad5	Include object size and hash in the table returned by the object_table function in the GlobalStateAPI. (#665 ) * added log_table function and a test * fixed log_files and added task_profiles * fixed formatting * fixed linting errors * fixes * removed file * more fixes * hopefully fixed * Small changes. * Fix linting. * Fix bug in log monitor. * Small changes. * Fix bug in travis. * Including data_size and hash in the ResultTableReply. * Included data_size and hash info in object_table. * Fixed bugs in ray_redis_module.cc. * Removing commented out code. * Fixes * Freed hash and data_size strings after using, and checked if they're null along with task_id and is_put. * Changed it so that data_size is set correctly. * Removed iostream import. * Included a check to ensure that the Redis string to long long conversion was successful. * Included separate data_size and hash null checks. * Fixed bug. * Made linting changes. * Another linting error. * Slight simplication.	2017-06-16 23:17:11 -07:00
Robert Nishihara	96962cdee0	Log fatal error if plasma manager or local scheduler heartbeats take too long. (#676 ) * Log fatal error if plasma manager or local scheduler take too long to send heartbeat. * Fix linting. * Use int64_t for milliseconds since unix epoch.	2017-06-16 19:11:01 +00:00
Philipp Moritz	c343df832e	use multiple threads for memcpy (#669 )	2017-06-14 19:14:24 -07:00
Philipp Moritz	54925996ca	Allow remote functions to specify max executions and kill worker once limit is reached. (#660 ) * implement restarting workers after certain number of task executions * Clean up python code. * Don't start new worker when an actor disconnects. * Move wait_for_pid_to_exit to test_utils.py. * Add test. * Fix linting errors. * Fix linting. * Fix typo.	2017-06-13 00:34:58 -07:00
Robert Nishihara	1916475e14	Increase socket listen backlog from 5 to 128. (#661 )	2017-06-11 06:34:16 +00:00
Eric Liang	d4d2c03ac5	Remove timeout for Redis commands. (#649 ) * update * Remove interaction between callback data identifier and event loop. * Remove tests that no longer apply.	2017-06-09 15:55:36 -07:00
Philipp Moritz	0254efa5e8	Use parallel memcopy from arrow (#633 ) * use parallel memcopy from arrow * fix linting * remove memory.h	2017-06-02 18:18:41 -07:00
Robert Nishihara	a4d8e13094	Suppress excess warning messages related to intentional actor deaths. (#627 ) * Don't submit the actor destructor tasks when the job is exiting. * Don't propagate error messages to the driver when an actor exits intentionally.	2017-06-01 20:10:40 +00:00
Robert Nishihara	dd7f866a92	Fix compilation error on CentOS. (#622 ) * Fix compilation error on CentOS. * add TODO	2017-06-01 06:51:00 +00:00
Robert Nishihara	5f193afb87	Tell local scheduler to ignore SIGCHLD so that workers don't become zombies. (#620 )	2017-06-01 06:37:28 +00:00
Robert Nishihara	4d51ed37b2	Fix bug in which plasma client file descriptors were not closed. (#618 ) * Fix bug in which plasma client file descriptors were not closed. * Add logging statement when disconnecting client from plasma store. * Fix after rebasing. * Add more checks to plasma disconnect client.	2017-06-01 05:37:29 +00:00
Philipp Moritz	b94b4a35e0	Make the Plasma store ready for Arrow integration (#579 ) * port plasma to arrow * fixes * refactor plasma client * more modernization * fix plasma manager tests * everything compiles * fix plasma client tests * update plasma serialization tests * fix plasma manager tests * fix bug * updates * fix bug * fix tests * fix rebase * address comments * fix travis valgrind build * fix linting * fix include order again * fix linting * address comments	2017-05-31 16:24:23 -07:00
Richard Shin	16050eca8d	Don't link Python extensions to libpython*.so (#598 )	2017-05-25 19:01:12 -07:00
Philipp Moritz	3885d1b286	make builds with CMake incremental (#592 )	2017-05-24 21:52:33 -07:00
Stephanie Wang	ee08c8274b	Shard Redis. (#539 ) * Implement sharding in the Ray core * Single node Python modifications to do sharding * Do the sharding in redis.cc * Pipe num_redis_shards through start_ray.py and worker.py. * Use multiple redis shards in multinode tests. * first steps for sharding ray.global_state * Fix problem in multinode docker test. * fix runtest.py * fix some tests * fix redis shard startup * fix redis sharding * fix * fix bug introduced by the map-iterator being consumed * fix sharding bug * shard event table * update number of Redis clients to be 64K * Fix object table tests by flushing shards in between unit tests * Fix local scheduler tests * Documentation * Register shard locations in the primary shard * Add plasma unit tests back to build * lint * lint and fix build * Fix * Address Robert's comments * Refactor start_ray_processes to start Redis shard * lint * Fix global scheduler python tests * Fix redis module test * Fix plasma test * Fix component failure test * Fix local scheduler test * Fix runtest.py * Fix global scheduler test for python3 * Fix task_table_test_and_update bug, from actor task table submission race * Fix jenkins tests. * Retry Redis shard connections * Fix test cases * Convert database clients to DBClient struct * Fix race condition when subscribing to db client table * Remove unused lines, add APITest for sharded Ray * Fix * Fix memory leak * Suppress ReconstructionTests output * Suppress output for APITestSharded * Reissue task table add/update commands if initial command does not publish to any subscribers. * fix * Fix linting. * fix tests * fix linting * fix python test * fix linting	2017-05-18 17:40:41 -07:00
Robert Nishihara	9018dffd7f	Fix bug in actor task dispatch. (#552 ) * Fix bug in actor task dispatch. * Return early from dispatch_actor_task if creation notification has not arrived. Also fix comment.	2017-05-15 23:47:15 -07:00
Philipp Moritz	08e988aee5	Modernize plasma store (C to C++ changes). (#546 )	2017-05-15 01:19:44 -07:00
Eric Liang	e2e9e4ce6f	Fix segmentation fault when calling ray.put on a dictionary with object keys (#548 ) * fix segfault when serializing dict key * fix style * fix test * Fix linting.	2017-05-15 01:09:13 -07:00
Philipp Moritz	3a6922276a	convert malloc.c to STL (#537 ) * convert malloc.c to STL * linting * cleanup and comments * address Richard's comments	2017-05-11 11:18:23 -07:00
Philipp Moritz	c1e9496a06	fix problem if old version of arrow is cloned (#538 )	2017-05-10 12:16:07 -07:00
Philipp Moritz	3a0e86395e	Convert eviction code to STL (#534 ) * temp commit * convert eviction policy to C++ * temp commit * fix plasma tests * fix * linting * fixes * fix linting	2017-05-09 21:26:22 -07:00
Philipp Moritz	118fac5619	Remove boost dependencies from Ray (#518 ) * remove boost regex * workaround for boost * fix * do not link against boost any more * rebased on arrow change	2017-05-09 16:17:20 -07:00
Philipp Moritz	e5e2aab5e4	upgrade arrow and fix bug (#530 ) * upgrade arrow and fix bug * fixes suggested by Wes	2017-05-09 13:58:42 -07:00
Philipp Moritz	0681107039	add serializing numpy boolean (#529 )	2017-05-08 22:24:02 -07:00

... 3 4 5 6 7 ...

474 commits