hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-09 12:56:46 -04:00

Author	SHA1	Message	Date
Philipp Moritz	c3b39b4d86	Pull Plasma from Apache Arrow and remove Plasma store from Ray. (#692 ) * Rebase Ray on top of Plasma in Apache Arrow * add thirdparty building scripts * use rebased arrow * fix * fix build * fix python visibility * comment out C tests for now * fix multithreading * fix * reduce logging * fix plasma manager multithreading * make sure old and new object IDs can coexist peacefully * more rebasing * update * fixes * fix * install pyarrow * install cython * fix * install newer cmake * fix * rebase on top of latest arrow * getting runtest.py run locally (needed to comment out a test for that to work) * work on plasma tests * more fixes * fix local scheduler tests * fix global scheduler test * more fixes * fix python 3 bytes vs string * fix manager tests valgrind * fix documentation building * fix linting * fix c++ linting * fix linting * add tests back in * Install without sudo. * Set PKG_CONFIG_PATH in build.sh so that Ray can find plasma. * Install pkg-config * Link -lpthread, note that find_package(Threads) doesn't seem to work reliably. * Comment in testGPUIDs in runtest.py. * Set PKG_CONFIG_PATH when building pyarrow. * Pull apache/arrow and not pcmoritz/arrow. * Fix installation in docker image. * adapt to changes of the plasma api * Fix installation of pyarrow module. * Fix linting. * Use correct python executable to build pyarrow.	2017-07-31 21:04:15 -07:00
Robert Nishihara	8ad9ced99b	Fix task ID hash computation. (#774 )	2017-07-26 10:08:38 -07:00
Yeolar	31329d43dd	fixtypo: plasma_protocol (#764 ) Fix typo in plasma_protocol.	2017-07-22 17:52:27 -07:00
Robert Nishihara	e0867c8845	Switch Python indentation from 2 spaces to 4 spaces. (#726 ) * 4 space indentation for actor.py. * 4 space indentation for worker.py. * 4 space indentation for more files. * 4 space indentation for some test files. * Check indentation in Travis. * 4 space indentation for some rl files. * Fix failure test. * Fix multi_node_test. * 4 space indentation for more files. * 4 space indentation for remaining files. * Fixes.	2017-07-13 21:53:57 +00:00
alanamarzoev	8464d77c76	Change event logs to store one Redis ZSET per worker. (#705 ) * Changing to zset * Fixed bug. * Fixed another bug. * Modified task_profiles. * Removed extra file. * Modified task_profiles test. * WIP * WIP * Undid changes * Updated * WIP * Made changes according to comments. * Removed unneeded print. * Removed ujson usage. * failing test * tests passing * Fixed linting errors and modified style. * Fixed bug. * Fixed linting * Fixed according to comments. * Redis crashing? * Fixed linting * Fixed linting	2017-07-09 01:42:29 +02:00
Robert Nishihara	6c45657280	Reset the SIGCHLD handler after forking a worker to avoid influencing the worker. (#713 )	2017-07-07 14:50:37 +00:00
Robert Nishihara	1941e0f7b1	Fix compilation on CentOS. (#699 )	2017-06-26 05:54:21 +00:00
Robert Nishihara	0926550661	Remove -mtune and -march compiler flags. (#697 )	2017-06-26 05:52:45 +00:00
Robert Nishihara	ad480f8165	Don't reconstruct all objects in every fetch request in local scheduler. (#686 ) * Don't reconstruct all objects in every fetch request in local scheduler. * Separate out fetch timer and reconstruction timer. * Fix bug. * Bug fix. * Fix naming convention for global variables. * Address comments. * Make reconstruct_counter a static variable. * Fix linting. * Redo reconstruct handler using a set of objects to fetch. * Fix linting. * Replace set with vector.	2017-06-23 21:08:02 +00:00
Robert Nishihara	5ebc2f3f2e	Do resource bookkeeping for actor methods. (#682 ) * Dispatch regular and actor tasks when resources become available. * Make actor methods do resource bookkeeping and add test. * Remove unnecessary field. * Fix linting. * Fix actor test. * Maintain set of actors with pending tasks to speed up task dispatch. * Exit early from task dispatch if there are no resources available. * Fix linting. * Fix error. * Fix bug related to iterator invalidation. * When an actor is removed, remove it from the set of actors with pending tasks.	2017-06-21 05:52:45 +00:00
Robert Nishihara	3052ce25a6	Divide up large fetch requests from local scheduler, also print warni… (#683 ) * Divide up large fetch requests from local scheduler, also print warning if fetch handler is slow. * Fix linting. * Fix typo.	2017-06-19 22:57:51 +00:00
Robert Nishihara	9e4a3e4972	Replace some UT data structures in local scheduler with C++ STL. (#680 ) * Replace a local scheduler ut_array with a std::vector. * Replace vector of sizes in local scheduler with std::pair. * Remove utarray include. * Replace utarray with std::vector for reading local scheduler input messages. * Remove more UT data structures. * Remove UT includes. * Fix linting. * Include stdlib.h to find size_t. * Remove includes of stdbool.h. * Replace std::pair with TaskQueueEntry. * Fix redis tests. * Reinstate tests.	2017-06-19 21:58:42 +00:00
Robert Nishihara	f12db5f0e2	Divide large plasma requests into smaller chunks, and wait longer before reissuing large requests. (#678 ) * Divide large get requests into smaller chunks. * Divide fetches into smaller chunks. * Wait longer in worker and manager before reissuing fetch requests if there are many outstanding fetch requests. * Log warning if a handler in the local scheduler or plasma manager takes more than one second.	2017-06-18 04:42:15 +00:00
alanamarzoev	4d5ac9dad5	Include object size and hash in the table returned by the object_table function in the GlobalStateAPI. (#665 ) * added log_table function and a test * fixed log_files and added task_profiles * fixed formatting * fixed linting errors * fixes * removed file * more fixes * hopefully fixed * Small changes. * Fix linting. * Fix bug in log monitor. * Small changes. * Fix bug in travis. * Including data_size and hash in the ResultTableReply. * Included data_size and hash info in object_table. * Fixed bugs in ray_redis_module.cc. * Removing commented out code. * Fixes * Freed hash and data_size strings after using, and checked if they're null along with task_id and is_put. * Changed it so that data_size is set correctly. * Removed iostream import. * Included a check to ensure that the Redis string to long long conversion was successful. * Included separate data_size and hash null checks. * Fixed bug. * Made linting changes. * Another linting error. * Slight simplication.	2017-06-16 23:17:11 -07:00
Robert Nishihara	96962cdee0	Log fatal error if plasma manager or local scheduler heartbeats take too long. (#676 ) * Log fatal error if plasma manager or local scheduler take too long to send heartbeat. * Fix linting. * Use int64_t for milliseconds since unix epoch.	2017-06-16 19:11:01 +00:00
Philipp Moritz	c343df832e	use multiple threads for memcpy (#669 )	2017-06-14 19:14:24 -07:00
Philipp Moritz	54925996ca	Allow remote functions to specify max executions and kill worker once limit is reached. (#660 ) * implement restarting workers after certain number of task executions * Clean up python code. * Don't start new worker when an actor disconnects. * Move wait_for_pid_to_exit to test_utils.py. * Add test. * Fix linting errors. * Fix linting. * Fix typo.	2017-06-13 00:34:58 -07:00
Robert Nishihara	1916475e14	Increase socket listen backlog from 5 to 128. (#661 )	2017-06-11 06:34:16 +00:00
Eric Liang	d4d2c03ac5	Remove timeout for Redis commands. (#649 ) * update * Remove interaction between callback data identifier and event loop. * Remove tests that no longer apply.	2017-06-09 15:55:36 -07:00
Philipp Moritz	0254efa5e8	Use parallel memcopy from arrow (#633 ) * use parallel memcopy from arrow * fix linting * remove memory.h	2017-06-02 18:18:41 -07:00
Robert Nishihara	a4d8e13094	Suppress excess warning messages related to intentional actor deaths. (#627 ) * Don't submit the actor destructor tasks when the job is exiting. * Don't propagate error messages to the driver when an actor exits intentionally.	2017-06-01 20:10:40 +00:00
Robert Nishihara	dd7f866a92	Fix compilation error on CentOS. (#622 ) * Fix compilation error on CentOS. * add TODO	2017-06-01 06:51:00 +00:00
Robert Nishihara	5f193afb87	Tell local scheduler to ignore SIGCHLD so that workers don't become zombies. (#620 )	2017-06-01 06:37:28 +00:00
Robert Nishihara	4d51ed37b2	Fix bug in which plasma client file descriptors were not closed. (#618 ) * Fix bug in which plasma client file descriptors were not closed. * Add logging statement when disconnecting client from plasma store. * Fix after rebasing. * Add more checks to plasma disconnect client.	2017-06-01 05:37:29 +00:00
Philipp Moritz	b94b4a35e0	Make the Plasma store ready for Arrow integration (#579 ) * port plasma to arrow * fixes * refactor plasma client * more modernization * fix plasma manager tests * everything compiles * fix plasma client tests * update plasma serialization tests * fix plasma manager tests * fix bug * updates * fix bug * fix tests * fix rebase * address comments * fix travis valgrind build * fix linting * fix include order again * fix linting * address comments	2017-05-31 16:24:23 -07:00
Richard Shin	16050eca8d	Don't link Python extensions to libpython*.so (#598 )	2017-05-25 19:01:12 -07:00
Philipp Moritz	3885d1b286	make builds with CMake incremental (#592 )	2017-05-24 21:52:33 -07:00
Stephanie Wang	ee08c8274b	Shard Redis. (#539 ) * Implement sharding in the Ray core * Single node Python modifications to do sharding * Do the sharding in redis.cc * Pipe num_redis_shards through start_ray.py and worker.py. * Use multiple redis shards in multinode tests. * first steps for sharding ray.global_state * Fix problem in multinode docker test. * fix runtest.py * fix some tests * fix redis shard startup * fix redis sharding * fix * fix bug introduced by the map-iterator being consumed * fix sharding bug * shard event table * update number of Redis clients to be 64K * Fix object table tests by flushing shards in between unit tests * Fix local scheduler tests * Documentation * Register shard locations in the primary shard * Add plasma unit tests back to build * lint * lint and fix build * Fix * Address Robert's comments * Refactor start_ray_processes to start Redis shard * lint * Fix global scheduler python tests * Fix redis module test * Fix plasma test * Fix component failure test * Fix local scheduler test * Fix runtest.py * Fix global scheduler test for python3 * Fix task_table_test_and_update bug, from actor task table submission race * Fix jenkins tests. * Retry Redis shard connections * Fix test cases * Convert database clients to DBClient struct * Fix race condition when subscribing to db client table * Remove unused lines, add APITest for sharded Ray * Fix * Fix memory leak * Suppress ReconstructionTests output * Suppress output for APITestSharded * Reissue task table add/update commands if initial command does not publish to any subscribers. * fix * Fix linting. * fix tests * fix linting * fix python test * fix linting	2017-05-18 17:40:41 -07:00
Robert Nishihara	9018dffd7f	Fix bug in actor task dispatch. (#552 ) * Fix bug in actor task dispatch. * Return early from dispatch_actor_task if creation notification has not arrived. Also fix comment.	2017-05-15 23:47:15 -07:00
Philipp Moritz	08e988aee5	Modernize plasma store (C to C++ changes). (#546 )	2017-05-15 01:19:44 -07:00
Eric Liang	e2e9e4ce6f	Fix segmentation fault when calling ray.put on a dictionary with object keys (#548 ) * fix segfault when serializing dict key * fix style * fix test * Fix linting.	2017-05-15 01:09:13 -07:00
Philipp Moritz	3a6922276a	convert malloc.c to STL (#537 ) * convert malloc.c to STL * linting * cleanup and comments * address Richard's comments	2017-05-11 11:18:23 -07:00
Philipp Moritz	c1e9496a06	fix problem if old version of arrow is cloned (#538 )	2017-05-10 12:16:07 -07:00
Philipp Moritz	3a0e86395e	Convert eviction code to STL (#534 ) * temp commit * convert eviction policy to C++ * temp commit * fix plasma tests * fix * linting * fixes * fix linting	2017-05-09 21:26:22 -07:00
Philipp Moritz	118fac5619	Remove boost dependencies from Ray (#518 ) * remove boost regex * workaround for boost * fix * do not link against boost any more * rebased on arrow change	2017-05-09 16:17:20 -07:00
Philipp Moritz	e5e2aab5e4	upgrade arrow and fix bug (#530 ) * upgrade arrow and fix bug * fixes suggested by Wes	2017-05-09 13:58:42 -07:00
Philipp Moritz	0681107039	add serializing numpy boolean (#529 )	2017-05-08 22:24:02 -07:00
Robert Nishihara	c688a64235	Expose GPU IDs to remote functions. (#496 ) * Change local scheduler bookkeeping to use GPU IDs. * Update actor test. * Add tests for actors and tasks simultaneously using GPUs. * Add additional task GPU ID test. * Fix linting. * Make redis GPU assignment ignore GPU IDs. * Small fix.	2017-05-07 13:03:49 -07:00
Philipp Moritz	1dddd5336a	Fix actor bug arising from overwriting task specifications in the local scheduler (#513 ) * copy task specifications put into the actor task cache so it won't get overwritten when the scheduler receives the next task * cleanup * cleanup and fix * linting * fix jenkins test * fix linting	2017-05-06 17:39:35 -07:00
Stephanie Wang	e50a23b820	Fix bug with reused file descriptors (#471 ) * Fix bug with reused file descriptors * Remove client connection if write_object_chunk fails * Handle ECONNRESET on unsuccessful write * lint * Back to lowercase * fix compilation * fix linting	2017-05-02 19:45:27 -07:00
Robert Nishihara	2bbfc5da8d	Dispatch actor tasks when actor connects. (#495 )	2017-04-28 17:36:43 -07:00
Robert Nishihara	6d301d9079	Simplify resource bookkeeping in local scheduler. (#494 ) * Simplify resource bookkeeping in local scheduler. * Change ints to doubles.	2017-04-28 12:09:47 -07:00
Robert Nishihara	eea19371b7	Suppress warning about working dying when driver exits. (#492 )	2017-04-26 23:52:13 -07:00
Robert Nishihara	1627f89945	Fix problem in which actors and workers running tasks are not killed by driver exit. (#490 ) * Augment test to verify that relevant workers and actors are killed during driver cleanup. * Fix bug in which we were only killing one worker when a driver exited. * Fix remove driver test. * Fix and augment test.	2017-04-26 15:13:39 -07:00
Philipp Moritz	b7ace01b5f	Convert Plasma client to STL (#486 ) * convert mmap table to STL * update * fix * convert objects_in_use * fix * convert release_history * cleanup * linting * update * fix * linting	2017-04-25 01:25:40 -07:00
Robert Nishihara	0ac125e9b2	Clean up when a driver disconnects. (#462 ) * Clean up state when drivers exit. * Remove unnecessary field in ActorMapEntry struct. * Have monitor release GPU resources in Redis when driver exits. * Enable multiple drivers in multi-node tests and test driver cleanup. * Make redis GPU allocation a redis transaction and small cleanups. * Fix multi-node test. * Small cleanups. * Make global scheduler take node_ip_address so it appears in the right place in the client table. * Cleanups. * Fix linting and cleanups in local scheduler. * Fix removed_driver_test. * Fix bug related to vector -> list. * Fix linting. * Cleanup. * Fix multi node tests. * Fix jenkins tests. * Add another multi node test with many drivers. * Fix linting. * Make the actor creation notification a flatbuffer message. * Revert "Make the actor creation notification a flatbuffer message." This reverts commit af99099c8084dbf9177fb4e34c0c9b1a12c78f39. * Add comment explaining flatbuffer problems.	2017-04-24 18:10:21 -07:00
Philipp Moritz	8194b71f32	Convert pending_notifications to STL (#484 ) * temp commit * converted more plasma notifications * cleanup * rename * linting * fixes * fixes	2017-04-24 14:41:34 -07:00
Philipp Moritz	892e53d69e	Convert plasma client array and object notification queue to STL (#482 ) * Conver plasma clients to STL * use a deque for object notifications in plasma store for perf * cleanup * linting * fix include order	2017-04-24 00:43:48 -07:00
Philipp Moritz	e36de2dad1	Convert object table to STL (#480 ) * convert object table to stl * temp commit * fix * comments * linting	2017-04-23 22:24:05 -07:00
Alexey Tumanov	a67a107e0e	Fix int-type compilation problem on redhat. (#472 )	2017-04-19 02:43:33 -07:00

1 2 3 4 5 ...

361 commits