hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-11 13:46:40 -04:00

Author	SHA1	Message	Date
Philipp Moritz	615d5516d1	Compile valgrind tests with Bazel (#4144 )	2019-02-24 00:00:49 -08:00
Philipp Moritz	ba52caff37	Make Bazel the default build system (#3898 )	2019-02-23 11:58:59 -08:00
Philipp Moritz	9b3ce3e64b	Revert inline objects PR (#4125 ) * Revert "Inline objects (#3756)" This reverts commit `f987572795`. * fix rebase problems * more rebase fixes * add back debug statement	2019-02-22 18:21:01 -08:00
Tianming Xu	692bb336a1	Fix master branch compilation error and lint error (#4109 )	2019-02-21 11:54:30 -08:00
Yuhong Guo	3549cd8195	Add the Delete function in GCS (#4081 ) * Add the Delete function in GCS * Unify BatchDelete and Delete * Fix comment * Lint * Refine according to comments * Unify test. * Address comment * C++ lint * Update ray_redis_module.cc	2019-02-21 13:33:37 +08:00
Hao Chen	de17443dc2	Propagate backend error to worker (#4039 )	2019-02-16 11:39:15 +08:00
Stephanie Wang	3684e5bc0d	Fix memory leak in Redis by using auto memory management (#4054 ) * Table appends should always succeed * Use Redis auto memory management * Remove unneeded namespace	2019-02-14 19:51:18 -08:00
Philipp Moritz	810cc17062	Fix LRU eviction of client notification datastructure (#4021 ) * convert notification_key map to C++ datastructure * fix crash and add debug string * clean notification map up (this was a bug before) * remove checks * add jenkins test * linting * fixes * properly erase * clean up * linting * Update test_wait_hanging.py * Update run_multi_node_tests.sh * increase redis_max_memory * fix dat jenkins * update * Update run_multi_node_tests.sh	2019-02-13 22:20:27 -08:00
Stephanie Wang	fd5b58a827	Increase timeout for object manager valgrind tests (#4027 ) * Avoid second copy of data for inlined objects * Increase Wait timeout for valgrind tests * Run object manager tests with and without inlined objects * Fix test	2019-02-13 18:29:03 -08:00
Stephanie Wang	4347ab644e	Use Redis lists in the GCS instead of zset (#4023 ) * Convert zset to list * Remove object evictions map from the object directory, yay * comments * Fix tests	2019-02-13 10:32:57 -08:00
Hao Chen	f31a79f3f7	Implement actor checkpointing (#3839 ) * Implement Actor checkpointing * docs * fix * fix * fix * move restore-from-checkpoint to HandleActorStateTransition * Revert "move restore-from-checkpoint to HandleActorStateTransition" This reverts commit 9aa4447c1e3e321f42a1d895d72f17098b72de12. * resubmit waiting tasks when actor frontier restored * add doc about num_actor_checkpoints_to_keep=1 * add num_actor_checkpoints_to_keep to Cython * add checkpoint_expired api * check if actor class is abstract * change checkpoint_ids to long string * implement java * Refactor to delay actor creation publish until checkpoint is resumed * debug, lint * Erase from checkpoints to restore if task fails * fix lint * update comments * avoid duplicated actor notification log * fix unintended change * add actor_id to checkpoint_expired * small java updates * make checkpoint info per actor * lint * Remove logging * Remove old actor checkpointing Python code, move new checkpointing code to FunctionActionManager * Replace old actor checkpointing tests * Fix test and lint * address comments * consolidate kill_actor * Remove __ray_checkpoint__ * fix non-ascii char * Loosen test checks * fix java * fix sphinx-build	2019-02-13 19:39:02 +08:00
Zhijun Fu	7097ba393b	protect raylet against bad messages (#4003 ) * protect raylet against bad messages * address comments * linting and regression test	2019-02-12 00:39:38 +08:00
Yuhong Guo	5fb1efd60d	Fix CI test failures (#4007 )	2019-02-11 11:01:14 +08:00
Yuhong Guo	3a66d47a3a	Remove RAY_CHECK from JNI code (#3978 ) * Remove RAY_CHECK in JNI * Try to add mvn test to test the exception. * Refine * Address comments	2019-02-09 18:10:22 +08:00
Robert Nishihara	ef527f84ab	Stream logs to driver by default. (#3892 ) * Stream logs to driver by default. * Fix from rebase * Redirect raylet output independently of worker output. * Fix. * Create redis client with services.create_redis_client. * Suppress Redis connection error at exit. * Remove thread_safe_client from redis. * Shutdown driver threads in ray.shutdown(). * Add warning for too many log messages. * Only stop threads if worker is connected. * Only stop threads if they exist. * Remove unnecessary try/excepts. * Fix * Only add new logging handler once. * Increase timeout. * Fix tempfile test. * Fix logging in cluster_utils. * Revert "Increase timeout." This reverts commit b3846b89040bcd8e583b2e18cb513cb040e71d95. * Retry longer when connecting to plasma store from node manager and object manager. * Close pubsub channels to avoid leaking file descriptors. * Limit log monitor open files to 200. * Increase plasma connect retries. * Add comment.	2019-02-07 19:53:50 -08:00
Ion	f987572795	Inline objects (#3756 ) * added store_client_ to object_manager and node_manager * half through... * all code in, and compiling! Nothing tested though... * something is working ;-) * added a few more comments * now, add only one entry to the in GCS for inlined objects * more comments * remove a spurious todo * some comment updates * add test * added support for meta data for inline objects * avoid some copies * Initialize plasma client in tests * Better comments. Enable configuring nline_object_max_size_bytes. * Update src/ray/object_manager/object_manager.cc Co-Authored-By: istoica <istoica@cs.berkeley.edu> * Update src/ray/raylet/node_manager.cc Co-Authored-By: istoica <istoica@cs.berkeley.edu> * Update src/ray/raylet/node_manager.cc Co-Authored-By: istoica <istoica@cs.berkeley.edu> * fiexed comments * fixed various typos in comments * updated comments in object_manager.h and object_manager.cc * addressed all comments...hopefully ;-) * Only add eviction entries for objects that are not inlined * fixed a bunch of comments * Fix test * Fix object transfer dump test * lint * Comments * Fix test? * Fix test? * lint * fix build * Fix build * lint * Use const ref * Fixes, don't let object manager hang * Increase object transfer retry time for travis? * Fix test * Fix test? * Add internal config to java, fix PlasmaFreeTest	2019-02-07 10:32:39 -08:00
Stephanie Wang	49e9bec988	Fix raylet bug in driver cleanup (#3962 ) * Fix task dependency manager cleanup on driver exit * Add regression test * Better check, update header	2019-02-06 11:19:10 -08:00
Stephanie Wang	244fd473f4	Only mark tasks as forwarded if they are in the lineage cache (#3958 )	2019-02-05 23:01:38 -08:00
Eric Liang	5fb813ff39	Don't check fail on missing lineage cache entry (#3861 )	2019-02-04 17:45:41 -08:00
Kai Yang	02766adeca	Limit maximum starting workers per language (#3852 )	2019-01-29 21:43:12 -08:00
Yuhong Guo	c45b91dcca	Make redis module safe without crashing by removing RAY_CHECK (#3855 )	2019-01-29 21:06:31 -08:00
Philipp Moritz	0aadf11c10	Fix compilation on macOS by adding virtual destructors (#3878 )	2019-01-28 13:22:52 -08:00
Stephanie Wang	eddd60e14e	Improve backend debug logging, refactor scheduling queues (#3819 )	2019-01-26 16:15:48 +08:00
Philipp Moritz	20162ce159	Compile raylet cython bindings with bazel (#3842 )	2019-01-25 00:57:31 -08:00
Si-Yuan	48139cf861	Migrate Python C extension to Cython (#3541 )	2019-01-24 09:17:14 -08:00
Yuhong Guo	c1a52b1c86	Remove duplicated code in RayConfig (#3831 )	2019-01-24 17:04:10 +08:00
Hao Chen	bfcf254e52	Fix: do not treat actor task as failed if the actor will be reconstructed (#3736 )	2019-01-23 23:28:44 -08:00
Robert Nishihara	0b1608a546	Factor out code for starting new processes and test plasma store in valgrind. (#3824 ) * Factor out starting Ray processes. * Detect flags through environment variables. * Return ProcessInfo from start_ray_process. * Print valgrind errors at exit. * Test valgrind in travis. * Some valgrind fixes. * Undo raylet monitor change. * Only test plasma store in valgrind.	2019-01-22 14:59:11 -08:00
Philipp Moritz	931e6a2fc3	Fix compilation error on ARM. (#3800 )	2019-01-18 00:25:16 -08:00
Si-Yuan	16a3b99d8d	Get rid of Arrow test utils (#3734 ) * convert code to proper C++ * revert changes to "id.h" because #3765 has been merged. * revert changes to Python bindings because they will be removed in #3541 * remove dependencies of Arrow logging * revert changes to Arrow logging * lint	2019-01-17 18:35:41 -08:00
Hao Chen	d1840bc7a9	Simplify RayConfig (#3714 )	2019-01-16 16:43:26 -08:00
Tianming Xu	0b8008f41c	remove RAY_CHECK around wait_state.remaining.erase (#3745 )	2019-01-14 10:32:31 -08:00
Philipp Moritz	02bdaf221d	Update arrow to include https://github.com/apache/arrow/pull/3392 (#3765 ) * update arrow to include https://github.com/apache/arrow/pull/3392 * add appropriate includes * update	2019-01-14 19:20:26 +08:00
Wang Qing	8674606e26	Support to auto-generate Java files from flatbuffer (#3749 ) * auto gen flatbuffers for Java * Add auto_gen_tool.py * Refine * Add a comment * address comments. * Address comments. * Addressed * Refine * Address comments * Fix typo * Add exception * Address comments. * Refine * Fix lint * Fix * Fix lint and address comment. * Fix lint error	2019-01-13 11:39:23 -08:00
Yuhong Guo	d2cf8561f2	Refactor code about ray.ObjectID. (#3674 ) * Refactor code about ray.ObjectID. * remove from_random and use nil_id instead of constructor * remove id() in hash * Lint and fix * Change driver id to ObjectID * Replace binary_to_hex(ObjectID.id()) to ObjectID.hex()	2019-01-13 01:47:29 -08:00
Wang Qing	fa2bfa6d76	Fix some small code quality issues. (#3719 )	2019-01-11 15:24:49 +08:00
Hao Chen	6fc3fc4120	Cap task lease timeout (#3707 )	2019-01-09 17:19:48 -08:00
Stephanie Wang	04f31db54d	Actor dummy object garbage collection (#3593 ) * Convert UniqueID::nil() to a constructor * Cleanup actor handle pickling code * Add new actor handles to the task spec * Pass in new actor handles * Add new handles to the actor registration * Regression test for actor handle forking and GC * lint and doc * Handle pickled actor handles in the backend and some refactoring * Add regression test for dummy object GC and pickled actor handles * Check for duplicate actor tasks on submission * Regression test for forking twice, fix failed named actor leak * Fix bug for forking twice * lint * Revert "Fix bug for forking twice" This reverts commit 3da85e59d401e53606c2e37ffbebcc8653ff27ac. * Add new actor handles when task is assigned, not finished * Remove comment * remove UniqueID() * Updates * update * fix * fix java * fixes * fix	2019-01-09 10:37:11 -08:00
Wenting Shen	3027dde303	Fix some storage problems of RayLog (#3595 ) 1. Fix the problem of duplicated stored logs. 2. Save log whose level is higher than severity_threshold, not only with severity_threshold. 3. Fix a `log_dir` bug: storing logs in a wrong path.	2019-01-09 13:54:21 +08:00
Robert Nishihara	067976ad3d	Push a warning to all users when large number of workers have been started. (#3645 ) * Push a warning to all users when large number of workers have been started. * Add test. * Fix bug. * Give warning when worker starts instead of when worker registers. * Fix * Fix tests	2019-01-05 13:27:32 -08:00
Robert Nishihara	b6bcd18d65	Split profile table among many keys in the GCS. (#3676 ) * Divide profile table among many keys in GCS. * Fix, and remove --collect-profiling-data arg. * Remove reference in doc.	2019-01-02 21:33:01 -08:00
Yuhong Guo	93e9d2b82c	Improve backend log: env variable setting and format refine. (#3662 ) * Improve backend logging * Address comment * Fix Raul's comment	2019-01-01 21:45:29 -08:00
Zhijun Fu	382b138fc7	fix code issues in object manager that are reported by scanning tool (#3649 ) Fix some code issues found by code scanning tool: 1. Macro compares unsigned to 0(NO_EFFECT) CWE570: An unsigned value can never be less than 0 This greater-than-or-equal-to-zero comparison of an unsigned value is always true. "this->create_buffer_state_[object_id].num_seals_remaining >= 0UL". ~/ray/src/ray/object_manager/object_buffer_pool.cc: ray::ObjectBufferPool::SealChunk(const ray::UniqueID &, unsigned long) 2. Inferred misuse of enum(MIXED_ENUMS) CWE398: An integer expression which was inferred to have an enum type is mixed with a different enum type This case, "static_cast(ray::object_manager::protocol::MessageType::PushRequest)", implies the effective type of "message_type" is "ray::object_manager::protocol::MessageType". ~/ray/src/ray/object_manager/object_manager.cc: ray::ObjectManager::ProcessClientMessage(std::shared_ptr> &, long, const unsigned char *)	2018-12-28 14:38:59 -08:00
Zhijun Fu	3df1e1c471	Add missing lock in FreeObjects of object buffer pool (#3647 ) Object manager uses multi-threading for transferring objects between different nodes, the plasma client used in object_buffer_pool_ needs to be protected by lock. We have met crashes caused by missing lock in FreeObjects() interface, this PR fixes that issue.	2018-12-28 11:47:31 -08:00
Hao Chen	0b682d043e	Fix memory leak in PyRayletCient (#3640 ) 1) if using `PyObject_GetIter`, the caller must call `Py_DECREF` to avoid memory leak. But with `PyList_GetItem`, `Py_DECREF` isn't needed. 2) the `Py_BuildValue` call in `wait` doesn't need to increment ref count.	2018-12-27 17:39:02 -08:00
Hao Chen	f4011754d6	Fix: ServerConnection should be closed before being removed (#3626 ) Otherwise, in the event of a remote raylet crashing, the connection might be held by boost asio forever, and the pending callbacks will never get invoked. See also #3586.	2018-12-25 11:01:53 -08:00
Robert Nishihara	ddd4c842f1	Initialize some variables in constructor instead of header file. (#3617 ) * Initialize some variables in constructor instead of header file	2018-12-23 02:44:23 -08:00
Alexey Tumanov	bada42c334	object store notification mgr: fix using uninitialized variables (#3592 ) Initialize private class variables to avoid valgrind errors. They are used before initialization.	2018-12-22 19:51:22 -08:00
Philipp Moritz	e578a38116	Fix TensorFlow and PyTorch compatibility (#3574 ) * remove tensorflow workaround * update docker * add boost threads * add date_time, too * change link order * cosmetics	2018-12-22 13:25:48 -08:00
Alexey Tumanov	6b179cb8a7	change the order of allocation for io_service and gcs client in raylet main (#3597 )	2018-12-21 00:13:28 -08:00

1 2 3 4 5 ...

651 commits