hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-10 21:36:39 -04:00

Author	SHA1	Message	Date
Philipp Moritz	02bdaf221d	Update arrow to include https://github.com/apache/arrow/pull/3392 (#3765 ) * update arrow to include https://github.com/apache/arrow/pull/3392 * add appropriate includes * update	2019-01-14 19:20:26 +08:00
Wang Qing	8674606e26	Support to auto-generate Java files from flatbuffer (#3749 ) * auto gen flatbuffers for Java * Add auto_gen_tool.py * Refine * Add a comment * address comments. * Address comments. * Addressed * Refine * Address comments * Fix typo * Add exception * Address comments. * Refine * Fix lint * Fix * Fix lint and address comment. * Fix lint error	2019-01-13 11:39:23 -08:00
Yuhong Guo	d2cf8561f2	Refactor code about ray.ObjectID. (#3674 ) * Refactor code about ray.ObjectID. * remove from_random and use nil_id instead of constructor * remove id() in hash * Lint and fix * Change driver id to ObjectID * Replace binary_to_hex(ObjectID.id()) to ObjectID.hex()	2019-01-13 01:47:29 -08:00
Wang Qing	fa2bfa6d76	Fix some small code quality issues. (#3719 )	2019-01-11 15:24:49 +08:00
Hao Chen	6fc3fc4120	Cap task lease timeout (#3707 )	2019-01-09 17:19:48 -08:00
Stephanie Wang	04f31db54d	Actor dummy object garbage collection (#3593 ) * Convert UniqueID::nil() to a constructor * Cleanup actor handle pickling code * Add new actor handles to the task spec * Pass in new actor handles * Add new handles to the actor registration * Regression test for actor handle forking and GC * lint and doc * Handle pickled actor handles in the backend and some refactoring * Add regression test for dummy object GC and pickled actor handles * Check for duplicate actor tasks on submission * Regression test for forking twice, fix failed named actor leak * Fix bug for forking twice * lint * Revert "Fix bug for forking twice" This reverts commit 3da85e59d401e53606c2e37ffbebcc8653ff27ac. * Add new actor handles when task is assigned, not finished * Remove comment * remove UniqueID() * Updates * update * fix * fix java * fixes * fix	2019-01-09 10:37:11 -08:00
Wenting Shen	3027dde303	Fix some storage problems of RayLog (#3595 ) 1. Fix the problem of duplicated stored logs. 2. Save log whose level is higher than severity_threshold, not only with severity_threshold. 3. Fix a `log_dir` bug: storing logs in a wrong path.	2019-01-09 13:54:21 +08:00
Robert Nishihara	067976ad3d	Push a warning to all users when large number of workers have been started. (#3645 ) * Push a warning to all users when large number of workers have been started. * Add test. * Fix bug. * Give warning when worker starts instead of when worker registers. * Fix * Fix tests	2019-01-05 13:27:32 -08:00
Robert Nishihara	b6bcd18d65	Split profile table among many keys in the GCS. (#3676 ) * Divide profile table among many keys in GCS. * Fix, and remove --collect-profiling-data arg. * Remove reference in doc.	2019-01-02 21:33:01 -08:00
Yuhong Guo	93e9d2b82c	Improve backend log: env variable setting and format refine. (#3662 ) * Improve backend logging * Address comment * Fix Raul's comment	2019-01-01 21:45:29 -08:00
Zhijun Fu	382b138fc7	fix code issues in object manager that are reported by scanning tool (#3649 ) Fix some code issues found by code scanning tool: 1. Macro compares unsigned to 0(NO_EFFECT) CWE570: An unsigned value can never be less than 0 This greater-than-or-equal-to-zero comparison of an unsigned value is always true. "this->create_buffer_state_[object_id].num_seals_remaining >= 0UL". ~/ray/src/ray/object_manager/object_buffer_pool.cc: ray::ObjectBufferPool::SealChunk(const ray::UniqueID &, unsigned long) 2. Inferred misuse of enum(MIXED_ENUMS) CWE398: An integer expression which was inferred to have an enum type is mixed with a different enum type This case, "static_cast(ray::object_manager::protocol::MessageType::PushRequest)", implies the effective type of "message_type" is "ray::object_manager::protocol::MessageType". ~/ray/src/ray/object_manager/object_manager.cc: ray::ObjectManager::ProcessClientMessage(std::shared_ptr> &, long, const unsigned char *)	2018-12-28 14:38:59 -08:00
Zhijun Fu	3df1e1c471	Add missing lock in FreeObjects of object buffer pool (#3647 ) Object manager uses multi-threading for transferring objects between different nodes, the plasma client used in object_buffer_pool_ needs to be protected by lock. We have met crashes caused by missing lock in FreeObjects() interface, this PR fixes that issue.	2018-12-28 11:47:31 -08:00
Hao Chen	0b682d043e	Fix memory leak in PyRayletCient (#3640 ) 1) if using `PyObject_GetIter`, the caller must call `Py_DECREF` to avoid memory leak. But with `PyList_GetItem`, `Py_DECREF` isn't needed. 2) the `Py_BuildValue` call in `wait` doesn't need to increment ref count.	2018-12-27 17:39:02 -08:00
Hao Chen	f4011754d6	Fix: ServerConnection should be closed before being removed (#3626 ) Otherwise, in the event of a remote raylet crashing, the connection might be held by boost asio forever, and the pending callbacks will never get invoked. See also #3586.	2018-12-25 11:01:53 -08:00
Robert Nishihara	ddd4c842f1	Initialize some variables in constructor instead of header file. (#3617 ) * Initialize some variables in constructor instead of header file	2018-12-23 02:44:23 -08:00
Alexey Tumanov	bada42c334	object store notification mgr: fix using uninitialized variables (#3592 ) Initialize private class variables to avoid valgrind errors. They are used before initialization.	2018-12-22 19:51:22 -08:00
Philipp Moritz	e578a38116	Fix TensorFlow and PyTorch compatibility (#3574 ) * remove tensorflow workaround * update docker * add boost threads * add date_time, too * change link order * cosmetics	2018-12-22 13:25:48 -08:00
Alexey Tumanov	6b179cb8a7	change the order of allocation for io_service and gcs client in raylet main (#3597 )	2018-12-21 00:13:28 -08:00
Hao Chen	132a23354e	Fix pending callback not called when ServerConnection destructs (#3572 )	2018-12-19 17:29:36 -08:00
Yuhong Guo	fb33fa9097	Enable function_descriptor in backend to replace the function_id (#3028 )	2018-12-18 18:53:59 -05:00
Stephanie Wang	26ca40817e	Convert UniqueID::nil() to a constructor (#3564 ) * Initialize UniqueID to nil * Return reference to static const variable	2018-12-18 11:59:02 -08:00
Yuhong Guo	75ddf7cca4	Fix 2 small bugs (#3573 )	2018-12-18 14:52:21 -05:00
Robert Nishihara	417c7f2d6f	Update arrow and remove plasma_manager references. (#3545 )	2018-12-15 23:36:02 -08:00
Philipp Moritz	b3bf608608	Update arrow to reduce plasma IPCs. (#3497 )	2018-12-14 23:49:37 -05:00
Stephanie Wang	fcc37021b2	Throw exception for `ray.get` of an evicted actor object (#3490 ) * Add a flag for whether an object has been created before * Add regression test * doc * Share object directory between object and node managers * Treat evicted actor tasks as failed * minor * Check return value * Fix bug where object locations weren't getting updated on client death * Fix mac build * Use RayTaskError	2018-12-14 11:41:27 -08:00
Yuhong Guo	a4abe6c0fe	Add test to test raylet client connection when raylet crashes. (#3518 )	2018-12-13 23:40:50 -08:00
Hao Chen	e7b51cbd1b	[xray] Implement Actor Reconstruction (#3332 ) * Implement Actor Reconstruction * fix * fix actor handle __del__ * fix lint * add comment * Remove actorCreationDummyObjectId * address comments * fix * address comments * avoid copy * change log to debug * fix error name	2018-12-13 21:28:58 -08:00
Alexey Tumanov	2455de78ce	save initial config instead of initial resource config (#3532 )	2018-12-13 20:39:42 -08:00
Si-Yuan	84fae57ab5	Convert the raylet client (the code in local_scheduler_client.cc) to proper C++. (#3511 ) * refactoring * fix bugs * create client class * create client class for java; bug fix * remove legacy code * improve code by using std::string, std::unique_ptr rename private fields and removing legacy code * rename class * improve naming * fix * rename files * fix names * change name * change return types * make a mutex private field * fix comments * fix bugs * lint * bug fix * bug fix * move too short functions into the header file * Loose crash conditions for some APIs. * Apply suggestions from code review Co-Authored-By: suquark <suquark@gmail.com> * format * update * rename python APIs * fix java * more fixes * change types of cpython interface * more fixes * improve error processing * improve error processing for java wrapper * lint * fix java * make fields const * use pointers for [out] parameters * fix java & error msg * fix resource leak, etc.	2018-12-13 13:39:10 -08:00
Eric Liang	20c7fad4f4	Move actor table to primary redis context	2018-12-12 16:51:29 -08:00
Eric Liang	cffe8f9806	Add option to evict keys LRU from the sharded redis tables (#3499 ) * wip * wip * format * wip * note * lint * fix * flag * typo * raise timeout * fix * optional get * fix flag * increase timeout in test * update docs * format	2018-12-09 05:48:52 -08:00
Yuhong Guo	0136af5aac	Add return value for recontruction RPC. (#3493 ) * Add return value for recontruct RPC. * Fix comment function name	2018-12-09 00:08:44 -08:00
Stephanie Wang	4abafd7e62	Fix bug in ray.wait (#3445 ) ray.wait depends on callbacks from the GCS to decide when an object has appeared in the cluster. The raylet crashes if a callback is received for a wait request that has already completed, but this actually can happen, depending on the order of calls. More precisely: 1. Objects A and B are put in the cluster. 2. Client calls ray.wait([A, B], num_returns=1). 3. Client subscribes to locations for A and B. Locations are cached for both, so callbacks are posted for each. 4. Callback for A fires. The wait completes and the request is removed. 5. Callback for B fires. The wait request no longer exists and raylet crashes.	2018-12-01 19:40:33 -08:00
Stephanie Wang	48a5935224	Fault tolerance for actor creation (#3422 ) * Add regression test * Request actor creation if no actor location found * Comments * Address comments * Increase test timeout * Trigger test	2018-11-29 10:48:35 -08:00
Tianming Xu	139fbf7884	Initialize client_id_ in ObjectManager constructor that takes user-defined ObjectDirectory (#3403 )	2018-11-27 23:51:18 -08:00
Eric Liang	c2108ca64f	Don't put entire actor registry in debug string since it's too long (#3395 )	2018-11-27 16:48:12 -08:00
Stephanie Wang	6b3236349c	Fix memory leak in lineage cache (#3366 ) * Move children_ map inside Lineage * Update lineage_cache.cc * Test and fixes * Remove unused	2018-11-21 16:18:39 -08:00
Stephanie Wang	3e33f6f71b	Fix failure handling for actor death (#3359 ) * Broadcast actor death, clean up dummy objects * Reduce logging and clean up state when failing a task * lint * Make actor failure test nicer, reduce node timeout	2018-11-21 12:26:22 -08:00
Eric Liang	686cf20951	Remove uses of std::list::size (#3358 ) * worker pool and client conn * Fix linting * unordered set * move	2018-11-20 14:47:55 -08:00
Philipp Moritz	d3697ce4e1	Ready queue refactor to make Dispatching tasks more efficient (#3324 ) * put queues outside * working version, still needs to be optimized * implement round robin * proper round robin * fix spillback * update * fix * cleanup * more cleanups * fix * fix * add documentation * explanation for hash combiner * speed it up * cleanup and linting * linting * comments * Update scheduling_queue.h * temp commit * fixes * update * fix * cleanup * cleanup * lint * more prints * more prints * increase sleep * documentation * sleep * fix * fix * sleep longer * update * fix * fix * fix * Add ordered_set container. * Fix * Linting * Constructors * Remove O(n) call to list.size(). * fixes * use ordered set * Fix. * Add documentation. * Add iterators to ordered_set container implementation. * iterator_type -> iterator * Make typedefs private * Add const_iterator * fix * fix test * linting * lint * update * add documentation * linting	2018-11-20 13:14:12 -08:00
Ujval Misra	b0bfd104f2	Batch heartbeats from node manager together in the monitor. (#3011 )	2018-11-20 09:52:27 -08:00
Robert Nishihara	f2b5500642	Add ordered_set container. (#3352 ) * Add ordered_set container. * Fix * Linting * Constructors * Remove O(n) call to list.size(). * Fix. * Add documentation. * Add iterators to ordered_set container implementation. * iterator_type -> iterator * Make typedefs private * Add const_iterator	2018-11-19 17:01:18 -08:00
Eric Liang	d4dbd27e0d	Don't retry IPC connect an absurd number of times (#3355 )	2018-11-19 16:23:59 -08:00
Robert Nishihara	5cbc597494	Suppress duplicate pre-emptive object pushes. (#3276 ) * Suppress duplicate pre-emptive object pushes. * Add test. * Fix linting * Remove timer and inline recent_pushes_ into local_objects_. * Improve test. * Fix * Fix linting * Enable retrying pull from same object manager. Randomize object manager. * Speed up test * Linting * Add test. * Minor * Lengthen pull timeout and reissue pull every time a new object becomes available. * Increase pull timeout in test. * Wait for nodes to start in object manager test. * Wait longer for nodes to start up in test. * Small fixes. * _submit -> _remote * Change assert to warning.	2018-11-16 23:02:45 -08:00
Robert Nishihara	60b22d9a72	Don't unsubscribe dependencies for infeasible tasks. (#3338 ) * Make scheduling queues RemoveTasks return task states as well. * Add test * Don't unsubscribe for infeasible tasks when spilling over. * Linting * Address comments.	2018-11-16 11:33:00 -08:00
Eric Liang	e0bf9d7305	Add debug string to raylet (#3317 ) * initial debug string * format * wip debug string * fix compile * fix * update * finished * to file * logs dir * use temp root * fix * override	2018-11-15 21:47:50 -08:00
Philipp Moritz	1be1455d86	Fix redis crash when duplicate messages are appended to log. (#3316 )	2018-11-15 15:09:39 -08:00
Philipp Moritz	b6a12d1f97	Fix socket retry message (#3325 )	2018-11-15 12:14:19 -08:00
Stephanie Wang	577c1dda74	Release sender connections as soon as WriteMessageAsync completes (#3313 )	2018-11-13 21:32:24 -05:00
Ion	d681893b0f	Speed up task dispatch. (#3234 ) * speed up task dispatch * minor changes * improved comments * improved comments * change argument of DispatchTasks to list of tasks * dispatch only tasks whose dependencies have been fullfiled * some updated comments * refactored DispatchQueue() and Assigntask() to avoid the copy of the ready list * minor fixes * some more minor fixes * some more minor fixes * added more comments * better comments? * fixed all feedback comments, minus making the argument of AssignTask() const * Assigntask() now taskes a const argument * Do the task copy outside of the callback * fix linting	2018-11-10 09:55:12 -08:00

1 2 3 4 5 ...

719 commits