hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-08 11:31:40 -05:00

Author	SHA1	Message	Date
architkulkarni	3ce03a52bc	Revert "Revert "Revert "Unhandled exception handler based on local ref counti… (#14113 )" (#14136 ) This reverts commit `e457872fe1`.	2021-02-16 11:47:09 -08:00
Barak Michener	c43a64230e	[ray_client]: Fix mutual recursion (#14122 )	2021-02-16 10:37:58 -08:00
SangBin Cho	4ad79ca963	[Object Spilling] Remove LRU eviction (#13977 ) * done. * formatting. * done. * done.	2021-02-15 14:24:53 -08:00
Eric Liang	e457872fe1	Revert "Revert "Unhandled exception handler based on local ref counti… (#14113 ) * Revert "Revert "Unhandled exception handler based on local ref counting (#14049)" (#14099)" This reverts commit `b45ae76765`. * reomve test * fix * fix	2021-02-15 14:11:11 -08:00
SangBin Cho	b45ae76765	Revert "Unhandled exception handler based on local ref counting (#14049 )" (#14099 ) This reverts commit `9dc671ae02`.	2021-02-14 22:08:32 -08:00
Alex Wu	5636af8084	[hotfix] Fix mac build (#14075 ) * . * done? * . Co-authored-by: Alex Wu <alex@anyscale.com>	2021-02-14 14:26:51 -08:00
Eric Liang	9dc671ae02	Unhandled exception handler based on local ref counting (#14049 )	2021-02-12 22:58:38 -08:00
Clark Zinzow	c7ff69f4bf	[OBOD] Add support for ownership-based object directory object recovery. (#14066 )	2021-02-12 11:58:31 -08:00
Clark Zinzow	cd7e567a57	[Core] Ownership-based Object Directory - Added support for object spilling in the ownership-based object directory. (#13948 ) * Add support for object spilling in the ownership-based object directory. * Move owner address hashmap into pinned_objects_ and objects_pending_spill_. * Update local object manager tests. * Feedback and misc. fixes. * Move spilled unpin callback lambda to std::binded private method. * Skip test_delete_objects_multi_node test on MacOS for now.	2021-02-11 10:36:22 -08:00
Ameer Haj Ali	d87a82e891	Revert "Revert "[Autoscaler] Monitor refactor for backward compatability. (#13970 )" (#14046 )" (#14050 ) * prepare for head node * move command runner interface outside _private * remove space * Eric * flake * min_workers in multi node type * fixing edge cases * eric not idle * fix target_workers to consider min_workers of node types * idle timeout * minor * minor fix * test * lint * eric v2 * eric 3 * min_workers constraint before bin packing * Update resource_demand_scheduler.py * Revert "Update resource_demand_scheduler.py" This reverts commit 818a63a2c86d8437b3ef21c5035d701c1d1127b5. * reducing diff * make get_nodes_to_launch return a dict * merge * weird merge fix * auto fill instance types for AWS * Alex/Eric * Update doc/source/cluster/autoscaling.rst * merge autofill and input from user * logger.exception * make the yaml use the default autofill * docs Eric * remove test_autoscaler_yaml from windows tests * lets try changing the test a bit * return test * lets see * edward * Limit max launch concurrency * commenting frac TODO * move to resource demand scheduler * use STATUS UP TO DATE * Eric * make logger of gc freed refs debug instead of info * add cluster name to docker mount prefix directory * grrR * fix tests * moving docker directory to sdk * move the import to prevent circular dependency * smallf fix * ian * fix max launch concurrency bug to assume failing nodes as pending and consider only load_metric's connected nodes as running * small fix * Revert "Revert "[Autoscaler] Monitor refactor for backward compatability. (#13970)" (#14046)" This reverts commit `6f9d39fb3e`. * fake news Co-authored-by: Ameer Haj Ali <ameerhajali@ameers-mbp.lan> Co-authored-by: Alex Wu <alex@anyscale.io> Co-authored-by: Alex Wu <itswu.alex@gmail.com> Co-authored-by: Eric Liang <ekhliang@gmail.com> Co-authored-by: Ameer Haj Ali <ameerhajali@Ameers-MacBook-Pro.local>	2021-02-10 17:59:08 -08:00
Stephanie Wang	fc89984162	Subtract from num bytes in use (#13944 )	2021-02-10 12:22:08 -08:00
architkulkarni	6f9d39fb3e	Revert "[Autoscaler] Monitor refactor for backward compatability. (#13970 )" (#14046 ) This reverts commit `7a6f8054d1`.	2021-02-10 12:16:52 -08:00
fangfengbin	1754359281	[Core]Fix ray.kill doesn't cancel pending actor bug (#14025 )	2021-02-10 15:30:21 +08:00
Ameer Haj Ali	7a6f8054d1	[Autoscaler] Monitor refactor for backward compatability. (#13970 )	2021-02-09 21:41:50 -08:00
Kai Yang	e0b81796c5	Revert "Revert "[Java] fix test hang occasionally when running FailureTest (#13934 )" (#13992 )" (#14008 )	2021-02-09 12:43:26 -08:00
Simon Mo	f51c26bae6	Revert "[Core]Fix ray.kill doesn't cancel pending actor bug (#13254 )" (#14013 ) This reverts commit `2092b097ea`.	2021-02-09 11:36:38 -08:00
fangfengbin	2092b097ea	[Core]Fix ray.kill doesn't cancel pending actor bug (#13254 )	2021-02-09 10:59:14 +08:00
Simon Mo	ec94214957	Revert "[Java] fix test hang occasionally when running FailureTest (#13934 )" (#13992 ) This reverts commit `bcf9457abb`.	2021-02-08 11:30:30 -08:00
Kai Yang	bcf9457abb	[Java] fix test hang occasionally when running FailureTest (#13934 )	2021-02-08 18:21:50 +08:00
Kai Yang	4b4941435d	[Java] fix actor restart failure when multi-worker is turned on (#13793 )	2021-02-07 21:12:54 +08:00
Simon Mo	ea4154df80	[Hotfix] Master compilation error on MacOS. (#13946 )	2021-02-05 16:07:45 -08:00
fyrestone	eee624cf5f	Revert "Fix passing env on windows (#13253 )" (#13828 )	2021-02-05 13:03:16 +08:00
fangfengbin	8a5999c12a	[GCS]Fix bug that gcs client does not set last_resource_usage_ (#13856 )	2021-02-05 11:51:25 +08:00
DK.Pino	fb89f9c2c8	[Placement Group] Support named placement group (#13755 )	2021-02-05 11:04:51 +08:00
Tao Wang	44aa9c173f	Rename timeout to period with heartbeat interval (#13872 )	2021-02-04 10:37:28 +08:00
Tao Wang	e0d9c8f0a8	Always replace DEL with UNLINK (#13832 )	2021-02-04 10:30:00 +08:00
Clark Zinzow	407302f93a	[Core] Ownership-based Object Directory - Changed infinite short-poll location subscription to long-poll. (#13841 )	2021-02-03 14:16:42 -08:00
SangBin Cho	cb9fa90203	[Object Spilling] Add consumed bytes to detect thrashing. (#13853 )	2021-02-03 14:16:26 -08:00
Alex Wu	f14171ced9	[Core] Put raylet ip's in resource usage report (#13871 ) * . * done? Co-authored-by: Alex Wu <alex@anyscale.com>	2021-02-03 11:28:56 -08:00
Gabriele Oliaro	79310452e7	Enabling the cancellation of non-actor tasks in a worker's queue 2 (#13244 ) * wrote code to enable cancellation of queued non-actor tasks * minor changes * bug fixes * added comments * rev1 * linting * making ActorSchedulingQueue::CancelTaskIfFound raise a fatal error * bug fix * added two unit tests * linting * iterating through pending_normal_tasks starting from end * fixup! iterating through pending_normal_tasks starting from end * fixup! fixup! iterating through pending_normal_tasks starting from end * post merge fixes * added debugging instructions, pulled Accept() out of guarded loop * removed debugging instructions, linting * first commit * lint * lint * added hack to avoid race condition in test stress * moved hack * fix test cancel * removed hack (hopefully no longer needed) * Revert "removed hack (hopefully no longer needed)" This reverts commit 99d0e7c91539f290700f50aaaed805dcde04a5ee. * added sleep in mock_worker.cc * sleep function fixup to work on windows * sleep in test_fast both for force=true and force=false * linting Co-authored-by: Ian <ian.rodney@gmail.com>	2021-02-03 10:20:12 -08:00
fangfengbin	b4684cf37a	Fix bug that otal_commands_queued_ is not initialized (#13852 )	2021-02-03 10:00:15 +08:00
Eric Liang	fa4290090d	Add Ray client protocol version (#13846 )	2021-02-02 00:19:08 -08:00
SangBin Cho	886217c333	[Object Spilling] Skip normal ray.get path when spilling objects. (#13831 )	2021-02-01 16:03:34 -08:00
Stephanie Wang	754bee9282	[core][object spillin] Fix bugs in admission control (#13781 )	2021-02-01 10:48:21 -08:00
Tao Wang	1d2ab018b0	Use right reserve size (#13829 )	2021-02-01 15:49:34 +08:00
Lingxuan Zuo	b5f0aed974	[Log] use default stderr logger if no raylog starting (#13762 )	2021-02-01 11:13:06 +08:00
Stephanie Wang	30f82329e3	[core] Add debug information for the PullManager and LocalObjectManager (#13782 ) * Add debug info * Formatting. Co-authored-by: SangBin Cho <rkooo567@gmail.com>	2021-01-29 17:55:46 -08:00
Hao Chen	0f3a3e14aa	Only delete local object in CoreWorkerPlasmaStoreProvider:::WarmupStore (#13788 )	2021-01-29 20:24:09 +08:00
Stephanie Wang	42d501d747	[core] Pin arguments during task execution (#13737 ) * tmp * Pin task args * unit tests * update * test * Fix	2021-01-28 19:07:10 -08:00
Tao Wang	56ee6ef55f	[GCS]only update states related fields when publish actor table data (#13448 )	2021-01-28 11:12:57 +08:00
Simon Mo	4f1f558802	[Core] Hotfix Windows Compilation Error for ClusterTaskManager (#13754 ) * [Core] Hotfix Windows Compilation Error for ClusterTaskManager * fix	2021-01-27 19:01:56 -08:00
Alex Wu	c0fe816466	[Core/Autoscaler] Properly clean up resource backlog from (#13727 )	2021-01-27 15:30:58 -08:00
Eric Liang	56a9523020	Fix high CPU usage in object manager due to O(n^2) iteration over active pulls list (#13724 )	2021-01-27 14:02:22 -08:00
DK.Pino	7f6d326ad8	[Placement Group]Add detached support for placement group. (#13582 )	2021-01-27 18:51:26 +08:00
SangBin Cho	8baafacb1e	[Logging] Log rotation config (#13375 ) * In Progress. * formatting. * in progress. * linting. * Done. * Fix typo. * Fixed the issue.	2021-01-26 20:15:55 -08:00
Lingxuan Zuo	f9f2bfa778	[Metric] Fix crashed when register metric view in multithread (#13485 ) * Fix crashed when register metric view in multithread * fix comments * fix	2021-01-25 20:32:08 +08:00
SangBin Cho	edbb2937d3	[Object Spilling] Multi node file spilling V2. (#13542 ) * done. * done. * Fix a mistake. * Ready. * Fix issues. * fix. * Finished the first round of code review. * formatting. * In progress. * Formatting. * Addressed code review. * Formatting * Fix tests. * fix bugs. * Skip flaky tests for now.	2021-01-23 23:15:32 -08:00
Qing Wang	8ef835ff03	Remove idle actor from worker pool. (#13523 )	2021-01-23 13:57:30 +08:00
Kai Yang	90f1e408de	[Java] Add `fetchLocal` parameter in `Ray.wait()` (#13604 )	2021-01-22 17:55:00 +08:00
Stephanie Wang	0998d69968	[core] Admission control for pulling objects to the local node (#13514 ) * Admission control, TODO: tests, object size * Unit tests for admission control and some bug fixes * Add object size to object table, only activate pull if object size is known * Some fixes, reset timer on eviction * doc * update * Trigger OOM from the pull manager * don't spam * doc * Update src/ray/object_manager/pull_manager.cc Co-authored-by: Eric Liang <ekhliang@gmail.com> * Remove useless tests * Fix test * osx build * Skip broken test * tests * Skip failing tests Co-authored-by: Eric Liang <ekhliang@gmail.com>	2021-01-21 16:46:42 -08:00

1 2 3 4 5 ...

1807 commits