hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-10 05:16:49 -04:00

Author	SHA1	Message	Date
Kai Yang	853d650e29	Revert "Revert "[Object spilling] Avoid worker crash when an object is spille… (#15964 )" (#16012 ) This reverts commit `29aa336a4d`.	2021-05-25 23:48:24 -07:00
Eric Liang	ea6bdfb9c1	Prevent object store from allocating over the specified limit even if there is memory fragmentation (#15951 )	2021-05-24 17:56:11 -07:00
Yi Cheng	7c45480542	[runtime env] Introduce OS envs to skip GC for runtime env in local node; (#15984 )	2021-05-21 12:49:22 -07:00
Eric Liang	29aa336a4d	Revert "[Object spilling] Avoid worker crash when an object is spille… (#15964 ) This reverts commit `061e3fbde3`.	2021-05-20 21:17:59 -07:00
SangBin Cho	a1375a955b	Pubsub registration / unregistration idempotency (#15896 ) * Make AddEntry idempotent. * Done.	2021-05-20 18:40:06 -07:00
Kai Yang	061e3fbde3	[Object spilling] Avoid worker crash when an object is spilled right after being restored (#15903 ) * Fix check failure when memory pressure is high * Add test * lint	2021-05-20 18:36:11 -07:00
Frank Luan	c87b76632d	[plasma] Reset OOM timer as objects are being spilled (#15431 ) * Fix deserializer in metrics.Counter * Fix restore_spilled_objects() for external object spilling * WIP reset OOM timer * Add test * Revert style change * pytest * Simplify test * Fix test * Make tests faster	2021-05-20 13:13:54 -07:00
Alex Wu	ec997c0145	[client] Client builder API namespace support (#15934 ) * add namespace to client * done? * address comments Co-authored-by: Alex <alex@anyscale.com>	2021-05-20 12:36:05 -07:00
Alex Wu	cd2fc7792f	[dashboard] Snapshot of cluster state (#15868 )	2021-05-20 08:10:32 -07:00
Yi Cheng	874558e813	[runtime env] Put runtime env into runtime context; (#15895 )	2021-05-20 08:08:45 -07:00
Ian Rodney	4825f1b2a5	[client] One Driver per RayClient Server (#15923 )	2021-05-19 15:40:49 -07:00
architkulkarni	c3d06697bb	[Core] Add dynamic conda env install in shim process (#15881 )	2021-05-19 15:46:42 -05:00
Eric Liang	836c739fe5	Revert "[client] One Driver per RayClient Server (#15875 )" (#15922 ) This reverts commit `97d1414f23`.	2021-05-19 11:58:29 -07:00
Ian Rodney	97d1414f23	[client] One Driver per RayClient Server (#15875 )	2021-05-19 09:03:09 -07:00
qicosmos	8790bb465b	[C++ worker] Remove func ptr offset (#15809 )	2021-05-19 18:03:39 +08:00
architkulkarni	194c5e3a96	[Core] Cache workers by runtime_env in worker pool (#15782 ) * pass RuntimeEnv in task spec as opaque string * lint * set correct empty value for json: "{}" not "" * add comment for field in proto * fix worker pool test by checking both "" and "{}" * add RAY_CHECK todo * make dict empty if all values null * remove unnecessary ser/de * fix * address comments * add WorkerCacheKey with hash function * clean up * add naive impl., dedicated workers never killed * put dedicated workers in idle_of_all_languages * pipe env hash from worker.py -> Worker * fully pipe through hash, basic cache test passing * use int type for runtime env hash * convert Worker env hash type from size_t to int * fix * add method to MockWorker to fix cpp tests * make compatible with java streaming test * restore old dynamic_options code to fix java test * address comments * add comment about sorting before hash * add comments for private members of WorkerCacheKey	2021-05-18 00:19:27 -07:00
Alex Wu	69f228d22d	[core] Record actor+job start/end times and metadata (#15803 )	2021-05-17 21:38:39 -07:00
Frank Luan	0dc34566fe	Refactor raylet to allocate+write+seal one return object at a time (#15757 ) * Refactor raylet to allocate+write+seal one return object at a time * Fix build * Fix C++ and Java runtime * Skip Windows testing * Fix java and cpp runtime * Fix warnings * Fix cpp and java tests * Fix cpp and java runtime Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>	2021-05-17 20:06:08 -07:00
SangBin Cho	ff461634b0	[Core] Improved bad error message. (#15663 ) * Improved bad error message. * Update src/ray/raylet/node_manager.cc Co-authored-by: Clark Zinzow <clarkzinzow@gmail.com> * lint. * Add a pid Co-authored-by: Alex Wu <itswu.alex@gmail.com> Co-authored-by: Clark Zinzow <clarkzinzow@gmail.com>	2021-05-17 19:38:05 -07:00
Alex Wu	3e94114336	Namespaces (#15774 )	2021-05-17 10:04:22 -07:00
SangBin Cho	259fcbd5bd	[Pubsub] Generalize the pubsub interface and adapt it for ref counting protocol (#15446 ) * Add mock code first * In the initial progress. * Fix the number error * In progress. * in more pgoress. * in progress. * lint. * Prototype done. * Fix compilation bug. * Now it is working with reference counting. * Remove template. * lint. * Fixed issues. * Fix reference count test. * Reference count test passes now. * Fixed the test array problem * Addressed code review. * lint. * Addressed half of code review. * Fix tests. * Addressed the most critical issue. * Make subscriber thread-safe. * Revert "Make subscriber thread-safe." This reverts commit 9a6a52197cfa8463ab60dfaae9530ad3c0ed8790. * Fixed test failures. The only failure now is the asan failure. * Reset test suites and see if it fixes the issue. * Fix a flaky test * Addressed code review.	2021-05-13 09:29:02 -07:00
architkulkarni	a0c1cfe034	[Core] Pass RuntimeEnv as opaque string in the task spec (#15658 )	2021-05-13 10:32:00 -05:00
SongGuyang	40b2face74	Fix std::atomic compiling error (#15781 )	2021-05-13 10:27:45 -05:00
Tao Wang	19462e43d6	[large scale]use proxy to track gcs server address in core worker (#15714 )	2021-05-13 19:26:01 +08:00
fcardoso75	c877da4c19	create_and_mmap_buffer() - In case CreateFileMapping() fails, GetLastError() return code is printed (#15773 ) * Enabling all test cases on test_client.py * Moving test_client.py to a large CI py_test_module_list * Disabling test_client::test_remote_functions * Divide Run CI script action into separete Build action and Test action * Reverting test_client.py to separate work for different tickes * Reverting python\ray\tests\BUILD to separate work for different tickets * create_and_mmap_buffer() - In case CreateFileMapping() fails, GetLastError() return code is printed * Addressed lint comments Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>	2021-05-13 00:31:33 -07:00
Ian Rodney	cdf93930f3	Revert "[Core] Fix event loop instrumentation causing Java segfaults in tests. (#15349 )" (#15727 ) This reverts commit `edb0d1b376`.	2021-05-12 15:49:06 -07:00
mwtian	6a044f4f30	[Test] Ensure output params are initialized before calling IsPlasmaObjectPinnedOrSpilled() (#15758 )	2021-05-12 10:22:35 -07:00
fyrestone	56c309416e	[Job submission] Basic job submission structure (#15103 )	2021-05-12 15:08:20 +08:00
Clark Zinzow	c1b7d6f115	Don't consider a worker to be idle if it has in-flight object pinning RPCs. (#15686 )	2021-05-11 19:21:52 -07:00
Eric Liang	82d5b67521	Remove placement group log spam (#15747 )	2021-05-11 17:08:06 -07:00
Eric Liang	cb59d30917	Drop profiling events if the GCS becomes backlogged (#15726 )	2021-05-11 14:10:34 -07:00
Eric Liang	996a002b00	Add prepopulate plasma memory flag for debugging (#15669 ) * add prepopulate flag * fix build * warn	2021-05-07 15:17:31 -07:00
Clark Zinzow	edb0d1b376	[Core] Fix event loop instrumentation causing Java segfaults in tests. (#15349 ) * Reenable event loop instrumentation. * Take stats handle by copy in post() handler closure. * Revert "Take stats handle by copy in post() handler closure." This reverts commit e46777939bcc3bb4bb101e136e9d3348ea4ae1a1.	2021-05-07 15:01:00 -07:00
Yi Cheng	d5379ba99e	[core] RuntimeEnv GC in gcs (#14833 )	2021-05-06 11:31:33 -05:00
Alex Wu	18d85d2de9	Grpc based resource broadcast (#15466 )	2021-05-05 11:20:08 -07:00
architkulkarni	e5c5dde847	[Core] Prevent dedicated workers from being returned to general idle pool (#15545 )	2021-04-29 15:45:25 -05:00
Alex Wu	40a6ced996	[core] Handle blocked worker crashes edge case (#15083 )	2021-04-27 10:14:12 -07:00
Ian Rodney	4db696d365	[Client] Asyncio Client, Sync gRPC Server (#15488 )	2021-04-27 08:41:10 -07:00
Ian Rodney	360b053254	[client] Add support for `ray.timeline()` (#15448 )	2021-04-26 18:32:22 -07:00
architkulkarni	b08b2c5103	[Core] Add "shim process" setup_worker.py that calls "conda activate" for runtime_env (#15361 )	2021-04-23 15:29:52 -05:00
Eric Liang	93a1ecba4b	Unhandled error messages aren't printed until next interaction with shell (#15432 )	2021-04-23 11:00:34 -07:00
fangfengbin	d9780761a3	[GCS]Revert ping_gcs_rpc_server_max_retries to 600 (#14443 )	2021-04-23 10:02:38 +08:00
Jialing He	5403021430	Fix incorrect call function WorkerID::FromBinary (#15449 )	2021-04-22 15:44:49 +08:00
Yi Cheng	dbba3a456f	[core] Fixing of actor creation failure (#15411 ) * Fix * fix * format * fix * fix * fix * fix * fix * fix * fix * format * fix comments	2021-04-20 15:27:45 -07:00
Yi Cheng	9b3ea7c32b	[core] Take care of object spilling failure (#14703 ) * fix spilling failure * format * unittests added * format * format * format * fix * add comment * fix some comments * add test cases * format * format	2021-04-20 10:28:48 -07:00
fangfengbin	ade684ac03	[Test] Fix gcs flaky testcase (#15391 ) Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>	2021-04-19 10:21:39 -07:00
SangBin Cho	5f74d0e40d	[Test] Fix flaky test failure (#15326 ) * Fix trial. * unskip test. * Mock commit	2021-04-16 18:09:02 -07:00
fangfengbin	0e3bbbeba3	[Test] Try deflaking gcs server test by adding log (#15332 ) Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>	2021-04-15 21:16:09 -07:00
Stephanie Wang	6b2da7eda8	[core] Log warning on bad max task args value (#15314 )	2021-04-14 20:34:08 -07:00
Yi Cheng	0caf96be94	Take care of failed killing request (#15313 )	2021-04-14 18:07:10 -07:00

1 2 3 4 5 ...

1984 commits