hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Edward Oakes	c73fdb7425	Ignore errors in ObjectID.__dealloc__ (#5997 )	2019-10-24 16:48:47 -07:00
Philipp Moritz	09d05bb3fa	Reduce actor submission python overhead (#5949 )	2019-10-23 00:11:32 -07:00
Edward Oakes	02931e08f3	[core worker] Python core worker task execution (#5783 ) Executes tasks via the event loop in the C++ core worker. Also properly handles signals (including KeyboardInterrupt), so ctrl-C in a python interactive shell works now (if connecting to an existing cluster).	2019-10-22 20:15:59 -07:00
Siyuan (Ryans) Zhuang	95241f6686	Fix the incorrect serialization behavior with pickle (#5960 )	2019-10-22 18:08:36 -07:00
Richard Liaw	81dd0dfb0a	[tune] fix conditional identifier (#5971 ) * fix conditional identifier * fix * doc	2019-10-22 02:00:49 -07:00
Richard Liaw	252a5d13ed	[sgd/tune][minor] more tf ports (#5953 )	2019-10-21 16:46:16 -07:00
Mitchell Stern	235dec8aa3	[Dashboard] Remove token authentication from dashboard (#5888 )	2019-10-21 12:48:48 -07:00
Richard Liaw	26a724c5e6	[core] Support kwargs and positionals in Ray remote calls (#5606 )	2019-10-20 22:40:54 -07:00
Edward Oakes	fc56872012	Send active object IDs to the raylet (#5803 ) * Send active object IDs to the raylet * comment * comments * dedup * signed int in config * comments * Remove object ID from monitor * Fix test * re-add check * fix cast * check if core worker * Add comment * Reservoir sampling * Fix lint * Pointer return * tmp * Fix merge * Initialize object ids properly * Fix lint	2019-10-20 22:05:28 -07:00
Simon Mo	6b36ef1138	[Serve] Ensure strict traffic splitting (#5929 ) * [Serve] Ensure strict traffic splitting * Fix test	2019-10-20 20:18:14 -07:00
Stephanie Wang	bc4a0de4da	Fix multiple drivers for named actors and add test (#5956 )	2019-10-20 16:04:21 -07:00
Richard Liaw	74852c80cb	[docs] Improve more serialization Errors (#5658 )	2019-10-20 14:06:00 -07:00
Richard Liaw	91acecc9f9	[tune][minor] gpu warning (#5948 ) * gpu * formaat * defaults * format_and_check * better registration * fix * fix * trial * foramt * tune	2019-10-19 17:09:48 -07:00
Philipp Moritz	d23696de17	Introduce flag to use pickle for serialization (#5805 )	2019-10-18 22:29:36 -07:00
Philipp Moritz	29eee7f970	Forward multiple ports for autoscaler (#5893 )	2019-10-18 16:50:46 -07:00
Richard Liaw	48ba484640	[tune] Test TF2.0, TF1.14, TF1.12 Tensorboard support (#5931 )	2019-10-18 13:50:42 -07:00
Stephanie Wang	697f765efc	Refactor CoreWorker to remove TaskInterface (#5924 ) * Remove TaskInterface * Remove Status return value * Remove CActorHandle, some return values, TaskSubmitter * lint * doc * doc * fix build * lint * Return Status, guarded by annotation, fail tasks for RECONSTRUCTING actors * fix * move annotation * revert * Fix core worker test * nits	2019-10-18 00:03:57 -04:00
Stephanie Wang	3ac8592dcf	Remove actor handle IDs (#5889 ) * Remove actor handle ID from main ActorHandle constructor * Set the actor caller ID when calling submit task instead of in the actor handle * Remove ActorHandle::Fork, remove actor handle ID from protobuf * Make inner actor handle const, remove new_actor_handles * Move caller ID into the common task spec, start refactoring raylet * Some fixes for forking actor handles * Store ActorHandle state in CoreWorker, only expose actor ID to Python * Remove some unused fields * lint * doc * fix merge * Remove ActorHandleID from python/cpp * doc * Fix core worker test * Move actor table subscription to CoreWorker, reset actor handles on actor failure * lint * Remove GCS client from direct actor * fix tests * Fix * Fix tests for raylet codepath * Fix local mode * Fix multithreaded test * Fix AsyncSubscribe issue... * doc * fix serve * Revert bazel	2019-10-17 12:36:34 -04:00
Philipp Moritz	32b2907457	Update max resource label and give better error message (#5916 )	2019-10-16 22:37:01 -07:00
Peter Schafhalter	6c11b534c8	[Autoscaler] Update AWS Deep Learning AMI to version 24.3 (#5932 )	2019-10-16 16:50:54 -07:00
Richard Liaw	9f23620412	[tune] tf2.0 mnist example (#5898 ) * tfmnistexample * tfmnist * add_to_ci * format * exampledownlaod * fix	2019-10-15 22:25:01 -07:00
Eric Liang	6843a01a7f	Automatically create custom node id resource (#5882 ) * node id * comment * comments * fix tests	2019-10-15 21:31:11 -07:00
Richard Liaw	c52bb0621d	[tune] Support TF2.0 on Keras Callback (#5912 )	2019-10-15 10:49:50 -07:00
Eric Liang	69d5c1b53a	remove evil redirects (#5919 )	2019-10-14 19:41:04 -07:00
Camille Couturier	320cba313f	[tune] Explicitly set scheduler in run() (#5871 ) * Explicitely set scheduler in run() * Better formatting/indentation (after running format.sh) * Remove accidental paste in parameters definitions. * format	2019-10-14 15:44:59 -07:00
Philipp Moritz	8fd23c0c3f	Add back TensorFlow test (#5885 )	2019-10-14 11:26:02 -07:00
Richard Liaw	20c0cdee4f	[autoscaler] Worker-Head termination + Better Scale-up message (#5909 )	2019-10-14 10:37:50 -07:00
Edward Oakes	abbfe7392f	Bump dev version to 0.8.0.dev6 (#5906 )	2019-10-14 11:36:13 +01:00
Richard Liaw	1650f7b174	[tune] Remove TF MNIST example + add TrialRunner hook to execut… (#5868 ) * remove test * add trial runner * remvoerestore * Remove other mnist examples * tunetest * revert * v1 * Revert "v1" This reverts commit c8bddaf2db7a8270c43c02021cac0e75df15ed20. * Revert "revert" This reverts commit b58f56884a0c288d3a6f997d149ab4d496ddd7a3. * errors * format	2019-10-13 20:33:56 -07:00
Richard Liaw	52e5c9b22d	[tune] CPU-Only Head Node support (#5900 ) * trialqueue * add tests	2019-10-13 20:31:42 -07:00
Eric Liang	2cbc67f3d5	Fix test_dying_worker_get (#5908 )	2019-10-13 18:06:28 -07:00
Richard Liaw	0f24509c30	[autoscaler] uptime redirect fix (#5907 ) * small change * comment	2019-10-13 23:25:15 +01:00
Edward Oakes	6eaa8e31fa	[autoscaler] Revert to double-spawning updater threads (#5903 ) * [autoscaler] Revert to double-spawning threads * Use log prefix * add comment	2019-10-13 20:00:06 +01:00
Simon Mo	97a786cf11	[Serve] Remove handle passing in tail recursion (#5894 ) * Remove handle pass in tail recursion * Quick fix * Fix worker timeout issue	2019-10-12 20:13:20 -07:00
Eric Liang	0e8c3c0346	Don't wrap RayError with RayTaskError (#5870 )	2019-10-11 11:00:08 -07:00
Edward Oakes	779f91523b	[autoscaler] Fix quoting (#5891 )	2019-10-11 00:40:26 -07:00
Simon Mo	4b99cb429e	[Serve] Hotfix: Fix actor handle hashing in metric monitoring (#5886 )	2019-10-11 00:31:42 -07:00
Robert Nishihara	523c764c25	Python 2 compatibility. (#5887 )	2019-10-10 19:09:25 -07:00
Eric Liang	c3b2ae26c5	Fix str of RayTaskError (#5878 ) * fix key error * fix	2019-10-10 16:53:18 -07:00
Mitchell Stern	195ca43e9c	[Dashboard] Improve handling of logs and errors in dashboard backend (#5857 ) * Improve handling of logs and errors in dashboard backend * Update nested dict comprehension for clarity	2019-10-10 11:59:54 -07:00
Eric Liang	1a8ac3db46	Implement fair task queueing to prevent task starvation (#5851 ) * initial commit * lint * clarify * add feature flag * comment * add timeout to test * fix print * comment * use id for scheduling class * lint * dad warn * flake	2019-10-08 21:04:25 -07:00
Richard Liaw	1181924077	[tune][minor] formatting examples, fix travis (#5869 ) * formatting * formatting	2019-10-08 17:58:43 -07:00
Ujval Misra	a851d7eb87	[tune] Readable trial progress output (#5822 ) * Cleaner, tabulated progress output. * Minor HTML changes, trial ID instead of name * Revert basic variant changes * Cleanup, address richard's comments, add progress_reporter.py * Add tabulate dependency * Added more info to table, auto-hide columns with no data. * lint * Address comments * Replace experiment tag w/ trial ID * Fixed tests. * Fixed test * Added requirement * Fix formatting	2019-10-08 16:38:39 -07:00
Philipp Moritz	24b79fd0a6	temporarily remove tensorflow test (#5866 )	2019-10-08 14:13:54 -07:00
Edward Oakes	42dd0fae96	Fix actor ID collision in local mode (#5863 ) * Fixed local mode actor id * Update python/ray/actor.py Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com> * Added hyphen to match comments * Added tests to test_local_mode * Helloworld * Better test naming * lint	2019-10-08 13:07:42 -07:00
Ujval Misra	375852af23	[tune] Check node liveness before result fetch (#5844 ) * Check if trial's node is alive before trying to fetch result * Added function for failed trials to trial_executor interface * Address comments, add test.	2019-10-08 11:41:01 -07:00
waldroje	054583ffe6	[tune] MedianStopping on result (#5402 ) * added class median_stopping_result to schedulers and updated __init__ * Dicts flatten and combine schedulers. MedianStoppingRule is now combined with MedianStoppingResult; I think the functionality is essentially the same so there's no need to duplicate. Dict flattening was already taken care of in a separate PR, so I've reverted that. * lint * revert * remove time sharing and simplify state * fix * fixtests * added class median_stopping_result to schedulers and updated __init__ * update property names and types to reflect suggestions by ray developers, merged get_median_result and get_best_result into a single method to eliminate duplicate steps, added resource check on PAUSE condition, modified utility function to use updated properties * updated tests for median_stopping_result in separate file * remove stray characters from previous merge conflict * reformatted and cleaned up dependencies from running code format and linting * added class median_stopping_result to schedulers and updated __init__ * Dicts flatten and combine schedulers. MedianStoppingRule is now combined with MedianStoppingResult; I think the functionality is essentially the same so there's no need to duplicate. Dict flattening was already taken care of in a separate PR, so I've reverted that. * lint * revert * remove time sharing and simplify state * fix * added class median_stopping_result to schedulers and updated __init__ * update property names and types to reflect suggestions by ray developers, merged get_median_result and get_best_result into a single method to eliminate duplicate steps, added resource check on PAUSE condition, modified utility function to use updated properties * updated tests for median_stopping_result in separate file * remove stray characters from previous merge conflict * reformatted and cleaned up dependencies from running code format and linting * update scheduler to coordinate eval interval * modify median_stopping_result to synchronize result evaluation at regular intervals, driven by least common interval * add some logging info to median_result * add new scheduler, SyncMedianStoppingResult, which evaluates and stops trials in a synchronous fashion * Cleanup median_stopping_rule - remove eval_interval - pause trials with insufficient samples if there are other waiting trials - compute score only for trials that have reached result_time * Remove extraneous classes * Fix median stopping rule tests * Added min_time_slice flag to reduce potential checkpointing cost * Only compute mean after grace * Relegate logging to debug mode	2019-10-08 11:40:41 -07:00
Philipp Moritz	785670bc18	Fix class attributes and methods for actor classes (#5802 )	2019-10-07 23:56:07 -07:00
Edward Oakes	08e4e3a153	[core worker] Submit Python actor tasks through core worker (#5750 ) * Submit actor tasks through core worker * Fix java * add comment * Remove task builder * Check negative * Increase -> Increment * pass by reference * fix signal * Clean up c++ actor handle * more cleanup * Clean up headers * Fix unique_ptr construction * Fix java * Move profiling to c++ * dedup * fix error * comments * fix java * Fix tests * wait for actor to exit * Start after constructor * ignore java build * fix comment * always init logging * Fix logging * fix logging issue * shared_ptr for profiler * DEBUG -> WARNING * fix killed_ init * Fix flaky checkpointing tests * -v flag for tune tests * Fix checkpoint test logic * Fix exception matching * timeout exception * Fix test exception info * Fix import * fix build * Fix test * shared_ptr	2019-10-07 15:42:19 -07:00
Simon Mo	9bb3633cd9	[Serve] Implement metric interface (#5852 ) * Implement metric interface * Address comment: made actor_handles a dict * Fix iteration * Lint * Mark lightweight actors as num_cpus=0 to prevent resource starvation * Be more explicit about the readiness condition * Make task_runner non-blocking * Lint	2019-10-07 09:29:26 -07:00

1 2 3 4 5 ...

1715 commits