hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
zhu-eric	3845c97dd0	[doc] Hyperparameter Tuning Gallery Entry (#5786 ) * mod_table * Example fix for gallery * lint * nit * nit * fix * gallery * remove table for now * training, object store, tune, actors, advanced * start tf code * first cut tf * yapf * pytorch * add torch example * torch * parallel * tune * tuning * reviewsready * finetune * fix * move_code * update conf * compile * init hyperparameter * Start images * overview * extra * fix * works * update-ps-example * param_actor * fix * examples * simple * simplify_pong * flake8 and run hyperopt * add comments * add comments * add suggestion * add suggestion * suggestions * add suggestion * add suggestions * fixed in wrong area * last edit * finish changes * add line * hyperparameter	2019-10-08 14:13:17 -07:00
Matthew A. Wright	4aa06918ae	Qmix on gpu and with non-stacked-obs environment state support (#5751 )	2019-10-08 13:18:07 -07:00
Edward Oakes	42dd0fae96	Fix actor ID collision in local mode (#5863 ) * Fixed local mode actor id * Update python/ray/actor.py Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com> * Added hyphen to match comments * Added tests to test_local_mode * Helloworld * Better test naming * lint	2019-10-08 13:07:42 -07:00
Ujval Misra	375852af23	[tune] Check node liveness before result fetch (#5844 ) * Check if trial's node is alive before trying to fetch result * Added function for failed trials to trial_executor interface * Address comments, add test.	2019-10-08 11:41:01 -07:00
waldroje	054583ffe6	[tune] MedianStopping on result (#5402 ) * added class median_stopping_result to schedulers and updated __init__ * Dicts flatten and combine schedulers. MedianStoppingRule is now combined with MedianStoppingResult; I think the functionality is essentially the same so there's no need to duplicate. Dict flattening was already taken care of in a separate PR, so I've reverted that. * lint * revert * remove time sharing and simplify state * fix * fixtests * added class median_stopping_result to schedulers and updated __init__ * update property names and types to reflect suggestions by ray developers, merged get_median_result and get_best_result into a single method to eliminate duplicate steps, added resource check on PAUSE condition, modified utility function to use updated properties * updated tests for median_stopping_result in separate file * remove stray characters from previous merge conflict * reformatted and cleaned up dependencies from running code format and linting * added class median_stopping_result to schedulers and updated __init__ * Dicts flatten and combine schedulers. MedianStoppingRule is now combined with MedianStoppingResult; I think the functionality is essentially the same so there's no need to duplicate. Dict flattening was already taken care of in a separate PR, so I've reverted that. * lint * revert * remove time sharing and simplify state * fix * added class median_stopping_result to schedulers and updated __init__ * update property names and types to reflect suggestions by ray developers, merged get_median_result and get_best_result into a single method to eliminate duplicate steps, added resource check on PAUSE condition, modified utility function to use updated properties * updated tests for median_stopping_result in separate file * remove stray characters from previous merge conflict * reformatted and cleaned up dependencies from running code format and linting * update scheduler to coordinate eval interval * modify median_stopping_result to synchronize result evaluation at regular intervals, driven by least common interval * add some logging info to median_result * add new scheduler, SyncMedianStoppingResult, which evaluates and stops trials in a synchronous fashion * Cleanup median_stopping_rule - remove eval_interval - pause trials with insufficient samples if there are other waiting trials - compute score only for trials that have reached result_time * Remove extraneous classes * Fix median stopping rule tests * Added min_time_slice flag to reduce potential checkpointing cost * Only compute mean after grace * Relegate logging to debug mode	2019-10-08 11:40:41 -07:00
Edward Oakes	486abedcdf	Link to kubernetes config files in docs (#5865 )	2019-10-08 11:06:25 -07:00
Philipp Moritz	785670bc18	Fix class attributes and methods for actor classes (#5802 )	2019-10-07 23:56:07 -07:00
Edward Oakes	08e4e3a153	[core worker] Submit Python actor tasks through core worker (#5750 ) * Submit actor tasks through core worker * Fix java * add comment * Remove task builder * Check negative * Increase -> Increment * pass by reference * fix signal * Clean up c++ actor handle * more cleanup * Clean up headers * Fix unique_ptr construction * Fix java * Move profiling to c++ * dedup * fix error * comments * fix java * Fix tests * wait for actor to exit * Start after constructor * ignore java build * fix comment * always init logging * Fix logging * fix logging issue * shared_ptr for profiler * DEBUG -> WARNING * fix killed_ init * Fix flaky checkpointing tests * -v flag for tune tests * Fix checkpoint test logic * Fix exception matching * timeout exception * Fix test exception info * Fix import * fix build * Fix test * shared_ptr	2019-10-07 15:42:19 -07:00
Eric Liang	04e997fe0d	Fix TF2 / rllib test (#5846 )	2019-10-07 14:25:16 -07:00
Simon Mo	9bb3633cd9	[Serve] Implement metric interface (#5852 ) * Implement metric interface * Address comment: made actor_handles a dict * Fix iteration * Lint * Mark lightweight actors as num_cpus=0 to prevent resource starvation * Be more explicit about the readiness condition * Make task_runner non-blocking * Lint	2019-10-07 09:29:26 -07:00
Simon Mo	25dde48607	[Serve] Implement replica scaling (#5850 ) * Implement replica scaling * Lint * Fix .travis.yml so it won't skip if only serve affected	2019-10-07 01:57:31 -07:00
Erik Cederstrand	5834c56c64	Restore support for Python 3.5 (#5818 ) * Advertise that Python >= 3.6 is needed ray/tune/examples/ax_example.py contains f-strings which limits support of this package to Python 3.6 and up. * Python 3.5 does not support f-strings Rewrite by using format() * Lower required version after 9f88fe9d * Remove python_requires again by request * Fix linter warning	2019-10-07 00:10:00 -07:00
Simon Mo	e8570874b6	[Serve] Implement flask_request and named python request (#5849 ) * Implement flask_request and named python request * Forgot to include missing files * Address comment * Add flask to requirements for doc (lint failed) * Update doc requirement so lint will build * Install flask in CI * Fix typo in .travis.yml	2019-10-06 15:12:30 -07:00
Anthony Yu	b99cdf4e39	[tune] PBT + Memnn example (#5723 ) * Add example file * Move into train function * Somewhat working example of MemNN, still has some failed trials * Reorganize into a class * Small fixes * Iteration decrease and fix hyperparam_mutations * Add example file * Move into train function * Somewhat working example of MemNN, still has some failed trials * Reorganize into a class * Small fixes * Iteration decrease and fix hyperparam_mutations * Some style edits * Address PR changes without modifying learning rate * Add configs and hyperparameter mutations * Add tune test * Modify import locations * Some parameter changes for testing * Update memnn example * Add tensorboard support and address PR comment * Final changes * lint * generator	2019-10-05 09:22:37 -07:00
Eric Liang	fb33160df8	Fix obs space lo/hi (#5826 )	2019-10-04 09:28:06 -07:00
Edward Oakes	17c6835c3f	Just die on signal (#5842 )	2019-10-03 18:21:21 -07:00
Edward Oakes	8ca7fab581	Improve manual Kubernetes deployment documentation (#5582 ) * Add ray-cluster, modify submit * Add comments * Job submission working * Write docs * Add link to autoscaling * Fix wget link in job * Use namespace file * match tense * fix tab * Improve job documentation * comments * Fix link * Fix links * comments * add overview paragraph * Update imagePullPolicy * Warning if no cluster running * better check	2019-10-03 15:47:49 -07:00
Si-Yuan	3a42780cb8	Improved Pickle5 pickling (#5841 ) * object copy optimization * see if we can reuse the Arrow parallel_memcopy * remove unused function * restore the original code, since later experiments show that it has little impact on performance. * lint	2019-10-03 15:14:32 -07:00
Simon Mo	fa1214c44a	[Serve] First iteration of the serve doc (#5834 ) * Address comments * Lint * Add py3 warning	2019-10-03 15:14:09 -07:00
Philipp Moritz	0dee225ce1	Make it possible to run ray examples as projects (#5816 )	2019-10-03 14:52:37 -07:00
Edward Oakes	972dddd776	[autoscaler] Kubernetes autoscaler backend (#5492 ) * Add Kubernetes NodeProvider to autoscaler * Split off SSHCommandRunner * Add KubernetesCommandRunner * Cleanup * More config options * Check if auth present * More auth checks * Better output * Always bootstrap config * All working * Add k8s-rsync comment * Clean up manual k8s examples * Fix up submit.yaml * Automatically configure permissisons * Fix get_node_provider arg * Fix permissions * Fill in empty auth * Remove ray-cluster from this PR * No hard dep on kubernetes library * Move permissions into autoscaler config * lint * Fix indentation * namespace validation * Use cluster name tag * Remove kubernetes from setup.py * Comment in example configs * Same default autoscaling config as aws * Add Kubernetes quickstart * lint * Revert changes to submit.yaml (other PR) * Install kubernetes in travis * address comments * Improve autoscaling doc * kubectl command in setup * Force use_internal_ips * comments * backend env in docs * Change namespace config * comments * comments * Fix yaml test	2019-10-03 10:17:00 -07:00
Ujval Misra	9df6eda84f	[tune] Add error case for member functions passed as stopping c… (#5823 )	2019-10-03 09:49:03 -07:00
Si-Yuan	2fb7d7846f	Initial implementation of Cython pickle5 support (#5725 )	2019-10-03 09:20:26 -07:00
Philipp Moritz	9a71d6ce3a	Build dashboard only once in the wheel build and make sure caching is working for wheel builds (#5784 ) * build dashboard only once * update * debug * caching? * update * update	2019-10-02 16:29:11 -07:00
Edward Oakes	4e049232a8	shared_ptr (#5830 )	2019-10-02 16:29:04 -07:00
Philipp Moritz	26834bcf94	Add message about tests passing and flaky tests to PR template (#5833 )	2019-10-02 15:23:34 -07:00
Edward Oakes	ef1a61ab57	Log output in test_dead_actors.py (#5831 )	2019-10-02 14:40:55 -07:00
Stephanie Wang	dc80e6be3d	Add screen argument (#5808 )	2019-10-01 15:18:19 -07:00
Edward Oakes	963bbe8bbd	Move profiling to c++ (#5771 ) * Move profiling to c++ * comments * Fix tests * Start after constructor * fix comment * always init logging * Fix logging * fix logging issue * shared_ptr for profiler * DEBUG -> WARNING * fix killed_ init * Fix flaky checkpointing tests * Fix checkpoint test logic * Fix exception matching * timeout exception * Fix import * fix build * use boost::asio * fix double const * Properly reset async_wait * remove SIGINT * Change error message * increase timeout * small nits * Don't trap on SIGINT * -v for tune * Fix test	2019-10-01 10:06:25 -07:00
Edward Oakes	443feb75f0	Fix test (#5810 )	2019-09-30 19:39:53 -07:00
Richard Liaw	e54c487d18	[hotfix] Docker (#5809 ) * configspace * reorder	2019-09-30 16:39:00 -07:00
Wenjie Wu	ccd88c9e20	[doc] fix typo in ASHA blog url (#5801 ) this fix issue #5800	2019-09-29 17:41:18 -07:00
Eric Liang	81ee887f91	Preserve the original exception type when converting to RayTaskError (#5799 )	2019-09-28 17:03:15 -07:00
Eric Liang	493364d3bd	[autoscaler] Add unit tests for stopped node caching, fix flaky tests (#5793 )	2019-09-27 22:36:09 -07:00
Edward Oakes	86610a30c9	[flaky test] Fix flaky checkpointing tests (#5791 ) * Fix flaky checkpointing tests * Fix checkpoint test logic * Fix exception matching * timeout exception * Fix import * fix build	2019-09-27 11:03:07 -07:00
Richard Liaw	baf85c6665	[tune/sgd] Fix Jenkins (#5765 )	2019-09-27 09:59:08 -07:00
Eric Liang	b5da32df78	Bump Ray version in documentation to dev5 (#5794 )	2019-09-27 00:19:17 -07:00
Richard Liaw	5c549fd84b	[docs] Make slack more prominent (#5792 ) Co-Authored-By: Edward Oakes <ed.nmi.oakes@gmail.com>	2019-09-26 15:36:56 -07:00
Philipp Moritz	01d6362472	Serialize StringIO with pickle (#5781 )	2019-09-26 12:55:14 -07:00
Philipp Moritz	57a5871ea6	Convert long running stress tests to projects (#5641 )	2019-09-26 11:25:09 -07:00
Eric Liang	5ecb02fb80	Release 0.7.5 updates (#5727 )	2019-09-26 10:30:37 -07:00
Edward Oakes	8a33891a40	Include object size in full error (#5782 )	2019-09-25 17:04:17 -07:00
Robert Nishihara	ddfe9439c8	And sphinx-gallery requirement to readthedocs. (#5780 )	2019-09-25 14:46:56 -07:00
Robert Nishihara	18ce7bda2b	Fix flaky test_actors_and_tasks_with_gpus_version_two test. (#5756 )	2019-09-25 11:47:47 -07:00
Edward Oakes	d499601bd7	Fix flaky checkpoint tests (#5778 )	2019-09-25 10:55:17 -07:00
Eric Liang	c6919d315d	[rllib] Remove TorchPolicy locks (#5764 ) * remove torch lock * remove lock	2019-09-24 17:52:16 -07:00
Richard Liaw	10f21fa313	[docs] Convert Examples to Gallery (#5414 )	2019-09-24 15:46:56 -07:00
Zhijun Fu	ea9376c9ce	Fix flaky core worker tests because of race condition in gcs client subscription (#5735 )	2019-09-24 22:47:38 +08:00
Kai Yang	c580955840	[Java] Fix some potential bugs about `Ray.shutdown()` (#5693 )	2019-09-24 10:44:17 +08:00
Ujval Misra	a4659a8f8b	[tune] Add support for function-based stopping condition (#5754 )	2019-09-23 18:39:00 -07:00

1 2 3 4 5 ...

3334 commits