hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-08 11:31:40 -05:00

Author	SHA1	Message	Date
Stephanie Wang	4ac9c1ed6e	Fix bug in cluster mode where driver exits when there are tasks in the waiting queue (#4251 )	2019-03-20 10:18:27 -07:00
Yuhong Guo	8ce7565530	Refactor pytest fixtures for ray core (#4390 )	2019-03-20 11:48:32 +08:00
Eric Liang	c6f15a0057	Revert [rllib] Reserve CPUs for replay actors in apex (#4404 ) * Revert "[rllib] Reserve CPUs for replay actors in apex (#4217)" This reverts commit `2781d74680`. * comment	2019-03-19 09:58:45 -07:00
Peter Schafhalter	c93eb126ec	Allow manually writing to return ObjectIDs from tasks/actor methods (#3805 )	2019-03-18 19:24:57 -07:00
gehring	7c3274e65b	[tune] Make the logging of the function API consistent and predictable (#4011 ) ## What do these changes do? This is a re-implementation of the `FunctionRunner` which enforces some synchronicity between the thread running the training function and the thread running the Trainable which logs results. The main purpose is to make logging consistent across APIs in anticipation of a new function API which will be generator based (through `yield` statements). Without these changes, it will be impossible for the (possibly soon to be) deprecated reporter based API to behave the same as the generator based API. This new implementation provides additional guarantees to prevent results from being dropped. This makes the logging behavior more intuitive and consistent with how results are handled in custom subclasses of Trainable. New guarantees for the tune function API: - Every reported result, i.e., `reporter(**kwargs)` calls, is forwarded to the appropriate loggers instead of being dropped if not enough time has elapsed since the last results. - The wrapped function only runs if the `FunctionRunner` expects a result, i.e., when `FunctionRunner._train()` has been called. This removes the possibility that a result will be generated by the function but never logged. - The wrapped function is not called until the first `_train()` call. Currently, the wrapped function is started during the setup phase which could result in dropped results if the trial is cancelled between `_setup()` and the first `_train()` call. - Exceptions raised by the wrapped function won't be propagated until all results are logged to prevent dropped results. - The thread running the wrapped function is explicitly stopped when the `FunctionRunner` is stopped with `_stop()`. - If the wrapped function terminates without reporting `done=True`, a duplicate result with `{"done": True}`, is reported to explicitly terminate the trial, and components will be notified with a duplicate of the last reported result, but this duplicate will not be logged. ## Related issue number Closes #3956. #3949 #3834	2019-03-18 19:14:26 -07:00
Yuhong Guo	edb063c3c8	Fix glog problem of no call stack (#4395 )	2019-03-18 18:21:21 -07:00
Wang Qing	3b141b26cd	Fix global_state not disconnected after ray.shutdown (#4354 )	2019-03-18 16:44:49 -07:00
Kristian Hartikainen	2a046116ce	[tune] Fix _SafeFallbackEncoder type checks (#4238 ) * Fix numpy type checks for _SafeFallbackEncoder * Format changes * Fix usage of nan_str in _SafeFallbackEncoder	2019-03-18 16:34:56 -07:00
Eric Liang	27cd6ea401	[rllib] Flip sign of A2C, IMPALA entropy coefficient; raise DeprecationWarning if negative (#4374 )	2019-03-17 18:07:37 -07:00
Richard Liaw	ea5a6f8455	[tune] Simplify API (#4234 ) Uses `tune.run` to execute experiments as preferred API. @noahgolmant This does not break backwards compat, but will slowly internalize `Experiment`. In a separate PR, Tune schedulers should only support 1 running experiment at a time.	2019-03-17 13:03:32 -07:00
markgoodhead	20a155d03d	[Tune] Support initial parameters for SkOpt search algorithm (#4341 ) Similar to the recent change to HyperOpt (#https://github.com/ray-project/ray/pull/3944) this implements both: 1. The ability to pass in initial parameter suggestion(s) to be run through Tune first, before using the Optimiser's suggestions. This is for when you already know good parameters and want the Optimiser to be aware of these when it makes future parameter suggestions. 2. The same as 1. but if you already know the reward value for those parameters you can pass these in as well to avoid having to re-run the experiments. In the future it would be nice for Tune to potentially support this functionality directly by loading previously run Tune experiments and initialising the Optimiser with these (kind of like a top level checkpointing functionality) but this feature allows users to do this manually for now.	2019-03-16 23:11:30 -07:00
Eric Liang	b513c0f498	[autoscaler] Restore error message for setup	2019-03-16 18:00:37 -07:00
Richard Liaw	5e95abe63e	[tune] Fix performance issue and fix reuse tests (#4379 ) * fix tests * better name * reduce warnings * better resource tracking * oops * revertmessage * fix_executor	2019-03-16 13:52:02 -07:00
Eric Liang	a45019d98c	[rllib] Add option to proceed even if some workers crashed (#4376 )	2019-03-16 13:34:09 -07:00
justinwyang	db9fe6619d	Run only relevant tests in Travis based on git diff. (#4271 )	2019-03-15 22:23:54 -07:00
Hao Chen	a6a5b344b9	[Java] Upgrade checkstyle plugin (#4375 )	2019-03-15 11:36:09 -07:00
Philipp Moritz	c5e2c9af4d	Build wheels for macOS with Bazel (#4280 )	2019-03-15 10:37:57 -07:00
Hao Chen	93d9867290	Fix linting error on master (#4377 )	2019-03-15 10:31:09 -07:00
Hao Chen	f8d12b0418	[Java] Package native dependencies into jar (#4367 )	2019-03-15 12:38:40 +08:00
Leon Sievers	6b93ec3034	Fixed calculation of num_steps_trained for multi_gpu_optimizer (#4364 )	2019-03-14 19:46:02 -07:00
Eric Liang	2c1131e8b2	[tune] Add warnings if tune event loop gets clogged (#4353 ) * add guards * comemnts	2019-03-14 19:44:01 -07:00
Yuhong Guo	1a1027b3ab	Update git-clang-format to support Python 3. (#4339 )	2019-03-14 13:57:11 -07:00
Yuhong Guo	becffc6cef	Fix checkpoint crash for actor creation task. (#4327 ) * Fix checkpoint crash for actor creation task. * Lint * Move test to test_actor.py * Revert unused code in test_failure.py * Refine test according to Raul's suggestion.	2019-03-14 23:42:57 +08:00
Philipp Moritz	2f37cd7e27	fix wheel building doc (#4360 )	2019-03-13 23:11:30 -07:00
Philipp Moritz	b0c4e60ffb	Build wheels for Linux with Bazel (#4281 )	2019-03-13 15:57:33 -07:00
Ameer Haj Ali	8a6403c26e	[rllib] bug fix: merging --config params with params.pkl (#4336 )	2019-03-13 11:26:55 -07:00
Andrew Tan	87bfa1cf82	[tune] add output flag for Tune CLI (#4322 )	2019-03-12 23:56:59 -07:00
Eric Liang	d5f4698305	[tune] Avoid scheduler blocking, add reuse_actors optimization (#4218 )	2019-03-12 23:49:31 -07:00
Stefan Pantic	2202a81773	Fix multi discrete (#4338 ) * Revert "Revert "[wingman -> rllib] IMPALA MultiDiscrete changes (#3967)" (#4332)" This reverts commit `3c41cb9b60`. * Fix a bug with log rhos for vtrace * Reformat * lint	2019-03-12 20:32:11 -07:00
Philipp Moritz	490d896f41	Make sure the right Python interpreter is used (#4334 )	2019-03-12 12:21:55 -07:00
Eric Liang	3c41cb9b60	Revert "[wingman -> rllib] IMPALA MultiDiscrete changes (#3967 )" (#4332 ) This reverts commit `962b17f567`.	2019-03-11 22:51:26 -07:00
Kai Yang	7ff56ce826	Introduce set data structure in GCS (#4199 ) * Introduce set data structure in GCS. Change object table to Set instance. * Fix a logic bug. Update python code. * lint * lint again * Remove CURRENT_VALUE mode * Remove 'CURRENT_VALUE' * Add more test cases * rename has_been_created to subscribed. * Make `changed` parameter type of `bool ` Rename mode to notification_mode * fix build * RAY.SET_REMOVE return error if entry doesn't exist * lint * Address comments * lint and fix build	2019-03-11 14:42:58 -07:00
Andrew Tan	c435013b27	[tune] add-note command for Tune CLI (#4321 ) Co-Authored-By: andrewztan <andrewztan12@gmail.com>	2019-03-11 14:16:44 -07:00
Luke	08a476932c	On Kubernetes, set pod anti-affinity at the host level for pods of type 'ray' (#4131 )	2019-03-11 12:57:04 -07:00
Stefan Pantic	36cbde651a	Add action space to model (#4210 )	2019-03-09 19:23:12 -08:00
justinwyang	5adb4a6941	Set _remote() function args and kwargs as optional (#4305 )	2019-03-09 16:40:14 -08:00
Yuhong Guo	ba3fe04629	Fix message type to string crash (#4308 ) * Fix message string crash * Fix	2019-03-09 13:51:02 -08:00
Stephanie Wang	edc794751f	Set TCP_NODELAY on all TCP connections (#4318 )	2019-03-09 12:15:29 -08:00
William Ma	f423909aec	Temporary fix for many_actor_task.py (#4315 )	2019-03-09 00:07:45 -08:00
Richard Liaw	6630a35353	[tune] Initial Commit for Tune CLI (#3983 ) This introduces a light CLI for Tune.	2019-03-08 16:46:05 -08:00
Simon Mo	3064fad96b	Add ray.experimental.serve Module (#4095 )	2019-03-08 16:22:05 -08:00
Eric Liang	c7f74dbdc7	[rllib] Add async remote workers (#4253 )	2019-03-08 15:39:48 -08:00
Robert Nishihara	fd2d8c2c06	Remove Jenkins backend tests and add new long running stress test. (#4288 )	2019-03-08 15:29:39 -08:00
Richard Liaw	c3a3360a4a	[tune] Add custom field for serializations (#4237 )	2019-03-08 11:00:25 -08:00
Kristian Hartikainen	7e4b4822cf	[tune] Fix worker recovery by setting `force=False` when calling logger sync_now (#4302 ) ## What do these changes do? Fixes a tune autoscaling problem where worker recovery causes things to stall.	2019-03-08 10:59:31 -08:00
Yuhong Guo	d5fb7b70a9	Update arrow version to fix plasma bugs (#4127 ) * Update arrow * Change to 2c511979b13b230e73a179dab1d55b03cd81ec02 which is rebased on Arrow 46f75d7 * Update to fix comment * disable tests which use python/ray/rllib/tests/data/cartpole_small * Fix get order of meta and data in MockObjectStore.java	2019-03-08 18:03:58 +08:00
Philipp Moritz	95254b3d71	Remove the old web UI (#4301 )	2019-03-07 23:15:11 -08:00
Robert Nishihara	4c80177d6f	Unpin gym in Python 2 since gym 0.12 was released. (#4291 )	2019-03-07 15:59:30 -08:00
Philipp Moritz	dec7c3f8f5	[build] Add debug info to Bazel (#4278 )	2019-03-07 15:21:13 -08:00
Eric Liang	437459f40a	[build] Make travis logs not as long (#4213 ) * clean it up * Update .travis.yml * Update .travis.yml * update * fix example * suppress * timeout * print periodic progress * Update suppress_output * Update run_silent.sh * Update suppress_output * Update suppress_output * manually do timeout * sleep 300 * fix test * Update run_silent.sh * Update suppress_output * Update .travis.yml	2019-03-07 12:09:03 -08:00

... 7 8 9 10 11 ...

3093 commits