ray/python/ray
gehring 7c3274e65b [tune] Make the logging of the function API consistent and predictable (#4011)
## What do these changes do?

This is a re-implementation of the `FunctionRunner` which enforces some synchronicity between the thread running the training function and the thread running the Trainable which logs results. The main purpose is to make logging consistent across APIs in anticipation of a new function API which will be generator based (through `yield` statements). Without these changes, it will be impossible for the (possibly soon to be) deprecated reporter based API to behave the same as the generator based API.

This new implementation provides additional guarantees to prevent results from being dropped. This makes the logging behavior more intuitive and consistent with how results are handled in custom subclasses of Trainable.

New guarantees for the tune function API:

- Every reported result, i.e., `reporter(**kwargs)` calls, is forwarded to the appropriate loggers instead of being dropped if not enough time has elapsed since the last results.
- The wrapped function only runs if the `FunctionRunner` expects a result, i.e., when `FunctionRunner._train()` has been called. This removes the possibility that a result will be generated by the function but never logged.
- The wrapped function is not called until the first `_train()` call. Currently, the wrapped function is started during the setup phase which could result in dropped results if the trial is cancelled between `_setup()` and the first `_train()` call.
- Exceptions raised by the wrapped function won't be propagated until all results are logged to prevent dropped results.
- The thread running the wrapped function is explicitly stopped when the `FunctionRunner` is stopped with `_stop()`.
- If the wrapped function terminates without reporting `done=True`, a duplicate result with `{"done": True}`, is reported to explicitly terminate the trial, and components will be notified with a duplicate of the last reported result, but this duplicate will not be logged.

## Related issue number

Closes #3956.
#3949
#3834
2019-03-18 19:14:26 -07:00
..
autoscaler [autoscaler] Restore error message for setup 2019-03-16 18:00:37 -07:00
cloudpickle Update cloudpickle to 0.8.0.dev0 (#3964) 2019-02-07 15:24:06 -08:00
core Make Bazel the default build system (#3898) 2019-02-23 11:58:59 -08:00
dashboard Add a web dashboard for monitoring node resource usage (#4066) 2019-02-21 00:10:04 -08:00
dataframe Dataframe deprecation (#2353) 2018-07-06 00:16:22 -07:00
experimental Fix global_state not disconnected after ray.shutdown (#4354) 2019-03-18 16:44:49 -07:00
includes Use strongly typed IDs in C++. (#4185) 2019-03-07 21:43:01 +08:00
internal Add type check in free and change Exception to TypeError (#4221) 2019-03-04 16:40:04 +08:00
pyarrow_files Package pyarrow along with ray. (#822) 2017-08-07 21:17:28 -07:00
rllib [rllib] Flip sign of A2C, IMPALA entropy coefficient; raise DeprecationWarning if negative (#4374) 2019-03-17 18:07:37 -07:00
scripts Add "ray timeline" command to auto-dump Chrome trace for the current Ray instance (#4239) 2019-03-05 16:28:00 -08:00
tests Fix global_state not disconnected after ray.shutdown (#4354) 2019-03-18 16:44:49 -07:00
tune [tune] Make the logging of the function API consistent and predictable (#4011) 2019-03-18 19:14:26 -07:00
workers Add option of load_code_from_local which is required in cross-language ray call. (#3675) 2019-02-21 12:37:17 +08:00
__init__.py Update version to 0.7.0.dev1 and update docs 0.6.3 -> 0.6.4 (#4276) 2019-03-05 22:22:29 -08:00
_raylet.pyx Use strongly typed IDs in C++. (#4185) 2019-03-07 21:43:01 +08:00
actor.py Set _remote() function args and kwargs as optional (#4305) 2019-03-09 16:40:14 -08:00
exceptions.py Propagate backend error to worker (#4039) 2019-02-16 11:39:15 +08:00
function_manager.py Fix checkpoint crash for actor creation task. (#4327) 2019-03-14 23:42:57 +08:00
gcs_utils.py Add a web dashboard for monitoring node resource usage (#4066) 2019-02-21 00:10:04 -08:00
import_thread.py API cleanups. Remove worker argument. Remove some deprecated arguments. (#4025) 2019-02-15 10:49:16 -08:00
log_monitor.py More compact format for worker logs (#4092) 2019-02-19 19:53:43 -08:00
memory_monitor.py Ray Logging Configuration (#3691) 2019-01-30 21:01:12 -08:00
monitor.py Stream logs to driver by default. (#3892) 2019-02-07 19:53:50 -08:00
node.py Remove the old web UI (#4301) 2019-03-07 23:15:11 -08:00
parameter.py Remove the old web UI (#4301) 2019-03-07 23:15:11 -08:00
profiling.py API cleanups. Remove worker argument. Remove some deprecated arguments. (#4025) 2019-02-15 10:49:16 -08:00
ray_constants.py Skip dead nodes to avoid connection timeout. (#4154) 2019-03-02 13:11:19 -08:00
remote_function.py Set _remote() function args and kwargs as optional (#4305) 2019-03-09 16:40:14 -08:00
reporter.py Add a web dashboard for monitoring node resource usage (#4066) 2019-02-21 00:10:04 -08:00
runtime_context.py Add runtime_context to get some runtime fields in worker (#4065) 2019-02-19 15:57:30 +08:00
serialization.py Expose custom serializers through the API. (#1147) 2017-10-29 00:08:55 -07:00
services.py Remove the old web UI (#4301) 2019-03-07 23:15:11 -08:00
signature.py Ray Logging Configuration (#3691) 2019-01-30 21:01:12 -08:00
utils.py Downgrade six to 1.0.0 (#4180) 2019-02-27 13:05:25 -08:00
worker.py Fix global_state not disconnected after ray.shutdown (#4354) 2019-03-18 16:44:49 -07:00