hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-09 04:46:38 -04:00

Author	SHA1	Message	Date
Eric Liang	75ef70afca	[rllib] Auto-clip atari rewards	2018-09-24 12:55:11 -07:00
Eric Liang	8331d1ebe0	[rllib] Add vf clipping param to fix pendulum example (#2921 ) * add vf clip * fix test * Update ppo.py	2018-09-23 13:11:17 -07:00
Eric Liang	3267676994	[Experimental] Add experimental distributed SGD API (#2858 ) * check in sgd api * idx * foreach_worker foreach_model * add feed_dict * update * yapf * typo * lint * plasma op change * fix plasma op * still not working * fix * fix * comments * yapf * silly flake8 * small test	2018-09-19 21:12:37 -07:00
Praveen Palanisamy	b23fd5de13	[rllib] Adds agent name & env id to default logdir prefix (#2859 ) * Added agent name & env id to default logdir prefix * Revert "Added agent name & env id to default logdir prefix" This reverts commit 07cfdf80d2537da3c67dd4f553c5f3e43671cc7d. * Added default logger creator with informative prefix to Agent * Updated import order & improved str cat * Update agent.py	2018-09-18 22:22:07 -07:00
Eric Liang	3a3782c39f	[rllib] Fix LSTM regression on truncated sequences and add regression test (#2898 ) * fix * add test * yapf * yapf * fix space * Oops that should be lstm: True * Update cartpole_lstm.py	2018-09-18 15:09:16 -07:00
Eric Liang	ab8348b1f5	[rllib] Reward clipping should default to off	2018-09-18 15:08:01 -07:00
Robert Nishihara	ea9d1cc887	Remove dependence on psutil. Add utility functions for getting system memory. (#2892 )	2018-09-18 15:03:29 +08:00
Robert Nishihara	61bf6c6123	Fix regression in directing worker output to stdout/stderr. (#2897 )	2018-09-17 16:40:45 -07:00
Richard Liaw	899e4585bc	Don't include redundant entries in global_state.client_table (#2880 )	2018-09-17 12:52:49 -07:00
Richard Liaw	f372f48bf3	[tune] Tune onto Logging Module (#2882 ) Moves Tune onto logging in Python. Ignores examples and tests.	2018-09-16 12:09:36 -07:00
Robert Nishihara	503344149f	Run jupyter UI with --ip=0.0.0.0. (#2883 )	2018-09-15 21:59:46 -07:00
Richard Liaw	e05baed336	[tune] Better Info String and Tweaks (#2874 )	2018-09-15 11:02:13 -07:00
Hanwei Jin	fbf214e408	update ray cmake build process (#2853 ) * use cmake to build ray project, no need to appply build.sh before cmake, fix some abuse of cmake, improve the build performance * support boost external project, avoid using the system or build.sh boost * keep compatible with build.sh, remove boost and arrow build from it. * bugfix: parquet bison version control, plasma_java lib install problem * bugfix: cmake, do not compile plasma java client if no need * bugfix: component failures test timeout machenism has problem for plasma manager failed case * bugfix: arrow use lib64 in centos, travis check-git-clang-format-output.sh does not support other branches except master * revert some fix * set arrow python executable, fix format error in component_failures_test.py * make clean arrow python build directory * update cmake code style, back to support cmake minimum version 3.4	2018-09-12 11:19:33 -07:00
Daniel Ho	d9eeaaf00a	[tune] Fix bug in example where config hyperparameters were ignored (#2860 ) A fix to an example for tune (`python/ray/tune/examples/pbt_tune_cifar10_with_keras.py`) where the hyperparameters for the optimizer, learning rate and decay, were not being passed into the optimizer. This means that the current optimizer uses default values for the hyperparameters no matter the config.	2018-09-12 09:17:56 -07:00
old-bear	f3c1194be3	[tune] Add AutoML algorithm of GeneticSearcher (#2699 ) Add new search algorithm (genetic) along with the base framework of the searcher (which performs some basic jobs such as logging, recording and organizing in our project). Note that this is the initial commit. In the following days, we will add example, UT, and other refinements.	2018-09-12 09:17:04 -07:00
Eric Liang	bee743c152	Remove log suppression code When running in a screen (or any other time it is hard to scroll up), printing "Suppressing previous error message" is not helpful since the previous error is lost far above past scrollback. Better to just print it repeatedly at the end. tada 1	2018-09-11 23:28:45 -07:00
Kaahan	045861c9b0	[tune] Reset Config for Trainables (#2831 ) Adds the ability for trainables to reset their configurations during experiments. These changes in particular add the base functions to the trial_executor and trainable interfaces as well as giving the basic implementation on the PopulationBasedTraining scheduler. Related issue number: #2741	2018-09-11 08:45:04 -07:00
Peter Schafhalter	5da6e78db1	Add available resources to global state (#2501 )	2018-09-10 15:46:32 -07:00
Eric Liang	611259b2c7	Re-raise actor initialization errors on method invocation (#2843 ) If an actor constructor fails, save that error and re-raise it on any subsequent attempts to interact with the actor. Related to https://github.com/ray-project/ray/issues/282 and https://github.com/ray-project/ray/issues/1093.	2018-09-10 10:51:19 -07:00
Eric Liang	588c573d41	Ray stop needs to kill `plasma_store_server` not `plasma_store` (#2850 )	2018-09-09 19:23:09 -07:00
eugenevinitsky	9ba751c29a	Ars increase (#2844 ) * removed cv2 * remove opencv * increased number of default rollouts ARS * put cv2 back in this branch * put cv2 back in this branch * moved cv2 back where it belongs in preprocessors	2018-09-08 14:09:02 -07:00
Robert Nishihara	bd64c940e9	Push error to driver when monitor raises an exception. (#2834 )	2018-09-07 17:42:45 -07:00
Robert Nishihara	3f6ed537a4	Add ray.is_initialized() function. (#2818 ) * Add ray.is_initialized() function. * Add assert.	2018-09-06 21:20:59 -07:00
Eric Liang	e7db54bdb0	Log at INFO level by default (including in autoscaler). (#2824 ) Before this change, the autoscaler `up` and related commands don't print any info messages to the console at all. This was a regression from 0.5. @richardliaw @robertnishihara https://github.com/ray-project/ray/issues/2812	2018-09-06 13:31:19 -07:00
Eric Liang	d81605e9e7	[tune] Add a time/timesteps since last restore metric (#2819 ) * rsm * always log to avoid changing schema for csv writer * add iter since restore * update * criteria warn	2018-09-05 17:45:09 -07:00
Eric Liang	995ac24a2c	[rllib] clarify train batch size for PPO (#2793 ) It's possible to configure PPO in a way that ends up discarding most of the samples (they are treated as "stragglers"). Add a warning when this happens, and raise an exception if the waste is particularly egregious.	2018-09-05 12:06:13 -07:00
kary	4c0e2c3f58	[rllib]multi agent judge bug (#2821 ) * fix multi agent judge bug * Update policy_evaluator.py	2018-09-04 21:02:06 -07:00
Richard Liaw	72542c9016	[tune] Fix Pausing and Error Propogation (#2815 ) * add new tests * Try-catch errors from ray get * longer pbt run * Update pbt_example.py * Split trial and result and fix tests	2018-09-04 15:22:11 -07:00
Eric Liang	25ffe57a5c	[rllib] Auto-synchronize filters for all agents (#2791 ) This makes sure we always update the local filter, and adds an option to synchronize the remote filters as well. In APEX_DDPG we previously didn't do either. The first is needed for checkpoint correctness, the second might help performance.	2018-09-03 20:01:53 -07:00
Eric Liang	01b030bd57	[rllib] throw an error for continuous action spaces in IMPALA We currently don't support this since the reference vtrace.py does not, though it could be an interesting extension.	2018-09-03 11:12:55 -07:00
Eric Liang	df4788e501	[rllib/tune] Add test for fractional gpu support in xray mode; add rllib support for fractional gpu (#2768 ) * frac gpu * doc * Update rllib-training.rst * yapf * remove xray	2018-09-03 11:12:23 -07:00
Eric Liang	b37a283053	[rllib] support local mode (#2795 )	2018-09-02 23:02:19 -07:00
Robert Nishihara	0ac855e061	Push errors to all drivers when node is marked dead. (#2808 ) * Push errors to all drivers when node is marked dead. * Fix	2018-09-02 20:04:58 -07:00
Alexey Tumanov	fdc9688226	[xray] push warning to driver for infeasible tasks (#2784 ) This PR pushes a warning to the user for infeasible tasks to alert them to the fact that they can't currently be executed. Fixes #2780.	2018-09-01 13:21:27 -07:00
Robert Nishihara	eda6ebb87d	Convert some unittests to pytest. (#2779 ) * Convert multi_node_test.py to pytest. * Convert array_test.py to pytest. * Convert failure_test.py to pytest. * Convert microbenchmarks to pytest. * Convert component_failures_test.py to pytest and some minor quotes changes. * Convert tensorflow_test.py to pytest. * Convert actor_test.py to pytest. * Fix. * Fix	2018-08-31 11:24:15 -07:00
wangyiguang	3813ae34b3	[tune] Add AutoMLBoard: Monitoring UI (experimental) (#2574 )	2018-08-31 00:26:44 -07:00
Richard Liaw	0347e6418b	[tune] Add PyTorch MNIST Example + Misc. Tweaks (#2708 )	2018-08-30 16:18:56 -07:00
Robert Nishihara	224d38cbb2	Name Python threads. (#2767 )	2018-08-30 11:08:24 -07:00
Robert Nishihara	5021795190	Update documents to replace 0.5.0 with 0.5.2. (#2761 ) * Update documents to replace 0.5.0 with 0.5.1. * Update documentation from 0.5.1 -> 0.5.2.	2018-08-29 21:05:09 -07:00
Robert Nishihara	f4f3478b45	Bump version number to 0.5.2. (#2765 )	2018-08-29 13:39:25 -07:00
Praveen Palanisamy	357c0d6156	[tune] Adds option to checkpoint at end of trials (#2754 ) * Added checkpoint_at_end option. To fix #2740 * Added ability to checkpoint at the end of trials if the option is set to True * checkpoint_at_end option added; Consistent with Experience and Trial runner * checkpoint_at_end option mentioned in the tune usage guide * Moved the redundant checkpoint criteria check out of the if-elif * Added note that checkpoint_at_end is enabled only when checkpoint_freq is not 0 * Added test case for checkpoint_at_end * Made checkpoint_at_end have an effect regardless of checkpoint_freq * Removed comment from the test case * Fixed the indentation * Fixed pep8 E231 * Handled cases when trainable does not have _save implemented * Constrained test case to a particular exp using the MockAgent * Revert "Constrained test case to a particular exp using the MockAgent" This reverts commit e965a9358ec7859b99a3aabb681286d6ba3c3906. * Revert "Handled cases when trainable does not have _save implemented" This reverts commit 0f5382f996ff0cbf3d054742db866c33494d173a. * Simpler test case for checkpoint_at_end * Preserved bools from loosing their actual value * Revert "Moved the redundant checkpoint criteria check out of the if-elif" This reverts commit 783005122902240b0ee177e9e206e397356af9c5. * Fix linting error.	2018-08-29 13:14:17 -07:00
Robert Nishihara	132f133214	Limit number of concurrent workers started by hardware concurrency. (#2753 ) * Limit number of concurrent workers started by hardware concurrency. * Check if std:🧵:hardware_concurrency() returns 0. * Pass in max concurrency from Python. * Fix Java call to startRaylet. * Fix typo * Remove unnecessary cast. * Fix linting. * Cleanups on Java side. * Comment back in actor test. * Require maximum_startup_concurrency to be at least 1. * Fix linting and test. * Improve documentation. * Fix typo.	2018-08-29 14:53:40 +08:00
Mitar	3850e3ba64	Added extra logging related arguments to "ray start" (#2664 )	2018-08-28 23:00:37 -07:00
Eric Liang	69d1354016	[rllib] Document ARS & rainbow (#2744 ) * wip * rainbow doc too * e not used * fix ppo doc * clean list * use same title	2018-08-28 18:13:36 -07:00
Robert Nishihara	6e1de19cc2	Bump version to 0.5.1. (#2755 )	2018-08-28 16:52:17 -07:00
Robert Nishihara	b7722897b4	Deprecate 'driver_mode' argument. (#2758 ) * Deprecate 'driver_mode' argument. * Fix * Fix	2018-08-28 16:45:49 -07:00
Alexey Tumanov	de047daea7	[xray] raylet scheduling mechanism with a simple spillback policy (#2749 ) ## What do these changes do? * distribute load and resource information on a heartbeat * for each raylet, maintain total and available resource capacity as well as measure of current load * this PR introduces a new notion of load, defined as a sum of all resource demand induced by queued ready tasks on the local raylet. This provides a heterogeneity-aware measure of load that supersedes legacy Ray's task count as a proxy for load. * modify the scheduling policy to perform capacity-based, load-aware, optimistically concurrent resource allocation * perform task spillover to the heartbeating node in response to a heartbeat, implementing heterogeneity-aware late-binding/work-stealing.	2018-08-28 00:03:34 -07:00
adoda	90ae8f11df	The function get_node_ip_address while catch an exception and return … (#2722 ) …'127.0.0.1', when we forbid the external network. Instead of we can get ip address from hostname. The function get_node_ip_address while catch an exception and return '127.0.0.1' when we forbid the external network. Instead of we can get ip address from hostname. https://github.com/ray-project/ray/issues/2721	2018-08-27 22:24:49 -07:00
Yuhong Guo	0b6e08ebee	Separate python logger module-wise (#2703 ) ## What do these changes do? 1. Separate the log related code to logger.py from services.py. 2. Allow users to modify logging formatter in `ray start`. ## Related issue number https://github.com/ray-project/ray/pull/2664	2018-08-26 13:46:14 -07:00
Richard Liaw	dbba7f2a53	[autoscaler] Cleanup Logging (#2709 ) Moves Autoscaler onto Python `logging` module.	2018-08-25 17:08:45 -07:00

1 2 3 4 5 ...

786 commits