hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-08 19:41:38 -05:00

Author	SHA1	Message	Date
Eric Liang	0c0bd4d41c	[rllib] Use model.value_function() in MARWIL (#4036 ) * fix marwil * add ph * fix	2019-02-14 19:35:21 -08:00
Philipp Moritz	077ffd99bf	Bump version from 0.6.3 to 0.7.0.dev0 in docs and .yaml (#4042 )	2019-02-14 12:08:48 -08:00
Si-Yuan	2de31eb489	minor fix (#4040 )	2019-02-13 17:22:45 -08:00
Eric Liang	2dccf383dd	[rllib] Basic infrastructure for off-policy estimation (IS, WIS) (#3941 )	2019-02-13 16:25:05 -08:00
Kristian Hartikainen	729d0b2825	[autoscaler] docker run options (#3921 ) Adds support for docker options, allowing for use of nvidia-docker. Closes #2657.	2019-02-13 12:26:28 -08:00
bjg2	0e37ac6d1d	[wingman -> rllib] Remote and entangled environments (#3968 ) * added all our environment changes * fixed merge request comments and remote env * fixed remote check * moved remote_worker_envs to correct config section * lint * auto wrap impl * fix * fixed the tests	2019-02-13 10:08:26 -08:00
Hao Chen	f31a79f3f7	Implement actor checkpointing (#3839 ) * Implement Actor checkpointing * docs * fix * fix * fix * move restore-from-checkpoint to HandleActorStateTransition * Revert "move restore-from-checkpoint to HandleActorStateTransition" This reverts commit 9aa4447c1e3e321f42a1d895d72f17098b72de12. * resubmit waiting tasks when actor frontier restored * add doc about num_actor_checkpoints_to_keep=1 * add num_actor_checkpoints_to_keep to Cython * add checkpoint_expired api * check if actor class is abstract * change checkpoint_ids to long string * implement java * Refactor to delay actor creation publish until checkpoint is resumed * debug, lint * Erase from checkpoints to restore if task fails * fix lint * update comments * avoid duplicated actor notification log * fix unintended change * add actor_id to checkpoint_expired * small java updates * make checkpoint info per actor * lint * Remove logging * Remove old actor checkpointing Python code, move new checkpointing code to FunctionActionManager * Replace old actor checkpointing tests * Fix test and lint * address comments * consolidate kill_actor * Remove __ray_checkpoint__ * fix non-ascii char * Loosen test checks * fix java * fix sphinx-build	2019-02-13 19:39:02 +08:00
Andrew Tan	57dcd3033e	[tune] Trial reporter fix (#3951 ) Fixes #3949.	2019-02-13 01:03:54 -08:00
William Ma	e1a479b137	Add teardown_module to test_queue.py (#4012 )	2019-02-12 22:43:09 -08:00
Si-Yuan	21472b890a	Integrate "tempfile_service" into "ray.node.Node" (#3953 )	2019-02-12 17:34:04 -08:00
Adi Zimmerman	dac1969647	[tune] Add Nevergrad to Tune (#3985 )	2019-02-12 11:00:04 -08:00
Wang Qing	c523bc04ad	Enable redis password in Java worker (#3943 ) * Support Java redis password * Fix * Refine * Fix lint.	2019-02-12 13:11:25 +08:00
Adi Zimmerman	9797028a91	[tune] Add scikit-optimize to Tune (#3924 )	2019-02-11 17:06:02 -08:00
Eric Liang	8df772867c	[rllib] rename compute_apply to learn_on_batch	2019-02-11 15:22:15 -08:00
Eric Liang	c4182463f6	[rllib] Add helper to iterate over envs in a vectorized environment (#4001 ) * add foreach env func * fix * add test	2019-02-11 10:40:47 -08:00
Ion	3c32343c63	Ray signal (#3624 )	2019-02-11 10:14:48 -08:00
Zhijun Fu	7097ba393b	protect raylet against bad messages (#4003 ) * protect raylet against bad messages * address comments * linting and regression test	2019-02-12 00:39:38 +08:00
Philipp Moritz	ab809bd927	update ray version to 0.7.0dev (#3995 )	2019-02-10 19:56:42 -08:00
Eric Liang	8e9f2c923f	[autoscaler] Use RLock in addition to FileLock	2019-02-10 19:16:43 -08:00
Yuhong Guo	5fb1efd60d	Fix CI test failures (#4007 )	2019-02-11 11:01:14 +08:00
bjg2	e703b9f49d	[wingman -> rllib] Improved stats changes in AsyncSamplesOptimizer (#3966 ) * added stats changes to optimizer * changes timers * fix python 2 compat * improved optimizer throughput stats * Update async_samples_optimizer.py * fix python2 compat	2019-02-10 01:25:22 -08:00
Eric Liang	29322c7389	[rllib] Replay buffer for IMPALA should default to 0 slots. (#3971 ) * disable replay * make lq configurable * leak test * Update run_multi_node_tests.sh	2019-02-08 10:03:11 -08:00
Robert Nishihara	6a32b410bb	Update versions from 0.6.2 -> 0.6.3 in the documentation. (#3981 )	2019-02-07 20:57:37 -08:00
Robert Nishihara	ef527f84ab	Stream logs to driver by default. (#3892 ) * Stream logs to driver by default. * Fix from rebase * Redirect raylet output independently of worker output. * Fix. * Create redis client with services.create_redis_client. * Suppress Redis connection error at exit. * Remove thread_safe_client from redis. * Shutdown driver threads in ray.shutdown(). * Add warning for too many log messages. * Only stop threads if worker is connected. * Only stop threads if they exist. * Remove unnecessary try/excepts. * Fix * Only add new logging handler once. * Increase timeout. * Fix tempfile test. * Fix logging in cluster_utils. * Revert "Increase timeout." This reverts commit b3846b89040bcd8e583b2e18cb513cb040e71d95. * Retry longer when connecting to plasma store from node manager and object manager. * Close pubsub channels to avoid leaking file descriptors. * Limit log monitor open files to 200. * Increase plasma connect retries. * Add comment.	2019-02-07 19:53:50 -08:00
Philipp Moritz	0aa74fb1fd	Update cloudpickle to 0.8.0.dev0 (#3964 )	2019-02-07 15:24:06 -08:00
Eric Liang	ae4bc7d6e8	[revert] [rllib] Add copy() in async samples optimizer	2019-02-07 14:14:39 -08:00
markgoodhead	5ce670cb36	[tune] Add Initial Parameter Suggestion for HyperOpt (#3944 ) Allows users of the HyperOptSearch suggestion algorithm to specify initial experiment values to run (typically already known good baseline parameters within the domain specified)	2019-02-07 10:57:51 -08:00
Richard Liaw	5db1afef07	[tune] Support Custom Resources (#2979 ) Support arbitrary resource declarations in Tune. Fixes https://github.com/ray-project/ray/issues/2875	2019-02-07 00:29:19 -08:00
Stephanie Wang	d2b6db3db1	Bump version from 0.6.2 to 0.6.3 (#3972 )	2019-02-06 19:11:16 -08:00
Eric Liang	04fc145a44	[autoscaler] Autoscaler hangs forever on non-zero exit code command (#3969 )	2019-02-06 17:25:24 -08:00
Robert Nishihara	fa4eb8313d	Suppress warning for serializing different unique ID types in Python. (#3872 ) * Suppress warning for serializing different unique ID types in Python. * Add _ID_TYPES variable.	2019-02-05 11:38:33 -08:00
vfdev	b2b8417790	[tune] Improve mnist_pytorch.py example (#3894 ) ## What do these changes do? * Improved --no-cuda handling * Removed deprecated Variable usage ## Related issue number Fixes #3873 <!-- Are there any issues opened that will be resolved by merging this change? -->	2019-02-04 17:59:54 -08:00
William Ma	f067223c4a	Allow Ray processes to be started inside of gdb and tmux. (#3847 )	2019-02-04 15:23:39 -08:00
Wang Qing	e1c68a0881	Enable including Java worker for `ray start` command (#3838 )	2019-02-04 16:23:43 +08:00
Eric Liang	7ef830bef1	[rllib] Add copy() in async samples optimizer to fix memory leak (#3938 ) Fixes #3884.	2019-02-03 18:34:37 -08:00
Andrew Tan	8323419a6d	[tune] Add SigOpt Integration (#3844 )	2019-02-03 18:23:57 -08:00
Kristian Hartikainen	85294fb503	[autoscaler] node caching changes (#3937 ) Breaks the node provider node getter into cached and non-cached versions. Fixes #3930 by updating the node label finger print before updating labels. Fixes #3935 by refreshing node cache if node ip is not found.	2019-02-03 17:48:07 -08:00
James Casbon	976f018dab	[autoscaler] GCP: only call setIamPolicy if necessary (#3782 )	2019-02-03 16:16:00 -08:00
James Casbon	b8cc176b4d	[autoscaler] Document gcp subnet config (#3783 ) Adds info to the gcp example yaml on using shared subnets.	2019-02-03 16:14:44 -08:00
Si-Yuan	9295ab8f60	Various Python code cleanups. (#3837 )	2019-02-03 10:16:24 -08:00
Michael Luo	1a015e420b	Optimal PPO Configs (10k reward in 1 hr) + PPO grad clipping implemented (#3934 )	2019-02-02 22:10:58 -08:00
Richard Liaw	eab6dd72b5	[tune] logging fixes, better warnings, better cluster support (#3906 )	2019-02-02 19:14:03 -08:00
Yuhong Guo	54cbb4396f	Prepare socket file when start ray (#3925 )	2019-02-02 12:53:36 +08:00
Eric Liang	0f81bc9a33	[rllib] on_train_result results do not get logged (#3865 )	2019-02-01 20:32:07 -08:00
Robert Nishihara	e0f82fd260	Fix building python 3.7 wheel by installing newer numpy. (#3927 )	2019-02-01 18:06:48 -08:00
Daniel Edgecumbe	315edab085	[autoscaler] Speedups (#3720 ) - NodeUpdater gets its' IP in parallel now (no longer in __init__) - We use persistent connections in SSH (temp folder created only for ray; ControlMaster) - hash_runtime_conf was performing a pointless hexlify step, wasting time on large files - We use NodeUpdaterThreads and share the NodeProvider; NodeUpdaterProcess is removed - AWSNodeProvider caches nodes more aggressively - NodeProvider now has a shim batch terminate_nodes() call; AWSNodeProvider parallelises it; the autoscaler uses it - AWSNodeProvider batches EC2 update_tags calls - Logging changes throughout to provide standardised timing information for profiling - Pulled out a few unnecessary is_running calls (NodeUpdater will loop waiting for SSH anyway) ## Related issue number Issue #3599	2019-02-01 02:46:32 -08:00
Daniel Edgecumbe	ff3c6af1d6	[autoscaler]: Remove assertion in info string (#3916 ) Fixes #3903	2019-02-01 00:32:24 -08:00
Tianming Xu	1302fafc0b	[Tune] Add export_formats option to export policy graphs (#3868 ) In earlier PRs, PR#3585 and PR#3637, export_policy_model and export_policy_checkpoint were introduced for users to export TensorFlow model and checkpoint. For Ray Tune users, these APIs are not accessible through YAML configurations. In this pull request, export_formats option is provided to enable users to choose the desired export format.	2019-01-31 17:07:27 -08:00
Kristian Hartikainen	b9eed2e86c	[autoscaler] Move attach helper text under exec_cluster (#3920 ) ## What do these changes do? Moves the attach command helper from cli commands to the actual `exec_cluster` function.	2019-01-31 17:01:24 -08:00
Peter Schafhalter	62a0a7bdc7	[tune] Add BayesOpt (#3864 ) Adds BayesOpt as a Tune suggestion algorithm.	2019-01-31 16:54:17 -08:00

1 2 3 4 5 ...

1109 commits