hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-08 11:31:40 -05:00

Author	SHA1	Message	Date
Si-Yuan	9295ab8f60	Various Python code cleanups. (#3837 )	2019-02-03 10:16:24 -08:00
Michael Luo	1a015e420b	Optimal PPO Configs (10k reward in 1 hr) + PPO grad clipping implemented (#3934 )	2019-02-02 22:10:58 -08:00
Richard Liaw	eab6dd72b5	[tune] logging fixes, better warnings, better cluster support (#3906 )	2019-02-02 19:14:03 -08:00
Yuhong Guo	54cbb4396f	Prepare socket file when start ray (#3925 )	2019-02-02 12:53:36 +08:00
Eric Liang	0f81bc9a33	[rllib] on_train_result results do not get logged (#3865 )	2019-02-01 20:32:07 -08:00
Robert Nishihara	e0f82fd260	Fix building python 3.7 wheel by installing newer numpy. (#3927 )	2019-02-01 18:06:48 -08:00
Daniel Edgecumbe	315edab085	[autoscaler] Speedups (#3720 ) - NodeUpdater gets its' IP in parallel now (no longer in __init__) - We use persistent connections in SSH (temp folder created only for ray; ControlMaster) - hash_runtime_conf was performing a pointless hexlify step, wasting time on large files - We use NodeUpdaterThreads and share the NodeProvider; NodeUpdaterProcess is removed - AWSNodeProvider caches nodes more aggressively - NodeProvider now has a shim batch terminate_nodes() call; AWSNodeProvider parallelises it; the autoscaler uses it - AWSNodeProvider batches EC2 update_tags calls - Logging changes throughout to provide standardised timing information for profiling - Pulled out a few unnecessary is_running calls (NodeUpdater will loop waiting for SSH anyway) ## Related issue number Issue #3599	2019-02-01 02:46:32 -08:00
Daniel Edgecumbe	ff3c6af1d6	[autoscaler]: Remove assertion in info string (#3916 ) Fixes #3903	2019-02-01 00:32:24 -08:00
Tianming Xu	1302fafc0b	[Tune] Add export_formats option to export policy graphs (#3868 ) In earlier PRs, PR#3585 and PR#3637, export_policy_model and export_policy_checkpoint were introduced for users to export TensorFlow model and checkpoint. For Ray Tune users, these APIs are not accessible through YAML configurations. In this pull request, export_formats option is provided to enable users to choose the desired export format.	2019-01-31 17:07:27 -08:00
Kristian Hartikainen	b9eed2e86c	[autoscaler] Move attach helper text under exec_cluster (#3920 ) ## What do these changes do? Moves the attach command helper from cli commands to the actual `exec_cluster` function.	2019-01-31 17:01:24 -08:00
Peter Schafhalter	62a0a7bdc7	[tune] Add BayesOpt (#3864 ) Adds BayesOpt as a Tune suggestion algorithm.	2019-01-31 16:54:17 -08:00
Jimpachnet	d3551dd8df	[tune] Added possibility to execute infinite recovery retries for a trial (#3901 ) Allows to let a trial try to do infinite recoveries by setting _max_failures_ to a negative number.	2019-01-31 02:21:16 -08:00
Richard Liaw	d128636bab	Ray Logging Configuration (#3691 ) * fix logging for autoscaler * module logging * try this for logging * yapf * fix * Initial logging setup * momery * ok * remove basicconfig * catch * remove package logging * print * fix * try_fix * fix 1 * revert rllib * logging level * flake8 * fix * fix * Remove vestigal TODO	2019-01-30 21:01:12 -08:00
Robert Nishihara	d06d9fc5d7	Fix Python linting errors. (#3905 )	2019-01-30 13:43:18 -08:00
Eric Liang	152375aa8a	[rllib] Add evaluation option to DQN agent (#3835 ) * add eval * interval * multiagent minor fix * Update rllib.rst * Update ddpg.py * Update qmix.py	2019-01-29 21:19:53 -08:00
Eric Liang	fb73cedf70	[rllib] Add examples page, add hierarchical training example, delete SC2 examples (#3815 ) * wip * lint * wip * up * wip * update examples * wip * remove carla * update * improve envspec * link to custom * Update rllib-env.rst * update * fix * fn * lint * ds * ssd games * desc * fix up docs * fix	2019-01-29 21:06:09 -08:00
Bruno Morier	c9819a721d	Update tempfile_services.py (#3896 ) Fix an invalid reference to os.errno. errno have been removed from os in python 3.7. The fix only replaces it by the already imported errno.	2019-01-29 19:33:02 -08:00
Eric Liang	c75038b945	[autoscaler] Updating a file in file mounts causes all worker nodes to get restarted	2019-01-27 17:41:37 -08:00
Stephanie Wang	ad9f1721d1	Fix object_manager_test.py::object_transfer_retry test (#3863 )	2019-01-27 13:55:38 -08:00
Yuhong Guo	066fa8abf3	Fix monitor_test.py by waiting for moniter.py to start working (#3840 ) * Wait for moniter.py to start working * Checkout None result in state.py	2019-01-25 18:07:15 +08:00
Philipp Moritz	20162ce159	Compile raylet cython bindings with bazel (#3842 )	2019-01-25 00:57:31 -08:00
Si-Yuan	48139cf861	Migrate Python C extension to Cython (#3541 )	2019-01-24 09:17:14 -08:00
Eric Liang	04ec47cbd4	[rllib] annotate public vs developer vs private APIs (#3808 )	2019-01-23 21:27:26 -08:00
Wang Qing	816406ea3d	[Java] Fix `setCurrentTask()` in multi threading (#3821 )	2019-01-23 20:45:30 +08:00
Robert Nishihara	0b1608a546	Factor out code for starting new processes and test plasma store in valgrind. (#3824 ) * Factor out starting Ray processes. * Detect flags through environment variables. * Return ProcessInfo from start_ray_process. * Print valgrind errors at exit. * Test valgrind in travis. * Some valgrind fixes. * Undo raylet monitor change. * Only test plasma store in valgrind.	2019-01-22 14:59:11 -08:00
Eric Liang	f0e6523323	[rllib] Don't call reset() unless necessary for multi-agent envs	2019-01-20 15:00:18 -08:00
Eric Liang	aad48ee5a5	[tune] Fully deprecate raw function literals in Tune (#3788 ) Related: https://github.com/ray-project/ray/issues/3785	2019-01-19 17:09:36 -08:00
Michael Luo	16f7ca45e4	Appo (#3779 ) * Deleted old fork, updated new ray and moved PPO-impala to APPO in ppo folder * Deleted unneccesary vtrace.py file * Update pong-impala.yaml * Cleaned PPO Code * Update pong-impala.yaml * Update pong-impala.yaml * wip * new ifle * refactor * add vtrace off option * revert * support any space * docs * fix comment * remove kl * Update cartpole-appo-vtrace.yaml	2019-01-18 13:40:26 -08:00
Robert Nishihara	9af5a62e05	Give better error for old-style actor classes. (#3793 )	2019-01-17 19:05:04 -08:00
Richard Liaw	0537508106	Bump strings for 0.6.2 (#3801 )	2019-01-17 19:03:27 -08:00
Jones Wong	319c1340cb	[rllib] Develop MARWIL (#3635 ) * add marvil policy graph * fix typo * add offline optimizer and enable running marwil * fix loss function * add maintaining the moving average of advantage norm * use sync replay optimizer for unifying * remove offline optimizer and use sync replay optimizer * format by yapf * add imitation learning objective * fix according to eric's review * format by yapf * revise * add test data * marwil	2019-01-16 19:00:43 -08:00
Richard Liaw	75ac016e2b	Bump version (#3787 )	2019-01-16 11:40:54 -08:00
Richard Liaw	fa99fda2b4	Application Stress Tests (#3612 )	2019-01-16 02:05:16 -08:00
Richard Liaw	c28e6d41f5	[tune] Avoid overwriting checkpoint file (#3781 )	2019-01-16 02:03:16 -08:00
Eric Liang	401e656b95	[rllib] Sync filters at end of iteration not start; hierarchical docs (#3769 )	2019-01-15 16:25:25 -08:00
Richard Liaw	3918934dfd	[tune] Cross-Node Recovery (#3725 ) Augments trial restore to also check if the runner is at the same location. If not, the checkpoint files are pushed onto the new location.	2019-01-15 10:37:28 -08:00
Si-Yuan	a5df8e3532	minor fix (#3770 )	2019-01-14 13:52:51 -08:00
Robert Nishihara	19908c01b8	Use environment markers to only install faulthandler in Python < 3.3. (#3764 )	2019-01-14 15:55:59 +08:00
Eugene Vinitsky	a5d1f03515	[rllib] fix for rollout of lstm policies (#3643 ) * fix for lstm policies * added call to local evaluator * Update python/ray/rllib/rollout.py Co-Authored-By: eugenevinitsky <eugenevinitsky@users.noreply.github.com> * Update rollout.py * Update rollout.py	2019-01-13 15:54:23 -08:00
Philipp Moritz	00e9f8d870	Fix pyarrow version (#3760 )	2019-01-13 14:28:23 -08:00
Yuhong Guo	d2cf8561f2	Refactor code about ray.ObjectID. (#3674 ) * Refactor code about ray.ObjectID. * remove from_random and use nil_id instead of constructor * remove id() in hash * Lint and fix * Change driver id to ObjectID * Replace binary_to_hex(ObjectID.id()) to ObjectID.hex()	2019-01-13 01:47:29 -08:00
Eric Liang	c4b058739b	Remove redundant error message (#3761 )	2019-01-12 22:22:41 -08:00
James Casbon	528bb3afd9	gcp allow manual network configuration (#3748 )	2019-01-12 14:02:20 -08:00
Robert Nishihara	fbea1ece2e	Clear new actor handle list after submitting task. (#3755 )	2019-01-12 23:25:40 +08:00
Robert Nishihara	8723d6b061	Define a Node class to manage Ray processes. (#3733 ) * Implement Node class and move most of services.py into it. * Wait for nodes as they are added to the cluster. * Fix Redis authentication bug. * Fix bug in client table ordering. * Address comments. * Kill raylet before plasma store in test. * Minor	2019-01-11 22:30:38 -08:00
Stephanie Wang	cc5ecd71c5	[autoscaler] Add kill and get IP commands to CLI for testing (#3731 ) ## What do these changes do? Adds 2 commands to the CLI that take in an autoscaler config: 1. Kill a random ray node in the cluster. 2. Get all the worker node IP addresses. These commands are both for testing and are not recommended for normal use. ## Related issue number Closes #3685.	2019-01-10 22:06:57 -08:00
Richard Liaw	574f0b73bc	[tune] Fix Trial Serialization (#3743 )	2019-01-10 19:26:10 -08:00
Hao Chen	597abb24ea	Refine multi-threading support (#3672 ) * [Python] refine multi-threading support fix * [java] refine multithreading code fix java * format	2019-01-10 13:58:11 -08:00
Eric Liang	71243203a4	[rllib] Fix KeyError: 'kl' in multiagent ppo training	2019-01-09 19:33:07 -08:00
Richard Liaw	edb7aaf7c7	[tune] Better Serialization for Server (#3708 ) * Add cloudpickle for serialization * Fix tests	2019-01-09 11:55:32 -08:00

1 2 3 4 5 ...

1070 commits