hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-09 04:46:38 -04:00

Author	SHA1	Message	Date
Yuhong Guo	4b23a34c93	Fix multi-thread problem of function manager and Jenkins test (#3648 )	2019-01-03 17:05:13 +08:00
Eric Liang	ca864faece	[rllib] Documentation for I/O API and multi-agent support / cleanup (#3650 )	2019-01-03 15:15:36 +08:00
opherlieber	2177e2f410	[rllib] Agent: Allow unknown subkeys for custom_resources_per_worker (#3639 ) * RLLib Agent: Allow unknown subkeys for custom_resources_per_worker * Update agent.py	2019-01-03 14:19:59 +08:00
Eric Liang	47d36d7bd6	[rllib] Refactor pytorch custom model support (#3634 )	2019-01-03 13:48:33 +08:00
Robert Nishihara	b6bcd18d65	Split profile table among many keys in the GCS. (#3676 ) * Divide profile table among many keys in GCS. * Fix, and remove --collect-profiling-data arg. * Remove reference in doc.	2019-01-02 21:33:01 -08:00
Si-Yuan	93d54110f8	Prevent overriding faulthandler settings (#3668 ) This change ensures that Ray set up fault handlers only if it has not been enabled by other applications. Otherwise some applications could face strange issues when using Ray, and some unittests using xml runners will fail.	2018-12-31 16:36:26 -08:00
Yuhong Guo	c9b8ecca51	Add RayParams to refactor the parameters used by ray python. (#3558 )	2018-12-29 22:04:27 +08:00
Devin Petersohn	eb1e5fa2cf	Fixing Python2 compatibility issues. Adding inline docs (#3656 )	2018-12-28 22:53:28 -08:00
Richard Liaw	aad3c50e2d	[tune] Cluster Fault Tolerance (#3309 ) This PR introduces cluster-level fault tolerance for Tune by checkpointing global state. This occurs with relatively high frequency and allows users to easily resume experiments when the cluster crashes. Note that this PR may affect automated workflows due to auto-prompting, but this is resolvable.	2018-12-29 11:42:25 +08:00
Richard Liaw	ac792d70c8	[rllib] Add starcraft multiagent env as example (#3542 )	2018-12-27 10:00:32 +08:00
Tianming Xu	b4f61dfd50	[rllib] Export policy model checkpoint (#3637 ) * Export policy model checkpoint * update comment	2018-12-27 08:43:06 +09:00
Richard Liaw	6e2d7a9ba1	[tune] Support Configuration Merging (#3584 ) * merge configs * deep merge * lint * add resolve * test	2018-12-26 20:07:11 +09:00
Stan Wang	4ce3818be5	Average aggregated gradients before put in plasma store (#3631 )	2018-12-26 20:03:11 +09:00
Yuhong Guo	1b98fb8238	Fix Jenkins test failures and function descriptor bug. (#3569 ) ## What do these changes do? 1. Fix the Jenkins test failure by add driver id to Actor GCS Key. 2. Move `object_manager_test.py` from Jenkins to Travis.	2018-12-25 23:31:44 -08:00
Robert Nishihara	5426234cd8	Update documentation to reflect 0.6.1 release. (#3622 )	2018-12-24 11:10:04 -08:00
nam-cern	3d8f56409b	Ensure numpy is at least 1.10.4 in setup.py (#2462 ) In the build script, numpy is specifically set at 1.10.4. We should also ensure that it is indeed the case in `setup.py`.	2018-12-24 11:01:25 -08:00
Eric Liang	9f63119a83	[rllib] Allow development without needing to compile Ray (#3623 ) * wip * lint * wip * wip * rename * wip * Cleaner handling of cli prompt	2018-12-24 18:08:23 +09:00
Devin Petersohn	c13b2685f5	[modin] Append to path to avoid namespace collision on development branches (#3621 )	2018-12-23 23:58:56 -08:00
Alexey Tumanov	9b8d7573fe	bump version from 0.6.0 to 0.6.1 (#3610 )	2018-12-23 17:03:42 -08:00
Robert Nishihara	bb7ca3bae7	Upgrade flatbuffers version to 1.10.0. (#3559 ) * Upgrade flatbuffers version to 1.10.0. * Temporarily change ray.utils.decode for backwards compatibility.	2018-12-23 14:56:34 -08:00
Tianming Xu	deb26b954e	[rllib] Export tensorflow model of policy graph (#3585 ) * Export tensorflow model of policy graph * Add tests,examples,pydocs and infer extra signatures from existing methods * Add example usage in export_policy_model comment * Fix lint error * Fix lint error * Fix lint error	2018-12-22 17:35:25 +09:00
Eric Liang	ddc97864df	[rllib] Add requested clarifications to test requirement of contrib docs (#3589 )	2018-12-21 11:02:02 -08:00
Richard Liaw	e046a5c767	[tune] resources_per_trial from trial_resources (#3580 ) Renaming variable due to user errors.	2018-12-20 19:00:47 -08:00
Devin Petersohn	a174a46e02	Allowing multiple users to access the /tmp/ray file at the same time (#3591 ) * Allowing multiple users to access the /tmp/ray file at the same time Previous sequence that caused this issue: * User A starts ray with `ray.init` when /tmp/ray does not exist * User B starts ray with `ray.init` and /tmp/ray now exists User B will get a permissions error Checking the permissions, /tmp/ray is 700 I have identified a race condition in `try_to_create_directory` * Multiple processes try to create /tmp/ray at the same time * chmod is either silently erroring or working properly within the race condition Resolution: Move chmod outside of the check for whether the directory exists or not. * Adding try except for users who do not own the directory	2018-12-20 18:46:54 -08:00
Stephanie Wang	34bab6291c	Cleanup actor handle pickling code (#3560 ) * Cleanup actor handle pickling code * remove unused * fix * lint	2018-12-20 16:37:21 -08:00
Eric Liang	6bb1103930	[rllib] Avoid sample wastage with bad PPO configurations (#3552 ) ## What do these changes do? Previously we logged a warning if the PPO configuration would waste many samples. However, this didn't apply in the case of long episodes in `complete_episodes` batch mode, and also the amount of waste is up to 2x in common cases. This pr: - Estimates the number of sampling tasks needed to avoid over-sampling. - Collects all sample results and never discards any. In principle this can degrade performance at large scale if certain machines are slower. Add a config flag to enable this legacy behavior. ## Related issue number Closes: https://github.com/ray-project/ray/issues/3549	2018-12-20 10:50:44 -08:00
Richard Liaw	ac48a58e4e	[tune] Reduce scope of variant generator (#3583 ) This PR provides a better error message when the generate_variants code breaks. Also removes a comment about nesting dependencies. This comes mainly as a hotfix solution for #3466. We should leave that issue open for future contribution 🙂	2018-12-20 10:48:28 -08:00
Eric Liang	303883a3b6	[rllib] [rfc] add contrib module and guideline for merging (#3565 ) This adds guidelines for merging code into `rllib/contrib` vs `rllib/agents`. Also, clean up the agent import code to make registration easier.	2018-12-20 10:44:34 -08:00
adoda	cf0c4745f4	[rllib] support running older version tensorflow(version < 1.5.0) (#3571 )	2018-12-19 20:27:24 -08:00
Robert Nishihara	a5309bec7c	Make README render properly on PyPI. (#3578 ) * Make README render properly in pypi. * Add small logo * temporary fix * smaller image * Remove image size. * Add author and email to setup.py.	2018-12-19 18:41:09 -08:00
Eric Liang	ffa6ee3ec8	[rllib] streaming minibatching for IMPALA (#3402 ) * mb impala * fix * paropt * update * cpu warn * on cpu * fix mb * doc * docs * comment * larger num * early release * remove grad clip * only check loader count in multi gpu mode * revert bad multigpu changes * num sgd iter * comment * reuse optimizer * add test * par load test * loosen test * Update run_multi_node_tests.sh * fix local mode * Update agent.py	2018-12-19 02:23:29 -08:00
Alexey Tumanov	c4cba98c75	Remove deprecation warnings when running actor tests (#3563 ) * remove deprecation warnings when running actor tests * replacing logger.warn with logger.warning * Update worker.py * Update policy_client.py * Update compression.py	2018-12-18 17:04:51 -08:00
Yuhong Guo	fb33fa9097	Enable function_descriptor in backend to replace the function_id (#3028 )	2018-12-18 18:53:59 -05:00
Yuhong Guo	75ddf7cca4	Fix 2 small bugs (#3573 )	2018-12-18 14:52:21 -05:00
Eric Liang	db0dee573e	[rllib] Q-Mix implementation (Q-Mix, VDN, IQN, and Ape-X variants) (#3548 )	2018-12-18 10:40:01 -08:00
opherlieber	854b06854f	remove auto-concat of rollouts in AsyncSampler (#3556 ) * remove auto-concat of rollouts in AsyncSampler * remote auto-concat test * remove unused reference	2018-12-17 13:54:52 -08:00
Robert Nishihara	417c7f2d6f	Update arrow and remove plasma_manager references. (#3545 )	2018-12-15 23:36:02 -08:00
Philipp Moritz	b3bf608608	Update arrow to reduce plasma IPCs. (#3497 )	2018-12-14 23:49:37 -05:00
Richard Liaw	de3fdeb5b5	[autoscaler] Fix Error Handling for botocore (#3534 ) Unfortunately Boto generates error classes dynamically, so this catches the expected error and raises the error if it is the wrong class. Closes #3533.	2018-12-14 00:20:49 -08:00
Hao Chen	e7b51cbd1b	[xray] Implement Actor Reconstruction (#3332 ) * Implement Actor Reconstruction * fix * fix actor handle __del__ * fix lint * add comment * Remove actorCreationDummyObjectId * address comments * fix * address comments * avoid copy * change log to debug * fix error name	2018-12-13 21:28:58 -08:00
Si-Yuan	84fae57ab5	Convert the raylet client (the code in local_scheduler_client.cc) to proper C++. (#3511 ) * refactoring * fix bugs * create client class * create client class for java; bug fix * remove legacy code * improve code by using std::string, std::unique_ptr rename private fields and removing legacy code * rename class * improve naming * fix * rename files * fix names * change name * change return types * make a mutex private field * fix comments * fix bugs * lint * bug fix * bug fix * move too short functions into the header file * Loose crash conditions for some APIs. * Apply suggestions from code review Co-Authored-By: suquark <suquark@gmail.com> * format * update * rename python APIs * fix java * more fixes * change types of cpython interface * more fixes * improve error processing * improve error processing for java wrapper * lint * fix java * make fields const * use pointers for [out] parameters * fix java & error msg * fix resource leak, etc.	2018-12-13 13:39:10 -08:00
Chunyang Wen	5dcc333199	[sgd] Modify: add interface for model (#3458 ) * Modify: add interface for model * Modify: remove single quota and build; add metrics * Modify: flatten into list of dict * Update distributed_sgd.rst * Modify: update format with scripts/format.sh * Update sgd_worker.py	2018-12-12 21:23:25 -08:00
Eric Liang	0e00533ed4	Different approach to removing RayGetError (#3471 )	2018-12-12 20:30:51 -08:00
Eric Liang	32473cf22e	[rllib] Basic Offline Data IO API (#3473 )	2018-12-12 13:57:48 -08:00
Richard Liaw	cc8f7db246	[docs] Improve cluster/docker docs (#3517 ) - Surfaces local cluster usage - Increases visability of these instructions - Removes some docker docs (that are really out of scope for Ray documentation IMO) Closes #3517.	2018-12-12 10:40:54 -08:00
Eric Liang	5f4a9cc713	[rllib] Rollout should preprocess observations; some cleanups (#3512 ) <!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> ## What do these changes do? From https://groups.google.com/forum/#!topic/ray-dev/u-gybKK6-Ns	2018-12-11 20:16:38 -08:00
Eric Liang	59f4743f20	[rllib] Run simple regressions tests for all algs in jenkins (#3498 )	2018-12-11 17:21:53 -08:00
Richard Liaw	e0fbb68e47	[tune] Custom Logging, Trial Name (#3465 ) Adds support for custom loggers, custom trial strings, and custom sync commands. Closes #3034, #2985, and #3390.	2018-12-11 13:41:59 -08:00
Eric Liang	52df4dfc6f	[rllib] Fix multiagent_two_trainer test (#3509 ) * update * fix * dict ordre * fix * fix	2018-12-11 00:16:39 -08:00
Richard Liaw	1f4a01cff6	[tune] Fix PyTorch example after PyTorch v1 (#3500 ) * [tune] * fix * lint * fix	2018-12-10 12:00:53 -08:00

1 2 3 4 5 ...

1000 commits