hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Eric Liang	1251abf0d1	[rllib] Modularize Torch and TF policy graphs (#2294 ) * wip * cls * re * wip * wip * a3c working * torch support * pg works * lint * rm v2 * consumer id * clean up pg * clean up more * fix python 2.7 * tf session management * docs * dqn wip * fix compile * dqn * apex runs * up * impotrs * ddpg * quotes * fix tests * fix last r * fix tests * lint * pass checkpoint restore * kwar * nits * policy graph * fix yapf * com * class * pyt * vectorization * update * test cpe * unit test * fix ddpg2 * changes * wip * args * faster test * common * fix * add alg option * batch mode and policy serving * multi serving test * todo * wip * serving test * doc async env * num envs * comments * thread * remove init hook * update * fix ppo * comments1 * fix * updates * add jenkins tests * fix * fix pytorch * fix * fixes * fix a3c policy * fix squeeze * fix trunc on apex * fix squeezing for real * update * remove horizon test for now * multiagent wip * update * fix race condition * fix ma * t * doc * st * wip * example * wip * working * cartpole * wip * batch wip * fix bug * make other_batches None default * working * debug * nit * warn * comments * fix ppo * fix obs filter * update * wip * tf * update * fix * cleanup * cleanup * spacing * model * fix * dqn * fix ddpg * doc * keep names * update * fix * com * docs * clarify model outputs * Update torch_policy_graph.py * fix obs filter * pass thru worker index * fix * rename * vlad torch comments * fix log action * debug name * fix lstm * remove unused ddpg net * remove conv net * revert lstm * cast * clean up * fix lstm check * move to end * fix sphinx * fix cmd * remove bad doc * clarify * copy * async sa * fix	2018-06-26 13:17:15 -07:00
Eric Liang	a9a26b7560	[rllib] Part 2 of multiagent support (#2286 ) * wip * cls * re * wip * wip * a3c working * torch support * pg works * lint * rm v2 * consumer id * clean up pg * clean up more * fix python 2.7 * tf session management * docs * dqn wip * fix compile * dqn * apex runs * up * impotrs * ddpg * quotes * fix tests * fix last r * fix tests * lint * pass checkpoint restore * kwar * nits * policy graph * fix yapf * com * class * pyt * vectorization * update * test cpe * unit test * fix ddpg2 * changes * wip * args * faster test * common * fix * add alg option * batch mode and policy serving * multi serving test * todo * wip * serving test * doc async env * num envs * comments * thread * remove init hook * update * fix ppo * comments1 * fix * updates * add jenkins tests * fix * fix pytorch * fix * fixes * fix a3c policy * fix squeeze * fix trunc on apex * fix squeezing for real * update * remove horizon test for now * multiagent wip * update * fix race condition * fix ma * t * doc * st * wip * example * wip * working * cartpole * wip * batch wip * fix bug * make other_batches None default * working * debug * nit * warn * comments * fix ppo * fix obs filter * update * fix obs filter * pass thru worker index * fix * fix log action * debug name * fix sphinx	2018-06-25 22:33:57 -07:00
Robert Nishihara	800f7cc77d	Make actor handles work in Python mode. (#2283 ) * Make actor handles work in local mode. * Add test for actor handles in local mode.	2018-06-20 23:02:41 -07:00
Robert Nishihara	ff2217251f	[xray] Add error table and push error messages to driver through node manager. (#2256 ) * Fix documentation indentation. * Add error table to GCS and push error messages through node manager. * Add type to error data. * Linting * Fix failure_test bug. * Linting. * Enable one more test. * Attempt to fix doc building. * Restructuring * Fixes * More fixes. * Move current_time_ms function into util.h.	2018-06-20 21:29:28 -07:00
Robert Nishihara	18ee044f03	Re-enable some actor tests. (#2276 )	2018-06-20 14:42:35 -07:00
Zongheng Yang	8190ff1fd0	Experimental: enable automatic GCS flushing with configurable policy. (#2266 ) * build_credis.sh: use an up-to-date credis commit. * build_credis.sh: leveldb is updated, so update build cmds for it * WIP: make monitor.py issue flush; switch gcs client to use credis * Experimental: enable automatic GCS flushing with configurable policy. * Fix linux compilation error * Fix leveldb build * Use optimized build for credis * Address comments * Attempt to fix tests	2018-06-20 14:40:57 -07:00
Eric Liang	e5724a9cfe	[rllib] Add a simple REST policy server and client example (#2232 ) * wip * cls * re * wip * wip * a3c working * torch support * pg works * lint * rm v2 * consumer id * clean up pg * clean up more * fix python 2.7 * tf session management * docs * dqn wip * fix compile * dqn * apex runs * up * impotrs * ddpg * quotes * fix tests * fix last r * fix tests * lint * pass checkpoint restore * kwar * nits * policy graph * fix yapf * com * class * pyt * vectorization * update * test cpe * unit test * fix ddpg2 * changes * wip * args * faster test * common * fix * add alg option * batch mode and policy serving * multi serving test * todo * wip * serving test * doc async env * num envs * comments * thread * remove init hook * update * policy serve * spaces * checkpoint * no train * fix ppo * comments1 * fix * updates * add jenkins tests * fix * fix pytorch * fix * fixes * fix a3c policy * fix squeeze * fix trunc on apex * fix squeezing for real * update * remove horizon test for now * fix race condition * update * com * updat * add test * Update run_multi_node_tests.sh * use curl * curl * kill * Update run_multi_node_tests.sh * Update run_multi_node_tests.sh * fix import * update	2018-06-20 13:22:39 -07:00
Richard Liaw	418cd6804a	[asv] Pushing to s3 (#2246 )	2018-06-20 10:43:44 -07:00
Eric Liang	7dee2c6735	[rllib] Envs for vectorized execution, async execution, and policy serving (#2170 ) ## What do these changes do? Vectorized envs: Users can either implement `VectorEnv`, or alternatively set `num_envs=N` to auto-vectorize gym envs (this vectorizes just the action computation part). ``` # CartPole-v0 on single core with 64x64 MLP: # vector_width=1: Actions per second 2720.1284458322966 # vector_width=8: Actions per second 13773.035334888269 # vector_width=64: Actions per second 37903.20472563333 ``` Async envs: The more general form of `VectorEnv` is `AsyncVectorEnv`, which allows agents to execute out of lockstep. We use this as an adapter to support `ServingEnv`. Since we can convert any other form of env to `AsyncVectorEnv`, utils.sampler has been rewritten to run against this interface. Policy serving: This provides an env which is not stepped. Rather, the env executes in its own thread, querying the policy for actions via `self.get_action(obs)`, and reporting results via `self.log_returns(rewards)`. We also support logging of off-policy actions via `self.log_action(obs, action)`. This is a more convenient API for some use cases, and also provides parallelizable support for policy serving (for example, if you start a HTTP server in the env) and ingest of offline logs (if the env reads from serving logs). Any of these types of envs can be passed to RLlib agents. RLlib handles conversions internally in CommonPolicyEvaluator, for example: ``` gym.Env => rllib.VectorEnv => rllib.AsyncVectorEnv rllib.ServingEnv => rllib.AsyncVectorEnv ```	2018-06-18 11:55:32 -07:00
Robert Nishihara	61139e1509	Enable fractional resources and resource IDs for xray. (#2187 ) * Implement GPU IDs and fractional resources. * Add documentation and python exceptions. * Fix signed/unsigned comparison. * Fix linting. * Fixes from rebase. * Re-enable tests that use ray.wait. * Don't kill the raylet if an infeasible task is submitted. * Ignore tests that require better load balancing. * Linting * Ignore array test. * Ignore stress test reconstructions tests. * Don't kill node manager if remote node manager disconnects. * Ignore more stress tests. * Naming changes * Remove outdated todo * Small fix * Re-enable test. * Linting * Fix resource bookkeeping for blocked tasks. * Fix linting * Fix Java client. * Ignore test * Ignore put error tests	2018-06-10 15:31:43 -07:00
Philipp Moritz	4ec5bea03b	[xray] Implement fetch (#2195 )	2018-06-09 23:36:27 -07:00
Robert Nishihara	125fe1c09c	Print warning when defining very large remote function or actor. (#2179 ) * Print warning when defining very large remote function or actor. * Add weak test. * Check that warnings appear in test. * Make wait_for_errors actually fail in failure_test.py. * Use constants for error types. * Fix	2018-06-09 19:59:15 -07:00
Eric Liang	71eb558eb0	[rllib] Refactor rllib to have a common sample collection pathway (#2149 )	2018-06-09 00:21:35 -07:00
Melih Elibol	7246ff80a4	[xray] Implements ray.wait (#2162 ) Implements ray.wait for xray. Fixes #1128.	2018-06-06 16:56:44 -07:00
Adam Gleave	6ef3b255ea	Launch nodes in separate threads (#2183 ) Modifies the autoscaler to run launch_new_nodes in a separate thread, keeping track of the number of pending requests.	2018-06-05 20:19:31 -07:00
Richard Liaw	13d4e0db95	Add Docker Support for ASV (#2184 ) * added new instructions and script * initialize ray only once * use ray-project/asv master	2018-06-05 15:55:35 -07:00
Binglin Chang	19d6ca0670	Support constructing TensorFlowVariables from multiple tf operations (#2182 )	2018-06-02 18:13:52 -07:00
Kunal Gosar	317d0da7d8	Add experimental API for ray.get and ray.wait with additional argument types (#2071 )	2018-06-01 16:42:27 -07:00
Kristian Hartikainen	74dc14d1fc	[autoscaler] GCP node provider (#2061 ) * Google Cloud Platform scaffolding * Add minimal gcp config example * Add googleapiclient discoveries, update gcp.config constants * Rename and update gcp.config key pair name function * Implement gcp.config._configure_project * Fix the create project get project flow * Implement gcp.config._configure_iam_role * Implement service account iam binding * Implement gcp.config._configure_key_pair * Implement rsa key pair generation * Implement gcp.config._configure_subnet * Save work-in-progress gcp.config._configure_firewall_rules. These are likely to be not needed at all. Saving them if we happen to need them later. * Remove unnecessary firewall configuration * Update example-minimal.yaml configuration * Add new wait_for_compute_operation, rename old wait_for_operation * Temporarily rename autoscaler tags due to gcp incompatibility * Implement initial gcp.node_provider.nodes * Still missing filter support * Implement initial gcp.node_provider.create_node * Implement another compute wait operation (wait_For_compute_zone_operation). TODO: figure out if we can remove the function. * Implement initial gcp.node_provider._node and node status functions * Implement initial gcp.node_provider.terminate_node * Implement node tagging and ip getter methods for nodes * Temporarily rename tags due to gcp incompatibility * Tiny tweaks for autoscaler.updater * Remove unused config from gcp node_provider * Add new example-full example to gcp, update load_gcp_example_config * Implement label filtering for gcp.node_provider.nodes * Revert unnecessary change in ssh command * Revert "Temporarily rename tags due to gcp incompatibility" This reverts commit e2fe634c5d11d705c0f5d3e76c80c37394bb23fb. * Revert "Temporarily rename autoscaler tags due to gcp incompatibility" This reverts commit c938ee435f4b75854a14e78242ad7f1d1ed8ad4b. * Refactor autoscaler tagging to support multiple tag specs * Remove missing cryptography imports * Update quote function import * Fix threading issue in gcp.config with the compute discovery object * Add gcs support for log_sync * Fix the labels/tags naming discrepancy * Add expanduser to file_mounts hashing * Fix gcp.node_provider.internal_ip * Add uuid to node name * Remove 'set -i' from updater ssh command * Also add TODO with the context and reason for the change. * Update ssh key creation in autoscaler.gcp.config * Fix wait_for_compute_zone_operation's threading issue Google discovery api's compute object is not thread safe, and thus needs to be recreated for each thread. This moves the `wait_for_compute_zone_operation` under `autoscaler.gcp.config`, and adds compute as its argument. * Address pr feedback from @ericl * Expand local file mount paths in NodeUpdater * Add ssh_user name to key names * Update updater ssh to attempt 'set -i' and fall back if that fails * Update gcp/example-full.yaml * Fix wait crm operation in gcp.config * Update gcp/example-minimal.yaml to match aws/example-minimal.yaml * Fix gcp/example-full.yaml comment indentation * Add gcp/example-full.yaml to setup files * Update example-full.yaml command * Revert "Refactor autoscaler tagging to support multiple tag specs" This reverts commit 9cf48409ca2e5b66f800153853072c706fa502f6. * Update tag spec to only use characters [0-9a-z_-] * Change the tag values to conform gcp spec * Add project_id in the ssh key name * Replace '_' with '-' in autoscaler tag names * Revert "Update updater ssh to attempt 'set -i' and fall back if that fails" This reverts commit 23a0066c5254449e49746bd5e43b94b66f32bfb4. * Revert "Remove 'set -i' from updater ssh command" This reverts commit 5fa034cdf79fa7f8903691518c0d75699c630172. * Add fallback to `set -i` in force_interactive command * Update autoscaler tests to match current implementation * Update GCPNodeProvider.create_node to include hash in instance name * Add support for creating multiple instance on one create_node call * Clean TODOs * Update styles * Replace single quotes with double quotes * Some minor indentation fixes etc. * Remove unnecessary comment. Fix indentation. * Yapfify files that fail flake8 test * Yapfify more files * Update project_id handling in gcp node provider * temporary yapf mod * Revert "temporary yapf mod" This reverts commit b6744e4e15d4d936d1a14f4bf155ed1d3bb14126. * Fix autoscaler/updater.py lint error, remove unused variable	2018-05-31 09:00:03 -07:00
Alok Singh	fd234e3171	[rllib] Fix A3C PyTorch implementation (#2036 ) * Use F.softmax instead of a pointless network layer Stateless functions should not be network layers. * Use correct pytorch functions * Rename argument name to out_size Matches in_size and makes more sense. * Fix shapes of tensors Advantages and rewards both should be scalars, and therefore a list of them should be 1D. * Fmt * replace deprecated function * rm unnecessary Variable wrapper * rm all use of torch Variables Torch does this for us now. * Ensure that values are flat list * Fix shape error in conv nets * fmt * Fix shape errors Reshaping the action before stepping in the env fixes a few errors. * Add TODO * Use correct filter size Works when `self.config['model']['channel_major'] = True`. * Add missing channel major * Revert reshape of action This should be handled by the agent or at least in a cleaner way that doesn't break existing envs. * Squeeze action * Squeeze actions along first dimension This should deal with some cases such as cartpole where actions are scalars while leaving alone cases where actions are arrays (some robotics tasks). * try adding pytorch tests * typo * fixup docker messages * Fix A3C for some envs Pendulum doesn't work since it's an edge case (expects singleton arrays, which `.squeeze()` collapses to scalars). * fmt * nit flake * small lint	2018-05-30 10:48:11 -07:00
Robert Nishihara	6172f94c04	Implement Python global state API for xray. (#2125 ) * Implement global state API for xray. * Fix object table. * Fixes for log structure. * Implement cluster_resources. * Add driver task to task table. * Remove python flatbuffers code * Get some global state API tests running. * Python linting. * Fix linting. * Fix mock modules for doc * Copy over flatbuffer bindings. * Fix for tests. * Linting * Fix monitor crash.	2018-05-29 16:25:54 -07:00
Eric Liang	bc2a83e698	Fix support for actor classmethods (#2146 )	2018-05-28 17:43:23 -07:00
Zongheng Yang	fa97acbc89	Integrate credis with Ray & route task table entries into credis. (#1841 )	2018-05-24 23:35:25 -07:00
Yucong He	3509a33cf3	Prototype named actors. (#2129 )	2018-05-24 00:32:12 -07:00
Alok Singh	f795173b51	Use flake8-comprehensions (#1976 ) * Add flake8 to Travis * Add flake8-comprehensions [flake8 plugin](https://github.com/adamchainz/flake8-comprehensions) that checks for useless constructions. * Use generators instead of lists where appropriate A lot of the builtins can take in generators instead of lists. This commit applies `flake8-comprehensions` to find them. * Fix lint error * Fix some string formatting The rest can be fixed in another PR * Fix compound literals syntax This should probably be merged after #1963. * dict() -> {} * Use dict literal syntax dict(...) -> {...} * Rewrite nested dicts * Fix hanging indent * Add missing import * Add missing quote * fmt * Add missing whitespace * rm duplicate pip install This is already installed in another file. * Fix indent * move `merge_dicts` into utils * Bring up to date with `master` * Add automatic syntax upgrade * rm pyupgrade In case users want to still use it on their own, the upgrade-syn.sh script was left in the `.travis` dir.	2018-05-20 16:15:06 -07:00
Robert Nishihara	99ae74e1d2	Improve error message printing and suppression. (#2104 )	2018-05-20 12:13:14 -07:00
Alok Singh	9a8f29e571	YAPF, take 3 (#2098 ) * Use pep8 style The original style file is actually just pep8 style, but with everything spelled out. It's easier to use the `based_on_style` feature. Any overrides are clearer that way. * Improve yapf script 1. Do formatting in parallel 2. Lint RLlib 3. Use .style.yapf file * Pull out expressions into variables * Don't format rllib * Don't allow splits in dicts * Apply yapf * Disallow single line if-statements * Use arithmetic comparison * Simplify checking for changed files * Pull out expr into var	2018-05-19 16:07:28 -07:00
Robert Nishihara	78e4b021ab	Functions for flushing done tasks and evicted objects. (#2033 )	2018-05-18 01:59:58 -07:00
Adam Gleave	470887c2ad	Support calling positional arguments by keyword (fix #998 ) (#2081 )	2018-05-17 16:10:26 -07:00
Melih Elibol	bea97b425b	Fix python linting (#2076 )	2018-05-16 15:04:31 -07:00
Robert Nishihara	570c3153cd	Some tests for _submit API. (#2062 )	2018-05-16 00:26:25 -07:00
Eric Liang	3f1dd29eab	[autoscaler] Remove faulty assert that breaks during downscaling, pull configs from env (#2006 ) * fixes * coment out test * Update ray_constants.py * Update autoscaler_test.py * Update ray_constants.py * lint * lint	2018-05-15 12:47:11 -07:00
Robert Nishihara	8fbb88485b	Create RemoteFunction class, remove FunctionProperties, simplify worker Python code. (#2052 ) * Cleaning up worker and actor code. Create remote function class. Remove FunctionProperties object. * Remove register_actor_signatures function. * Small cleanups. * Fix linting. * Support @ray.method syntax for actor methods. * Fix pickling bug. * Fix linting. * Shorten testBlockingTasks. * Small fixes. * Call get_global_worker().	2018-05-14 14:35:23 -07:00
Robert Nishihara	52b0f3734a	[xray] Add Travis build for testing xray on Linux. (#2047 ) * Run xray tests in travis. * Comment out TaskTests.testSubmittingManyTasks. * Comment out failing tests. * Comment out hanging test. * Linting * Comment out failing test. * Comment out failing test. * Ignore test_dataframe.py for now. * Comment out testDriverExitingQuickly.	2018-05-13 21:22:01 -07:00
Robert Nishihara	18071d95a7	Use more CPUs for testMultipleWaitsAndGets. (#2051 )	2018-05-13 15:35:02 -07:00
eric-jj	71997a481b	Improve shared_ptr usage (#2030 ) [xray] Improve shared_ptr usage	2018-05-11 20:05:04 -07:00
Robert Nishihara	77c8aa7627	Make ActorHandles pickleable, also make proper ActorHandle and ActorC… (#2007 ) * Make ActorHandles pickleable, also make proper ActorHandle and ActorClass classes. * Fix bug. * Fix actor test bug. * Update __ray_terminate__ usage. * Fix most linting, add documentation, and small cleanups. * Handle forking and pickling differently for actor handles. Fix linting. * Fixes for named actors via pickling. * Generate actor handle IDs deterministically in the pickling case.	2018-05-08 19:19:07 -07:00
Alok Singh	cdf94c18a4	Clean up syntax for supported Python versions. (#1963 ) * Use set/dict literal syntax Ran code through [pyupgrade](https://github.com/asottile/pyupgrade). This is supported in every Python version 2.7+. * Drop unnecessary string format specification No need to specify 0,1.. if paramters are passed in order. * Revert "Drop unnecessary string format specification" This reverts commit efa5ec85d30ff69f34e5ed93e31343fea7647bcb. * Undo changes to cloudpickle Drop use of set literal until cloudpickle uses it. * Reformat code with YAPF We need to set up a git pre-push hook to automatically run this stuff.	2018-05-03 07:45:11 -07:00
Eric Liang	7ab890f4a1	[tune] [rllib] Automatically determine RLlib resources and add queueing mechanism for autoscaling (#1848 )	2018-04-16 16:58:15 -07:00
Robert Nishihara	7792032ee3	Fix UI issue for non-json-serializable task arguments. (#1892 ) * Fix UI issue for non-json-serializable task arguments. * Simplify approach.	2018-04-15 13:54:42 -07:00
alvkao58	15a668dd12	[RLLib] DDPG (#1685 )	2018-04-11 15:08:39 -07:00
Philipp Moritz	74162d1492	Lint Python files with Yapf (#1872 )	2018-04-11 10:11:35 -07:00
Robert Nishihara	256389dc59	Use new task spec for computing IDs in raylet code path. (#1830 ) * Use new task spec for computing IDs in raylet code path. * Fix linting. * Fixes * Fix test.	2018-04-08 13:31:55 -07:00
Stephanie Wang	bf194db4bc	[xray] Basic actor support (#1835 )	2018-04-06 00:17:14 -07:00
Robert Nishihara	5bde5e75e7	Implement unsafe method for flushing entire object table and task table. (#1824 ) * Implement unsafe method for flushing entire object table and task table. * Add test. * Fix test.	2018-04-04 18:29:24 -07:00
Richard Liaw	888e70f1be	[tune] HyperOpt Support (v2) (#1763 )	2018-04-04 11:08:26 -07:00
Robert Nishihara	fbfbb1c079	[xray] Integrate worker.py with raylet. (#1810 ) * Integrate worker with raylet. * Begin allowing worker to attach to cluster. * Fix linting and documentation. * Fix linting. * Comment tests back in. * Fix type of worker command. * Remove xray python files and tests. * Fix from rebase. * Add test. * Copy over raylet executable. * Small cleanup.	2018-04-03 02:38:56 -07:00
Robert Nishihara	0fc989c6c1	Don't use 127.0.0.1 for local ip address. (#1596 ) * Don't use 127.0.0.1 for ip address. * Update test	2018-04-02 00:34:20 -07:00
Robert Nishihara	0c835a379f	Fix resource bookkeeping for blocked actor methods. (#1766 )	2018-03-21 20:48:04 -07:00
Robert Nishihara	c6ad71fc9d	Fix bug when connecting another driver in local case. (#1760 ) * Allow connecting another driver when using ip address 127.0.0.1. * Add test.	2018-03-21 11:49:53 -07:00

1 2 3 4 5 ...

391 commits