hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Qing Wang	1465a30ea9	Fix releasing CPUs incorrectly when actor creation task blocked. (#5271 ) * Fix * Remove useless log * Address * Fix typo * sleep	2019-07-28 15:46:17 +08:00
Richard Liaw	5ea859dc73	[sgd] hotfix example failure (#5297 ) * hotfix * Update train_example.py	2019-07-27 18:13:22 -07:00
Eric Liang	6f2c5b2819	Revert "[autoscaler] Clean up error messages on setup failure (#5210 )" (#5299 ) This reverts commit `7fc15dbf7f`.	2019-07-27 16:53:47 -07:00
lanlin	341dbf6c45	[tune] support nested dictionaries for CSVLogger (#5295 )	2019-07-27 14:44:34 -07:00
Richard Liaw	b4823d63c6	[autoscaler] Local YAML readability (#5290 )	2019-07-27 12:51:50 -07:00
Eric Liang	a62c5f40f6	[rllib] Document ModelV2 and clean up the models/ directory (#5277 )	2019-07-27 02:08:16 -07:00
Richard Liaw	9c00616cdc	Retry and exception for hang on memory store full (#5143 )	2019-07-27 01:20:13 -07:00
Richard Liaw	5e15b36d6e	[tune] experiment_analysis split to Analysis (#5115 )	2019-07-27 01:10:52 -07:00
Richard Liaw	7e715520e5	[sgd] Example for Training (#5292 )	2019-07-27 01:10:25 -07:00
Daniel Edgecumbe	06fec63c87	[autoscaler] Add a 'request_cores' function for manual autoscaling (#4754 )	2019-07-26 17:14:45 -07:00
lanlin	d9e81da3b8	[tune] configurable maximum length of trial identifier (#5287 )	2019-07-26 17:09:54 -07:00
Antoine Galataud	827618254a	[rllib] Configure learner queue timeout (#5270 ) * configure learner queue timeout * lint * use config * fix method args order, add unit test * fix wrong param name	2019-07-25 21:18:05 -07:00
Stephanie Wang	3321555975	Increase timeout for `ray.wait` test (#5273 ) * Increase test timeout for ray.wait * make sure the actor is scheduled	2019-07-25 14:23:46 -07:00
Eric Liang	bf9199ad77	[rllib] ModelV2 support for pytorch (#5249 )	2019-07-25 11:02:53 -07:00
Joey Jiang	40395acadf	[gRPC] Migrate raylet client implementation to grpc (#5120 )	2019-07-25 14:48:56 +08:00
Eric Liang	60f59639c1	[rllib] Port DDPG to the build_tf_policy pattern (#5242 )	2019-07-24 13:55:55 -07:00
Eric Liang	690b374581	[rllib] Add Keras LSTM example with ModelV2 (#5258 )	2019-07-24 13:09:41 -07:00
Eric Liang	5b76238bce	Fix two types of eviction hangs (#5225 )	2019-07-23 21:20:17 -07:00
Eric Liang	97c43284a6	[rllib] Fix trainer state restore (#5257 )	2019-07-23 21:18:58 -07:00
Stephanie Wang	9c651f47bb	Add regression test for actor load balancing (#5224 ) * Add regression test for actor load balancing * Increase timeout * Reduce number of nodes?	2019-07-23 15:11:55 -07:00
Stephanie Wang	15959b0f0d	Leave `ray.wait` calls open until the task or actor exits (#5234 ) * Regression test * Split TaskDependencyManager::SubscribeDependencies into ray.get and ray.wait dependencies - Some initial implementation * unit test * Improve unit tests for TaskDependencyManager * Implement SubscribeWaitDependencies and UnsubscribeWaitDependencies, unit tests passing * Add ray.wait python test for drivers that exit early * Add WorkerID to Worker * Update test to use two nodes * Regression test for ray.wait passes * Extend regression test to include ray.wait from an actor * Fix ClientID and WorkerIDs * lint * lint * Remove unnecessary ray_get argument * fix build	2019-07-23 11:55:28 -07:00
Peter Schafhalter	fc589050c9	[sgd] Deprecate old distributed SGD implementation (#5160 ) * Deprecate old distributed SGD implementation * Update README	2019-07-22 15:47:10 -07:00
Richard Liaw	7fc15dbf7f	[autoscaler] Clean up error messages on setup failure (#5210 )	2019-07-22 11:27:51 -07:00
Richard Liaw	53fb876a5f	Improved KeyboardInterrupt Exception Handling (#5237 )	2019-07-22 02:29:56 -07:00
Eric Liang	f9043cc49a	[rllib] Remove experimental eager support	2019-07-21 12:27:17 -07:00
Richard Liaw	b0c0de49a2	[tune] Fixup exception messages (#5238 )	2019-07-20 22:36:27 -07:00
Eric Liang	d58b986858	[rllib] MultiCategorical shouldn't return array for kl or entropy (#5215 ) * wip * fix	2019-07-19 12:12:04 -07:00
Jones Wong	da7676c925	Removed the implicit sync barrier at the end of each training iteration (#5217 ) * removed sync barrier at the end of each training iteration * formatted * modify the comment according to current semantics * lint check * Update trainer.py	2019-07-18 22:59:52 -07:00
Eric Liang	28e5c5555d	[rllib] Move some inline defs to avoid deserialization errors (#5228 ) * fix bug * move metrics too	2019-07-18 21:01:16 -07:00
Jones Wong	0af07bd493	Enable seeding actors for reproducible experiments (#5197 ) * enable graph-level worker-specific seed * lint checked * revised according to eric's suggestions * revised accordingly and added a test case * formated * Update test_reproducibility.py * Update trainer.py * Update rollout_worker.py * Update run_rllib_tests.sh * Update worker_set.py	2019-07-17 23:31:34 -07:00
Qingqing Mao	63f49f95dd	Improve memory check (#5216 ) * Improve MemoryMonitor - Add an env var to control the threshold. - Use cgroup memory limit and usage for container environment. * linting * white space * add comment	2019-07-17 23:30:02 -07:00
Jones Wong	81d297f87e	Remove redundant scaler of l2 reg (#5172 ) * remove redundant scaler of l2 reg * lint formatted * Update ddpg_policy.py	2019-07-17 15:11:27 -07:00
Jones Wong	ae03c42dd6	Fixed inconsistent action placeholder (#5213 )	2019-07-17 10:55:14 -07:00
Sam Toyer	214f09d969	[rllib] Make RLLib handle zero-length observation arrays (#5208 ) * [rllib] Make _summarize handle zero-len arrays Fixes #5207 * [rllib] Make aligned_array() handle empty arrays * [rllib] Conform with old yapf	2019-07-16 22:37:57 -07:00
Richard Liaw	3e0ad11ae0	Add heartbeat test + Fix monitor.py (#5191 )	2019-07-16 21:59:48 -07:00
Eric Liang	4fa2a6006c	[rllib] Remove nested import (#5204 ) * remove nested import * Update metrics.py	2019-07-16 10:52:56 -07:00
Eric Liang	047f4ccd61	[rllib] Fix rollout.py with tuple action space (#5201 ) * fix it * update doc too * fix rollout	2019-07-16 10:52:35 -07:00
Edward Oakes	e5be5fd46d	Remove dependencies from TaskExecutionSpecification (#5166 )	2019-07-15 18:15:21 -07:00
Hao Chen	ea6aa6409a	Reconstruct failed actors without sending tasks. (#5161 ) * fast reconstruct dead actors * add test * fix typos * remove debug print * small fix * fix typos * Update test_actor.py	2019-07-15 10:25:09 -07:00
Jones Wong	5b13a7eb90	Keep parameter space noise consistent with action space noise (Fix 5173) (#5193 ) * make parameter space noise consistent with action space noise * modified according to lint check * indent	2019-07-14 12:20:35 -07:00
Philipp Moritz	322b5166ad	Update arrow to include user defined status for plasma (#5156 )	2019-07-12 22:51:14 -07:00
Richard Liaw	b6509f46b0	Update wheels to 0.8.0dev2 (#5186 )	2019-07-12 17:27:03 -07:00
Richard Liaw	1530389822	[tune] Fast Node Recovery (#5053 )	2019-07-12 13:47:30 -07:00
Kristian Hartikainen	3456afdea7	[autoscaler] Fix missing body argument in GCP `getIamPolicy` #5169	2019-07-11 13:03:51 -07:00
Hao Chen	fd835d107e	Move task to common module and add checks in getter methods (#5147 )	2019-07-11 17:07:04 +08:00
Qing Wang	f2293243cc	[ID Refactor] Shorten the length of JobID to 4 bytes (#5110 ) * WIP * Fix * Add jobid test * Fix * Add python part * Fix * Fix tes * Remove TODOs * Fix C++ tests * Lint * Fix * Fix exporting functions in multiple ray.init * Fix java test * Fix lint * Fix linting * Address comments. * FIx * Address and fix linting * Refine and fix * Fix * address * Address comments. * Fix linting * Fix * Address * Address comments. * Address * Address * Fix * Fix * Fix * Fix lint * Fix * Fix linting * Address comments. * Fix linting * Address comments. * Fix linting * address comments. * Fix	2019-07-11 14:25:16 +08:00
Kai Yang	43b6513d19	[GCS] Move node resource info from client table to resource table (#5050 )	2019-07-11 13:17:19 +08:00
Richard Liaw	691c9733f9	[tune] Document trainable attributes and enable user-checkpoint… (#4868 )	2019-07-10 18:51:11 -07:00
Richard Liaw	0b540ab492	[tune] Test example checkpointing (#4728 )	2019-07-10 01:58:26 -07:00
Eric Liang	5ab5017c67	[rllib] Fix impala stress test (#5101 ) * add copy * upgrade to tf 1.14 * update * reduce count to workaround https://github.com/ray-project/ray/issues/5125 * Update impala.py * placeholder * comments * update	2019-07-09 20:22:30 -07:00

1 2 3 4 5 ...

1510 commits