hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Robert Nishihara	20b8b1d891	Add script for running stress tests. (#3378 ) * Add script for running stress tests. * Add an actor tree test where actors die with some probability * Improve test. * Small fix * Update tests. * Minor change	2018-11-27 04:28:02 -08:00
Eric Liang	e3c088fa1e	[rllib] PPO doesn't work with fractional num gpus (#3396 ) * frac ppo * gpu test	2018-11-27 01:14:10 -08:00
Eric Liang	aa94d3dd50	[autoscaler] Allow more than 5s from node creation to first heartbeat (#3385 )	2018-11-26 17:25:05 -08:00
Robert Nishihara	0f0099fb90	UI changes, fix the task timeline and add the object transfer timeline to UI. (#3397 ) * Saving * Fix cmake and remove object/task search boxes. * Add comment	2018-11-25 10:16:49 -08:00
Eric Liang	b85e7b43f3	[rllib] Refactor the sampler (#3387 ) * refactor * fix test * add perf test * Update sampler.py	2018-11-24 18:16:54 -08:00
Robert Nishihara	3856533065	Fix incompatibility with most recent version of Redis. (#3379 ) * Fix incompatibility with most recent version of Redis. * Fix * Fixes.	2018-11-24 16:36:38 -08:00
Eric Liang	18a8dbfcfb	[rllib] Clip DDPG ou-noise to avoid exceeding action bounds (#3386 ) Closes #2965	2018-11-24 00:56:50 -08:00
Eric Liang	55fca828ce	[rllib] Fix use_lstm option when using custom model with dict space (#3368 ) ## What do these changes do? This passes in the right obs space to the lstm model wrapper, so that it doesn't attempt to un-flatten the already processed dict observation. ## Related issue number Closes https://github.com/ray-project/ray/issues/3367	2018-11-23 22:51:08 -08:00
Eric Liang	8b76bab25c	[rllib] docs for td3 (#3381 ) * td3 doc * Update rllib-env.rst	2018-11-22 13:36:47 -08:00
Eric Liang	41b6b50d09	fix py3 (#3382 )	2018-11-22 11:43:52 -08:00
GiliR4t1qbit	b9ae5edf74	When getting a role/profile, catch only exception that indicates the role/profile already exists, allow others to be raised (#3383 )	2018-11-22 09:42:58 -08:00
Jones Wong	24bfe8ab76	Enable Twin Delayed DDPG for RLlib DDPG agent (#3353 )	2018-11-21 20:03:20 -08:00
Stephanie Wang	6b3236349c	Fix memory leak in lineage cache (#3366 ) * Move children_ map inside Lineage * Update lineage_cache.cc * Test and fixes * Remove unused	2018-11-21 16:18:39 -08:00
Richard Liaw	784a6399b0	[tune] Node Fault Tolerance (#3238 ) This PR introduces single-node fault tolerance for Tune. ## Previous behavior: - Actors will be restarted without checking if resources are available. This can lead to problems if we lose resources. ## New behavior: - RUNNING trials will be resumed on another node on a best effort basis (meaning they will run if resources available). - If the cluster is saturated, RUNNING trials on that failed node will become PENDING and queued. - During recovery, TrialSchedulers and SearchAlgorithms should receive notification of this (via `trial_runner.stop_trial`) so that they don’t wait/block for a trial that isn’t running. Remaining questions: - Should `last_result` be consistent during restore? Yes; but not for earlier trials (trials that are yet to be checkpointed). - Waiting for some PRs to merge first (#3239) Closes #2851.	2018-11-21 12:38:16 -08:00
Stephanie Wang	3e33f6f71b	Fix failure handling for actor death (#3359 ) * Broadcast actor death, clean up dummy objects * Reduce logging and clean up state when failing a task * lint * Make actor failure test nicer, reduce node timeout	2018-11-21 12:26:22 -08:00
Philipp Moritz	1a926c9b7c	Fix $MACOSX_DEPLOYMENT_TARGET (#3337 )	2018-11-21 10:56:17 -08:00
Eric Liang	686cf20951	Remove uses of std::list::size (#3358 ) * worker pool and client conn * Fix linting * unordered set * move	2018-11-20 14:47:55 -08:00
Richard Liaw	c24d87b4d1	[autoscaler] Submit command (#3312 )	2018-11-20 14:03:34 -08:00
Philipp Moritz	d3697ce4e1	Ready queue refactor to make Dispatching tasks more efficient (#3324 ) * put queues outside * working version, still needs to be optimized * implement round robin * proper round robin * fix spillback * update * fix * cleanup * more cleanups * fix * fix * add documentation * explanation for hash combiner * speed it up * cleanup and linting * linting * comments * Update scheduling_queue.h * temp commit * fixes * update * fix * cleanup * cleanup * lint * more prints * more prints * increase sleep * documentation * sleep * fix * fix * sleep longer * update * fix * fix * fix * Add ordered_set container. * Fix * Linting * Constructors * Remove O(n) call to list.size(). * fixes * use ordered set * Fix. * Add documentation. * Add iterators to ordered_set container implementation. * iterator_type -> iterator * Make typedefs private * Add const_iterator * fix * fix test * linting * lint * update * add documentation * linting	2018-11-20 13:14:12 -08:00
Ujval Misra	b0bfd104f2	Batch heartbeats from node manager together in the monitor. (#3011 )	2018-11-20 09:52:27 -08:00
Eric Liang	abdc3b592e	[rllib] Update multi-gpu impala numbers (#3327 )	2018-11-19 20:55:27 -08:00
Eric Liang	5972c29d28	[rllib] Set ape-x local exploration to 0, also load explorations before training steps (#3349 ) ## What do these changes do? This should fix high explorations being used after restore / for rollouts. ## Related issue number (dev list issue)	2018-11-19 20:36:25 -08:00
Eric Liang	afc48d7b77	Don't setpgid() on actors (#3347 )	2018-11-19 17:35:26 -08:00
Robert Nishihara	f2b5500642	Add ordered_set container. (#3352 ) * Add ordered_set container. * Fix * Linting * Constructors * Remove O(n) call to list.size(). * Fix. * Add documentation. * Add iterators to ordered_set container implementation. * iterator_type -> iterator * Make typedefs private * Add const_iterator	2018-11-19 17:01:18 -08:00
Eric Liang	d4dbd27e0d	Don't retry IPC connect an absurd number of times (#3355 )	2018-11-19 16:23:59 -08:00
Eric Liang	e4bb5d8d16	Fix logging when ray cluster utils is used	2018-11-18 21:49:27 -08:00
Eric Liang	61e3bbbfee	Update stale example links	2018-11-17 15:40:38 -08:00
Robert Nishihara	5cbc597494	Suppress duplicate pre-emptive object pushes. (#3276 ) * Suppress duplicate pre-emptive object pushes. * Add test. * Fix linting * Remove timer and inline recent_pushes_ into local_objects_. * Improve test. * Fix * Fix linting * Enable retrying pull from same object manager. Randomize object manager. * Speed up test * Linting * Add test. * Minor * Lengthen pull timeout and reissue pull every time a new object becomes available. * Increase pull timeout in test. * Wait for nodes to start in object manager test. * Wait longer for nodes to start up in test. * Small fixes. * _submit -> _remote * Change assert to warning.	2018-11-16 23:02:45 -08:00
Wenting Shen	ab1e0f5c2f	support home path and relative path for temp-dir (#3329 )	2018-11-16 17:41:10 -08:00
Robert Nishihara	60b22d9a72	Don't unsubscribe dependencies for infeasible tasks. (#3338 ) * Make scheduling queues RemoveTasks return task states as well. * Add test * Don't unsubscribe for infeasible tasks when spilling over. * Linting * Address comments.	2018-11-16 11:33:00 -08:00
Eric Liang	e0bf9d7305	Add debug string to raylet (#3317 ) * initial debug string * format * wip debug string * fix compile * fix * update * finished * to file * logs dir * use temp root * fix * override	2018-11-15 21:47:50 -08:00
Robert Nishihara	d10cb570ab	Rename _submit -> _remote. (#3321 )	2018-11-15 15:30:18 -08:00
Robert Nishihara	98edf752a9	Note requirement cython==0.27.3 in installation instructions. (#3322 )	2018-11-15 15:27:19 -08:00
Philipp Moritz	1be1455d86	Fix redis crash when duplicate messages are appended to log. (#3316 )	2018-11-15 15:09:39 -08:00
Eric Liang	5723291db6	Raise exception if the node is nearly out of memory (#3323 ) * wip * add * comment * escape hatch * update * object store too * .2	2018-11-15 12:55:25 -08:00
Philipp Moritz	b6a12d1f97	Fix socket retry message (#3325 )	2018-11-15 12:14:19 -08:00
Lewis Belcher	5319fd044c	Update redis version in setup.py (#3333 ) * `redis` has released a new version (https://github.com/andymccurdy/redis-py/releases/tag/3.0.0) * `ray` is not compatible with this version * This PR adds the "compatible release" operator for `redis` version 2.10.6.	2018-11-15 10:40:08 -08:00
Eric Liang	706dc1d473	[rllib] Add test for multi-agent support and fix IMPALA multi-agent (#3289 ) IMPALA support for multiagent was broken since IMPALA has a requirement that batch sizes be of a certain length. However multi-agent envs can create variable-length batches. Fix this by adding zero-padding as needed (similar to the RNN case).	2018-11-14 14:14:07 -08:00
andrewztan	57c7b4238e	KL Divergence Metrics (#3300 ) * added KL divergence metrics * fix	2018-11-13 23:12:35 -08:00
Eric Liang	1660c9d627	Kill actor child processes on shutdown (#3297 ) * example * add env * test pg * change to test * add atexit test * Update rllib-env.rst * comment * revert unnecessary file * fix title when actor is idle * Update python/ray/actor.py Co-Authored-By: ericl <ekhliang@gmail.com>	2018-11-13 19:16:42 -08:00
Stephanie Wang	577c1dda74	Release sender connections as soon as WriteMessageAsync completes (#3313 )	2018-11-13 21:32:24 -05:00
Wang Qing	9d4847ad2d	[hot-fix] Fix error when calling Ray.init() twice. (#3314 )	2018-11-13 21:21:54 -05:00
Eric Liang	65c27c70cf	[rllib] Clean up agent resource configurations (#3296 ) Closes #3284	2018-11-13 18:00:03 -08:00
Philipp Moritz	d4fad222e1	Update profiling instructions for raylet (#3311 )	2018-11-13 17:48:33 -05:00
Richard Liaw	97f423781b	Clean up Ray processes after cluster util exits (#3278 )	2018-11-13 13:18:12 -08:00
Richard Liaw	c3a2c7ebed	[tune] Doc: Autofilled, StatusReporter (#3294 ) * autofill and revise doc page for things * lint * comments	2018-11-13 13:15:56 -08:00
Eric Liang	6ee7a3b571	[rllib] Raise worker TF intra_op threads to 2, lower driver intra_op threads to 8 (#3299 )	2018-11-13 11:41:58 -08:00
Richard Liaw	c0423db05c	[core] Add Global State Test for multi-node setting (#3239 ) * add test for adding node * multinode test fixes * First pass at allowing updatable values * Fix compilation issues * Add config file parsing * Full initialization * Wrote a good test * configuration parsing and stuff * docs * write some tests, make it good * fixed init * Add all config options and bring back stress tests. * Update python/ray/worker.py * Update python/ray/worker.py * Fix internalization * some last changes * Linting and Java fix * add docstring * Fix test, add assertions * pytest ext * lint * lint	2018-11-13 10:35:24 -08:00
Eric Liang	d90f365394	[rllib] Add self-supervised loss to model (#3291 ) # What do these changes do? Allow self-supervised losses to be easily defined in custom models. Add this to the reference policy graphs.	2018-11-12 18:55:24 -08:00
Philipp Moritz	ce6e01b988	enable incremental builds (#3292 )	2018-11-12 21:49:09 -05:00

1 2 3 4 5 ...

2322 commits