hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Eric Liang	8b76bab25c	[rllib] docs for td3 (#3381 ) * td3 doc * Update rllib-env.rst	2018-11-22 13:36:47 -08:00
Eric Liang	41b6b50d09	fix py3 (#3382 )	2018-11-22 11:43:52 -08:00
GiliR4t1qbit	b9ae5edf74	When getting a role/profile, catch only exception that indicates the role/profile already exists, allow others to be raised (#3383 )	2018-11-22 09:42:58 -08:00
Jones Wong	24bfe8ab76	Enable Twin Delayed DDPG for RLlib DDPG agent (#3353 )	2018-11-21 20:03:20 -08:00
Stephanie Wang	6b3236349c	Fix memory leak in lineage cache (#3366 ) * Move children_ map inside Lineage * Update lineage_cache.cc * Test and fixes * Remove unused	2018-11-21 16:18:39 -08:00
Richard Liaw	784a6399b0	[tune] Node Fault Tolerance (#3238 ) This PR introduces single-node fault tolerance for Tune. ## Previous behavior: - Actors will be restarted without checking if resources are available. This can lead to problems if we lose resources. ## New behavior: - RUNNING trials will be resumed on another node on a best effort basis (meaning they will run if resources available). - If the cluster is saturated, RUNNING trials on that failed node will become PENDING and queued. - During recovery, TrialSchedulers and SearchAlgorithms should receive notification of this (via `trial_runner.stop_trial`) so that they don’t wait/block for a trial that isn’t running. Remaining questions: - Should `last_result` be consistent during restore? Yes; but not for earlier trials (trials that are yet to be checkpointed). - Waiting for some PRs to merge first (#3239) Closes #2851.	2018-11-21 12:38:16 -08:00
Stephanie Wang	3e33f6f71b	Fix failure handling for actor death (#3359 ) * Broadcast actor death, clean up dummy objects * Reduce logging and clean up state when failing a task * lint * Make actor failure test nicer, reduce node timeout	2018-11-21 12:26:22 -08:00
Philipp Moritz	1a926c9b7c	Fix $MACOSX_DEPLOYMENT_TARGET (#3337 )	2018-11-21 10:56:17 -08:00
Eric Liang	686cf20951	Remove uses of std::list::size (#3358 ) * worker pool and client conn * Fix linting * unordered set * move	2018-11-20 14:47:55 -08:00
Richard Liaw	c24d87b4d1	[autoscaler] Submit command (#3312 )	2018-11-20 14:03:34 -08:00
Philipp Moritz	d3697ce4e1	Ready queue refactor to make Dispatching tasks more efficient (#3324 ) * put queues outside * working version, still needs to be optimized * implement round robin * proper round robin * fix spillback * update * fix * cleanup * more cleanups * fix * fix * add documentation * explanation for hash combiner * speed it up * cleanup and linting * linting * comments * Update scheduling_queue.h * temp commit * fixes * update * fix * cleanup * cleanup * lint * more prints * more prints * increase sleep * documentation * sleep * fix * fix * sleep longer * update * fix * fix * fix * Add ordered_set container. * Fix * Linting * Constructors * Remove O(n) call to list.size(). * fixes * use ordered set * Fix. * Add documentation. * Add iterators to ordered_set container implementation. * iterator_type -> iterator * Make typedefs private * Add const_iterator * fix * fix test * linting * lint * update * add documentation * linting	2018-11-20 13:14:12 -08:00
Ujval Misra	b0bfd104f2	Batch heartbeats from node manager together in the monitor. (#3011 )	2018-11-20 09:52:27 -08:00
Eric Liang	abdc3b592e	[rllib] Update multi-gpu impala numbers (#3327 )	2018-11-19 20:55:27 -08:00
Eric Liang	5972c29d28	[rllib] Set ape-x local exploration to 0, also load explorations before training steps (#3349 ) ## What do these changes do? This should fix high explorations being used after restore / for rollouts. ## Related issue number (dev list issue)	2018-11-19 20:36:25 -08:00
Eric Liang	afc48d7b77	Don't setpgid() on actors (#3347 )	2018-11-19 17:35:26 -08:00
Robert Nishihara	f2b5500642	Add ordered_set container. (#3352 ) * Add ordered_set container. * Fix * Linting * Constructors * Remove O(n) call to list.size(). * Fix. * Add documentation. * Add iterators to ordered_set container implementation. * iterator_type -> iterator * Make typedefs private * Add const_iterator	2018-11-19 17:01:18 -08:00
Eric Liang	d4dbd27e0d	Don't retry IPC connect an absurd number of times (#3355 )	2018-11-19 16:23:59 -08:00
Eric Liang	e4bb5d8d16	Fix logging when ray cluster utils is used	2018-11-18 21:49:27 -08:00
Eric Liang	61e3bbbfee	Update stale example links	2018-11-17 15:40:38 -08:00
Robert Nishihara	5cbc597494	Suppress duplicate pre-emptive object pushes. (#3276 ) * Suppress duplicate pre-emptive object pushes. * Add test. * Fix linting * Remove timer and inline recent_pushes_ into local_objects_. * Improve test. * Fix * Fix linting * Enable retrying pull from same object manager. Randomize object manager. * Speed up test * Linting * Add test. * Minor * Lengthen pull timeout and reissue pull every time a new object becomes available. * Increase pull timeout in test. * Wait for nodes to start in object manager test. * Wait longer for nodes to start up in test. * Small fixes. * _submit -> _remote * Change assert to warning.	2018-11-16 23:02:45 -08:00
Wenting Shen	ab1e0f5c2f	support home path and relative path for temp-dir (#3329 )	2018-11-16 17:41:10 -08:00
Robert Nishihara	60b22d9a72	Don't unsubscribe dependencies for infeasible tasks. (#3338 ) * Make scheduling queues RemoveTasks return task states as well. * Add test * Don't unsubscribe for infeasible tasks when spilling over. * Linting * Address comments.	2018-11-16 11:33:00 -08:00
Eric Liang	e0bf9d7305	Add debug string to raylet (#3317 ) * initial debug string * format * wip debug string * fix compile * fix * update * finished * to file * logs dir * use temp root * fix * override	2018-11-15 21:47:50 -08:00
Robert Nishihara	d10cb570ab	Rename _submit -> _remote. (#3321 )	2018-11-15 15:30:18 -08:00
Robert Nishihara	98edf752a9	Note requirement cython==0.27.3 in installation instructions. (#3322 )	2018-11-15 15:27:19 -08:00
Philipp Moritz	1be1455d86	Fix redis crash when duplicate messages are appended to log. (#3316 )	2018-11-15 15:09:39 -08:00
Eric Liang	5723291db6	Raise exception if the node is nearly out of memory (#3323 ) * wip * add * comment * escape hatch * update * object store too * .2	2018-11-15 12:55:25 -08:00
Philipp Moritz	b6a12d1f97	Fix socket retry message (#3325 )	2018-11-15 12:14:19 -08:00
Lewis Belcher	5319fd044c	Update redis version in setup.py (#3333 ) * `redis` has released a new version (https://github.com/andymccurdy/redis-py/releases/tag/3.0.0) * `ray` is not compatible with this version * This PR adds the "compatible release" operator for `redis` version 2.10.6.	2018-11-15 10:40:08 -08:00
Eric Liang	706dc1d473	[rllib] Add test for multi-agent support and fix IMPALA multi-agent (#3289 ) IMPALA support for multiagent was broken since IMPALA has a requirement that batch sizes be of a certain length. However multi-agent envs can create variable-length batches. Fix this by adding zero-padding as needed (similar to the RNN case).	2018-11-14 14:14:07 -08:00
andrewztan	57c7b4238e	KL Divergence Metrics (#3300 ) * added KL divergence metrics * fix	2018-11-13 23:12:35 -08:00
Eric Liang	1660c9d627	Kill actor child processes on shutdown (#3297 ) * example * add env * test pg * change to test * add atexit test * Update rllib-env.rst * comment * revert unnecessary file * fix title when actor is idle * Update python/ray/actor.py Co-Authored-By: ericl <ekhliang@gmail.com>	2018-11-13 19:16:42 -08:00
Stephanie Wang	577c1dda74	Release sender connections as soon as WriteMessageAsync completes (#3313 )	2018-11-13 21:32:24 -05:00
Wang Qing	9d4847ad2d	[hot-fix] Fix error when calling Ray.init() twice. (#3314 )	2018-11-13 21:21:54 -05:00
Eric Liang	65c27c70cf	[rllib] Clean up agent resource configurations (#3296 ) Closes #3284	2018-11-13 18:00:03 -08:00
Philipp Moritz	d4fad222e1	Update profiling instructions for raylet (#3311 )	2018-11-13 17:48:33 -05:00
Richard Liaw	97f423781b	Clean up Ray processes after cluster util exits (#3278 )	2018-11-13 13:18:12 -08:00
Richard Liaw	c3a2c7ebed	[tune] Doc: Autofilled, StatusReporter (#3294 ) * autofill and revise doc page for things * lint * comments	2018-11-13 13:15:56 -08:00
Eric Liang	6ee7a3b571	[rllib] Raise worker TF intra_op threads to 2, lower driver intra_op threads to 8 (#3299 )	2018-11-13 11:41:58 -08:00
Richard Liaw	c0423db05c	[core] Add Global State Test for multi-node setting (#3239 ) * add test for adding node * multinode test fixes * First pass at allowing updatable values * Fix compilation issues * Add config file parsing * Full initialization * Wrote a good test * configuration parsing and stuff * docs * write some tests, make it good * fixed init * Add all config options and bring back stress tests. * Update python/ray/worker.py * Update python/ray/worker.py * Fix internalization * some last changes * Linting and Java fix * add docstring * Fix test, add assertions * pytest ext * lint * lint	2018-11-13 10:35:24 -08:00
Eric Liang	d90f365394	[rllib] Add self-supervised loss to model (#3291 ) # What do these changes do? Allow self-supervised losses to be easily defined in custom models. Add this to the reference policy graphs.	2018-11-12 18:55:24 -08:00
Philipp Moritz	ce6e01b988	enable incremental builds (#3292 )	2018-11-12 21:49:09 -05:00
Eric Liang	bd0dbde149	[rllib] Rename ServingEnv => ExternalEnv (#3302 )	2018-11-12 16:31:27 -08:00
Richard Liaw	e37891d79d	[tune] Fix default handling for timesteps (#3293 ) This PR fixes an issue where previously if timesteps_this_iter = 0, then it would render as "None". Closes #3057.	2018-11-12 15:52:17 -08:00
Eric Liang	49e2085d78	[rllib] Don't reset envs when possible (#3290 ) * laz * better errors	2018-11-11 01:45:37 -08:00
Eric Liang	463511f8a6	[tune] Track and warn on low memory (#3298 )	2018-11-11 00:29:45 -08:00
Eric Liang	53489d2f85	[sgd] Document and add simple MNIST example (#3236 )	2018-11-10 21:52:20 -08:00
Ion	d681893b0f	Speed up task dispatch. (#3234 ) * speed up task dispatch * minor changes * improved comments * improved comments * change argument of DispatchTasks to list of tasks * dispatch only tasks whose dependencies have been fullfiled * some updated comments * refactored DispatchQueue() and Assigntask() to avoid the copy of the ready list * minor fixes * some more minor fixes * some more minor fixes * added more comments * better comments? * fixed all feedback comments, minus making the argument of AssignTask() const * Assigntask() now taskes a const argument * Do the task copy outside of the callback * fix linting	2018-11-10 09:55:12 -08:00
Richard Liaw	29c182d449	[tune] Support "None" for upload_dir	2018-11-09 22:02:08 -08:00
Eric Liang	a51d618d88	[autoscaler] missing example-full.yaml file in the latest wheel for provider type "local"	2018-11-09 21:25:15 -08:00

1 2 3 4 5 ...

2214 commits