hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 18:41:40 -05:00

Author	SHA1	Message	Date
Eric Liang	aad48ee5a5	[tune] Fully deprecate raw function literals in Tune (#3788 ) Related: https://github.com/ray-project/ray/issues/3785	2019-01-19 17:09:36 -08:00
Michael Luo	16f7ca45e4	Appo (#3779 ) * Deleted old fork, updated new ray and moved PPO-impala to APPO in ppo folder * Deleted unneccesary vtrace.py file * Update pong-impala.yaml * Cleaned PPO Code * Update pong-impala.yaml * Update pong-impala.yaml * wip * new ifle * refactor * add vtrace off option * revert * support any space * docs * fix comment * remove kl * Update cartpole-appo-vtrace.yaml	2019-01-18 13:40:26 -08:00
Robert Nishihara	9af5a62e05	Give better error for old-style actor classes. (#3793 )	2019-01-17 19:05:04 -08:00
Richard Liaw	0537508106	Bump strings for 0.6.2 (#3801 )	2019-01-17 19:03:27 -08:00
Jones Wong	319c1340cb	[rllib] Develop MARWIL (#3635 ) * add marvil policy graph * fix typo * add offline optimizer and enable running marwil * fix loss function * add maintaining the moving average of advantage norm * use sync replay optimizer for unifying * remove offline optimizer and use sync replay optimizer * format by yapf * add imitation learning objective * fix according to eric's review * format by yapf * revise * add test data * marwil	2019-01-16 19:00:43 -08:00
Richard Liaw	75ac016e2b	Bump version (#3787 )	2019-01-16 11:40:54 -08:00
Richard Liaw	fa99fda2b4	Application Stress Tests (#3612 )	2019-01-16 02:05:16 -08:00
Richard Liaw	c28e6d41f5	[tune] Avoid overwriting checkpoint file (#3781 )	2019-01-16 02:03:16 -08:00
Eric Liang	401e656b95	[rllib] Sync filters at end of iteration not start; hierarchical docs (#3769 )	2019-01-15 16:25:25 -08:00
Richard Liaw	3918934dfd	[tune] Cross-Node Recovery (#3725 ) Augments trial restore to also check if the runner is at the same location. If not, the checkpoint files are pushed onto the new location.	2019-01-15 10:37:28 -08:00
Si-Yuan	a5df8e3532	minor fix (#3770 )	2019-01-14 13:52:51 -08:00
Robert Nishihara	19908c01b8	Use environment markers to only install faulthandler in Python < 3.3. (#3764 )	2019-01-14 15:55:59 +08:00
Eugene Vinitsky	a5d1f03515	[rllib] fix for rollout of lstm policies (#3643 ) * fix for lstm policies * added call to local evaluator * Update python/ray/rllib/rollout.py Co-Authored-By: eugenevinitsky <eugenevinitsky@users.noreply.github.com> * Update rollout.py * Update rollout.py	2019-01-13 15:54:23 -08:00
Philipp Moritz	00e9f8d870	Fix pyarrow version (#3760 )	2019-01-13 14:28:23 -08:00
Yuhong Guo	d2cf8561f2	Refactor code about ray.ObjectID. (#3674 ) * Refactor code about ray.ObjectID. * remove from_random and use nil_id instead of constructor * remove id() in hash * Lint and fix * Change driver id to ObjectID * Replace binary_to_hex(ObjectID.id()) to ObjectID.hex()	2019-01-13 01:47:29 -08:00
Eric Liang	c4b058739b	Remove redundant error message (#3761 )	2019-01-12 22:22:41 -08:00
James Casbon	528bb3afd9	gcp allow manual network configuration (#3748 )	2019-01-12 14:02:20 -08:00
Robert Nishihara	fbea1ece2e	Clear new actor handle list after submitting task. (#3755 )	2019-01-12 23:25:40 +08:00
Robert Nishihara	8723d6b061	Define a Node class to manage Ray processes. (#3733 ) * Implement Node class and move most of services.py into it. * Wait for nodes as they are added to the cluster. * Fix Redis authentication bug. * Fix bug in client table ordering. * Address comments. * Kill raylet before plasma store in test. * Minor	2019-01-11 22:30:38 -08:00
Stephanie Wang	cc5ecd71c5	[autoscaler] Add kill and get IP commands to CLI for testing (#3731 ) ## What do these changes do? Adds 2 commands to the CLI that take in an autoscaler config: 1. Kill a random ray node in the cluster. 2. Get all the worker node IP addresses. These commands are both for testing and are not recommended for normal use. ## Related issue number Closes #3685.	2019-01-10 22:06:57 -08:00
Richard Liaw	574f0b73bc	[tune] Fix Trial Serialization (#3743 )	2019-01-10 19:26:10 -08:00
Hao Chen	597abb24ea	Refine multi-threading support (#3672 ) * [Python] refine multi-threading support fix * [java] refine multithreading code fix java * format	2019-01-10 13:58:11 -08:00
Eric Liang	71243203a4	[rllib] Fix KeyError: 'kl' in multiagent ppo training	2019-01-09 19:33:07 -08:00
Richard Liaw	edb7aaf7c7	[tune] Better Serialization for Server (#3708 ) * Add cloudpickle for serialization * Fix tests	2019-01-09 11:55:32 -08:00
Stephanie Wang	04f31db54d	Actor dummy object garbage collection (#3593 ) * Convert UniqueID::nil() to a constructor * Cleanup actor handle pickling code * Add new actor handles to the task spec * Pass in new actor handles * Add new handles to the actor registration * Regression test for actor handle forking and GC * lint and doc * Handle pickled actor handles in the backend and some refactoring * Add regression test for dummy object GC and pickled actor handles * Check for duplicate actor tasks on submission * Regression test for forking twice, fix failed named actor leak * Fix bug for forking twice * lint * Revert "Fix bug for forking twice" This reverts commit 3da85e59d401e53606c2e37ffbebcc8653ff27ac. * Add new actor handles when task is assigned, not finished * Remove comment * remove UniqueID() * Updates * update * fix * fix java * fixes * fix	2019-01-09 10:37:11 -08:00
Wenting Shen	3027dde303	Fix some storage problems of RayLog (#3595 ) 1. Fix the problem of duplicated stored logs. 2. Save log whose level is higher than severity_threshold, not only with severity_threshold. 3. Fix a `log_dir` bug: storing logs in a wrong path.	2019-01-09 13:54:21 +08:00
Robert Nishihara	d1e21b702e	Change timeout from milliseconds to seconds in ray.wait. (#3706 ) * Change timeout from milliseconds to seconds in ray.wait. * Suppress warning. * Suppress warning. * Add prominent warning in API documentation.	2019-01-08 21:32:08 -08:00
Si-Yuan	59d861281e	Bug fixing: Redis password should be used when reporting errors. (#3724 )	2019-01-08 21:23:55 -08:00
Robert Nishihara	6bbc667f93	Remove unused code path in services.py. (#3722 )	2019-01-08 19:57:16 -08:00
Peter Schafhalter	5945b92fd3	[sgd] Add checkpointing (#3638 )	2019-01-08 15:29:30 -08:00
Robert Nishihara	5e76d52868	Improve cluster.wait_for_nodes() API. (#3712 ) * Separate out functionality for querying client table and improve cluster.wait_for_nodes() API. * Linting * Add back logging statements. * info -> debug	2019-01-07 21:26:58 -08:00
Richard Liaw	33319502b6	[tune] Add a callable check for converting to trainable (#3711 )	2019-01-07 16:18:29 -08:00
Robert Nishihara	5dadac148c	Remove unused file. (#3695 )	2019-01-07 12:45:48 -08:00
Robert Nishihara	c9d70f0dda	Remove num_local_schedulers argument from ray.worker._init. (#3704 ) * Remove num_local_schedulers argument from ray.worker._init. * Fix * Fix tests.	2019-01-07 12:44:49 -08:00
Eric Liang	e78562b2e8	[rllib] Misc fixes: set lr for PG, better error message for LSTM/PPO, fix multi-agent/APEX (#3697 ) * fix * update test * better error * compute * eps fix * add get_policy() api * Update agent.py * better err msg * fix * pass in rew	2019-01-06 19:37:35 -08:00
Richard Liaw	8934e37a78	[tune] Change log handling for Tune (#3661 ) Also provides a small retry mechanism for a transient error as reported by #3340. Closes #3653.	2019-01-06 13:20:10 -08:00
mattearllongshot	681e8cd3fd	[autoscaler] Add an initial_workers option (#3530 ) ## What do these changes do? This option goes along with `min_workers`, and `max_workers`. When the cluster is first brought up (or when it is refreshed with a subsequent `ray up`) this number of nodes will be started. It's a workaround for issues of scaling (see related issues) where it can take a long time (or forever in the case where the head node has `--num-cpus 0`) to scale up a cluster in response to increasing demand. ## Related issue number Workaround for https://github.com/ray-project/ray/issues/3339 and https://github.com/ray-project/ray/issues/2106	2019-01-05 17:58:42 -08:00
Robert Nishihara	067976ad3d	Push a warning to all users when large number of workers have been started. (#3645 ) * Push a warning to all users when large number of workers have been started. * Add test. * Fix bug. * Give warning when worker starts instead of when worker registers. * Fix * Fix tests	2019-01-05 13:27:32 -08:00
Eric Liang	03fe760616	[rllib] Model self loss isn't included in all algorithms (#3679 )	2019-01-04 22:30:35 -08:00
Richard Liaw	960a943503	[tune] Fault Tolerance: handle lost checkpoints by restart (#3657 ) Checks that node failure with lost checkpoints does not crash. Also adds test.	2019-01-04 22:05:27 -08:00
Eric Liang	7db1f3be2a	[tune] resume=False by default but print a tip to set resume="prompt" + jenkins fix (#3681 )	2019-01-04 17:23:19 -08:00
Kristian Hartikainen	747b117929	[tune] Tweak/allow nested pbt mutations (#3455 ) * Fix warning text in pbt logger * Allow nested mutations in pbt by recursing explore function * Add test for nested pbt mutation * Update pbt explore to only call custom explore on top level * fix test	2019-01-04 13:51:11 -08:00
Robert Nishihara	cd80891ddb	Try to figure out the memory limit in a docker container. (#3605 ) * Try to figure out the memory limit in a docker container. * Update comment * Fix * Fix	2019-01-03 23:07:24 -08:00
Robert Nishihara	586a5c9ffa	Limit default redis max memory to 10GB. (#3630 ) * Limit Redis max memory to 10GB/shard by default. * Update stress tests. * Reorganize * Update * Add minimum cap size for object store and redis. * Small test update.	2019-01-03 13:23:54 -08:00
Yuhong Guo	4b23a34c93	Fix multi-thread problem of function manager and Jenkins test (#3648 )	2019-01-03 17:05:13 +08:00
Eric Liang	ca864faece	[rllib] Documentation for I/O API and multi-agent support / cleanup (#3650 )	2019-01-03 15:15:36 +08:00
opherlieber	2177e2f410	[rllib] Agent: Allow unknown subkeys for custom_resources_per_worker (#3639 ) * RLLib Agent: Allow unknown subkeys for custom_resources_per_worker * Update agent.py	2019-01-03 14:19:59 +08:00
Eric Liang	47d36d7bd6	[rllib] Refactor pytorch custom model support (#3634 )	2019-01-03 13:48:33 +08:00
Robert Nishihara	b6bcd18d65	Split profile table among many keys in the GCS. (#3676 ) * Divide profile table among many keys in GCS. * Fix, and remove --collect-profiling-data arg. * Remove reference in doc.	2019-01-02 21:33:01 -08:00
Si-Yuan	93d54110f8	Prevent overriding faulthandler settings (#3668 ) This change ensures that Ray set up fault handlers only if it has not been enabled by other applications. Otherwise some applications could face strange issues when using Ray, and some unittests using xml runners will fail.	2018-12-31 16:36:26 -08:00

... 4 5 6 7 8 ...

1294 commits