hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-09 12:56:46 -04:00

Author	SHA1	Message	Date
Robert Nishihara	9868af4c7c	Use /tmp instead of /dev/shm for object store on Linux if /dev/shm is too small. (#3149 ) * Use /tmp instead of /dev/shm for object store on Linux if /dev/shm is too small. * Add logging statement and address comments. * Fix	2018-10-28 20:09:06 -07:00
Robert Nishihara	08fc9e5bcd	Add more description to setup.py. (#3153 )	2018-10-28 19:49:52 -07:00
Robert Nishihara	fd854ff090	Allow the node manager port and object manager port to be set through… (#3130 ) * Allow the node manager port and object manager port to be set through ray start. * Linting * Fix Java test * Address comments.	2018-10-28 17:28:41 -07:00
Eric Liang	a404401dc6	Update agent.py to fix lint error	2018-10-28 15:28:08 -07:00
Jones Wong	d6bf890648	Solve hang caused by ray.get in collect_metrics (#3096 )	2018-10-28 11:52:18 -07:00
Eric Liang	af0c1174cd	[sgd] Merge sharded param server based SGD implementation (#3033 ) This includes most of the TF code used for the OSDI experiment. Perf sanity check on p3.16xl instances: Overall scaling looks ok, with the multi-node results within 5% of OSDI final numbers. This seems reasonable given that hugepages are not enabled here, and the param server shards are placed randomly. $ RAY_USE_XRAY=1 ./test_sgd.py --gpu --batch-size=64 --num-workers=N \ --devices-per-worker=M --strategy=<simple\|ps> \ --warmup --object-store-memory=10000000000 Images per second total gpus total \| simple \| ps ======================================== 1 \| 218 2 (1 worker) \| 388 4 (1 worker) \| 759 4 (2 workers) \| 176 \| 623 8 (1 worker) \| 985 8 (2 workers) \| 349 \| 1031 16 (2 nodes, 2 workers) \| 600 \| 1661 16 (2 nodes, 4 workers) \| 468 \| 1712 <--- OSDI perf was 1817	2018-10-27 21:25:02 -07:00
Eric Liang	6531eed2d0	[rllib] Better error message when action space dim too high (#3119 )	2018-10-26 16:55:00 -07:00
Robert Nishihara	658c14282c	Remove legacy Ray code. (#3121 ) * Remove legacy Ray code. * Fix cmake and simplify monitor. * Fix linting * Updates * Fix * Implement some methods. * Remove more plasma manager references. * Fix * Linting * Fix * Fix * Make sure class IDs are strings. * Some path fixes * Fix * Path fixes and update arrow * Fixes. * linting * Fixes * Java fixes * Some java fixes * TaskLanguage -> Language * Minor * Fix python test and remove unused method signature. * Fix java tests * Fix jenkins tests * Remove commented out code.	2018-10-26 13:36:58 -07:00
Eric Liang	055daf17a0	[autoscaler] better message if there are more than 10 key pairs	2018-10-26 12:42:11 -07:00
Philipp Moritz	d3148cc3ab	[SGD] Provide better error message if model graphs have different numbers of variables (#3139 )	2018-10-25 22:18:10 -07:00
Robert Nishihara	5aa29613db	Fix linting errors. (#3127 )	2018-10-24 16:30:00 -07:00
Eric Liang	55d161b49f	[autoscaler] Also grant roles to worker nodes	2018-10-24 13:57:36 -07:00
Robert Nishihara	9c1826ed69	Use XRay backend by default. (#3020 ) * Use XRay backend by default. * Remove irrelevant valgrind tests. * Fix * Move tests around. * Fix * Fix test * Fix test. * String/unicode fix. * Fix test * Fix unicode issue. * Minor changes * Fix bug in test_global_state.py. * Fix test. * Linting * Try arrow change and other object manager changes. * Use newer plasma client API * Small updates * Revert plasma client api change. * Update * Update arrow and allow SendObjectHeaders to fail. * Update arrow * Update python/ray/experimental/state.py Co-Authored-By: robertnishihara <robertnishihara@gmail.com> * Address comments.	2018-10-23 12:46:39 -07:00
Robert Nishihara	9d2e864caf	Fix Python linting error. (#3113 )	2018-10-22 23:41:42 -07:00
Eric Liang	73a092e08c	update (#3112 )	2018-10-22 22:55:43 -07:00
Richard Liaw	eff7cb4458	[tune] Fix SearchAlg finishing early (#3081 ) * Fix trial search alg finishing early * Fix lint * fix lint * nit fix	2018-10-22 12:17:13 -07:00
Eric Liang	221d1663c1	[rllib] switch to python logger (#3098 ) * logg * set rllib logger * comment * info * rlib * comment * add format * fix lint * add file info * update * add ts * lint * better docs * fix value error * soft log level	2018-10-21 23:43:57 -07:00
Richard Liaw	40c4148d4f	Cluster Utilities for Fault Tolerance Tests (#3008 )	2018-10-20 22:56:29 -07:00
Eric Liang	59901a88a0	[rllib] Native support for Dict and Tuple spaces; fix Tuple action spaces; add prev a, r to LSTM (#3051 )	2018-10-20 15:21:22 -07:00
Peter Schafhalter	fa469783d8	Fix bug when connecting to password-secured cluster (#3083 )	2018-10-18 21:43:03 -07:00
Devin Petersohn	8fcdafc6ea	Adding Python3.7 wheels support (#2546 ) * Adding Python3.7 wheels support * Adding Mac wheels update * fix * numpy version * choose different numpy versions depending on python version * fix	2018-10-18 17:58:39 -07:00
Peter Schafhalter	b82fd157a7	Remove Redis protected mode (#3073 ) Follow-up to #2925 and #2952. Removes the Redis protected mode implementation from Ray which was replaced by Redis port authentication.	2018-10-17 22:48:14 -07:00
Philipp Moritz	2c52d9dfa0	Fix actor handle id creation when actor handle was pickled (#3074 )	2018-10-17 18:00:52 -07:00
Richard Liu	3c0803e7e9	[rllib] use `ray.wait` to get next worker result in async sample optimizer (#2993 )	2018-10-17 17:44:51 -07:00
Peter Schafhalter	a41bbc10ef	Add password authentication to Redis ports (#2952 ) * Implement Redis authentication * Throw exception for legacy Ray * Add test * Formatting * Fix bugs in CLI * Fix bugs in Raylet * Move default password to constants.h * Use pytest.fixture * Fix bug * Authenticate using formatted strings * Add missing passwords * Add test * Improve authentication of async contexts * Disable Redis authentication for credis * Update test for credis * Fix rebase artifacts * Fix formatting * Add workaround for issue #3045 * Increase timeout for test * Improve C++ readability * Fixes for CLI * Add security docs * Address comments * Address comments * Adress comments * Use ray.get * Fix lint	2018-10-16 22:48:30 -07:00
Eric Liang	a9e454f6fd	[rllib] Include config dicts in the sphinx docs (#3064 )	2018-10-16 15:55:11 -07:00
Praveen Palanisamy	4d8cfc0bf5	[tune] Fix (some more) misleading comments in tune/results.py (#3068 ) ## What do these changes do? Fix the misleading comments in code for: - `EPISODES_THIS_ITER` - `EPISODES_TOTAL` Had noted it before and planned to fix it along with some other changes but seemed very relevant to stay next to #3058 so sending this now.	2018-10-16 11:07:53 -07:00
Eric Liang	6240ccbc6e	[rllib] Add more warnings when multi-agent envs might not be set up right (#3061 )	2018-10-15 13:42:56 -07:00
Eric Liang	3c891c6ece	[rllib] Parallel-data loading and multi-gpu support for IMPALA (#2766 )	2018-10-15 11:02:50 -07:00
Marlon	4dc78b735b	[tune] Fix misleading comment (#3058 )	2018-10-14 22:25:39 -07:00
Eric Liang	866c7a574c	[rllib] Don't crash printing out error message (#3054 ) * fix er * update	2018-10-13 19:50:23 -07:00
Eric Liang	473ee4eb3f	[rllib] Add unit test and some better error messages for custom policy states (#3032 )	2018-10-13 00:03:52 -07:00
Richard Liaw	f9b58d7b02	[tune] Tweaks to Trainable and Verbosity (#2889 )	2018-10-11 23:42:13 -07:00
Kristian Hartikainen	2d35a97a76	Bug/log syncer fails with parentheses (#2653 ) * Update rsync command * Escape rsync locations * Fix the accidental variable move * Update rsync to use -s flag	2018-10-06 00:34:53 -07:00
Richard Liaw	ecd8f39580	[core] Improve logging message when plasma store is started. (#3029 ) Improve logging message when plasma store is started.	2018-10-05 15:24:24 -07:00
Richard Liaw	0651d3b629	[tune/core] Use Global State API for resources (#3004 )	2018-10-04 17:23:17 -07:00
Robert Nishihara	faa31ae018	Introduce concept of resources required for placing a task. (#2837 ) * Introduce concept of resources required for placement. * Add placement resources to task spec * Update java worker * Update taskinfo.java	2018-10-04 10:35:39 -07:00
Si-Yuan	f2dbd3096c	Minor improvements and fixes in Python code. (#3022 ) This commit fix some small defects. 1. Remove a comment that should have been removed in #3003 2. Remove `redis_protected_mode` that is never used in `ray.init()` 3. Fix `object_id_seed` that is forgotten to be passed into `ray._init()` 4. Remove several redundant brackets.	2018-10-03 21:08:20 -07:00
Yuhong Guo	9948e8c11b	Move function/actor exporting & loading code to function_manager.py (#3003 ) Move function/actor exporting & loading code to function_manager.py to prepare the code change for function descriptor for python.	2018-10-03 16:21:04 -07:00
Robert Nishihara	d73ee36e60	Update links to use latest 0.5.3 wheels instead of 0.5.2. (#3018 )	2018-10-03 13:43:40 -07:00
Si-Yuan	cc7e2ecdd5	Change logfile names and also allow plasma store socket to be passed in. (#2862 )	2018-10-03 10:03:53 -07:00
Robert Nishihara	3ce8eb2d4c	Test dying_worker_get and dying_worker_wait for xray. (#2997 ) This tests the case in which a worker is blocked in a call to ray.get or ray.wait, and then the worker dies. Then later, the object that the worker was waiting for becomes available. We need to make sure not to try to send a message to the dead worker and then die. Related to #2790.	2018-10-02 00:08:47 -07:00
Eric Liang	2019b4122b	[rllib] Remove legacy multiagent support (#2975 ) * remove legacy * remove reshaper	2018-10-01 13:07:11 -07:00
Eric Liang	b45bed4bce	[rllib] Propagate model options correctly in ARS / ES, to action dist of PPO (#2974 ) * fix * fix * fix it * propagate conf to action dist * move carla example too * rr * Update policies.py * wip * lint	2018-10-01 12:49:39 -07:00
Eric Liang	e4bea8d10e	[rllib] Default to truncate_episodes and add some more config validators (#2967 ) * update * link it * warn about truncation * fix * Update rllib-training.rst * deprecate tests failing	2018-09-30 18:37:55 -07:00
Eric Liang	814c35b7d7	[rllib] Simplify sample batch size and num envs config, n_step adjustment (#2995 ) * simplify vec batch requirements * Update rllib-training.rst * Update rllib-training.rst * Update rllib-training.rst * Update rllib-training.rst * Update rllib-training.rst * Update rllib-models.rst	2018-09-30 18:36:22 -07:00
old-bear	8aa736572b	[tune] Fix hyperband edge case for None entries (#2964 )	2018-09-30 09:57:43 -07:00
Eric Liang	65dcafdc3f	[rllib] Refactor save() / restore() code of agents and avoid O(n_workers) save size (#2982 )	2018-09-30 01:15:13 -07:00
Eric Liang	747253e0f6	[rllib] Don't shuffle samples in PPO when using lstm	2018-09-30 01:13:56 -07:00
Eric Liang	b06c604a51	[rllib] Add some more tuned atari results to documentation (#2991 ) * dqn results ++ * add scale * hour * fix * small dqn table * update * steps * upd * apex * up * add apex results * tip	2018-09-29 23:13:36 -07:00

... 2 3 4 5 6 ...

995 commits