hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Dennis Chung	9df2e6e6f4	[tune] Modify stop criteria in hyperopt example (#3102 ) Modify `training_iteraion` to `timesteps_total` because only `timesteps_total` is inside the reporter.	2018-10-30 13:26:40 -07:00
Stephanie Wang	aacbd007a0	[xray] Implement faster flush policy for lineage cache (#3071 ) * Policy that flushes the lineage stash immediately * Fix bug where remote tasks in uncommitted lineage weren't getting subscribed to, add reg test * test * Fix bug where waiting task was getting subscribed * Cleanup * Update src/ray/raylet/lineage_cache.cc Co-Authored-By: stephanie-wang <swang@cs.berkeley.edu> * Update src/ray/raylet/lineage_cache.cc Co-Authored-By: stephanie-wang <swang@cs.berkeley.edu> * cleanup * cleanup * Add another test for task with many parents * fix, unsubscribe to new waiting tasks * Unsubscribe as soon as the commit notification is handled	2018-10-30 09:59:50 -07:00
Eric Liang	a221f55b0d	[rllib] Add custom value functions, fix up and document multi-agent variable sharing (#3151 )	2018-10-29 19:37:27 -07:00
Robert Nishihara	e49839c73f	Fix linting. (#3155 )	2018-10-28 20:43:29 -07:00
Robert Nishihara	32f0d6b77e	Deprecate num_workers argument to ray.init and ray start. (#3114 ) * Remove num_workers argument. * Fix * Fix	2018-10-28 20:12:49 -07:00
Robert Nishihara	9868af4c7c	Use /tmp instead of /dev/shm for object store on Linux if /dev/shm is too small. (#3149 ) * Use /tmp instead of /dev/shm for object store on Linux if /dev/shm is too small. * Add logging statement and address comments. * Fix	2018-10-28 20:09:06 -07:00
Robert Nishihara	08fc9e5bcd	Add more description to setup.py. (#3153 )	2018-10-28 19:49:52 -07:00
Robert Nishihara	fd854ff090	Allow the node manager port and object manager port to be set through… (#3130 ) * Allow the node manager port and object manager port to be set through ray start. * Linting * Fix Java test * Address comments.	2018-10-28 17:28:41 -07:00
Eric Liang	a404401dc6	Update agent.py to fix lint error	2018-10-28 15:28:08 -07:00
Jones Wong	d6bf890648	Solve hang caused by ray.get in collect_metrics (#3096 )	2018-10-28 11:52:18 -07:00
Eric Liang	af0c1174cd	[sgd] Merge sharded param server based SGD implementation (#3033 ) This includes most of the TF code used for the OSDI experiment. Perf sanity check on p3.16xl instances: Overall scaling looks ok, with the multi-node results within 5% of OSDI final numbers. This seems reasonable given that hugepages are not enabled here, and the param server shards are placed randomly. $ RAY_USE_XRAY=1 ./test_sgd.py --gpu --batch-size=64 --num-workers=N \ --devices-per-worker=M --strategy=<simple\|ps> \ --warmup --object-store-memory=10000000000 Images per second total gpus total \| simple \| ps ======================================== 1 \| 218 2 (1 worker) \| 388 4 (1 worker) \| 759 4 (2 workers) \| 176 \| 623 8 (1 worker) \| 985 8 (2 workers) \| 349 \| 1031 16 (2 nodes, 2 workers) \| 600 \| 1661 16 (2 nodes, 4 workers) \| 468 \| 1712 <--- OSDI perf was 1817	2018-10-27 21:25:02 -07:00
Yuhong Guo	befbf78048	Delete empty pubsub keys (#3146 ) We found that there are large amount of pub-sub keys with no content in it (This case is worse when wait-id is used in the key name.). This logic of deleting empty pub-sub keys from GCS was in legacy ray but not in raylet.	2018-10-27 11:58:39 -07:00
Eric Liang	6531eed2d0	[rllib] Better error message when action space dim too high (#3119 )	2018-10-26 16:55:00 -07:00
Robert Nishihara	658c14282c	Remove legacy Ray code. (#3121 ) * Remove legacy Ray code. * Fix cmake and simplify monitor. * Fix linting * Updates * Fix * Implement some methods. * Remove more plasma manager references. * Fix * Linting * Fix * Fix * Make sure class IDs are strings. * Some path fixes * Fix * Path fixes and update arrow * Fixes. * linting * Fixes * Java fixes * Some java fixes * TaskLanguage -> Language * Minor * Fix python test and remove unused method signature. * Fix java tests * Fix jenkins tests * Remove commented out code.	2018-10-26 13:36:58 -07:00
Eric Liang	055daf17a0	[autoscaler] better message if there are more than 10 key pairs	2018-10-26 12:42:11 -07:00
bibabolynn	b4614ae69a	[java] customize path of ray.conf (#3100 ) users can add custom path of ray.config by using -Dray.config=/path/to/ray.conf	2018-10-26 13:36:34 +08:00
Philipp Moritz	d3148cc3ab	[SGD] Provide better error message if model graphs have different numbers of variables (#3139 )	2018-10-25 22:18:10 -07:00
Philipp Moritz	d34516f1f8	Update Gemfile Jekyll version (#3140 )	2018-10-25 21:43:08 -07:00
Robert Nishihara	5aa29613db	Fix linting errors. (#3127 )	2018-10-24 16:30:00 -07:00
Eric Liang	55d161b49f	[autoscaler] Also grant roles to worker nodes	2018-10-24 13:57:36 -07:00
Hanwei Jin	7c1fd19fd9	[Java] support python worker command in raylet (#3092 ) <!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> ## What do these changes do? support raylet, which is started by java runManager, to start python default_worker.py . So when doing local test of java call python task, it helps auto start python worker. ## Related issue number <!-- Are there any issues opened that will be resolved by merging this change? -->	2018-10-24 20:43:39 +08:00
Robert Nishihara	9c1826ed69	Use XRay backend by default. (#3020 ) * Use XRay backend by default. * Remove irrelevant valgrind tests. * Fix * Move tests around. * Fix * Fix test * Fix test. * String/unicode fix. * Fix test * Fix unicode issue. * Minor changes * Fix bug in test_global_state.py. * Fix test. * Linting * Try arrow change and other object manager changes. * Use newer plasma client API * Small updates * Revert plasma client api change. * Update * Update arrow and allow SendObjectHeaders to fail. * Update arrow * Update python/ray/experimental/state.py Co-Authored-By: robertnishihara <robertnishihara@gmail.com> * Address comments.	2018-10-23 12:46:39 -07:00
Robert Nishihara	9d2e864caf	Fix Python linting error. (#3113 )	2018-10-22 23:41:42 -07:00
Robert Nishihara	22dd7e0428	Add test for wait reconstruction. (#3110 )	2018-10-22 23:16:54 -07:00
Eric Liang	73a092e08c	update (#3112 )	2018-10-22 22:55:43 -07:00
Philipp Moritz	8d8b6e5bfa	Retry connections to redis for async and subscribe contexts (#3105 ) This is fixing a problem that @devin-petersohn observed on the windows subsystem for linux. In theory, redis should be up once the async connect is happening and there should be no retries needed for the async connect. However on the windows subsystem for linux, the async connect was failing even though the synchronous one was working. Maybe windows has a different semantics here than linux.	2018-10-22 22:31:13 -07:00
Richard Liaw	eff7cb4458	[tune] Fix SearchAlg finishing early (#3081 ) * Fix trial search alg finishing early * Fix lint * fix lint * nit fix	2018-10-22 12:17:13 -07:00
Eric Liang	221d1663c1	[rllib] switch to python logger (#3098 ) * logg * set rllib logger * comment * info * rlib * comment * add format * fix lint * add file info * update * add ts * lint * better docs * fix value error * soft log level	2018-10-21 23:43:57 -07:00
Richard Liaw	40c4148d4f	Cluster Utilities for Fault Tolerance Tests (#3008 )	2018-10-20 22:56:29 -07:00
Wang Qing	a4db5bbaea	Fill driver id into actor notification when finishing assigned task. (#3080 ) ## What do these changes do? Fill driver id into actor notification when finishing assigned task. Also it improves codes.	2018-10-21 11:12:20 +08:00
Eric Liang	59901a88a0	[rllib] Native support for Dict and Tuple spaces; fix Tuple action spaces; add prev a, r to LSTM (#3051 )	2018-10-20 15:21:22 -07:00
Robert Nishihara	9a2b5333ef	Add links for latest Python 3.7 wheels to documentation. (#3091 )	2018-10-19 12:15:22 -07:00
bibabolynn	9a5c273db7	[java] fix check exception type (#3093 ) <!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> ## What do these changes do? remove TaskExecutionException, use RayException instead <!-- Please give a short brief about these changes. --> ## Related issue number <!-- Are there any issues opened that will be resolved by merging this change? -->	2018-10-19 06:43:42 -07:00
Wang Qing	b410ee0d29	[Java] Support dynamically defining resources when submitting task. (#3070 ) ## What do these changes do? Before this PR, if we want to specify some resources, we must do as following codes: ```java @RayRemote(Resources={ResourceItem("CPU", 10)}) public static void f1() { // do sth } @RayRemote(Resources={ResourceItem("CPU", 10)}) class Demo { // sth } ``` Unfortunately, it's no way for us to create another actor or task with different resources required. After this PR, the thing will be: ```java ActorCreationOptions option = new ActorCreationOptions(); option.resources.put("CPU", 4.0); RayActor<Echo> echo1 = Ray.createActor(Echo::new, option); option.resources.put("Res-A", 4.0); RayActor<Echo> echo2 = Ray.createActor(Echo::new, option); //if we don't specify resource, the resources will be `{"cpu":0.0}` by default. Ray.call(Echo::echo, echo2, 100); ``` ## Related issue number N/A	2018-10-19 06:22:32 -07:00
Eric Liang	9d23fa03c9	[xray] All messages on main asio event loop should be written asynchronously (#3023 ) * copy over ref code * wip async writes * compiles * fix error handling * add test * amend * fix test * clang fmgt * clang format * wip * yapf * rename format script * test error * clangfmt * add test to list * warn * ref test * fix test * comment * add capture * Update client_connection.cc * wip * fix compile	2018-10-18 21:56:22 -07:00
Peter Schafhalter	fa469783d8	Fix bug when connecting to password-secured cluster (#3083 )	2018-10-18 21:43:03 -07:00
Devin Petersohn	8fcdafc6ea	Adding Python3.7 wheels support (#2546 ) * Adding Python3.7 wheels support * Adding Mac wheels update * fix * numpy version * choose different numpy versions depending on python version * fix	2018-10-18 17:58:39 -07:00
Yuhong Guo	653c5b114a	[c++] Refine Log Code (#2816 ) * Support setting logging level from env variable * Remove Env Variable related code * lint	2018-10-18 10:51:36 -07:00
Peter Schafhalter	b82fd157a7	Remove Redis protected mode (#3073 ) Follow-up to #2925 and #2952. Removes the Redis protected mode implementation from Ray which was replaced by Redis port authentication.	2018-10-17 22:48:14 -07:00
Philipp Moritz	2c52d9dfa0	Fix actor handle id creation when actor handle was pickled (#3074 )	2018-10-17 18:00:52 -07:00
Richard Liu	3c0803e7e9	[rllib] use `ray.wait` to get next worker result in async sample optimizer (#2993 )	2018-10-17 17:44:51 -07:00
Peter Schafhalter	a41bbc10ef	Add password authentication to Redis ports (#2952 ) * Implement Redis authentication * Throw exception for legacy Ray * Add test * Formatting * Fix bugs in CLI * Fix bugs in Raylet * Move default password to constants.h * Use pytest.fixture * Fix bug * Authenticate using formatted strings * Add missing passwords * Add test * Improve authentication of async contexts * Disable Redis authentication for credis * Update test for credis * Fix rebase artifacts * Fix formatting * Add workaround for issue #3045 * Increase timeout for test * Improve C++ readability * Fixes for CLI * Add security docs * Address comments * Address comments * Adress comments * Use ray.get * Fix lint	2018-10-16 22:48:30 -07:00
Eric Liang	a9e454f6fd	[rllib] Include config dicts in the sphinx docs (#3064 )	2018-10-16 15:55:11 -07:00
Wang Qing	64e5eb305e	[Java] Add jvm-parameters in Config. (#3065 )	2018-10-16 15:03:18 -07:00
Praveen Palanisamy	4d8cfc0bf5	[tune] Fix (some more) misleading comments in tune/results.py (#3068 ) ## What do these changes do? Fix the misleading comments in code for: - `EPISODES_THIS_ITER` - `EPISODES_TOTAL` Had noted it before and planned to fix it along with some other changes but seemed very relevant to stay next to #3058 so sending this now.	2018-10-16 11:07:53 -07:00
Eric Liang	6240ccbc6e	[rllib] Add more warnings when multi-agent envs might not be set up right (#3061 )	2018-10-15 13:42:56 -07:00
Eric Liang	3c891c6ece	[rllib] Parallel-data loading and multi-gpu support for IMPALA (#2766 )	2018-10-15 11:02:50 -07:00
Marlon	4dc78b735b	[tune] Fix misleading comment (#3058 )	2018-10-14 22:25:39 -07:00
Eric Liang	866c7a574c	[rllib] Don't crash printing out error message (#3054 ) * fix er * update	2018-10-13 19:50:23 -07:00
Eric Liang	473ee4eb3f	[rllib] Add unit test and some better error messages for custom policy states (#3032 )	2018-10-13 00:03:52 -07:00

... 2 3 4 5 6 ...

2272 commits