hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 18:41:40 -05:00

Author	SHA1	Message	Date
eugenevinitsky	1943ae44da	[rllib] Use SGD optimizer for ARS (#2916 )	2018-09-26 22:32:26 -07:00
Wang Qing	1d9652abf1	[java] fix wrong links in Java readme file.	2018-09-27 11:23:10 +08:00
Wang Qing	8e8e123777	[Java] Simplify Java worker configuration (#2938 ) ## What do these changes do? Previously, Java worker configuration is complicated, because it requires setting environment variables as well as command-line arguments. This PR aims to simplify Java worker's configuration. 1) Configuration management is now migrated to [lightbend config](https://github.com/lightbend/config), thus doesn't require setting environment variables. 2) Many unused config items are removed. 3) Provide a simple `example.conf` file, so users can get started quickly. 4) All possible options and their default values are declared and documented in `ray.default.conf` file. This PR also simplifies and refines the following code: 1) The process of `Ray.init()`. 2) `RunManager`. 3) `WorkerContext`. ### How to use this configuration? 1. Copy `example.conf` into your classpath and rename it to `ray.conf`. 2. Modify/add your configuration items. The all items are declared in `ray.default.conf`. 3. You can also set the items in java system prosperities. Note: configuration is read in this priority: System properties > `ray.conf` > `ray.default.conf` ## Related issue number N/A	2018-09-26 20:14:22 +08:00
Wang Qing	0e552fbb22	[Java] Update maven version to 0.1-SNAPSHOT Update the version in maven from 0.1 to 0.1-SNAPSHOT, because SNAPSHOT is the conventional version name in dev process. Non-snapshot versions are only used for release.	2018-09-26 18:08:46 +08:00
Peter Schafhalter	fcdca6de18	Fix test for available resources (#2914 )	2018-09-25 23:07:23 -07:00
Hao Chen	971df5ea8a	[java] put function meta in task spec and load functions with function meta (#2881 ) This PR adds a `function_desc` field into task spec. a function descriptor is a list of strings that can uniquely describe a function. - For a Python function, it should be: [module_name, class_name, function_name] - For a Java function, it should be: [class_name, method_name, type_descriptor] There're a couple of purposes to add this field: In this PR: - Java worker needs to know function's class name to load it. Previously, since task spec didn't have such a field to hold this info, we did a hack by appending the class name to the argument list. With this change, we fixed that hack and significantly simplified function management in Java. Will be done in subsequent PRs: - Support cross-language invocation (#2576): currently Python worker manages functions by saving them in GCS and pass function id in task spec. However, if we want to call a Python function from Java, we cannot save it in GCS and get the function id. But instead, we can pass the function descriptor (module name, class name, function name) in task spec and use it to load the function. - Support deployment: one major problem of Python worker's current function management mechanism is #2327. In prod env, we should have a mechanism to deploy code and dependencies to the cluster. And when code is already deployed, we don't need to save functions to GCS any more and can use `function_desc` to manage functions.	2018-09-25 23:05:05 -07:00
Hao Chen	3cccb49191	[Java] Implement missing methods in MockRayletClient (#2954 ) Previous changes broke single-process mode in raylet. This PR fixes the hello-world example work in single-process mode. Follow-up diffs will completely fix single-process mode and add tests.	2018-09-25 09:57:32 -07:00
Robert Nishihara	39b4a89fde	Bump version 0.5.2 to 0.5.3. (#2936 )	2018-09-25 09:49:58 -07:00
Eric Liang	3cde5957b3	[rllib] Better document APIs to access policy state (#2932 ) * fix * doc * example * up	2018-09-24 19:08:32 -07:00
Eric Liang	75ef70afca	[rllib] Auto-clip atari rewards	2018-09-24 12:55:11 -07:00
Eric Liang	8331d1ebe0	[rllib] Add vf clipping param to fix pendulum example (#2921 ) * add vf clip * fix test * Update ppo.py	2018-09-23 13:11:17 -07:00
Hanwei Jin	9f9e49e4a1	[cmake] enable using thirdparty env variable to find installed dependency (#2912 ) * enable using thirdparty env variable to find installed dependency, to speed up the build process * fix target dependency in cmake. :-) too chaos in each CMakeLists * check env variable defined directory exists	2018-09-23 07:52:33 -07:00
Yuhong Guo	b29839a0a3	Fix node manager failure when ClientTable has a disconnected entry. (#2905 ) When a new raylet starts, `ClientAdded` will be called with the disconnected client data. However, since the client was closed, the connection will fail.	2018-09-21 22:45:06 -07:00
Yuhong Guo	93ded5a3d5	Update arrow using Plasma with glog (#2913 ) * Update Arrow to Plasma with glog and update the building process * Remove ParquetExternalProject.cmake * Fix Mac building error in CI * Use find_package(BISON) instead of hard code * Revert BISON binary to hard code. * Remove build_parquet.sh * Update setup.sh	2018-09-20 13:37:44 -07:00
Eric Liang	3267676994	[Experimental] Add experimental distributed SGD API (#2858 ) * check in sgd api * idx * foreach_worker foreach_model * add feed_dict * update * yapf * typo * lint * plasma op change * fix plasma op * still not working * fix * fix * comments * yapf * silly flake8 * small test	2018-09-19 21:12:37 -07:00
Praveen Palanisamy	b23fd5de13	[rllib] Adds agent name & env id to default logdir prefix (#2859 ) * Added agent name & env id to default logdir prefix * Revert "Added agent name & env id to default logdir prefix" This reverts commit 07cfdf80d2537da3c67dd4f553c5f3e43671cc7d. * Added default logger creator with informative prefix to Agent * Updated import order & improved str cat * Update agent.py	2018-09-18 22:22:07 -07:00
Eric Liang	3a3782c39f	[rllib] Fix LSTM regression on truncated sequences and add regression test (#2898 ) * fix * add test * yapf * yapf * fix space * Oops that should be lstm: True * Update cartpole_lstm.py	2018-09-18 15:09:16 -07:00
Eric Liang	ab8348b1f5	[rllib] Reward clipping should default to off	2018-09-18 15:08:01 -07:00
Hao Chen	715ec1bca5	Modularize NodeManager::ProcessClientMessage (#2895 ) Split NodeManager::ProcessClientMessage into a couple of smaller functions, each of which handles one type of message.	2018-09-18 14:18:34 -07:00
Robert Nishihara	ea9d1cc887	Remove dependence on psutil. Add utility functions for getting system memory. (#2892 )	2018-09-18 15:03:29 +08:00
Robert Nishihara	61bf6c6123	Fix regression in directing worker output to stdout/stderr. (#2897 )	2018-09-17 16:40:45 -07:00
Richard Liaw	899e4585bc	Don't include redundant entries in global_state.client_table (#2880 )	2018-09-17 12:52:49 -07:00
Hanwei Jin	dc76e51a60	bugfix: cmake copy plasma java lib from lib64 directory in centos (#2885 )	2018-09-16 22:32:09 -07:00
Richard Liaw	f372f48bf3	[tune] Tune onto Logging Module (#2882 ) Moves Tune onto logging in Python. Ignores examples and tests.	2018-09-16 12:09:36 -07:00
Yuhong Guo	a8248e8628	Fix ObjectManager Crash (#2833 ) Fixes issue where object manager sometimes crashes within the `Wait` method: The issue stems from inconsistent behavior of the boost deadline timer's `cancel` method, which is invoked within `WaitComplete` to enforce exactly one `WaitComplete` invocation for each `Wait` request. The `cancel` method sometimes fails to actually prevent the timer's invocation of the provided handler with non-zero error code.	2018-09-16 02:14:13 -04:00
Philipp Moritz	47d2f82c6c	Fix common cmake dependencies (#2876 )	2018-09-15 22:11:12 -07:00
Robert Nishihara	503344149f	Run jupyter UI with --ip=0.0.0.0. (#2883 )	2018-09-15 21:59:46 -07:00
Richard Liaw	e05baed336	[tune] Better Info String and Tweaks (#2874 )	2018-09-15 11:02:13 -07:00
Hao Chen	e96817d074	fix a syntax error of initializing unordered_map (#2871 ) The previous way is incompatible with older version of gcc.	2018-09-14 12:07:08 -07:00
Philipp Moritz	2c9a4f6b41	Evaluate debug logging only in debug mode (#2869 ) This PR makes it so debugging logs are only evaluated during debugging. We found that for the current code, functions called in debug logging code are evaluated even in release mode (even though nothing is printed).	2018-09-14 11:40:44 -07:00
Robert Nishihara	f16d33593b	Mark worker as blocked and trigger reconstruction in ray.wait. (#2864 ) * Trigger reconstruction in ray.wait and mark worker as blocked. * Add test. * Linting. * Don't run new test with legacy Ray. * Only call HandleClientUnblocked if it actually blocked in ray.wait. * Reduce time to ray.wait in the test.	2018-09-13 15:28:17 -07:00
Joerg Schad	a1b8e79c30	Fixed Typo. (#2865 )	2018-09-13 13:32:56 +08:00
Hanwei Jin	fbf214e408	update ray cmake build process (#2853 ) * use cmake to build ray project, no need to appply build.sh before cmake, fix some abuse of cmake, improve the build performance * support boost external project, avoid using the system or build.sh boost * keep compatible with build.sh, remove boost and arrow build from it. * bugfix: parquet bison version control, plasma_java lib install problem * bugfix: cmake, do not compile plasma java client if no need * bugfix: component failures test timeout machenism has problem for plasma manager failed case * bugfix: arrow use lib64 in centos, travis check-git-clang-format-output.sh does not support other branches except master * revert some fix * set arrow python executable, fix format error in component_failures_test.py * make clean arrow python build directory * update cmake code style, back to support cmake minimum version 3.4	2018-09-12 11:19:33 -07:00
Daniel Ho	d9eeaaf00a	[tune] Fix bug in example where config hyperparameters were ignored (#2860 ) A fix to an example for tune (`python/ray/tune/examples/pbt_tune_cifar10_with_keras.py`) where the hyperparameters for the optimizer, learning rate and decay, were not being passed into the optimizer. This means that the current optimizer uses default values for the hyperparameters no matter the config.	2018-09-12 09:17:56 -07:00
old-bear	f3c1194be3	[tune] Add AutoML algorithm of GeneticSearcher (#2699 ) Add new search algorithm (genetic) along with the base framework of the searcher (which performs some basic jobs such as logging, recording and organizing in our project). Note that this is the initial commit. In the following days, we will add example, UT, and other refinements.	2018-09-12 09:17:04 -07:00
Eric Liang	bee743c152	Remove log suppression code When running in a screen (or any other time it is hard to scroll up), printing "Suppressing previous error message" is not helpful since the previous error is lost far above past scrollback. Better to just print it repeatedly at the end. tada 1	2018-09-11 23:28:45 -07:00
Kaahan	045861c9b0	[tune] Reset Config for Trainables (#2831 ) Adds the ability for trainables to reset their configurations during experiments. These changes in particular add the base functions to the trial_executor and trainable interfaces as well as giving the basic implementation on the PopulationBasedTraining scheduler. Related issue number: #2741	2018-09-11 08:45:04 -07:00
Peter Schafhalter	5da6e78db1	Add available resources to global state (#2501 )	2018-09-10 15:46:32 -07:00
Eric Liang	611259b2c7	Re-raise actor initialization errors on method invocation (#2843 ) If an actor constructor fails, save that error and re-raise it on any subsequent attempts to interact with the actor. Related to https://github.com/ray-project/ray/issues/282 and https://github.com/ray-project/ray/issues/1093.	2018-09-10 10:51:19 -07:00
Hao Chen	8414e413a2	[java] refine and simplify java worker code structure (#2838 )	2018-09-10 10:48:17 -07:00
Eric Liang	588c573d41	Ray stop needs to kill `plasma_store_server` not `plasma_store` (#2850 )	2018-09-09 19:23:09 -07:00
Richard Liaw	af1fdc826e	Pin YAPF in Travis lint build (#2848 ) Avoid needing to reformat everything all the time.	2018-09-09 15:54:46 -07:00
eugenevinitsky	9ba751c29a	Ars increase (#2844 ) * removed cv2 * remove opencv * increased number of default rollouts ARS * put cv2 back in this branch * put cv2 back in this branch * moved cv2 back where it belongs in preprocessors	2018-09-08 14:09:02 -07:00
Robert Nishihara	bd64c940e9	Push error to driver when monitor raises an exception. (#2834 )	2018-09-07 17:42:45 -07:00
Zhijun Fu	753ba76141	[Issue 2809][xray] Cleanup on driver detach (#2826 ) This change addresses issue #2809. Test #2797 has been enabled for raylet and can pass. The following should happen when a driver exits (either gracefully or ungracefully). #2797 should be enabled and pass. Any actors created by the driver that are still running should be killed. Any workers running tasks for the driver should be killed. Any tasks for the driver in any node_manager queues should be removed. Any future tasks received by a node manager for the driver should be ignored. The driver death notification should only be received once.	2018-09-07 16:11:32 +08:00
Robert Nishihara	3f6ed537a4	Add ray.is_initialized() function. (#2818 ) * Add ray.is_initialized() function. * Add assert.	2018-09-06 21:20:59 -07:00
Eric Liang	e7db54bdb0	Log at INFO level by default (including in autoscaler). (#2824 ) Before this change, the autoscaler `up` and related commands don't print any info messages to the console at all. This was a regression from 0.5. @richardliaw @robertnishihara https://github.com/ray-project/ray/issues/2812	2018-09-06 13:31:19 -07:00
Wang Qing	7e13e1fd49	[Java] Remove non-raylet code in Java. (#2828 )	2018-09-06 14:54:13 +08:00
Eric Liang	d81605e9e7	[tune] Add a time/timesteps since last restore metric (#2819 ) * rsm * always log to avoid changing schema for csv writer * add iter since restore * update * criteria warn	2018-09-05 17:45:09 -07:00
Eric Liang	995ac24a2c	[rllib] clarify train batch size for PPO (#2793 ) It's possible to configure PPO in a way that ends up discarding most of the samples (they are treated as "stragglers"). Add a warning when this happens, and raise an exception if the waste is particularly egregious.	2018-09-05 12:06:13 -07:00

... 4 5 6 7 8 ...

2283 commits