Commit graph

2053 commits

Author SHA1 Message Date
Wang Qing
fcef4edd46 [Java] Fix the required-resources issue of actor member function in Java worker. (#3002)
This fixes a bug in which Java actor methods inherit the resource requirements of the actor creation task.
2018-10-01 12:56:36 -07:00
Eric Liang
b45bed4bce
[rllib] Propagate model options correctly in ARS / ES, to action dist of PPO (#2974)
* fix

* fix

* fix it

* propagate conf to action dist

* move carla example too

* rr

* Update policies.py

* wip

* lint
2018-10-01 12:49:39 -07:00
Eric Liang
e4bea8d10e
[rllib] Default to truncate_episodes and add some more config validators (#2967)
* update

* link it

* warn about truncation

* fix

* Update rllib-training.rst

* deprecate tests failing
2018-09-30 18:37:55 -07:00
Eric Liang
814c35b7d7
[rllib] Simplify sample batch size and num envs config, n_step adjustment (#2995)
* simplify vec batch requirements

* Update rllib-training.rst

* Update rllib-training.rst

* Update rllib-training.rst

* Update rllib-training.rst

* Update rllib-training.rst

* Update rllib-models.rst
2018-09-30 18:36:22 -07:00
old-bear
8aa736572b [tune] Fix hyperband edge case for None entries (#2964) 2018-09-30 09:57:43 -07:00
Robert Nishihara
ed6289771a Convert runtest.py to use pytest. (#2966)
* Convert runtest.py to use pytest.

* Linting.

* Fix

* Fix

* Fix

* Fix
2018-09-30 07:59:44 -07:00
Eric Liang
65dcafdc3f
[rllib] Refactor save() / restore() code of agents and avoid O(n_workers) save size (#2982) 2018-09-30 01:15:13 -07:00
Eric Liang
747253e0f6
[rllib] Don't shuffle samples in PPO when using lstm 2018-09-30 01:13:56 -07:00
Eric Liang
b06c604a51
[rllib] Add some more tuned atari results to documentation (#2991)
* dqn results ++

* add scale

* hour

* fix

* small dqn table

* update

* steps

* upd

* apex

* up

* add apex results

* tip
2018-09-29 23:13:36 -07:00
Eric Liang
cf9cd5da9d
[ray] Add --new flag for ray attach (#2973)
* new flag

* yapf
2018-09-29 23:04:13 -07:00
Eric Liang
cb56f39070 [rllib] Entropy calculation for diag gaussian missing 0.5 term (#2968)
See: https://en.wikipedia.org/wiki/Multivariate_normal_distribution#Entropy
2018-09-29 22:57:47 -07:00
old-bear
b3f0dcf20b [tune] Add a raise_on_failed_trial flag in run_experiments (#2961)
Adds a flag to control raising TuneError if some trial fails in `run_experiments`.
2018-09-29 11:29:46 -07:00
Wang Qing
a879302355 Improve log message when failing to fork worker process (#2990)
## What do these changes do?
```c++
  // Try to execute the worker command.
  int rv = execvp(worker_command_args[0],
                  const_cast<char *const *>(worker_command_args.data()));
  // The worker failed to start. This is a fatal error.
  RAY_LOG(FATAL) << "Failed to start worker with return value " << rv;
```
When starting a process fails, the return value `rv` always be set to -1.
It is useless for us.
The log message should show some meaningful infos.

For example, If we did't install java. The message showed for us should be:
```shell
 Failed to start worker: No such file or directory.
```
This could help us to locate issue quickly.

## Related issue number
N/A
2018-09-29 22:10:57 +08:00
Hao Chen
c5b8840193 [Java] fix java/cleanup.sh (#2989)
Remove legacy-ray-related stuff from this script, and update temp file locations.
2018-09-28 21:31:47 -05:00
Hao Chen
18173dde26 [Java] update api doc (#2988)
API doc is kind of out-dated, because of some recent code changes. Update it and add some simple examples.
2018-09-28 19:05:42 -05:00
Eric Liang
f1c55497ce
[rllib] Fix edge case in n-step calculation and non-apex replay prioritization (#2929)
* fix

* lint
2018-09-28 15:22:33 -07:00
Hao Chen
4ffe1e3556 [Java] Fix: task spec's resource map should contain CPU (#2987) 2018-09-28 14:23:38 -05:00
Wang Qing
68cf194e90 [fix] Fix ray.home configuration item. (#2977)
If we set `ray.home` configuration item to `""`.
The current `RayConfig` will set it to current work directory, like `/User/My/Ray`.
But the some other configuration items(like `redisServerExecutablePath`) will be set to `/User/My/Ray//build/src/common/thirdparty/redis/src/redis-server` by mistake.
Note: There are 2 `/` between current work directory and `build/src/common....`

This PR will fix this issue.
2018-09-28 00:06:14 -05:00
Marlon
5eaf429c53 Fix typo in autoscaler yaml (#2981) 2018-09-27 09:48:18 -07:00
Richard Liaw
1c9617bc1c
[autoscaler] Add tmux support for attach and exec (#2907)
Adds a tmux flag that can be used to support background execution of experiments. Cannot be used together with screen. Seems to be useful feature that has shown up with different users.
2018-09-26 23:22:45 -07:00
eugenevinitsky
1943ae44da [rllib] Use SGD optimizer for ARS (#2916) 2018-09-26 22:32:26 -07:00
Wang Qing
1d9652abf1 [java] fix wrong links in Java readme file. 2018-09-27 11:23:10 +08:00
Wang Qing
8e8e123777 [Java] Simplify Java worker configuration (#2938)
## What do these changes do?
Previously, Java worker configuration is complicated, because it requires setting environment variables as well as command-line arguments.

This PR aims to simplify Java worker's configuration. 
1) Configuration management is now migrated to [lightbend config](https://github.com/lightbend/config), thus doesn't require setting environment variables.
2) Many unused config items are removed.
3) Provide a simple `example.conf` file, so users can get started quickly.
4) All possible options and their default values are declared and documented in `ray.default.conf` file.

This PR also simplifies and refines the following code:
1) The process of `Ray.init()`.
2) `RunManager`.
3) `WorkerContext`. 

### How to use this configuration?
1. Copy `example.conf` into your classpath and rename it to `ray.conf`.
2. Modify/add your configuration items. The all items are declared in `ray.default.conf`.
3. You can also set the items in java system prosperities.

Note: configuration is read in this priority:
System properties > `ray.conf` > `ray.default.conf`

## Related issue number
N/A
2018-09-26 20:14:22 +08:00
Wang Qing
0e552fbb22 [Java] Update maven version to 0.1-SNAPSHOT
Update the version in maven from 0.1 to 0.1-SNAPSHOT, because SNAPSHOT is the conventional version name in dev process. Non-snapshot versions are only used for release.
2018-09-26 18:08:46 +08:00
Peter Schafhalter
fcdca6de18 Fix test for available resources (#2914) 2018-09-25 23:07:23 -07:00
Hao Chen
971df5ea8a [java] put function meta in task spec and load functions with function meta (#2881)
This PR adds a `function_desc` field into task spec. a function descriptor is a list of strings that can uniquely describe a function.
- For a Python function, it should be: [module_name, class_name, function_name]
- For a Java function, it should be: [class_name, method_name, type_descriptor]

There're a couple of purposes to add this field:

In this PR:
- Java worker needs to know function's class name to load it. Previously, since task spec didn't have such a field to hold this info, we did a hack by appending the class name to the argument list. With this change, we fixed that hack and significantly simplified function management in Java.

Will be done in subsequent PRs:
- Support cross-language invocation (#2576): currently Python worker manages functions by saving them in GCS and pass function id in task spec. However, if we want to call a Python function from Java, we cannot save it in GCS and get the function id. But instead, we can pass the function descriptor (module name, class name, function name) in task spec and use it to load the function.
- Support deployment: one major problem of Python worker's current function management mechanism is #2327. In prod env, we should have a mechanism to deploy code and dependencies to the cluster. And when code is already deployed, we don't need to save functions to GCS any more and can use `function_desc` to manage functions.
2018-09-25 23:05:05 -07:00
Hao Chen
3cccb49191 [Java] Implement missing methods in MockRayletClient (#2954)
Previous changes broke single-process mode in raylet. This PR fixes the hello-world example work in single-process mode. Follow-up diffs will completely fix single-process mode and add tests.
2018-09-25 09:57:32 -07:00
Robert Nishihara
39b4a89fde Bump version 0.5.2 to 0.5.3. (#2936) 2018-09-25 09:49:58 -07:00
Eric Liang
3cde5957b3
[rllib] Better document APIs to access policy state (#2932)
* fix

* doc

* example

* up
2018-09-24 19:08:32 -07:00
Eric Liang
75ef70afca
[rllib] Auto-clip atari rewards 2018-09-24 12:55:11 -07:00
Eric Liang
8331d1ebe0
[rllib] Add vf clipping param to fix pendulum example (#2921)
* add vf clip

* fix test

* Update ppo.py
2018-09-23 13:11:17 -07:00
Hanwei Jin
9f9e49e4a1 [cmake] enable using thirdparty env variable to find installed dependency (#2912)
* enable using thirdparty env variable to find installed dependency, to speed up the build process

* fix target dependency in cmake. :-) too chaos in each CMakeLists

* check env variable defined directory exists
2018-09-23 07:52:33 -07:00
Yuhong Guo
b29839a0a3 Fix node manager failure when ClientTable has a disconnected entry. (#2905)
When a new raylet starts, `ClientAdded` will be called with the disconnected client data. However, since the client was closed, the connection will fail.
2018-09-21 22:45:06 -07:00
Yuhong Guo
93ded5a3d5 Update arrow using Plasma with glog (#2913)
* Update Arrow to Plasma with glog and update the building process

* Remove ParquetExternalProject.cmake

* Fix Mac building error in CI

* Use find_package(BISON) instead of hard code

* Revert BISON binary to hard code.

* Remove build_parquet.sh

* Update setup.sh
2018-09-20 13:37:44 -07:00
Eric Liang
3267676994 [Experimental] Add experimental distributed SGD API (#2858)
* check in sgd api

* idx

* foreach_worker foreach_model

* add feed_dict

* update

* yapf

* typo

* lint

* plasma op change

* fix plasma op

* still not working

* fix

* fix

* comments

* yapf

* silly flake8

* small test
2018-09-19 21:12:37 -07:00
Praveen Palanisamy
b23fd5de13 [rllib] Adds agent name & env id to default logdir prefix (#2859)
* Added agent name & env id to default logdir prefix

* Revert "Added agent name & env id to default logdir prefix"

This reverts commit 07cfdf80d2537da3c67dd4f553c5f3e43671cc7d.

* Added default logger creator with informative prefix to Agent

* Updated import order & improved str cat

* Update agent.py
2018-09-18 22:22:07 -07:00
Eric Liang
3a3782c39f
[rllib] Fix LSTM regression on truncated sequences and add regression test (#2898)
* fix

* add test

* yapf

* yapf

* fix space

* Oops that should be lstm: True

* Update cartpole_lstm.py
2018-09-18 15:09:16 -07:00
Eric Liang
ab8348b1f5
[rllib] Reward clipping should default to off 2018-09-18 15:08:01 -07:00
Hao Chen
715ec1bca5 Modularize NodeManager::ProcessClientMessage (#2895)
Split NodeManager::ProcessClientMessage into a couple of smaller functions, each of which handles one type of message.
2018-09-18 14:18:34 -07:00
Robert Nishihara
ea9d1cc887 Remove dependence on psutil. Add utility functions for getting system memory. (#2892) 2018-09-18 15:03:29 +08:00
Robert Nishihara
61bf6c6123 Fix regression in directing worker output to stdout/stderr. (#2897) 2018-09-17 16:40:45 -07:00
Richard Liaw
899e4585bc Don't include redundant entries in global_state.client_table (#2880) 2018-09-17 12:52:49 -07:00
Hanwei Jin
dc76e51a60 bugfix: cmake copy plasma java lib from lib64 directory in centos (#2885) 2018-09-16 22:32:09 -07:00
Richard Liaw
f372f48bf3
[tune] Tune onto Logging Module (#2882)
Moves Tune onto logging in Python. Ignores examples and tests.
2018-09-16 12:09:36 -07:00
Yuhong Guo
a8248e8628 Fix ObjectManager Crash (#2833)
Fixes issue where object manager sometimes crashes within the `Wait` method: The issue stems from inconsistent behavior of the boost deadline timer's `cancel` method, which is invoked within `WaitComplete` to enforce exactly one `WaitComplete` invocation for each `Wait` request. The `cancel` method sometimes fails to actually prevent the timer's invocation of the provided handler with non-zero error code.
2018-09-16 02:14:13 -04:00
Philipp Moritz
47d2f82c6c Fix common cmake dependencies (#2876) 2018-09-15 22:11:12 -07:00
Robert Nishihara
503344149f Run jupyter UI with --ip=0.0.0.0. (#2883) 2018-09-15 21:59:46 -07:00
Richard Liaw
e05baed336
[tune] Better Info String and Tweaks (#2874) 2018-09-15 11:02:13 -07:00
Hao Chen
e96817d074 fix a syntax error of initializing unordered_map (#2871)
The previous way is incompatible with older version of gcc.
2018-09-14 12:07:08 -07:00
Philipp Moritz
2c9a4f6b41 Evaluate debug logging only in debug mode (#2869)
This PR makes it so debugging logs are only evaluated during debugging. We found that for the current code, functions called in debug logging code are evaluated even in release mode (even though nothing is printed).
2018-09-14 11:40:44 -07:00