Commit graph

2322 commits

Author SHA1 Message Date
nam-cern
3d8f56409b Ensure numpy is at least 1.10.4 in setup.py (#2462)
In the build script, numpy is specifically set at 1.10.4. We should also ensure that it is indeed the case in `setup.py`.
2018-12-24 11:01:25 -08:00
Eric Liang
9f63119a83
[rllib] Allow development without needing to compile Ray (#3623)
* wip

* lint

* wip

* wip

* rename

* wip

* Cleaner handling of cli prompt
2018-12-24 18:08:23 +09:00
Devin Petersohn
c13b2685f5 [modin] Append to path to avoid namespace collision on development branches (#3621) 2018-12-23 23:58:56 -08:00
Si-Yuan
a1995ff3b0 Resize logo in README. (#3619) 2018-12-23 22:59:23 -08:00
Alexey Tumanov
9b8d7573fe bump version from 0.6.0 to 0.6.1 (#3610) 2018-12-23 17:03:42 -08:00
Robert Nishihara
bb7ca3bae7 Upgrade flatbuffers version to 1.10.0. (#3559)
* Upgrade flatbuffers version to 1.10.0.

* Temporarily change ray.utils.decode for backwards compatibility.
2018-12-23 14:56:34 -08:00
Robert Nishihara
ddd4c842f1 Initialize some variables in constructor instead of header file. (#3617)
* Initialize some variables in constructor instead of header file
2018-12-23 02:44:23 -08:00
Alexey Tumanov
bada42c334 object store notification mgr: fix using uninitialized variables (#3592)
Initialize private class variables to avoid valgrind errors. They are used before initialization.
2018-12-22 19:51:22 -08:00
Philipp Moritz
e578a38116 Fix TensorFlow and PyTorch compatibility (#3574)
* remove tensorflow workaround
* update docker
* add boost threads
* add date_time, too
* change link order
* cosmetics
2018-12-22 13:25:48 -08:00
Tianming Xu
deb26b954e [rllib] Export tensorflow model of policy graph (#3585)
* Export tensorflow model of policy graph

* Add tests,examples,pydocs and infer extra signatures from existing methods

* Add example usage in export_policy_model comment

* Fix lint error

* Fix lint error

* Fix lint error
2018-12-22 17:35:25 +09:00
Wang Qing
8393df2516 Use BaseTest to instead of TestListener. (#3577) 2018-12-21 16:29:16 -08:00
Eric Liang
ddc97864df [rllib] Add requested clarifications to test requirement of contrib docs (#3589) 2018-12-21 11:02:02 -08:00
Alexey Tumanov
6b179cb8a7 change the order of allocation for io_service and gcs client in raylet main (#3597) 2018-12-21 00:13:28 -08:00
bibabolynn
e65b8f18f4 [java] change RayLog.core to org.slf4j.Logger (#3579) 2018-12-21 15:58:32 +08:00
Richard Liaw
e046a5c767
[tune] resources_per_trial from trial_resources (#3580)
Renaming variable due to user errors.
2018-12-20 19:00:47 -08:00
Devin Petersohn
a174a46e02 Allowing multiple users to access the /tmp/ray file at the same time (#3591)
* Allowing multiple users to access the /tmp/ray file at the same time

Previous sequence that caused this issue:
* User A starts ray with `ray.init` when /tmp/ray does not exist
* User B starts ray with `ray.init` and /tmp/ray now exists

User B will get a permissions error
Checking the permissions, /tmp/ray is 700

I have identified a race condition in `try_to_create_directory`
* Multiple processes try to create /tmp/ray at the same time
* chmod is either silently erroring or working properly within the race condition

Resolution: Move chmod outside of the check for whether the directory exists or not.

* Adding try except for users who do not own the directory
2018-12-20 18:46:54 -08:00
Stephanie Wang
34bab6291c
Cleanup actor handle pickling code (#3560)
* Cleanup actor handle pickling code

* remove unused

* fix

* lint
2018-12-20 16:37:21 -08:00
Eric Liang
6bb1103930 [rllib] Avoid sample wastage with bad PPO configurations (#3552)
## What do these changes do?

Previously we logged a warning if the PPO configuration would waste many samples. However, this didn't apply in the case of long episodes in `complete_episodes` batch mode, and also the amount of waste is up to 2x in common cases.

This pr:
- Estimates the number of sampling tasks needed to avoid over-sampling.
- Collects all sample results and never discards any. In principle this can degrade performance at large scale if certain machines are slower. Add a config flag to enable this legacy behavior.

## Related issue number

Closes: https://github.com/ray-project/ray/issues/3549
2018-12-20 10:50:44 -08:00
Richard Liaw
ac48a58e4e
[tune] Reduce scope of variant generator (#3583)
This PR provides a better error message when the generate_variants code
breaks. Also removes a comment about nesting dependencies.

This comes mainly as a hotfix solution for #3466. We should leave that issue open for future contribution 🙂
2018-12-20 10:48:28 -08:00
Eric Liang
303883a3b6 [rllib] [rfc] add contrib module and guideline for merging (#3565)
This adds guidelines for merging code into `rllib/contrib` vs `rllib/agents`. Also, clean up the agent import code to make registration easier.
2018-12-20 10:44:34 -08:00
adoda
cf0c4745f4 [rllib] support running older version tensorflow(version < 1.5.0) (#3571) 2018-12-19 20:27:24 -08:00
Robert Nishihara
a5309bec7c Make README render properly on PyPI. (#3578)
* Make README render properly in pypi.

* Add small logo

* temporary fix

* smaller image

* Remove image size.

* Add author and email to setup.py.
2018-12-19 18:41:09 -08:00
Hao Chen
132a23354e Fix pending callback not called when ServerConnection destructs (#3572) 2018-12-19 17:29:36 -08:00
Eric Liang
ffa6ee3ec8
[rllib] streaming minibatching for IMPALA (#3402)
* mb impala

* fix

* paropt

* update

* cpu warn

* on cpu

* fix mb

* doc

* docs

* comment

* larger num

* early release

* remove grad clip

* only check loader count in multi gpu mode

* revert bad multigpu changes

* num sgd iter

* comment

* reuse optimizer

* add test

* par load test

* loosen test

* Update run_multi_node_tests.sh

* fix local mode

* Update agent.py
2018-12-19 02:23:29 -08:00
Alexey Tumanov
c4cba98c75 Remove deprecation warnings when running actor tests (#3563)
* remove deprecation warnings when running actor tests

* replacing logger.warn with logger.warning

* Update worker.py

* Update policy_client.py

* Update compression.py
2018-12-18 17:04:51 -08:00
Yuhong Guo
fb33fa9097 Enable function_descriptor in backend to replace the function_id (#3028) 2018-12-18 18:53:59 -05:00
Alexey Tumanov
3822b20319 [doc] update testing and dev instructions (#3562)
* [doc] update python testing command

* update installation/dev instructions
2018-12-18 14:45:24 -08:00
Stephanie Wang
26ca40817e Convert UniqueID::nil() to a constructor (#3564)
* Initialize UniqueID to nil

* Return reference to static const variable
2018-12-18 11:59:02 -08:00
Yuhong Guo
75ddf7cca4 Fix 2 small bugs (#3573) 2018-12-18 14:52:21 -05:00
Eric Liang
db0dee573e
[rllib] Q-Mix implementation (Q-Mix, VDN, IQN, and Ape-X variants) (#3548) 2018-12-18 10:40:01 -08:00
YifengHuang
bc4aa85ea3 fix link in doc (#3567) 2018-12-18 00:10:55 -08:00
opherlieber
854b06854f remove auto-concat of rollouts in AsyncSampler (#3556)
* remove auto-concat of rollouts in AsyncSampler

* remote auto-concat test

* remove unused reference
2018-12-17 13:54:52 -08:00
Devin Petersohn
3833ba4e4b Bump modin version to 0.2.5 (#3553) 2018-12-17 14:36:47 -05:00
Tianming Xu
7767aba637 Note requirement cython==0.29.0 in installation instructions (#3555) 2018-12-17 20:43:47 +08:00
Robert Nishihara
417c7f2d6f Update arrow and remove plasma_manager references. (#3545) 2018-12-15 23:36:02 -08:00
Philipp Moritz
b3bf608608 Update arrow to reduce plasma IPCs. (#3497) 2018-12-14 23:49:37 -05:00
Stephanie Wang
fcc37021b2
Throw exception for ray.get of an evicted actor object (#3490)
* Add a flag for whether an object has been created before

* Add regression test

* doc

* Share object directory between object and node managers

* Treat evicted actor tasks as failed

* minor

* Check return value

* Fix bug where object locations weren't getting updated on client death

* Fix mac build

* Use RayTaskError
2018-12-14 11:41:27 -08:00
bibabolynn
7fd24e384b [java] Pass large args by reference (#3504) 2018-12-14 23:32:35 +08:00
Richard Liaw
de3fdeb5b5
[autoscaler] Fix Error Handling for botocore (#3534)
Unfortunately Boto generates error classes dynamically, so this catches
the expected error and raises the error if it is the wrong class.

Closes #3533.
2018-12-14 00:20:49 -08:00
Yuhong Guo
2a4685a08b Add a script to collect built thirdparty libs to avoid download and building again. (#3521) 2018-12-13 23:56:40 -08:00
Yuhong Guo
a4abe6c0fe Add test to test raylet client connection when raylet crashes. (#3518) 2018-12-13 23:40:50 -08:00
Hao Chen
e7b51cbd1b [xray] Implement Actor Reconstruction (#3332)
* Implement Actor Reconstruction

* fix

* fix actor handle __del__

* fix lint

* add comment

* Remove actorCreationDummyObjectId

* address comments

* fix

* address comments

* avoid copy

* change log to debug

* fix error name
2018-12-13 21:28:58 -08:00
Alexey Tumanov
2455de78ce save initial config instead of initial resource config (#3532) 2018-12-13 20:39:42 -08:00
Si-Yuan
84fae57ab5 Convert the raylet client (the code in local_scheduler_client.cc) to proper C++. (#3511)
* refactoring

* fix bugs

* create client class

* create client class for java; bug fix

* remove legacy code

* improve code by using std::string, std::unique_ptr rename private fields and removing legacy code

* rename class

* improve naming

* fix

* rename files

* fix names

* change name

* change return types

* make a mutex private field

* fix comments

* fix bugs

* lint

* bug fix

* bug fix

* move too short functions into the header file

* Loose crash conditions for some APIs.

* Apply suggestions from code review

Co-Authored-By: suquark <suquark@gmail.com>

* format

* update

* rename python APIs

* fix java

* more fixes

* change types of cpython interface

* more fixes

* improve error processing

* improve error processing for java wrapper

* lint

* fix java

* make fields const

* use pointers for [out] parameters

* fix java & error msg

* fix resource leak, etc.
2018-12-13 13:39:10 -08:00
Chunyang Wen
5dcc333199 [sgd] Modify: add interface for model (#3458)
* Modify: add interface for model

* Modify: remove single quota and build; add metrics

* Modify: flatten into list of dict

* Update distributed_sgd.rst

* Modify: update format with scripts/format.sh

* Update sgd_worker.py
2018-12-12 21:23:25 -08:00
Eric Liang
0e00533ed4
Different approach to removing RayGetError (#3471) 2018-12-12 20:30:51 -08:00
Eric Liang
20c7fad4f4
Move actor table to primary redis context 2018-12-12 16:51:29 -08:00
Eric Liang
32473cf22e
[rllib] Basic Offline Data IO API (#3473) 2018-12-12 13:57:48 -08:00
Richard Liaw
cc8f7db246
[docs] Improve cluster/docker docs (#3517)
- Surfaces local cluster usage
 - Increases visability of these instructions
 - Removes some docker docs (that are really out of scope for Ray
 documentation IMO)

Closes #3517.
2018-12-12 10:40:54 -08:00
Eric Liang
5f4a9cc713 [rllib] Rollout should preprocess observations; some cleanups (#3512)
<!--
Thank you for your contribution!

Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request.
-->

## What do these changes do?

From https://groups.google.com/forum/#!topic/ray-dev/u-gybKK6-Ns
2018-12-11 20:16:38 -08:00