Eric Liang
ddc97864df
[rllib] Add requested clarifications to test requirement of contrib docs ( #3589 )
2018-12-21 11:02:02 -08:00
Alexey Tumanov
6b179cb8a7
change the order of allocation for io_service and gcs client in raylet main ( #3597 )
2018-12-21 00:13:28 -08:00
bibabolynn
e65b8f18f4
[java] change RayLog.core to org.slf4j.Logger ( #3579 )
2018-12-21 15:58:32 +08:00
Richard Liaw
e046a5c767
[tune] resources_per_trial from trial_resources ( #3580 )
...
Renaming variable due to user errors.
2018-12-20 19:00:47 -08:00
Devin Petersohn
a174a46e02
Allowing multiple users to access the /tmp/ray file at the same time ( #3591 )
...
* Allowing multiple users to access the /tmp/ray file at the same time
Previous sequence that caused this issue:
* User A starts ray with `ray.init` when /tmp/ray does not exist
* User B starts ray with `ray.init` and /tmp/ray now exists
User B will get a permissions error
Checking the permissions, /tmp/ray is 700
I have identified a race condition in `try_to_create_directory`
* Multiple processes try to create /tmp/ray at the same time
* chmod is either silently erroring or working properly within the race condition
Resolution: Move chmod outside of the check for whether the directory exists or not.
* Adding try except for users who do not own the directory
2018-12-20 18:46:54 -08:00
Stephanie Wang
34bab6291c
Cleanup actor handle pickling code ( #3560 )
...
* Cleanup actor handle pickling code
* remove unused
* fix
* lint
2018-12-20 16:37:21 -08:00
Eric Liang
6bb1103930
[rllib] Avoid sample wastage with bad PPO configurations ( #3552 )
...
## What do these changes do?
Previously we logged a warning if the PPO configuration would waste many samples. However, this didn't apply in the case of long episodes in `complete_episodes` batch mode, and also the amount of waste is up to 2x in common cases.
This pr:
- Estimates the number of sampling tasks needed to avoid over-sampling.
- Collects all sample results and never discards any. In principle this can degrade performance at large scale if certain machines are slower. Add a config flag to enable this legacy behavior.
## Related issue number
Closes: https://github.com/ray-project/ray/issues/3549
2018-12-20 10:50:44 -08:00
Richard Liaw
ac48a58e4e
[tune] Reduce scope of variant generator ( #3583 )
...
This PR provides a better error message when the generate_variants code
breaks. Also removes a comment about nesting dependencies.
This comes mainly as a hotfix solution for #3466 . We should leave that issue open for future contribution 🙂
2018-12-20 10:48:28 -08:00
Eric Liang
303883a3b6
[rllib] [rfc] add contrib module and guideline for merging ( #3565 )
...
This adds guidelines for merging code into `rllib/contrib` vs `rllib/agents`. Also, clean up the agent import code to make registration easier.
2018-12-20 10:44:34 -08:00
adoda
cf0c4745f4
[rllib] support running older version tensorflow(version < 1.5.0) ( #3571 )
2018-12-19 20:27:24 -08:00
Robert Nishihara
a5309bec7c
Make README render properly on PyPI. ( #3578 )
...
* Make README render properly in pypi.
* Add small logo
* temporary fix
* smaller image
* Remove image size.
* Add author and email to setup.py.
2018-12-19 18:41:09 -08:00
Hao Chen
132a23354e
Fix pending callback not called when ServerConnection destructs ( #3572 )
2018-12-19 17:29:36 -08:00
Eric Liang
ffa6ee3ec8
[rllib] streaming minibatching for IMPALA ( #3402 )
...
* mb impala
* fix
* paropt
* update
* cpu warn
* on cpu
* fix mb
* doc
* docs
* comment
* larger num
* early release
* remove grad clip
* only check loader count in multi gpu mode
* revert bad multigpu changes
* num sgd iter
* comment
* reuse optimizer
* add test
* par load test
* loosen test
* Update run_multi_node_tests.sh
* fix local mode
* Update agent.py
2018-12-19 02:23:29 -08:00
Alexey Tumanov
c4cba98c75
Remove deprecation warnings when running actor tests ( #3563 )
...
* remove deprecation warnings when running actor tests
* replacing logger.warn with logger.warning
* Update worker.py
* Update policy_client.py
* Update compression.py
2018-12-18 17:04:51 -08:00
Yuhong Guo
fb33fa9097
Enable function_descriptor in backend to replace the function_id ( #3028 )
2018-12-18 18:53:59 -05:00
Alexey Tumanov
3822b20319
[doc] update testing and dev instructions ( #3562 )
...
* [doc] update python testing command
* update installation/dev instructions
2018-12-18 14:45:24 -08:00
Stephanie Wang
26ca40817e
Convert UniqueID::nil() to a constructor ( #3564 )
...
* Initialize UniqueID to nil
* Return reference to static const variable
2018-12-18 11:59:02 -08:00
Yuhong Guo
75ddf7cca4
Fix 2 small bugs ( #3573 )
2018-12-18 14:52:21 -05:00
Eric Liang
db0dee573e
[rllib] Q-Mix implementation (Q-Mix, VDN, IQN, and Ape-X variants) ( #3548 )
2018-12-18 10:40:01 -08:00
YifengHuang
bc4aa85ea3
fix link in doc ( #3567 )
2018-12-18 00:10:55 -08:00
opherlieber
854b06854f
remove auto-concat of rollouts in AsyncSampler ( #3556 )
...
* remove auto-concat of rollouts in AsyncSampler
* remote auto-concat test
* remove unused reference
2018-12-17 13:54:52 -08:00
Devin Petersohn
3833ba4e4b
Bump modin version to 0.2.5 ( #3553 )
2018-12-17 14:36:47 -05:00
Tianming Xu
7767aba637
Note requirement cython==0.29.0 in installation instructions ( #3555 )
2018-12-17 20:43:47 +08:00
Robert Nishihara
417c7f2d6f
Update arrow and remove plasma_manager references. ( #3545 )
2018-12-15 23:36:02 -08:00
Philipp Moritz
b3bf608608
Update arrow to reduce plasma IPCs. ( #3497 )
2018-12-14 23:49:37 -05:00
Stephanie Wang
fcc37021b2
Throw exception for ray.get
of an evicted actor object ( #3490 )
...
* Add a flag for whether an object has been created before
* Add regression test
* doc
* Share object directory between object and node managers
* Treat evicted actor tasks as failed
* minor
* Check return value
* Fix bug where object locations weren't getting updated on client death
* Fix mac build
* Use RayTaskError
2018-12-14 11:41:27 -08:00
bibabolynn
7fd24e384b
[java] Pass large args by reference ( #3504 )
2018-12-14 23:32:35 +08:00
Richard Liaw
de3fdeb5b5
[autoscaler] Fix Error Handling for botocore ( #3534 )
...
Unfortunately Boto generates error classes dynamically, so this catches
the expected error and raises the error if it is the wrong class.
Closes #3533 .
2018-12-14 00:20:49 -08:00
Yuhong Guo
2a4685a08b
Add a script to collect built thirdparty libs to avoid download and building again. ( #3521 )
2018-12-13 23:56:40 -08:00
Yuhong Guo
a4abe6c0fe
Add test to test raylet client connection when raylet crashes. ( #3518 )
2018-12-13 23:40:50 -08:00
Hao Chen
e7b51cbd1b
[xray] Implement Actor Reconstruction ( #3332 )
...
* Implement Actor Reconstruction
* fix
* fix actor handle __del__
* fix lint
* add comment
* Remove actorCreationDummyObjectId
* address comments
* fix
* address comments
* avoid copy
* change log to debug
* fix error name
2018-12-13 21:28:58 -08:00
Alexey Tumanov
2455de78ce
save initial config instead of initial resource config ( #3532 )
2018-12-13 20:39:42 -08:00
Si-Yuan
84fae57ab5
Convert the raylet client (the code in local_scheduler_client.cc) to proper C++. ( #3511 )
...
* refactoring
* fix bugs
* create client class
* create client class for java; bug fix
* remove legacy code
* improve code by using std::string, std::unique_ptr rename private fields and removing legacy code
* rename class
* improve naming
* fix
* rename files
* fix names
* change name
* change return types
* make a mutex private field
* fix comments
* fix bugs
* lint
* bug fix
* bug fix
* move too short functions into the header file
* Loose crash conditions for some APIs.
* Apply suggestions from code review
Co-Authored-By: suquark <suquark@gmail.com>
* format
* update
* rename python APIs
* fix java
* more fixes
* change types of cpython interface
* more fixes
* improve error processing
* improve error processing for java wrapper
* lint
* fix java
* make fields const
* use pointers for [out] parameters
* fix java & error msg
* fix resource leak, etc.
2018-12-13 13:39:10 -08:00
Chunyang Wen
5dcc333199
[sgd] Modify: add interface for model ( #3458 )
...
* Modify: add interface for model
* Modify: remove single quota and build; add metrics
* Modify: flatten into list of dict
* Update distributed_sgd.rst
* Modify: update format with scripts/format.sh
* Update sgd_worker.py
2018-12-12 21:23:25 -08:00
Eric Liang
0e00533ed4
Different approach to removing RayGetError ( #3471 )
2018-12-12 20:30:51 -08:00
Eric Liang
20c7fad4f4
Move actor table to primary redis context
2018-12-12 16:51:29 -08:00
Eric Liang
32473cf22e
[rllib] Basic Offline Data IO API ( #3473 )
2018-12-12 13:57:48 -08:00
Richard Liaw
cc8f7db246
[docs] Improve cluster/docker docs ( #3517 )
...
- Surfaces local cluster usage
- Increases visability of these instructions
- Removes some docker docs (that are really out of scope for Ray
documentation IMO)
Closes #3517 .
2018-12-12 10:40:54 -08:00
Eric Liang
5f4a9cc713
[rllib] Rollout should preprocess observations; some cleanups ( #3512 )
...
<!--
Thank you for your contribution!
Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request.
-->
## What do these changes do?
From https://groups.google.com/forum/#!topic/ray-dev/u-gybKK6-Ns
2018-12-11 20:16:38 -08:00
Eric Liang
59f4743f20
[rllib] Run simple regressions tests for all algs in jenkins ( #3498 )
2018-12-11 17:21:53 -08:00
Richard Liaw
e0fbb68e47
[tune] Custom Logging, Trial Name ( #3465 )
...
Adds support for custom loggers, custom trial strings, and custom sync commands. Closes #3034 , #2985 , and #3390 .
2018-12-11 13:41:59 -08:00
Robert Nishihara
74c3370bd5
Show slowest tests in travis. ( #3507 )
2018-12-11 11:25:04 -08:00
Eric Liang
52df4dfc6f
[rllib] Fix multiagent_two_trainer test ( #3509 )
...
* update
* fix
* dict ordre
* fix
* fix
2018-12-11 00:16:39 -08:00
Richard Liaw
1f4a01cff6
[tune] Fix PyTorch example after PyTorch v1 ( #3500 )
...
* [tune]
* fix
* lint
* fix
2018-12-10 12:00:53 -08:00
Eric Liang
962f18756b
[autoscaler] Use fixed timestamp to check against health timeouts ( #3503 )
2018-12-10 14:58:27 -05:00
Yuhong Guo
abd781d607
Make stress test time shorter. ( #3506 )
2018-12-10 14:46:40 -05:00
Eric Liang
ce388a45cf
[rllib] Learner should not see clipped actions ( #3496 )
2018-12-09 21:57:11 -08:00
Philipp Moritz
87c0d24579
[sgd] Add file lock to protect compilation of sgd op ( #3486 )
...
* add file lock to protect compilation of sgd op
* lint
* update
* fix
* fix
* lint
* update
* rebase on arrow
* Update sgd_worker.py
2018-12-09 13:52:40 -08:00
Eric Liang
cffe8f9806
Add option to evict keys LRU from the sharded redis tables ( #3499 )
...
* wip
* wip
* format
* wip
* note
* lint
* fix
* flag
* typo
* raise timeout
* fix
* optional get
* fix flag
* increase timeout in test
* update docs
* format
2018-12-09 05:48:52 -08:00
Yuhong Guo
0136af5aac
Add return value for recontruction RPC. ( #3493 )
...
* Add return value for recontruct RPC.
* Fix comment function name
2018-12-09 00:08:44 -08:00