Commit graph

2301 commits

Author SHA1 Message Date
Robert Nishihara
a5309bec7c Make README render properly on PyPI. (#3578)
* Make README render properly in pypi.

* Add small logo

* temporary fix

* smaller image

* Remove image size.

* Add author and email to setup.py.
2018-12-19 18:41:09 -08:00
Hao Chen
132a23354e Fix pending callback not called when ServerConnection destructs (#3572) 2018-12-19 17:29:36 -08:00
Eric Liang
ffa6ee3ec8
[rllib] streaming minibatching for IMPALA (#3402)
* mb impala

* fix

* paropt

* update

* cpu warn

* on cpu

* fix mb

* doc

* docs

* comment

* larger num

* early release

* remove grad clip

* only check loader count in multi gpu mode

* revert bad multigpu changes

* num sgd iter

* comment

* reuse optimizer

* add test

* par load test

* loosen test

* Update run_multi_node_tests.sh

* fix local mode

* Update agent.py
2018-12-19 02:23:29 -08:00
Alexey Tumanov
c4cba98c75 Remove deprecation warnings when running actor tests (#3563)
* remove deprecation warnings when running actor tests

* replacing logger.warn with logger.warning

* Update worker.py

* Update policy_client.py

* Update compression.py
2018-12-18 17:04:51 -08:00
Yuhong Guo
fb33fa9097 Enable function_descriptor in backend to replace the function_id (#3028) 2018-12-18 18:53:59 -05:00
Alexey Tumanov
3822b20319 [doc] update testing and dev instructions (#3562)
* [doc] update python testing command

* update installation/dev instructions
2018-12-18 14:45:24 -08:00
Stephanie Wang
26ca40817e Convert UniqueID::nil() to a constructor (#3564)
* Initialize UniqueID to nil

* Return reference to static const variable
2018-12-18 11:59:02 -08:00
Yuhong Guo
75ddf7cca4 Fix 2 small bugs (#3573) 2018-12-18 14:52:21 -05:00
Eric Liang
db0dee573e
[rllib] Q-Mix implementation (Q-Mix, VDN, IQN, and Ape-X variants) (#3548) 2018-12-18 10:40:01 -08:00
YifengHuang
bc4aa85ea3 fix link in doc (#3567) 2018-12-18 00:10:55 -08:00
opherlieber
854b06854f remove auto-concat of rollouts in AsyncSampler (#3556)
* remove auto-concat of rollouts in AsyncSampler

* remote auto-concat test

* remove unused reference
2018-12-17 13:54:52 -08:00
Devin Petersohn
3833ba4e4b Bump modin version to 0.2.5 (#3553) 2018-12-17 14:36:47 -05:00
Tianming Xu
7767aba637 Note requirement cython==0.29.0 in installation instructions (#3555) 2018-12-17 20:43:47 +08:00
Robert Nishihara
417c7f2d6f Update arrow and remove plasma_manager references. (#3545) 2018-12-15 23:36:02 -08:00
Philipp Moritz
b3bf608608 Update arrow to reduce plasma IPCs. (#3497) 2018-12-14 23:49:37 -05:00
Stephanie Wang
fcc37021b2
Throw exception for ray.get of an evicted actor object (#3490)
* Add a flag for whether an object has been created before

* Add regression test

* doc

* Share object directory between object and node managers

* Treat evicted actor tasks as failed

* minor

* Check return value

* Fix bug where object locations weren't getting updated on client death

* Fix mac build

* Use RayTaskError
2018-12-14 11:41:27 -08:00
bibabolynn
7fd24e384b [java] Pass large args by reference (#3504) 2018-12-14 23:32:35 +08:00
Richard Liaw
de3fdeb5b5
[autoscaler] Fix Error Handling for botocore (#3534)
Unfortunately Boto generates error classes dynamically, so this catches
the expected error and raises the error if it is the wrong class.

Closes #3533.
2018-12-14 00:20:49 -08:00
Yuhong Guo
2a4685a08b Add a script to collect built thirdparty libs to avoid download and building again. (#3521) 2018-12-13 23:56:40 -08:00
Yuhong Guo
a4abe6c0fe Add test to test raylet client connection when raylet crashes. (#3518) 2018-12-13 23:40:50 -08:00
Hao Chen
e7b51cbd1b [xray] Implement Actor Reconstruction (#3332)
* Implement Actor Reconstruction

* fix

* fix actor handle __del__

* fix lint

* add comment

* Remove actorCreationDummyObjectId

* address comments

* fix

* address comments

* avoid copy

* change log to debug

* fix error name
2018-12-13 21:28:58 -08:00
Alexey Tumanov
2455de78ce save initial config instead of initial resource config (#3532) 2018-12-13 20:39:42 -08:00
Si-Yuan
84fae57ab5 Convert the raylet client (the code in local_scheduler_client.cc) to proper C++. (#3511)
* refactoring

* fix bugs

* create client class

* create client class for java; bug fix

* remove legacy code

* improve code by using std::string, std::unique_ptr rename private fields and removing legacy code

* rename class

* improve naming

* fix

* rename files

* fix names

* change name

* change return types

* make a mutex private field

* fix comments

* fix bugs

* lint

* bug fix

* bug fix

* move too short functions into the header file

* Loose crash conditions for some APIs.

* Apply suggestions from code review

Co-Authored-By: suquark <suquark@gmail.com>

* format

* update

* rename python APIs

* fix java

* more fixes

* change types of cpython interface

* more fixes

* improve error processing

* improve error processing for java wrapper

* lint

* fix java

* make fields const

* use pointers for [out] parameters

* fix java & error msg

* fix resource leak, etc.
2018-12-13 13:39:10 -08:00
Chunyang Wen
5dcc333199 [sgd] Modify: add interface for model (#3458)
* Modify: add interface for model

* Modify: remove single quota and build; add metrics

* Modify: flatten into list of dict

* Update distributed_sgd.rst

* Modify: update format with scripts/format.sh

* Update sgd_worker.py
2018-12-12 21:23:25 -08:00
Eric Liang
0e00533ed4
Different approach to removing RayGetError (#3471) 2018-12-12 20:30:51 -08:00
Eric Liang
20c7fad4f4
Move actor table to primary redis context 2018-12-12 16:51:29 -08:00
Eric Liang
32473cf22e
[rllib] Basic Offline Data IO API (#3473) 2018-12-12 13:57:48 -08:00
Richard Liaw
cc8f7db246
[docs] Improve cluster/docker docs (#3517)
- Surfaces local cluster usage
 - Increases visability of these instructions
 - Removes some docker docs (that are really out of scope for Ray
 documentation IMO)

Closes #3517.
2018-12-12 10:40:54 -08:00
Eric Liang
5f4a9cc713 [rllib] Rollout should preprocess observations; some cleanups (#3512)
<!--
Thank you for your contribution!

Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request.
-->

## What do these changes do?

From https://groups.google.com/forum/#!topic/ray-dev/u-gybKK6-Ns
2018-12-11 20:16:38 -08:00
Eric Liang
59f4743f20
[rllib] Run simple regressions tests for all algs in jenkins (#3498) 2018-12-11 17:21:53 -08:00
Richard Liaw
e0fbb68e47
[tune] Custom Logging, Trial Name (#3465)
Adds support for custom loggers, custom trial strings, and custom sync commands. Closes #3034, #2985, and #3390.
2018-12-11 13:41:59 -08:00
Robert Nishihara
74c3370bd5 Show slowest tests in travis. (#3507) 2018-12-11 11:25:04 -08:00
Eric Liang
52df4dfc6f
[rllib] Fix multiagent_two_trainer test (#3509)
* update

* fix

* dict ordre

* fix

* fix
2018-12-11 00:16:39 -08:00
Richard Liaw
1f4a01cff6 [tune] Fix PyTorch example after PyTorch v1 (#3500)
* [tune]

* fix

* lint

* fix
2018-12-10 12:00:53 -08:00
Eric Liang
962f18756b [autoscaler] Use fixed timestamp to check against health timeouts (#3503) 2018-12-10 14:58:27 -05:00
Yuhong Guo
abd781d607 Make stress test time shorter. (#3506) 2018-12-10 14:46:40 -05:00
Eric Liang
ce388a45cf
[rllib] Learner should not see clipped actions (#3496) 2018-12-09 21:57:11 -08:00
Philipp Moritz
87c0d24579
[sgd] Add file lock to protect compilation of sgd op (#3486)
* add file lock to protect compilation of sgd op

* lint

* update

* fix

* fix

* lint

* update

* rebase on arrow

* Update sgd_worker.py
2018-12-09 13:52:40 -08:00
Eric Liang
cffe8f9806 Add option to evict keys LRU from the sharded redis tables (#3499)
* wip

* wip

* format

* wip

* note

* lint

* fix

* flag

* typo

* raise timeout

* fix

* optional get

* fix flag

* increase timeout in test

* update docs

* format
2018-12-09 05:48:52 -08:00
Yuhong Guo
0136af5aac Add return value for recontruction RPC. (#3493)
* Add return value for recontruct RPC.

* Fix comment function name
2018-12-09 00:08:44 -08:00
Eric Liang
7aec357501
[rllib] Multi-GPU support for Multi-Agent PPO (#3479)
* wip

* fix

* remove check

* fix null

* revert

* lint and kl

* also fix rollout
2018-12-08 18:02:33 -08:00
Eric Liang
8b5827b9da
[rllib] Better document which methods are abstract and which ones are overrides (#3480) 2018-12-08 16:28:58 -08:00
Eric Liang
462e6ef066
[rllib] Use smoothed version of collect metrics for DQN (#3491)
* fix

* lint
2018-12-07 18:36:23 -08:00
Tianming Xu
f6490f9bef Resolve no handlers could be found for logger 'ray.worker' when importing ray (#3483) 2018-12-06 20:46:53 -08:00
Eric Liang
8395523f81
[rllib] Copy data before passing to Ape-X learner thread (fixes transient plasma crashes) (#3484) 2018-12-06 18:01:11 -08:00
Si-Yuan
c2c501bbe6 Experimental asyncio support (#2015)
* Init commit for async plasma client

* Create an eventloop model for ray/plasma

* Implement a poll-like selector base on `ray.wait`. Huge improvements.

* Allow choosing workers & selectors

* remove original design

* initial implementation of epoll-like selector for plasma

* Add a param for `worker` used in `PlasmaSelectorEventLoop`

* Allow accepting a `Future` which returns object_id

* Do not need `io.py` anymore

* Create a basic testing model

* fix: `ray.wait` returns tuple of lists

* fix a few bugs

* improving performance & bug fixing

* add test

* several improvements & fixing

* fix relative import

* [async] change code format, remove old files

* [async] Create context wrapper for the eventloop

* [async] fix: context should return a value

* [async] Implement futures grouping

* [async] Fix bugs & replace old functions

* [async] Fix bugs found in tests

* [async] Implement `PlasmaEpoll`

* [async] Make test faster, add tests for epoll

* [async] Fix code format

* [async] Add comments for main code.

* [async] Fix import path.

* [async] Fix test.

* [async] Compatibility.

* [async] less verbose to not annoy the CI.

* [async] Add test for new API

* [async] Allow showing debug info in some of the test.

* [async] Fix test.

* [async] Proper shutdown.

* [async] Lint~

* [async] Move files to experimental and create API

* [async] Use async/await syntax

* [async] Fix names & styles

* [async] comments

* [async] bug fixing & use pytest

* [async] bug fixing & change tests

* [async] use logger

* [async] add tests

* [async] lint

* [async] type checking

* [async] add more tests

* [async] fix bugs on waiting a future while timeout. Add more docs.

* [async] Formal docs.

* [async] Add typing info since these codes are compatible with py3.5+.

* [async] Documents.

* [async] Lint.

* [async] Fix deprecated call.

* [async] Fix deprecated call.

* [async] Implement a more reasonable way for dealing with pending inputs.

* [async] Fix docs

* [async] Lint

* [async] Fix bug: Type for time

* [async] Set our eventloop as the default eventloop so that we can get it through `asyncio.get_event_loop()`.

* [async] Update test & docs.

* [async] Lint.

* [async] Temporarily print more debug info.

* [async] Use `Poll` as a default option.

* [async] Limit resources.

* new async implementation for Ray

* implement linked list

* bug fix

* update

* support seamless async operations

* update

* update API

* fix tests

* lint

* bug fix

* refactor names

* improve doc

* properly shutdown async_api

* doc

* Change the table on the index page.

* Adjust table size.

* Only keeps `as_future`.

* change how we init connection

* init connection in `ray.worker.connect`

* doc

* fix

* Move initialization code into the module.

* Fix docs & code

* Update pyarrow version.

* lint

* Restore index.rst

* Add known issues.

* Apply suggestions from code review

Co-Authored-By: suquark <suquark@gmail.com>

* rename

* Update async_api.rst

* Update async_api.py

* Update async_api.rst

* Update async_api.py

* Update worker.py

* Update async_api.rst

* fix tests

* lint

* lint

* replace the magic number
2018-12-06 17:39:05 -08:00
Devin Petersohn
970babf31a Removing the check about the size re: ray-project/ray#3450 (#3464)
* Removing the check about the size re: ray-project/ray#3450

* Addressing comments

* Update services.py
2018-12-06 16:59:24 -08:00
Eugene Vinitsky
7a7c6e53c8 [tune/rllib] Use cloudpickle to dump config (#3462)
<!--
Thank you for your contribution!

Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request.
-->

## What do these changes do?

JSON Logger now uses cloudpickle to dump the configs as welll, which pkls the functions needed for multi-agent replay.

## Related issue number

<!-- Are there any issues opened that will be resolved by merging this change? -->
2018-12-06 15:52:44 -08:00
Yuhong Guo
b9e1977fae Fix failure of test_free_objects_multi_node (#3481)
It is possible that `test_free_objects_multi_node` would fail sometimes. If we run this test 20 times, we may found at least one failure.

The cause is that the test is based on function tasks. One raylet may create more than one worker to execute the tasks. So flush operations may be separated to several workers and not clean all the worker objects held by the plasma client.

In this PR, I change function task to actor tasks, which guarantee all the tasks are executed in one worker of a raylet.
2018-12-06 15:55:49 -05:00
Eric Liang
412aaa5195
[tune] Deprecate ambiguous function values (use tune.function / tune.sample_from instead) (#3457)
* wip

* exclude
2018-12-06 11:35:20 -08:00