* Modify: add interface for model
* Modify: remove single quota and build; add metrics
* Modify: flatten into list of dict
* Update distributed_sgd.rst
* Modify: update format with scripts/format.sh
* Update sgd_worker.py
- Surfaces local cluster usage
- Increases visability of these instructions
- Removes some docker docs (that are really out of scope for Ray
documentation IMO)
Closes#3517.
* Init commit for async plasma client
* Create an eventloop model for ray/plasma
* Implement a poll-like selector base on `ray.wait`. Huge improvements.
* Allow choosing workers & selectors
* remove original design
* initial implementation of epoll-like selector for plasma
* Add a param for `worker` used in `PlasmaSelectorEventLoop`
* Allow accepting a `Future` which returns object_id
* Do not need `io.py` anymore
* Create a basic testing model
* fix: `ray.wait` returns tuple of lists
* fix a few bugs
* improving performance & bug fixing
* add test
* several improvements & fixing
* fix relative import
* [async] change code format, remove old files
* [async] Create context wrapper for the eventloop
* [async] fix: context should return a value
* [async] Implement futures grouping
* [async] Fix bugs & replace old functions
* [async] Fix bugs found in tests
* [async] Implement `PlasmaEpoll`
* [async] Make test faster, add tests for epoll
* [async] Fix code format
* [async] Add comments for main code.
* [async] Fix import path.
* [async] Fix test.
* [async] Compatibility.
* [async] less verbose to not annoy the CI.
* [async] Add test for new API
* [async] Allow showing debug info in some of the test.
* [async] Fix test.
* [async] Proper shutdown.
* [async] Lint~
* [async] Move files to experimental and create API
* [async] Use async/await syntax
* [async] Fix names & styles
* [async] comments
* [async] bug fixing & use pytest
* [async] bug fixing & change tests
* [async] use logger
* [async] add tests
* [async] lint
* [async] type checking
* [async] add more tests
* [async] fix bugs on waiting a future while timeout. Add more docs.
* [async] Formal docs.
* [async] Add typing info since these codes are compatible with py3.5+.
* [async] Documents.
* [async] Lint.
* [async] Fix deprecated call.
* [async] Fix deprecated call.
* [async] Implement a more reasonable way for dealing with pending inputs.
* [async] Fix docs
* [async] Lint
* [async] Fix bug: Type for time
* [async] Set our eventloop as the default eventloop so that we can get it through `asyncio.get_event_loop()`.
* [async] Update test & docs.
* [async] Lint.
* [async] Temporarily print more debug info.
* [async] Use `Poll` as a default option.
* [async] Limit resources.
* new async implementation for Ray
* implement linked list
* bug fix
* update
* support seamless async operations
* update
* update API
* fix tests
* lint
* bug fix
* refactor names
* improve doc
* properly shutdown async_api
* doc
* Change the table on the index page.
* Adjust table size.
* Only keeps `as_future`.
* change how we init connection
* init connection in `ray.worker.connect`
* doc
* fix
* Move initialization code into the module.
* Fix docs & code
* Update pyarrow version.
* lint
* Restore index.rst
* Add known issues.
* Apply suggestions from code review
Co-Authored-By: suquark <suquark@gmail.com>
* rename
* Update async_api.rst
* Update async_api.py
* Update async_api.rst
* Update async_api.py
* Update worker.py
* Update async_api.rst
* fix tests
* lint
* lint
* replace the magic number
<!--
Thank you for your contribution!
Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request.
-->
## What do these changes do?
JSON Logger now uses cloudpickle to dump the configs as welll, which pkls the functions needed for multi-agent replay.
## Related issue number
<!-- Are there any issues opened that will be resolved by merging this change? -->
It is possible that `test_free_objects_multi_node` would fail sometimes. If we run this test 20 times, we may found at least one failure.
The cause is that the test is based on function tasks. One raylet may create more than one worker to execute the tasks. So flush operations may be separated to several workers and not clean all the worker objects held by the plasma client.
In this PR, I change function task to actor tasks, which guarantee all the tasks are executed in one worker of a raylet.
auto wrap multi-agent dict and tuple spaces by keeping a policy -> preprocessor in the sampler
add some Q-learning debug stats
report min, max of custom metrics
better errors
ray.wait depends on callbacks from the GCS to decide when an object has appeared in the cluster. The raylet crashes if a callback is received for a wait request that has already completed, but this actually can happen, depending on the order of calls. More precisely:
1. Objects A and B are put in the cluster.
2. Client calls ray.wait([A, B], num_returns=1).
3. Client subscribes to locations for A and B. Locations are cached for both, so callbacks are posted for each.
4. Callback for A fires. The wait completes and the request is removed.
5. Callback for B fires. The wait request no longer exists and raylet crashes.