* add noisy network
* distributional q-learning in dev
* add distributional q-learning
* validated rainbow module
* add some comments
* supply some comments
* remove redundant argument to pass CI test
* async replay optimizer does NOT need annealing beta
* ignore rainbow specific arguments for DDPG and Apex
* formatted by yapf
* Update dqn_policy_graph.py
* Update dqn_policy_graph.py
* added ars
* functioning ars with regression test
* added regression tests for ARs
* fixed default config for ARS
* ARS code runs, now time to test
* ARS working and tested, changed std deviation of meanstd filter to initialize to 1
* ARS working and tested, changed std deviation of meanstd filter to initialize to 1
* pep8 fixes
* removed unused linear model
* address comments
* more fixing comments
* post yapf
* fixed support failure
* Update LICENSE
* Update policies.py
* Update test_supported_spaces.py
* Update policies.py
* Update LICENSE
* Update test_supported_spaces.py
* Update policies.py
* Update policies.py
* Update filter.py
## What do these changes do?
#2362 left a bug where it assumed that the driver task ID was nil. This fixes the bug to check the `SchedulingQueue` for any driver task IDs instead.
* [WIP] Support different backend log lib
* Refine code, unify level, address comment
* Address comment and change formatter
* Fix linux building failure.
* Fix lint
* Remove log4cplus.
* Add log init to raylet main and add test to travis.
* Address comment and refine.
* Update logging_test.cc
* + Compatibility fix under py2 on ray.tune
* + Revert changes on master branch
* + Use default JsonEncoder in ray.tune.logger
* + Add UT for infinity support
A bunch of minor rllib fixes:
pull in latest baselines atari wrapper changes (and use deepmind wrapper by default)
move reward clipping to policy evaluator
add a2c variant of a3c
reduce vision network fc layer size to 256 units
switch to 84x84 images
doc tweaks
print timesteps in tune status
This PR makes it so that when Ray is started via ray.init() (as opposed to via ray start) the Redis servers will be started in "protected mode" (which means that clients can only connect by connecting to localhost).
In practice, we actually connect redis clients by passing in the node IP address (not localhost), so I need to create a redis config file on the fly to allow both localhost and the node's actual IP address (it would have been nice to find a way to do this from the Python redis client, but I couldn't find one).
This adds some experimental (undocumented) support for launching Ray on existing nodes. You have to provide the head ip, and the list of worker ips.
There are also a couple additional utils added for rsyncing files and port-forward.
This PR introduces the following changes:
* Ray Tune -> Tune
* [breaking] Creation of `schedulers/`, moving PBT, HyperBand into a submodule
* [breaking] Search Algorithms now must take in experiment configurations via `add_configurations` rather through initialization
* Support `"run": (function | class | str)` with automatic registering of trainable
* Documentation Changes
Currently, log directory in Java is a relative path . This PR changes it to `/tmp/raylogs` (with the same format as Python, e.g., `local_scheduler-2018-51-17_17-8-6-05164.err`). It also cleans up some relative code.
The goal of this PR is to allow custom policies to perform model-based rollouts. In the multi-agent setting, this requires access to not only policies of other agents, but also their current observations.
Also, you might want to return the model-based trajectories as part of the rollout for efficiency.
compute_actions() now takes a new keyword arg episodes
pull out internal episode class into a top-level file
add function to return extra trajectories from an episode that will be appended to the sample batch
documentation
ray exec CLUSTER CMD [--screen] [--start] [--stop]
ray attach CLUSTER [--start]
Example:
ray exec sgd.yaml 'source activate tensorflow_p27 && cd ~/ray/python/ray/rllib && ./train.py --run=PPO --env=CartPole-v0' --screen --start --stop
This will in one command create a cluster and run the command on it in a screen session. The screen can later be attached to via ray attach. After the command finishes, the cluster workers will be terminated and the head node stopped.
* Cache a Task's object dependencies
* Cache the parent task IDs for lineage cache entries
* Cache the parent task IDs in lineage cache entries
* revert
* Fix test
* remove unused line
* Fix test
* Support building Java and Python version at the same time.
* Remove duplicated definition.
* Refine the building process of local_scheduler
* Refine
* Add comment for languages
* Modify instruction and add python,jave building to CI.
* change according to comment
* Move all ObjectManager members to bottom of class def
* Better Pull requests
- suppress duplicate Pulls
- retry the Pull at the next client after a timeout
- cancel a Pull if the object no longer appears on any clients
* increase object manager Pull timeout
* Make the component failure test harder.
* note
* Notify SubscribeObjectLocations caller of empty list
* Address melih's comments
* Fix wait...
* Make component failure test easier for legacy ray
* lint