mika
64c95aea85
[rllib] Update README.md for qmix ( #4101 )
...
## What do these changes do?
Fixed PyMARL repository path.
## Related issue number
N/A
2019-02-20 10:21:08 -08:00
Robert Nishihara
e7651b1117
Fix excessive buffering of worker stdout/stderr. ( #4094 )
...
* Start workers with 'python -u' to prevent buffering of prints.
* Set sys.stdout and sys.stderr.
* Add comment.
2019-02-19 20:20:47 -08:00
Eric Liang
e9ee38ace2
More compact format for worker logs ( #4092 )
2019-02-19 19:53:43 -08:00
Robert Nishihara
c92a867c8b
Fix log monitor CPU utilization. ( #4091 )
2019-02-19 12:19:21 -08:00
Wang Qing
794a093249
Add runtime_context to get some runtime fields in worker ( #4065 )
2019-02-19 15:57:30 +08:00
Wang Qing
7574757391
Fix crash for Java task's task.argument()
in state. ( #4063 )
2019-02-19 12:46:07 +08:00
Philipp Moritz
cfc7e2c5a9
Fix modin test ( #4069 )
2019-02-18 12:17:36 -08:00
Eric Liang
6e46d75554
[tune] Remove slow gzip of checkpoints; ignore jupyter stop errors ( #4076 )
...
* fix gzip
* ignore jupyter
2019-02-18 01:30:13 -08:00
Eric Liang
f8bef004da
[rllib] Improve error message for bad envs, add remote env docs ( #4044 )
...
* commit
* fix up rew
2019-02-18 01:28:19 -08:00
Philipp Moritz
f51969964d
Fix linting on master ( #4077 )
2019-02-17 13:55:40 -08:00
Megan Kawakami
346885068c
[rllib] add torch pg ( #3857 )
...
* add torch pg
* add torch imports
* added torch pg
* working torch pg implementation
* add pg pytorch
* Update a3c.py
* Update a3c.py
* Update torch_policy_graph.py
* Update torch_policy_graph.py
2019-02-16 19:54:14 -08:00
Zekun Shi
a708ab66f5
Add simplex action space and dirichlet action distribution ( #4070 )
...
* add simplex action space and dirichlet action distribution
* Update and rename spaces.py to extra_spaces.py
* Update __init__.py
* Update catalog.py
* Fix python 2
* Update extra_spaces.py
* change Simplex.contains() to return False
2019-02-16 12:44:59 -08:00
Kristian Hartikainen
0cc5c88075
[tune] Add number of trials to the trial runner logger ( #4068 )
2019-02-16 01:12:59 -08:00
Yu Kobayashi
d2d66c576e
Support non ascii characters in the source code ( #4047 )
2019-02-16 11:45:44 +08:00
Hao Chen
de17443dc2
Propagate backend error to worker ( #4039 )
2019-02-16 11:39:15 +08:00
Robert Nishihara
2d07df7f3f
Replace '__main__' with "__main__". ( #4055 )
2019-02-15 13:32:43 -08:00
Robert Nishihara
5f71751891
API cleanups. Remove worker argument. Remove some deprecated arguments. ( #4025 )
...
* Remove worker argument from API methods.
* Remove deprecated arguments and deprecate redirect_output and redirect_worker_output.
* Fix
2019-02-15 10:49:16 -08:00
Hao Chen
042ad84573
Simplify Cython ID types and fix bug of ActorCheckpointID ( #4045 )
2019-02-15 20:15:16 +08:00
Richard Liaw
bb7c4ce9c4
[tune] Improve error message when Ray crashes ( #3795 )
2019-02-15 01:04:17 -08:00
Richard Liaw
7cf62a10cd
[tune] Fix TF checkpointing example ( #4043 )
...
Closes #3912 , closes #3963 .
2019-02-15 00:30:27 -08:00
Eric Liang
0c0bd4d41c
[rllib] Use model.value_function() in MARWIL ( #4036 )
...
* fix marwil
* add ph
* fix
2019-02-14 19:35:21 -08:00
Philipp Moritz
077ffd99bf
Bump version from 0.6.3 to 0.7.0.dev0 in docs and .yaml ( #4042 )
2019-02-14 12:08:48 -08:00
Si-Yuan
2de31eb489
minor fix ( #4040 )
2019-02-13 17:22:45 -08:00
Eric Liang
2dccf383dd
[rllib] Basic infrastructure for off-policy estimation (IS, WIS) ( #3941 )
2019-02-13 16:25:05 -08:00
Kristian Hartikainen
729d0b2825
[autoscaler] docker run options ( #3921 )
...
Adds support for docker options, allowing for use of nvidia-docker.
Closes #2657 .
2019-02-13 12:26:28 -08:00
bjg2
0e37ac6d1d
[wingman -> rllib] Remote and entangled environments ( #3968 )
...
* added all our environment changes
* fixed merge request comments and remote env
* fixed remote check
* moved remote_worker_envs to correct config section
* lint
* auto wrap impl
* fix
* fixed the tests
2019-02-13 10:08:26 -08:00
Hao Chen
f31a79f3f7
Implement actor checkpointing ( #3839 )
...
* Implement Actor checkpointing
* docs
* fix
* fix
* fix
* move restore-from-checkpoint to HandleActorStateTransition
* Revert "move restore-from-checkpoint to HandleActorStateTransition"
This reverts commit 9aa4447c1e3e321f42a1d895d72f17098b72de12.
* resubmit waiting tasks when actor frontier restored
* add doc about num_actor_checkpoints_to_keep=1
* add num_actor_checkpoints_to_keep to Cython
* add checkpoint_expired api
* check if actor class is abstract
* change checkpoint_ids to long string
* implement java
* Refactor to delay actor creation publish until checkpoint is resumed
* debug, lint
* Erase from checkpoints to restore if task fails
* fix lint
* update comments
* avoid duplicated actor notification log
* fix unintended change
* add actor_id to checkpoint_expired
* small java updates
* make checkpoint info per actor
* lint
* Remove logging
* Remove old actor checkpointing Python code, move new checkpointing code to FunctionActionManager
* Replace old actor checkpointing tests
* Fix test and lint
* address comments
* consolidate kill_actor
* Remove __ray_checkpoint__
* fix non-ascii char
* Loosen test checks
* fix java
* fix sphinx-build
2019-02-13 19:39:02 +08:00
Andrew Tan
57dcd3033e
[tune] Trial reporter fix ( #3951 )
...
Fixes #3949 .
2019-02-13 01:03:54 -08:00
William Ma
e1a479b137
Add teardown_module to test_queue.py ( #4012 )
2019-02-12 22:43:09 -08:00
Si-Yuan
21472b890a
Integrate "tempfile_service" into "ray.node.Node" ( #3953 )
2019-02-12 17:34:04 -08:00
Adi Zimmerman
dac1969647
[tune] Add Nevergrad to Tune ( #3985 )
2019-02-12 11:00:04 -08:00
Wang Qing
c523bc04ad
Enable redis password in Java worker ( #3943 )
...
* Support Java redis password
* Fix
* Refine
* Fix lint.
2019-02-12 13:11:25 +08:00
Adi Zimmerman
9797028a91
[tune] Add scikit-optimize to Tune ( #3924 )
2019-02-11 17:06:02 -08:00
Eric Liang
8df772867c
[rllib] rename compute_apply to learn_on_batch
2019-02-11 15:22:15 -08:00
Eric Liang
c4182463f6
[rllib] Add helper to iterate over envs in a vectorized environment ( #4001 )
...
* add foreach env func
* fix
* add test
2019-02-11 10:40:47 -08:00
Ion
3c32343c63
Ray signal ( #3624 )
2019-02-11 10:14:48 -08:00
Zhijun Fu
7097ba393b
protect raylet against bad messages ( #4003 )
...
* protect raylet against bad messages
* address comments
* linting and regression test
2019-02-12 00:39:38 +08:00
Philipp Moritz
ab809bd927
update ray version to 0.7.0dev ( #3995 )
2019-02-10 19:56:42 -08:00
Eric Liang
8e9f2c923f
[autoscaler] Use RLock in addition to FileLock
2019-02-10 19:16:43 -08:00
Yuhong Guo
5fb1efd60d
Fix CI test failures ( #4007 )
2019-02-11 11:01:14 +08:00
bjg2
e703b9f49d
[wingman -> rllib] Improved stats changes in AsyncSamplesOptimizer ( #3966 )
...
* added stats changes to optimizer
* changes timers
* fix python 2 compat
* improved optimizer throughput stats
* Update async_samples_optimizer.py
* fix python2 compat
2019-02-10 01:25:22 -08:00
Eric Liang
29322c7389
[rllib] Replay buffer for IMPALA should default to 0 slots. ( #3971 )
...
* disable replay
* make lq configurable
* leak test
* Update run_multi_node_tests.sh
2019-02-08 10:03:11 -08:00
Robert Nishihara
6a32b410bb
Update versions from 0.6.2 -> 0.6.3 in the documentation. ( #3981 )
2019-02-07 20:57:37 -08:00
Robert Nishihara
ef527f84ab
Stream logs to driver by default. ( #3892 )
...
* Stream logs to driver by default.
* Fix from rebase
* Redirect raylet output independently of worker output.
* Fix.
* Create redis client with services.create_redis_client.
* Suppress Redis connection error at exit.
* Remove thread_safe_client from redis.
* Shutdown driver threads in ray.shutdown().
* Add warning for too many log messages.
* Only stop threads if worker is connected.
* Only stop threads if they exist.
* Remove unnecessary try/excepts.
* Fix
* Only add new logging handler once.
* Increase timeout.
* Fix tempfile test.
* Fix logging in cluster_utils.
* Revert "Increase timeout."
This reverts commit b3846b89040bcd8e583b2e18cb513cb040e71d95.
* Retry longer when connecting to plasma store from node manager and object manager.
* Close pubsub channels to avoid leaking file descriptors.
* Limit log monitor open files to 200.
* Increase plasma connect retries.
* Add comment.
2019-02-07 19:53:50 -08:00
Philipp Moritz
0aa74fb1fd
Update cloudpickle to 0.8.0.dev0 ( #3964 )
2019-02-07 15:24:06 -08:00
Eric Liang
ae4bc7d6e8
[revert] [rllib] Add copy() in async samples optimizer
2019-02-07 14:14:39 -08:00
markgoodhead
5ce670cb36
[tune] Add Initial Parameter Suggestion for HyperOpt ( #3944 )
...
Allows users of the HyperOptSearch suggestion algorithm to specify initial experiment values to run (typically already known good baseline parameters within the domain specified)
2019-02-07 10:57:51 -08:00
Richard Liaw
5db1afef07
[tune] Support Custom Resources ( #2979 )
...
Support arbitrary resource declarations in Tune.
Fixes https://github.com/ray-project/ray/issues/2875
2019-02-07 00:29:19 -08:00
Stephanie Wang
d2b6db3db1
Bump version from 0.6.2 to 0.6.3 ( #3972 )
2019-02-06 19:11:16 -08:00
Eric Liang
04fc145a44
[autoscaler] Autoscaler hangs forever on non-zero exit code command ( #3969 )
2019-02-06 17:25:24 -08:00