Commit graph

1765 commits

Author SHA1 Message Date
Eric Liang
ae4e1dd396
[tune] [rllib] Allow checkpointing to object store instead of local disk (#1212)
* wip

* use normal pickle

* fix checkpoint test

* comment

* Comment

* fix test

* fix lint

* fix py 3.5

* Update agent.py

* fix lint
2017-11-19 00:36:43 -08:00
Peter Schafhalter
d986294c2b Replace UT strings in local scheduler (#1213)
* Convert to string using std::string

* Fix linting issue

* Fix linting

* Construct db_connect_args using vector

* Use vector size() instead of num_args

* Hopefully fix linting now
2017-11-17 16:14:46 -08:00
Robert Nishihara
94423c0542 Upgrade Arrow with fixes to Plasma eviction policy. (#1228)
* Upgrade Arrow with fixes to Plasma eviction policy.

* Upgrade arrow to have -f flag for plasma store.
2017-11-17 14:41:22 -08:00
Peter Schafhalter
4cbc2b1978 Clean up UT datastructures in Python extension (#1227) 2017-11-17 01:07:12 -08:00
Robert Nishihara
9a2e37a63e Don't record event log on driver. (#1217) 2017-11-16 23:17:59 -08:00
Robert Nishihara
0eae917766 [rllib] Clean up evolution strategies example. (#1225)
* Remove ES observation statistics.

* Consolidate policy classes.

* Remove random stream.

* Move rollout function out of policy.

* Consolidate policy initialization.

* Replace act implementation with sess.run.

* Remove tf_utils.

* Remove variable scope.

* Remove unused imports.

* Use regular TF session.

* Use MeanStdFilter.

* Minor.

* Clarify naming.

* Update documentation.

* eps -> episodes

* Report noiseless evaluation runs.

* Clean up naming.

* Update documentation.

* Fix some bugs.

* Make it run on atari.

* Don't add action noise during evaluation runs.

* Add ES to checkpoint/restore test.

* Small cleanups and remove redundant calls to get_weights.

* Remove outdated comment.
2017-11-16 21:58:30 -08:00
Richard Liaw
eadb998643
[tune] Make HyperBand Usable (#1215) 2017-11-16 10:31:42 -08:00
Richard Liaw
3a0206a1f4
[tune] Parallel Coordinate Visualization Notebook (#1218) 2017-11-16 00:42:28 -08:00
Stephanie Wang
c70430f322 Fix bugs in plasma manager transfer (#1188)
* Plasma client test for plasma abort

* Use ray-project/arrow:abort-objects branch

* Set plasma manager connection cursor to -1 when not in use

* Handle transfer errors between plasma managers, abort unsealed objects

* Add TODO for local scheduler exiting on plasma manager death

* Revert "Plasma client test for plasma abort"

This reverts commit e00fbd58dc4a632f58383549b19fb9057b305a14.

* Upgrade arrow to version with PlasmaClient::Abort

* Fix plasma manager test

* Fix plasma test

* Temporarily use arrow fork for testing

* fix and set arrow commit

* Fix plasma test

* Fix plasma manager test and make write_object_chunk consistent with read_object_chunk

* style

* upgrade arrow
2017-11-15 22:32:38 -08:00
Peter Schafhalter
9a7b15447b Replace UT string in redis tests (#1211)
* Replace UT arg formatting with vsnprintf

* Fix bug with va_list usage
2017-11-15 22:21:56 -08:00
Eric Liang
009f59defc
[tune] [rllib] Centralized driver logging (#1208)
* logger v2

* add logger

* lint

* todo

* viskit works now

* doc

* remove none check

* fix timeout

* Missing Numpy for Sigmoid data
2017-11-15 22:11:47 -08:00
Melih Elibol
e066bcf633 Synchronous parameter server example. (#1220)
* Synchronous parameter server example.

* Added sync parameter server example to documentation index.

* Consolidate documentation and minor simplifications.

* Fix linting.
2017-11-15 17:49:31 -08:00
Peter Schafhalter
428858c1ff Convert UT string to std::string (#1210) 2017-11-12 21:00:36 -08:00
Richard Liaw
71f8cd2403
[tune] Fixing up Hyperband (#1207)
* Fixing up Hyperband

* nit

* cleanup

* Timing test Added

* added_exception_back

* fixup_tests

* reverse placement

* fixes_and_tests

* fix

* fix

* fixlint

* cleanup_timing

* lint

* Update hyperband.py
2017-11-12 12:05:32 -08:00
Eric Liang
7c38f964b7 [tune] Add command line support for choosing early stopping schedulers (#1209)
* command line support

* add checkpoint freq

* fix other flags

* fix

* docs

* doc
2017-11-12 12:05:18 -08:00
Richard Liaw
afdc87323f
[rllib] PyTorch Models for A3C (#1187)
* fixing policy

* Compute Action is singular, fixed weird issue with arrays

* remove vestige

* extraneous ipdb

* Can Drop in Pytorch Model

* lint

* introducing models

* fix base policy

* Missed this from last time

* lint

* removedolds

* getting vision working

* LINT

* trying to fix test dependencies

* requiremnets

* try

* tryconda

* yes

* shutup

* flake_passes

* changes

* removing weight initializer for lstm for now

* unused

* adam

* clip

* zero

* properscaling

* weight

* try

* fix up pytorch visionnet

* bias correction

* fix model

* same visionnet

* matching_bad_things

* test

* try locking

* fixing_linear

* naming

* lint

* FORJENKINS

* clouds

* lint

* Lint + removed dependencies

* removed dependencies

* format
2017-11-12 00:20:33 -08:00
Peter Schafhalter
9a6a056609 Convert UT datastructures in tests (#1203)
* bind_ipc_sock_retry returns std::string

* snprintf -> std::snprintf

* Fix formatting

* Use stringstream instead of snprintf

* Fix typo
2017-11-11 16:55:05 -08:00
Philipp Moritz
e798a652bc Change TaskSpec to allow multiple object IDs per argument. (#1204)
* Implement object ID bags

* linting

* fix tests

* fix linting

* fix comments
2017-11-10 16:33:34 -08:00
Stephanie Wang
07f0532b9b Local scheduler filters out dead clients during reconstruction (#1182)
* Object table lookup returns vector of DBClientID instead of address strings

* Add node IP address to DBClient notification

* DB client cache stores entire DB client, convert addresses to std::string

* get cached db client returns the client

* Expose a call to initialize the redis cache

* Local scheduler filters out dead clients during reconstruction

* Remove node ip address from dbclient, use aux_address for plasma managers

* Get entire db client entry when not found in cache

* Fix common tests

* Fix address in tests

* Push error to driver if driver task did the put

* Address Robert's comments and cleanup

* Remove unused Redis command

* Fix db test
2017-11-10 11:29:24 -08:00
Christian Barra
d36595cb92 Add docs for contributors. (#1191)
* WIP: add docs for contributors.

* Remove changelog part.

* Simplify issue template.

* Simplify pull request template

* Simplify contributing doc.
2017-11-10 00:40:19 -08:00
Daniel Suo
4f0da6f81c Add basic functionality for Cython functions and actors (#1193)
* Add basic functionality for Cython functions and actors

* Fix up per @pcmoritz comments

* Fixes per @richardliaw comments

* Fixes per @robertnishihara comments

* Forgot double quotes when updating masked_log

* Remove import typing for Python 2 compatibility
2017-11-09 17:49:06 -08:00
Robert Nishihara
11f8f8bd8c Document --num-workers better. (#1201) 2017-11-09 17:02:18 -08:00
Richard Liaw
6197b260b8 Fix Jenkins issue introduced by Variant Generator (#1194)
* try fix

* shorten

* added a flag

* finish

* Fix linting.
2017-11-09 00:56:20 -08:00
Robert Nishihara
1bf276cc08 Basic parameter server example. (#1198)
* Basic parameter server example.

* Consolidate files.

* Whitespace.

* Add documentation.
2017-11-08 23:40:51 -08:00
Robert Nishihara
d3c082d325 More checking in redis.cc. (#1057) 2017-11-08 23:25:19 -08:00
Robert Nishihara
3a37d1cf7d Pin cloudpickle to 0.4.1. (#1200) 2017-11-08 21:14:09 -08:00
Robert Nishihara
1c6b30b5e2 Move all config constants into single file. (#1192)
* Initial pass at factoring out C++ configuration into a single file.

* Expose config through Python.

* Forward declarations.

* Fixes with Python extensions

* Remove old code.

* Consistent naming for constants.

* Fixes

* Fix linting.

* More linting.

* Whitespace

* rename config -> _config.

* Move config inside a class.

* update naming convention

* Fix linting.

* More linting

* More linting.

* Add in some more constants.

* Fix linting
2017-11-08 11:10:38 -08:00
Peter Schafhalter
a8032b9ca1 Convert connections from UT_array to std::vector (#1190) 2017-11-07 20:59:41 -08:00
Eric Liang
52888e4c6f [tune] Improve the tune Python API and variant generation (#1154)
* new variant gen

* wip

* Sat Oct 21 18:21:34 PDT 2017

* update

* comment

* fix

* update

* update readme

* fix

* Update README.rst

* Update README.rst

* fix repeat

* update

* note on restore
2017-11-06 23:41:17 -08:00
Richard Liaw
6222ec3bd7
[tune] hyperband (#1156)
* trial scheduler interface

* remove

* wip median stopping

* remove

* median stopping rule

* update

* docs

* update

* Revrt

* update

* hyperband untested

* small changes before moving on

* added endpoints

* good changes

* init tests

* smore tests

* unfinished tests

* testing

* testing code

* morbugs

* fixes

* end

* tests and typo

* nit

* try this

* tests

* testing

* lint

* lint

* lint

* comments and docs

* almost screwed up

* lint
2017-11-06 22:30:25 -08:00
Peter Schafhalter
7215f7d228 Remove UT String from logging (#1184)
* Removed unnecessary utarray include

* Removed ut_string from logging

* Fix formatting
2017-11-05 14:05:20 -08:00
Eric Liang
d06beacd84 [tune] Implement median stopping rule (#1170)
* trial scheduler interface

* remove

* wip median stopping

* remove

* median stopping rule

* update

* docs

* update

* Revrt

* update

* comments

* fix tesT
2017-11-03 11:25:02 -07:00
Philipp Moritz
fdf069bd1d update version to 0.2.2 (#1178) 2017-11-01 20:41:24 -07:00
Robert Nishihara
3317d38278 Replace hostnames with numerical IP addresses in redis address. (#1177)
* Replace hostnames with numerical IP addresses in redis address.

* Also do conversion for node_ip_address. Add test.

* Simplifications.
2017-11-01 17:13:22 -07:00
Eric Liang
202e7bf19a fix (#1174) 2017-11-01 13:45:39 -07:00
Richard Liaw
dc66a2d7d5
[rllib] A3C Refactoring (#1166)
* fixing policy

* Compute Action is singular, fixed weird issue with arrays

* remove vestige

* extraneous ipdb

* Can Drop in Pytorch Model

* lint

* naming

* finish comments
2017-10-29 11:12:17 -07:00
Eric Liang
4cace0976d [rllib] Fix DQN inefficiency, and cleanup for different modes of parallelism (#1151)
* initial checkin

* flake

* dqn

* docs

* add tuned pong

* remove

* upd

* add both

* better gamma

* update

* Last nit
2017-10-29 10:52:30 -07:00
Richard Liaw
304c3cade4
[tune] 10 second timeout for stopping (#1169)
* 10 second timeout for stopping

* prints for travis

* lint

* try better returning mechanism

* lint
2017-10-29 00:49:29 -07:00
Robert Nishihara
6852e8839e Expose custom serializers through the API. (#1147)
* Expose custom serializers through the API.

* minor renaming

* Add test.

* Remove comment.

* Clean up assertions.
2017-10-29 00:08:55 -07:00
Eric Liang
3b157ab933 [tune] Allow resources to not all be assigned to the driver (#1150)
* dgpu

* update

* update

* update

* also support cmdline

* limit

* Update README.rst

* documentation

* typo

* small coverage for driver_gpu_limit

* lint

* fix lint
2017-10-28 22:16:05 -07:00
Robert Nishihara
f59867850e Upgrade to cloudpickle 0.4.1. (#1164)
* Upgrade to cloudpickle 0.4.1.

* Catch more general exceptions thrown by cloudpickle.
2017-10-28 01:35:35 -07:00
Eric Liang
2b6c7af8ad [tune] Trial scheduler interface (#1160)
* trial scheduler interface

* remove

* update
2017-10-27 13:29:15 -07:00
Richard Liaw
797f4fcbf3 Fixing Lint after flake upgrade (#1162)
* Fixing Lint after flake upgrade

* more lint fixes
2017-10-26 21:02:07 -05:00
Abishek Bhat
6da7761d5d Fix overlooked typo. (#1158)
Without this the example script would crash with an UnboundLocalError.
2017-10-25 07:40:52 -07:00
Eric Liang
cd9dc398ff [rllib] Support discrete observation spaces such as FrozenLake-v0 (#1140)
* add

* remove transform_shape

* fix test

* fix
2017-10-23 23:16:52 -07:00
Richard Liaw
0c9817fa76 [tune] Tune Pausing (#1136)
* fix yaml bug

* add ext agent

* gpus

* update

* tuning

* docs

* Sun Oct 15 21:09:25 PDT 2017

* lint

* update

* Sun Oct 15 22:39:55 PDT 2017

* Sun Oct 15 22:40:17 PDT 2017

* Sun Oct 15 22:43:06 PDT 2017

* Sun Oct 15 22:46:06 PDT 2017

* Sun Oct 15 22:46:21 PDT 2017

* Sun Oct 15 22:48:11 PDT 2017

* Sun Oct 15 22:48:44 PDT 2017

* Sun Oct 15 22:49:23 PDT 2017

* Sun Oct 15 22:50:21 PDT 2017

* Sun Oct 15 22:53:00 PDT 2017

* Sun Oct 15 22:53:34 PDT 2017

* Sun Oct 15 22:54:33 PDT 2017

* Sun Oct 15 22:54:50 PDT 2017

* Sun Oct 15 22:55:20 PDT 2017

* Sun Oct 15 22:56:56 PDT 2017

* Sun Oct 15 22:59:03 PDT 2017

* fix

* Update tune_mnist_ray.py

* remove script trial

* fix

* reorder

* fix ex

* py2 support

* upd

* comments

* comments

* cleanup readme

* fix trial

* annotate

* Update rllib.rst

* init pausing

* Docs, Lint

* fix danglings and restore endpoint moved to trialrunner

* renaming

* nit

* start always starts from checkpoint

* smalls

* nits

* lint

* last change
2017-10-22 23:04:15 -07:00
Eric Liang
81ca27dc08 [rllib] [minor] Rename agent_id to experiment_tag (#1143)
* tagstr

* doc

* rename

* fix test
2017-10-22 18:44:18 -07:00
Robert Nishihara
97c6369b49 Update arrow to include custom serializer for pytorch and register default serialization handlers. (#1152)
* Update arrow to include custom serializer for pytorch.

* Call pyarrow function for registering default custom serialization handlers.

* Change class ID used in serialization context for object IDs.
2017-10-21 21:24:10 -07:00
Philipp Moritz
684e62e784 upgrade arrow to include numpy bool fix (#1148) 2017-10-20 17:25:15 -07:00
Peter Schafhalter
ad4cbd4016 Updated outstanding_callbacks to unordered_map (#1108)
* Updated outstanding_callbacks to unordered_map

* Fix bug in destroy_outstanding_callbacks and comments
2017-10-20 10:06:22 -07:00