Commit graph

1099 commits

Author SHA1 Message Date
Peter Schafhalter
6e9657e696 Replaced utstring with std::string (#1009) 2017-09-24 22:42:17 -07:00
Wapaul1
c26c7553bc Resnet Example Uses tf.Datasets now (#960)
Change Resnet example to use tf.Datasets instead of queues.
2017-09-20 14:14:04 -07:00
Eric Liang
5c70faf76b Update common.py (#996) 2017-09-19 10:10:56 -07:00
gycn
a432285e77 Disable parallelization for Actors and ray.wait for debugging (#961)
Support actors and ray.wait in PYTHON_MODE.
2017-09-17 00:12:50 -07:00
Philipp Moritz
73f40bd844 [rllib] user defined preprocessor (#985)
* add register_preprocessor to ModelCatalog

* add pytest

* make staticmethod a classmethod

* update

* install gym on travis

* fix linting

* fix
2017-09-16 15:53:19 -07:00
Wapaul1
29ac95d87a Web UI Documentation (#983)
* Initial Draft of Documentation

* Cleanup

* Fix line lengths and modify some text.
2017-09-16 15:41:52 -07:00
Eric Liang
98142ef51f fix checkpoint (#988) 2017-09-16 15:29:36 -07:00
Peter Schafhalter
241612709e Data structure updates to plasma manager (#937)
* Implemented local_available_objects as an unordered set

* Implemented fetch_requests as an unordered map

* Fixed bug and changed fetch_requests from pointer to object

* free(PlasmaManagerState *) -> delete PlasmaManagerState *

* removed unnecessary newline

* Make local_available_objects not a pointer.

* Attempt to safely iterate over unordered_map and remove elements.
2017-09-15 20:09:29 -07:00
Philipp Moritz
6601bb5f9e [rllib] Make observation filter optional (#940)
* make observation filter optional

* fix linting
2017-09-14 17:37:19 -07:00
Robert Nishihara
413140df38 Autogenerate catapult files if they are not already present. (#978)
* Autogenerate catapult files if they are not already present.

* Fix bash syntax.
2017-09-14 12:37:33 -07:00
Richard Liaw
d516d9440e Fixing local directory (#977)
* Fixing local directory

Enables ability to set custom local directory; code may be messy.

* Create all intermediate parent directories
2017-09-14 10:33:52 -07:00
Philipp Moritz
1eb8c83314 [rllib] Initial RLLib documentation (#969)
* initial documentation for RLLib

* more RL documentation

* fix linting

* fix comments

* update

* fix
2017-09-12 23:38:21 -07:00
ustcfriend
9ec3608eca Fix resnet crash by setting config.gpu_options.allow_growth = True. (#971) 2017-09-12 22:36:06 -07:00
Eric Liang
9f42ef6a4f [rllib] Make sure to always record stats like time elapsed, timesteps (#965)
* always record training stats

* fix

* comments

* revert assert

* nan

* fix
2017-09-12 14:28:16 -07:00
Stephanie Wang
74ac80631b Local scheduler sends a null heartbeat to global scheduler (#962)
* Local scheduler sends a null heartbeat to global scheduler to notify death

* Add whitespace.

* Speed up component failures test

* Free local scheduler state upon plasma manager disconnection
2017-09-12 10:45:21 -07:00
Philipp Moritz
dd4e99b481 Fix ray website (#963)
* update instructions

* update blog

* fix images

* Remove outdated documentation.
2017-09-11 23:11:15 -07:00
Eric Liang
e17412a72b fix free log std param (#964) 2017-09-11 18:52:48 -07:00
Stephanie Wang
99c8b1f38c Actor fault tolerance using object lineage reconstruction (#902)
* Revert Python actor reconstruction

* Actor reconstruction using object lineage

* Add dummy arguments and return values for actor tasks

* Pin dummy outputs for actor tasks

* Skip checkpointing test for now

* TODOs

* minor edits

* Generate dummy object dependencies in Python, not C

* Fix linting.

* Move actor counter and dummy objects inside of the actor handle

* Refactor Worker._process_task, suppress exception propagation for
sequential actor tasks
2017-09-10 19:29:28 -07:00
Eric Liang
d8aa826e63 [webui] Scalability fixes for the task timeline and visualizations (#935)
* fixes

* comments

* fix test

* Update ui.py

* upd

* Fix linting.
2017-09-10 15:47:44 -07:00
Robert Nishihara
f3c1248d98 Clone catapult and generate html files during installation. (#956)
* Clone catapult and generate static html during setup.

* Include UI files in installation.

* Fix directory to clone catapult to and fix linting.

* Use absolute path.

* Make sure we find a sufficiently new version of python2 when building wheels.

* Copy the trace_viewer_full.html file to the local directory if it is not present.

* Make sure wheels fail to build if UI is not included.
2017-09-10 13:41:16 -07:00
Philipp Moritz
546ba23ceb Upgrade to latest arrow to include set serialization speedups (#957)
* update arrow to pull in the set serialization speedups

* remove _register_class for set
2017-09-10 00:12:17 -07:00
Robert Nishihara
d6612a93a2 Add mailing list to README and documentation. (#950)
* Add mailing list to documentation.

* Add contact page to documentation.
2017-09-09 10:21:51 -07:00
Peter Schafhalter
8906a920f7 Implemented wait_requests as vector (#943) 2017-09-08 13:39:54 -07:00
Eric Liang
953878364e [webui] Print out timeline link for full-screen trace viewing (#936)
* up

* update
2017-09-06 01:41:21 -07:00
Wapaul1
e19e2c6284 Print jupyter notebook token when starting web UI. (#887)
* User now only needs to copy url to get to notebook

* Fixed duplicate code

* Added function to print url

* Added exception for calling function on worker

* Stored webui url in Redis

* Fix linting and simplify code.

* Now uses 24 bytes hex token

* Fixed python 3 compatibility

* Fix linting and python 3 compat

* Added comment explaining generating the token.

* Removed newline

* Small fixes.

* Fixed jenkins failure

* Rebased and changed formatting

* Revert "changed formatting"

This reverts commit 226510cf0cdcaab9cf42ad30bd9588a963683592.
2017-09-05 23:31:44 -07:00
Robert Nishihara
853969225b Sleep longer when starting plasma manager in valgrind case to catch errors where port bind fails. (#934) 2017-09-05 20:58:12 -07:00
Philipp Moritz
7030ef366f Rebase Ray on latest arrow (remove numbuf from Ray). (#910)
* remove some stuff

* put get roundtrip working

* fixes

* more fixes

* cleanup

* fix tests

* latest arrow

* fixes

* fix tests

* fix linting

* rebase

* fixes

* fix bug

* bring back libgcc error

* fix linting

* use official arrow repo

* fixes
2017-09-04 22:58:49 -07:00
Eric Liang
a2814567e1 [webui] Quick fix to timeline on task failure (#930)
* foo

* update

* Move _add_missing_timestamps to task_profiles function.
2017-09-04 22:58:19 -07:00
Eric Liang
63d8d11714 [webui] Checkboxes should go to the left of their labels (#932) 2017-09-04 17:05:13 -07:00
Robert Nishihara
d8010723d7 Attempt to wget boost up to 20 times during installation. (#927) 2017-09-04 14:42:29 -07:00
Robert Nishihara
d5eec0c2cd Pin opencv-python version to 3.2.0.8 in dockerfile. (#926) 2017-09-03 23:51:59 -07:00
Robert Nishihara
8ed03b1cf0 Make task timeline work with ipywidgets==7.0.0, change slider default values. (#925)
* Make task timeline work with ipywidgets==7.0.0.

* Change initial UI slider values from 70-100 to 0-100.
2017-09-03 23:15:46 -07:00
Stephanie Wang
ae0212b399 Fix failing task table test (#924) 2017-09-03 22:41:38 -07:00
Peter Schafhalter
2c19ae97a3 Implemented db_client_cache as unordered_map (#921)
* Implemented db_client_cache as unordered_map

* Fix for memory leak

* Fixed linting
2017-09-03 17:26:05 -07:00
Eric Liang
246be812f0 upd (#917) 2017-09-02 23:55:10 -07:00
Eric Liang
1ebfe9608f [rllib] Add downscale and frameskip options for Montezumas (#908)
* up

* update

* fix

* update

* update

* update

* api break

* Update run_multi_node_tests.sh

* fix
2017-09-02 17:20:56 -07:00
Zongheng Yang
7a36430399 doc: mention cython in installation note. (#896)
* doc: mention cython in installation note.

* Add to ubuntu note as well
2017-08-31 00:57:41 -07:00
Philipp Moritz
4e4a4e4e06 Add plasma in-memory object store blog post (#895)
* add plasma in-memory object store blog post

* modifications for Ray blog

* add arrow blog reference

* update

* rename

* Improve formatting.
2017-08-30 23:40:46 -07:00
Stephanie Wang
7496c98010 Fault tolerance race (#894)
* Remove race between local scheduler disconnecting and global scheduler
assigning a task

* Fix number of workers started in component failures test

* Fix race between global scheduler retrying a task assignment and monitor
cleaning up task table. The global scheduler should only retry the task
assignment if the local scheduler is still alive.

* Clean up task_table_update callback if failure

* Look up current local scheduler mapping when retrying actor task submission

* Log warning if no subscribers received a task table update

* Clean up database handle memory in local scheduler
2017-08-30 22:20:50 -07:00
Robert Nishihara
deca29a7eb Bump version to 0.2.0. (#877) 2017-08-29 21:38:35 -07:00
Robert Nishihara
4b76335157 Allow ResNet example to run on multiple machines. (#891)
* Allow a redis address to be passed into the ResNet example.

* Update documentation.
2017-08-29 21:37:53 -07:00
Philipp Moritz
164a8f368e [rllib] Rename algorithms (#890)
* rename algorithms

* fix

* fix jenkins test

* fix documentation

* fix
2017-08-29 16:56:42 -07:00
Robert Nishihara
e1831792f8 For PPO, rename num_agents -> num_workers. (#882) 2017-08-28 23:11:06 -07:00
Robert Nishihara
1afc487baf In setup.py, move cython to setup_requires. (#878)
* In setup.py, move cython to setup_requires and move setuptools_scm to setup_requires.

* Add back pip install of cython when building mac wheels.

* Revert changes to setuptools_scm.

* Check that the correct number of Linux wheels are produced.

* Add back pip install cython when building linux wheels.
2017-08-28 23:07:33 -07:00
Robert Nishihara
60d4d01d06 Use observation filter in compute_action for PPO. (#884) 2017-08-28 23:01:29 -07:00
Richard Liaw
5d72818ddc Generic shared_model class (#880)
Changing `shared_model` class back to `get_model` rather than `ConvolutionalNetwork`
2017-08-28 22:48:07 -07:00
Si-Yuan
8099cdeb9d Fix: 'hyperopt_adaptive' example keeps fake 'best_hyperparameters' (#883) 2017-08-28 22:47:16 -07:00
Wapaul1
4db45c9c54 Improved layout of controls for Web UI (#876)
* Improved layout of controls

* Added explicit labels and some comments

* Fix linting errors
2017-08-28 14:43:34 -07:00
Richard Liaw
bc082e9a9e [rllib] Additional support for Shared Models in A3C (#866)
* Code for Supporting Shared Models

Running (with vnet modification) - needs to be tested for performance

Summaries

Small refactoring + generalized to more domains

Small fix for jenkins

Linting

linting

Addressing changes

Addressing changes

Update envs.py

Addressing changes

convnet

Merge - new model

final touches

final linting

Changing iterations back

removed extra change

changes for fast experimentation

changes to enable a2c

TEMP FOR DEBUGGING

ContinuousActions - Still doesn't work

InvertedPendulum trains with 8 workers - k=200

huber loss

Maxes for InvertedPendulum-v1 - 16w,200steps

temp: working with a2c

Back to shared model

more fixes

small

nit

LSTM to shared models

need to fix last_features

tuning pong

Best record for hitting 0 - with k=16,n=20

nit

a2cremoval

remove A2c reference and nits

nit

removed a2c vestiges

removing a2c

removing example.py

Linting

nit

* Linting + Removing vestigal code

* Final Touches

* nits

* rerun travis
2017-08-28 12:23:14 -07:00
Robert Nishihara
b251f0b6b9 Add pip install instructions to README. (#869) 2017-08-27 19:55:39 -07:00