Commit graph

3575 commits

Author SHA1 Message Date
Philipp Moritz
ade6d80820 [rllib] use ray.wait to speed up parallel simulations for policy gradients (#754)
* use ray.wait to speed up parallel simulations for policy gradients

* linting
2017-07-19 16:09:15 -07:00
alanamarzoev
2b3190ad13 Chrome trace timeline with sliders. (#731)
* Trace timeline with sliders.

* Trace.

* Switched ujson to json.

* Fixed tests.

* linting fixes

* Fixed bug.

* Cleaned up code.

* Fixes according to comments.

* removed checkpoints.

* Undid accidental delete.

* Fixed linting error.

* Added documentation to notebook.

* Undid accidental deletes.

* Add comments and small formatting fixes.

* Small fix.
2017-07-17 19:59:49 -07:00
Eric Liang
420013774c [rllib] Pull out shared models for evolution strategies and policy gradient. (#719)
* wip

* works with cartpole

* lint

* fix pg

* comment

* action dist rename

* preprocessor

* fix test

* typo

* fix the action[0] nonsense

* revert

* satisfy the lint

* wip

* works with cartpole

* lint

* fix pg

* comment

* action dist rename

* preprocessor

* fix test

* typo

* fix the action[0] nonsense

* revert

* satisfy the lint

* Minor indentation changes.

* fix merge

* add humanoid

* fix linting

* more 4 space

* fix

* fix linT

* oops

* es parity
2017-07-17 08:58:54 +00:00
Eric Liang
86a7909149 make es worker count independent (#740) 2017-07-16 16:23:56 -07:00
Robert Nishihara
80e8426b5e Test example applications and rllib in jenkins tests. (#707)
* Test example applications in Jenkins.

* Fix default upload_dir argument for Algorithm class.

* Fix evolution strategies.

* Comment out policy gradient example which doesn't seem to work.

* Set --env-name for evolution strategies.
2017-07-16 18:51:33 +00:00
Robert Nishihara
e0867c8845 Switch Python indentation from 2 spaces to 4 spaces. (#726)
* 4 space indentation for actor.py.

* 4 space indentation for worker.py.

* 4 space indentation for more files.

* 4 space indentation for some test files.

* Check indentation in Travis.

* 4 space indentation for some rl files.

* Fix failure test.

* Fix multi_node_test.

* 4 space indentation for more files.

* 4 space indentation for remaining files.

* Fixes.
2017-07-13 21:53:57 +00:00
Philipp Moritz
c24c07613c [rllib] unify writing performance metrics and make it queryable (#708)
* write config to s3

* add train file

* write performance to S3

* writing needs to be fixed, replacing result.json at the moment

* update

* add experiment_id

* more logging and example queries

* update

* add info

* fill in other algorithms

* fix linting

* convert readme to rst

* fixes

* simplejson -> json

* make files executable

* edit README.rst

* unify storing logs in S3 and on local filesystem

* use 'info' entry in TrainingResult for algorithm specific info

* don't install smart_open with ray

* fixes

* linting fixes
2017-07-11 01:36:14 +02:00
alanamarzoev
8464d77c76 Change event logs to store one Redis ZSET per worker. (#705)
* Changing to zset

* Fixed bug.

* Fixed another bug.

* Modified task_profiles.

* Removed extra file.

* Modified task_profiles test.

* WIP

* WIP

* Undid changes

* Updated

* WIP

* Made changes according to comments.

* Removed unneeded print.

* Removed ujson usage.

* failing test

* tests passing

* Fixed linting errors and modified style.

* Fixed bug.

* Fixed linting

* Fixed according to comments.

* Redis crashing?

* Fixed linting

* Fixed linting
2017-07-09 01:42:29 +02:00
Eric Liang
cd12ea7e09 [rllib] Pull out the GPU-parallel optimizer from policy gradients to common (#711)
* refactor

* docs

* cleanup

* clean up more

* Update parallel.py

* add imports from future
2017-07-07 22:20:02 +00:00
Robert Nishihara
5b3d0c00f2 Create /tmp/ray directory in services.py. (#715) 2017-07-07 18:41:56 +00:00
Eric Liang
f012e597c2 [rllib] Basic port of baselines/deepq to rllib (#709)
* rllib v0

* fix imports

* lint

* comments

* update docs

* a3c wip

* a3c wip

* report stats

* update doc

* add common logdir attr

* name is too long

* fix small bug

* propagate exception on error

* fetch metrics

* initial port

* fix lint

* add right license

* port to common alg format

* fix lint

* rename dqn

* add imports from future

* fix lint
2017-07-07 18:37:00 +00:00
Eric Liang
66734847bb [rllib] Standardize writing output logs and other files to /tmp/ray (#706)
* rllib v0

* fix imports

* lint

* comments

* update docs

* a3c wip

* a3c wip

* report stats

* update doc

* add common logdir attr

* name is too long

* fix small bug

* propagate exception on error

* fetch metrics

* fix small nits
2017-07-03 16:01:47 +00:00
alanamarzoev
2b11a7bca2 Add task ID and object ID search boxes to web UI. (#704)
* Task search box.

* Cleaned up.

* Small reformatting.

* Add object table search box.
2017-07-01 17:48:23 -04:00
alanamarzoev
716469160e Enable dumping profiling information to timeline format viewable by chrome tracing. (#703)
* Chrome tracing timeline.

* Modified decode statement.

* Some cleanups and add test.

* Remove example.

* Fix test.
2017-06-30 12:14:11 -04:00
Eric Liang
2d81edfcdc [rllib] Move a3c implementation from examples/ to python/ray/rllib/ (#698)
* rllib v0

* fix imports

* lint

* comments

* update docs

* a3c wip

* a3c wip

* report stats

* update doc

* name is too long

* fix small bug

* propagate exception on error

* fetch metrics

* fix lint
2017-06-29 15:49:56 +00:00
Robert Nishihara
efce49cfbc Bump version to 0.1.2 in preparation for uploading wheels to PyPI. (#700) 2017-06-27 04:35:42 +00:00
Eric Liang
a674ec958c [rllib] Move policy gradient and evolution strategies algorithms from examples/ to ray/rllib/ (#694)
* rllib v0

* fix imports

* lint

* comments

* update docs
2017-06-25 22:13:03 +00:00
Robert Nishihara
8bc9c275fa Increase the number of log file names and handle errors better in log monitor. (#693) 2017-06-25 05:20:50 +00:00
alanamarzoev
e16df6da9a Updated task_profiles function to avoid future repetitive parsing. (#691)
* Updated task_profiles function to avoid future repetitive parsing.

* Fix indentation.

* Fixed according to comments.

* Included updated test for task_profiles function.

* Simplify test.

* Fix indentation.

* Fix.
2017-06-22 19:21:18 -07:00
Robert Nishihara
2d636d9278 Kill jupyter in ray stop. (#689)
* Kill jupyter in ray stop.

* Terminate jupyter notebook in ray stop.

* Fix linting.
2017-06-21 05:58:34 +00:00
alanamarzoev
ed9380d73d Automatically start web UI in ray.init(). (#687)
* Start up webui on ray.init

* Removed .ipynb checkpoint folders.

* Removed print statements in cleanup function.

* Fixed

* Removed extra file.

* Cleaned up ui.

* Don't start browser automatically in ray.init(), also copy the notebook every time so that changes don't persist.

* Update setup.py and installation instructions to install jupyter.

* Don't automatically install jupyter, don't start the UI if jupyter is not installed.

* Improve error message when failing to start UI.
2017-06-20 10:32:55 -07:00
Robert Nishihara
f12db5f0e2 Divide large plasma requests into smaller chunks, and wait longer before reissuing large requests. (#678)
* Divide large get requests into smaller chunks.

* Divide fetches into smaller chunks.

* Wait longer in worker and manager before reissuing fetch requests if there are many outstanding fetch requests.

* Log warning if a handler in the local scheduler or plasma manager takes more than one second.
2017-06-18 04:42:15 +00:00
alanamarzoev
4d5ac9dad5 Include object size and hash in the table returned by the object_table function in the GlobalStateAPI. (#665)
* added log_table function and a test

* fixed log_files and added task_profiles

* fixed formatting

* fixed linting errors

* fixes

* removed file

* more fixes

* hopefully fixed

* Small changes.

* Fix linting.

* Fix bug in log monitor.

* Small changes.

* Fix bug in travis.

* Including data_size and hash in the ResultTableReply.

* Included data_size and hash info in object_table.

* Fixed bugs in ray_redis_module.cc.

* Removing commented out code.

* Fixes

* Freed hash and data_size strings after using, and checked if they're null along with task_id and is_put.

* Changed it so that data_size is set correctly.

* Removed iostream import.

* Included a check to ensure that the Redis string to long long conversion was successful.

* Included separate data_size and hash null checks.

* Fixed bug.

* Made linting changes.

* Another linting error.

* Slight simplication.
2017-06-16 23:17:11 -07:00
Robert Nishihara
019ba07e9c Correct actor class name and module. (#675)
* Correct actor class name and module.

* Add test.

* Fix linting.
2017-06-17 05:44:42 +00:00
Alexey Tumanov
8317025987 reducing the size of objects created for the global scheduler test (#674) 2017-06-15 10:02:46 -07:00
alanamarzoev
cc4990b543 Task profiles function and test (#647)
Expose some task profiling information through global state API.
2017-06-13 17:53:34 -07:00
alanamarzoev
43bae46e47 Included worker_id in task event logs. (#668) 2017-06-13 17:30:43 -07:00
Philipp Moritz
54925996ca Allow remote functions to specify max executions and kill worker once limit is reached. (#660)
* implement restarting workers after certain number of task executions

* Clean up python code.

* Don't start new worker when an actor disconnects.

* Move wait_for_pid_to_exit to test_utils.py.

* Add test.

* Fix linting errors.

* Fix linting.

* Fix typo.
2017-06-13 00:34:58 -07:00
Richard Liaw
8d350f628a Fixing Redis Key Consistencies for Actor, FunctionTable, FunctionsToRun, and RemoteFunction (#659)
* consistencies for Actor, FunctionTable, and FunctionsToRun

* NOT WORKING: changing remote fn keys
2017-06-10 23:45:22 +00:00
alanamarzoev
ee1d4e5ea2 Redirect worker stdout/stderr to log files. (#646)
* local scheduler

* redirect output files to be associated with workers rather than the local scheduler

* fixed formatting

* fixes

* Moved output redirection logic to worker.py.

* Changed write mode.

* Fixed formatting.

* Added comment.

* Reuse log file creation in services.py.

* Fix linting.

* Fix problem in which multiple processes attempt to create /tmp/raylogs at the same time.
2017-06-08 18:30:48 -07:00
alanamarzoev
f0339f3386 Expose log files through global state API. (#641)
* added log_table function and a test

* fixed log_files and added task_profiles

* fixed formatting

* fixed linting errors

* fixes

* removed file

* more fixes

* hopefully fixed

* Small changes.

* Fix linting.

* Fix bug in log monitor.

* Small changes.

* Fix bug in travis.
2017-06-08 00:08:10 -07:00
Robert Nishihara
301e0b0db8 Bump version to 0.1.1 in preparation for uploading wheels to PyPI. (#630) 2017-06-03 02:17:39 +00:00
Robert Nishihara
2694337c0f Fix large memory tests. (#632)
* Log the driver ID in hex instead of binary.

* Fix large memory test and add more tests to it.

* Remove tests that are too stressful.
2017-06-03 01:12:56 +00:00
Robert Nishihara
23b0c80967 Rename linux wheels so they can be uploaded to PyPI. (#629) 2017-06-02 20:20:34 +00:00
Robert Nishihara
1a682e2807 Enable starting and stopping ray with "ray start" and "ray stop". (#628)
* Install start_ray and stop_ray scripts in setup.py.

* Update documentation.

* Fix docker tests.

* Implement stop_ray script in python.

* Fix linting.
2017-06-02 20:17:48 +00:00
Robert Nishihara
a4d8e13094 Suppress excess warning messages related to intentional actor deaths. (#627)
* Don't submit the actor destructor tasks when the job is exiting.

* Don't propagate error messages to the driver when an actor exits intentionally.
2017-06-01 20:10:40 +00:00
Robert Nishihara
d0bfc0a849 Clean up actor workers when actor handle goes out of scope. (#617) 2017-06-01 07:02:43 +00:00
Robert Nishihara
bcaab78908 Add script for building MacOS wheels. (#601)
* Add script for building MacOS wheels.

* Small cleanups to script.

* Fix setting of PATH before building wheel.

* Create symbolic link to correct Python executable so Ray installation finds the right Python.

* Address comments.

* Rename readme.
2017-06-01 00:30:46 +00:00
Richard Shin
609b5c1a4c Add script to build manylinux1 .whl files (#600)
* Add manylinux setup

* Switch to cp27mu

* python/MANIFEST.in

* Fix MANIFEST.in

* Add build-wheel-manylinux1.sh

* Update readme

* Install correct version of numpy

* Fix typo in README-manylinux1.md

* Don't install cmake

* Remove commented line from setup.py

* Delete unused manylinux1.sh

* Run setup.py bdist_wheel twice

* Don't use package_data and MANIFEST.in.

* Small aesthetic change.

* Trigger build_ext in setup.py.

* Remove nonexistent file from MANIFEST.in.

* Manually copy files in MANIFEST.in to where Python expects them in order to prevent setup.py from having to be run twice.

* Only run setup.py once when building wheels.

* Aesthetic change to readme.

* Copy generated flatbuffer Python files in build_ext.

* Fix permission denied error by making sure to preserve executableness when copying files.

* Remove unnecessary argument to setup.py.

* Remove MANIFEST.in and move files to include into list in setup.py.

* Fix numpy version when building wheels and replace rm with git clean.
2017-05-27 21:35:48 -07:00
Chelsea Finn
f97d0393cc Fix to json decoding bug (#597)
* fix json decoding bug

* Fix linting error.
2017-05-25 18:48:39 -07:00
Robert Nishihara
997aa35721 Remove cloudpickle customization and just use plain cloudpickle. (#588)
* Remove augmentations of cloudpickle.

* Entirely remove cloudpickle modifications. Just use plain cloudpickle.
2017-05-24 20:22:28 -07:00
Robert Nishihara
c5bc76193f Remove Ray environment variables from codebase. (#590) 2017-05-24 18:29:40 -07:00
Robert Nishihara
c647dd5f6c Make it possible to use actor definitions within remote functions and other actors. (#587)
* Enable remote function and actor definitions to close over actor definitions.

* Give better error message if actor objects are pickled.

* Add tests for closing over actor definitions.

* Fix linting.
2017-05-24 15:43:32 -07:00
Robert Nishihara
c440010cbd Bump version to 0.1.0. (#581) 2017-05-20 23:25:01 -07:00
Stephanie Wang
ee08c8274b Shard Redis. (#539)
* Implement sharding in the Ray core

* Single node Python modifications to do sharding

* Do the sharding in redis.cc

* Pipe num_redis_shards through start_ray.py and worker.py.

* Use multiple redis shards in multinode tests.

* first steps for sharding ray.global_state

* Fix problem in multinode docker test.

* fix runtest.py

* fix some tests

* fix redis shard startup

* fix redis sharding

* fix

* fix bug introduced by the map-iterator being consumed

* fix sharding bug

* shard event table

* update number of Redis clients to be 64K

* Fix object table tests by flushing shards in between unit tests

* Fix local scheduler tests

* Documentation

* Register shard locations in the primary shard

* Add plasma unit tests back to build

* lint

* lint and fix build

* Fix

* Address Robert's comments

* Refactor start_ray_processes to start Redis shard

* lint

* Fix global scheduler python tests

* Fix redis module test

* Fix plasma test

* Fix component failure test

* Fix local scheduler test

* Fix runtest.py

* Fix global scheduler test for python3

* Fix task_table_test_and_update bug, from actor task table submission race

* Fix jenkins tests.

* Retry Redis shard connections

* Fix test cases

* Convert database clients to DBClient struct

* Fix race condition when subscribing to db client table

* Remove unused lines, add APITest for sharded Ray

* Fix

* Fix memory leak

* Suppress ReconstructionTests output

* Suppress output for APITestSharded

* Reissue task table add/update commands if initial command does not publish to any subscribers.

* fix

* Fix linting.

* fix tests

* fix linting

* fix python test

* fix linting
2017-05-18 17:40:41 -07:00
Philipp Moritz
28f0882387 Expose function table to python global control state API (#542)
* expose function table to python global control state API

* fix

* fix linting

* add test for function table
2017-05-16 20:06:13 -07:00
Robert Nishihara
5572561704 Do not start web UI by default, and remove web UI from documentation. (#554) 2017-05-16 19:29:07 -07:00
Robert Nishihara
ec2534422b Remove register_class from API. (#550)
* Perform ray.register_class under the hood.

* Fix bug.

* Release worker lock when waiting for imports to arrive in get.

* Remove calls to register_class from examples and tests.

* Clear serialization state between tests.

* Fix bug and add test for multiple custom classes with same name.

* Fix failure test.

* Fix linting and cleanups to python code.

* Fixes to documentation.

* Implement recursion depth for recursively registering classes.

* Fix linting.

* Push warning to user if waiting for class for too long.

* Fix typos.

* Don't export FunctionToRun if pickling the function fails.

* Don't broadcast class definition when pickling class.
2017-05-16 18:38:52 -07:00
Philipp Moritz
08e988aee5 Modernize plasma store (C to C++ changes). (#546) 2017-05-15 01:19:44 -07:00
Robert Nishihara
9f91eb8c91 Change API for remote function declaration, actor instantiation, and actor method invocation. (#541)
* Direction substitution of @ray.remote -> @ray.task.

* Changes to make '@ray.task' work.

* Instantiate actors with Class.remote() instead of Class().

* Convert actor instantiation in tests and examples from Class() to Class.remote().

* Change actor method invocation from object.method() to object.method.remote().

* Update tests and examples to invoke actor methods with .remote().

* Fix bugs in jenkins tests.

* Fix example applications.

* Change @ray.task back to @ray.remote.

* Changes to make @ray.actor -> @ray.remote work.

* Direct substitution of @ray.actor -> @ray.remote.

* Fixes.

* Raise exception if @ray.actor decorator is used.

* Simplify ActorMethod class.
2017-05-14 00:01:20 -07:00