Commit graph

3894 commits

Author SHA1 Message Date
Robert Nishihara
301e0b0db8 Bump version to 0.1.1 in preparation for uploading wheels to PyPI. (#630) 2017-06-03 02:17:39 +00:00
Robert Nishihara
2694337c0f Fix large memory tests. (#632)
* Log the driver ID in hex instead of binary.

* Fix large memory test and add more tests to it.

* Remove tests that are too stressful.
2017-06-03 01:12:56 +00:00
Robert Nishihara
23b0c80967 Rename linux wheels so they can be uploaded to PyPI. (#629) 2017-06-02 20:20:34 +00:00
Robert Nishihara
1a682e2807 Enable starting and stopping ray with "ray start" and "ray stop". (#628)
* Install start_ray and stop_ray scripts in setup.py.

* Update documentation.

* Fix docker tests.

* Implement stop_ray script in python.

* Fix linting.
2017-06-02 20:17:48 +00:00
Robert Nishihara
a4d8e13094 Suppress excess warning messages related to intentional actor deaths. (#627)
* Don't submit the actor destructor tasks when the job is exiting.

* Don't propagate error messages to the driver when an actor exits intentionally.
2017-06-01 20:10:40 +00:00
Robert Nishihara
d0bfc0a849 Clean up actor workers when actor handle goes out of scope. (#617) 2017-06-01 07:02:43 +00:00
Robert Nishihara
bcaab78908 Add script for building MacOS wheels. (#601)
* Add script for building MacOS wheels.

* Small cleanups to script.

* Fix setting of PATH before building wheel.

* Create symbolic link to correct Python executable so Ray installation finds the right Python.

* Address comments.

* Rename readme.
2017-06-01 00:30:46 +00:00
Richard Shin
609b5c1a4c Add script to build manylinux1 .whl files (#600)
* Add manylinux setup

* Switch to cp27mu

* python/MANIFEST.in

* Fix MANIFEST.in

* Add build-wheel-manylinux1.sh

* Update readme

* Install correct version of numpy

* Fix typo in README-manylinux1.md

* Don't install cmake

* Remove commented line from setup.py

* Delete unused manylinux1.sh

* Run setup.py bdist_wheel twice

* Don't use package_data and MANIFEST.in.

* Small aesthetic change.

* Trigger build_ext in setup.py.

* Remove nonexistent file from MANIFEST.in.

* Manually copy files in MANIFEST.in to where Python expects them in order to prevent setup.py from having to be run twice.

* Only run setup.py once when building wheels.

* Aesthetic change to readme.

* Copy generated flatbuffer Python files in build_ext.

* Fix permission denied error by making sure to preserve executableness when copying files.

* Remove unnecessary argument to setup.py.

* Remove MANIFEST.in and move files to include into list in setup.py.

* Fix numpy version when building wheels and replace rm with git clean.
2017-05-27 21:35:48 -07:00
Chelsea Finn
f97d0393cc Fix to json decoding bug (#597)
* fix json decoding bug

* Fix linting error.
2017-05-25 18:48:39 -07:00
Robert Nishihara
997aa35721 Remove cloudpickle customization and just use plain cloudpickle. (#588)
* Remove augmentations of cloudpickle.

* Entirely remove cloudpickle modifications. Just use plain cloudpickle.
2017-05-24 20:22:28 -07:00
Robert Nishihara
c5bc76193f Remove Ray environment variables from codebase. (#590) 2017-05-24 18:29:40 -07:00
Robert Nishihara
c647dd5f6c Make it possible to use actor definitions within remote functions and other actors. (#587)
* Enable remote function and actor definitions to close over actor definitions.

* Give better error message if actor objects are pickled.

* Add tests for closing over actor definitions.

* Fix linting.
2017-05-24 15:43:32 -07:00
Robert Nishihara
c440010cbd Bump version to 0.1.0. (#581) 2017-05-20 23:25:01 -07:00
Stephanie Wang
ee08c8274b Shard Redis. (#539)
* Implement sharding in the Ray core

* Single node Python modifications to do sharding

* Do the sharding in redis.cc

* Pipe num_redis_shards through start_ray.py and worker.py.

* Use multiple redis shards in multinode tests.

* first steps for sharding ray.global_state

* Fix problem in multinode docker test.

* fix runtest.py

* fix some tests

* fix redis shard startup

* fix redis sharding

* fix

* fix bug introduced by the map-iterator being consumed

* fix sharding bug

* shard event table

* update number of Redis clients to be 64K

* Fix object table tests by flushing shards in between unit tests

* Fix local scheduler tests

* Documentation

* Register shard locations in the primary shard

* Add plasma unit tests back to build

* lint

* lint and fix build

* Fix

* Address Robert's comments

* Refactor start_ray_processes to start Redis shard

* lint

* Fix global scheduler python tests

* Fix redis module test

* Fix plasma test

* Fix component failure test

* Fix local scheduler test

* Fix runtest.py

* Fix global scheduler test for python3

* Fix task_table_test_and_update bug, from actor task table submission race

* Fix jenkins tests.

* Retry Redis shard connections

* Fix test cases

* Convert database clients to DBClient struct

* Fix race condition when subscribing to db client table

* Remove unused lines, add APITest for sharded Ray

* Fix

* Fix memory leak

* Suppress ReconstructionTests output

* Suppress output for APITestSharded

* Reissue task table add/update commands if initial command does not publish to any subscribers.

* fix

* Fix linting.

* fix tests

* fix linting

* fix python test

* fix linting
2017-05-18 17:40:41 -07:00
Philipp Moritz
28f0882387 Expose function table to python global control state API (#542)
* expose function table to python global control state API

* fix

* fix linting

* add test for function table
2017-05-16 20:06:13 -07:00
Robert Nishihara
5572561704 Do not start web UI by default, and remove web UI from documentation. (#554) 2017-05-16 19:29:07 -07:00
Robert Nishihara
ec2534422b Remove register_class from API. (#550)
* Perform ray.register_class under the hood.

* Fix bug.

* Release worker lock when waiting for imports to arrive in get.

* Remove calls to register_class from examples and tests.

* Clear serialization state between tests.

* Fix bug and add test for multiple custom classes with same name.

* Fix failure test.

* Fix linting and cleanups to python code.

* Fixes to documentation.

* Implement recursion depth for recursively registering classes.

* Fix linting.

* Push warning to user if waiting for class for too long.

* Fix typos.

* Don't export FunctionToRun if pickling the function fails.

* Don't broadcast class definition when pickling class.
2017-05-16 18:38:52 -07:00
Philipp Moritz
08e988aee5 Modernize plasma store (C to C++ changes). (#546) 2017-05-15 01:19:44 -07:00
Robert Nishihara
9f91eb8c91 Change API for remote function declaration, actor instantiation, and actor method invocation. (#541)
* Direction substitution of @ray.remote -> @ray.task.

* Changes to make '@ray.task' work.

* Instantiate actors with Class.remote() instead of Class().

* Convert actor instantiation in tests and examples from Class() to Class.remote().

* Change actor method invocation from object.method() to object.method.remote().

* Update tests and examples to invoke actor methods with .remote().

* Fix bugs in jenkins tests.

* Fix example applications.

* Change @ray.task back to @ray.remote.

* Changes to make @ray.actor -> @ray.remote work.

* Direct substitution of @ray.actor -> @ray.remote.

* Fixes.

* Raise exception if @ray.actor decorator is used.

* Simplify ActorMethod class.
2017-05-14 00:01:20 -07:00
Robert Nishihara
22c6a22f28 Add flatbuffers dependency to setup.py. (#540) 2017-05-11 23:39:34 -07:00
Robert Nishihara
b4788ae518 Only export actor classes once. (#510)
* Only export actor classes once.

* Fix linting.

* Fixes after rebase.
2017-05-09 19:49:23 -07:00
Robert Nishihara
1f991b6389 Change /tmp/raylogs permissions so multiple users can log there. (#532) 2017-05-09 12:15:31 -07:00
Robert Nishihara
f32368bcbe Prevent actors from being placed on removed nodes or nodes with no CPUs. (#527)
* Make note about bug in which actor creation notification message is not received.

* Prevent actors from being created on removed nodes.

* Prevent actors from being created on nodes with no CPUs.

* Fix linting.

* Add test for scheduling actors on local schedulers with no CPUs.

* Improve error message when actors created before ray.init called.
2017-05-08 20:39:43 -07:00
Robert Nishihara
c688a64235 Expose GPU IDs to remote functions. (#496)
* Change local scheduler bookkeeping to use GPU IDs.

* Update actor test.

* Add tests for actors and tasks simultaneously using GPUs.

* Add additional task GPU ID test.

* Fix linting.

* Make redis GPU assignment ignore GPU IDs.

* Small fix.
2017-05-07 13:03:49 -07:00
Robert Nishihara
35dbdcc4f5 Make all export IDs unique. (#522)
* Make all export IDs unique.

* Work around test failure.
2017-05-06 21:17:25 -07:00
Philipp Moritz
1dddd5336a Fix actor bug arising from overwriting task specifications in the local scheduler (#513)
* copy task specifications put into the actor task cache so it won't get overwritten when the scheduler receives the next task

* cleanup

* cleanup and fix

* linting

* fix jenkins test

* fix linting
2017-05-06 17:39:35 -07:00
Robert Nishihara
8532ba4272 Serialize lambdas, sets, and types with pickle by default. (#511)
* Serialize lambdas with pickle by default.

* Serialize sets with pickle by default.

* Serialize types with pickle by default.

* Small update to documentation.

* Update tests.
2017-05-04 00:16:35 -07:00
Robert Nishihara
245c8ab888 Make sure user seeding does not affect actor ID generation. (#506)
* Make sure user seeding does not affect actor ID generation.

* Fix linting.

* Add test.
2017-05-03 16:29:55 -07:00
Robert Nishihara
1627f89945 Fix problem in which actors and workers running tasks are not killed by driver exit. (#490)
* Augment test to verify that relevant workers and actors are killed during driver cleanup.

* Fix bug in which we were only killing one worker when a driver exited.

* Fix remove driver test.

* Fix and augment test.
2017-04-26 15:13:39 -07:00
Robert Nishihara
0ac125e9b2 Clean up when a driver disconnects. (#462)
* Clean up state when drivers exit.

* Remove unnecessary field in ActorMapEntry struct.

* Have monitor release GPU resources in Redis when driver exits.

* Enable multiple drivers in multi-node tests and test driver cleanup.

* Make redis GPU allocation a redis transaction and small cleanups.

* Fix multi-node test.

* Small cleanups.

* Make global scheduler take node_ip_address so it appears in the right place in the client table.

* Cleanups.

* Fix linting and cleanups in local scheduler.

* Fix removed_driver_test.

* Fix bug related to vector -> list.

* Fix linting.

* Cleanup.

* Fix multi node tests.

* Fix jenkins tests.

* Add another multi node test with many drivers.

* Fix linting.

* Make the actor creation notification a flatbuffer message.

* Revert "Make the actor creation notification a flatbuffer message."

This reverts commit af99099c8084dbf9177fb4e34c0c9b1a12c78f39.

* Add comment explaining flatbuffer problems.
2017-04-24 18:10:21 -07:00
Robert Nishihara
3a2eb1467b Fix failure to propagate error message. (#479) 2017-04-23 16:12:25 -07:00
Robert Nishihara
c802e51d36 Re-enable recursive remote functions in a limited form. (#453)
* Re-enable recursive remote functions in a limited form.

* Fix linting.
2017-04-13 01:47:33 -07:00
Robert Nishihara
f4c1adae17 Unify function signature handling between remote functions and actor … (#441)
* Unify function signature handling between remote functions and actor methods.

* Fixes.

* Fix tests.
2017-04-08 21:34:13 -07:00
Alexey Tumanov
b6c4ae82c0 Increase redis client pubsub buffer size. (#442) 2017-04-08 15:24:07 -07:00
Robert Nishihara
7cd00741b1 Suppress irrelevant Redis connection errors. (#434)
* Suppress error messages in worker import thread when Redis terminates.

* Suppress some warnings from one of the tests.
2017-04-07 23:19:24 -07:00
Robert Nishihara
05fd4c2c37 Changes to local scheduler client protocol. (#435)
* Make local scheduler clients receive reply upon registration.

* Fix tests and linting.
2017-04-07 23:03:37 -07:00
Robert Nishihara
7af6f462fb Add API for querying global control state. (#431)
* Add API for querying global control state.

* Fix linting.

* Fix errors in Python 2.

* Fix bug in test.

* Fix bug in test.
2017-04-06 23:51:12 -07:00
Robert Nishihara
320109a5bd By default, start a number of workers equal to the number of CPUs. (#430)
* By default, start a number of workers equal to the number of CPUs.

* Fix stress tests.
2017-04-06 00:02:58 -07:00
Stephanie Wang
93679df724 Stopped nodes can rejoin immediately (#428)
* Ignore deleted clients when reading address info from Redis

* Remove self from db_client table when exiting cleanly

* Fix valgrind test

* Do not call plasma_perform_release when disconnecting
2017-04-05 23:50:38 -07:00
Philipp Moritz
4043769ba2 Make putting large objects work. (#411)
* putting large objects

* add more checks

* support large objects

* fix test

* fix linting

* upgrade to latest arrow version

* check malloc return code

* print mmap file sizes

* printing

* revert to dlmalloc

* add prints

* more prints

* add printing

* printing

* fix

* update

* fix

* update

* print

* initialization

* temp

* fix

* update

* fix linting

* comment out object_store_full tests

* fix test

* fix test

* evict objects if dlmalloc fails

* fix stresstests

* Fix linting.

* Uncomment large-memory tests.

* Increase memory for docker image for jenkins tests.

* Reduce large memory tests.

* Further reduce large memory tests.
2017-04-05 01:04:05 -07:00
Robert Nishihara
0925e11c48 Exclude function source from function ID hash in Python interpreter. (#395)
* Exclude function source code from function ID hash in Python interpreter.

* Remove try except block.
2017-03-25 11:31:21 -07:00
Robert Nishihara
ba02fc0eb0 Run flake8 in Travis and make code PEP8 compliant. (#387) 2017-03-21 12:57:54 -07:00
Stephanie Wang
083e7a28ad Push an error to the driver when the workload hangs on ray.put reconstruction (#382)
* Fix worker blocked bug

* tmp

* Push an error to the driver on ray.put for non-driver tasks

* Fix result table tests

* Fix test, logging

* Address comments

* Fix suppression bug

* Fix redis module test

* Edit error message

* Get values in chunks during reconstruction

* Test case for driver ray.put errors

* Error for evicting ray.put objects from the driver

* Fix tests

* Reduce verbosity

* Documentation
2017-03-21 00:16:48 -07:00
Stephanie Wang
12c9618c0c Plasma and worker node failure. (#373)
* Failing test case

* Local scheduler exits cleanly after plasma store dies

* Tolerate one plasma store failure

* Tolerate plasma store failures on all nodes except head node

* Plasma manager heartbeats

* Component failure tests

* Don't run the helper for Python testing

* Fix C test

* Fix hanging plasma transfer test

* Fix python3

* Consolidate ClientConnection code

* Fix valgrind test

* fix c test

* We can restart worker nodes!

* Fix flatbuffers bug

* Address comments

* Only register actual workers with the local scheduler

* Fix bug

* Fix segfaults

* Add test case that tests for driver liveness, fix local scheduler bug

* Clean up after tests

* Allocate retry info on the stack

* Send SIGKILL before waiting

* Relax unit test conditions

* Driver liveness test case and documentation
2017-03-17 17:03:58 -07:00
Robert Nishihara
f1d4dda8cb Put all log files in redis and visualize them in UI. (#350)
* Start process for monitoring log files and push changes to redis.

* Display log files in UI.

* Bug fix for recent tasks.

* Use flatbuffers to parse local scheduler heartbeats.
2017-03-16 15:27:00 -07:00
Robert Nishihara
3333e1d6b9 Fix bug in parsing of tasks in monitor. (#372) 2017-03-15 20:32:23 -07:00
Robert Nishihara
3b7788bf88 Disallow calling ray.put on an object ID. (#353) 2017-03-11 12:09:28 -08:00
Robert Nishihara
53dffe0bf2 Use flatbuffers for some messages from Redis. (#341)
* Compile the Ray redis module with C++.

* Redo parsing of object table notifications with flatbuffers.

* Update redis module python tests.

* Redo parsing of task table notifications with flatbuffers.

* Fix linting.

* Redo parsing of db client notifications with flatbuffers.

* Redo publishing of local scheduler heartbeats with flatbuffers.

* Fix linting.

* Remove usage of fixed-width formatting of scheduling state in channel name.

* Reply with flatbuffer object to task table queries, also simplify redis string to flatbuffer string conversion.

* Fix linting and tests.

* fix

* cleanup

* simplify logic in ReplyWithTask
2017-03-10 18:35:25 -08:00
Wapaul1
c66178bcd7 Resnet Adapted to Ray (#229)
* Initial conversion

* Further changes

* fixes

* some changes

* Fixes

* Added data pipeline

* Added updates to cifar

* Currently borken need sep pr

* Added test for retriving variables from an optimizer

* Removed FlAG ref in environment variables

* Added comments to test

* Addressed comments

* Added updates

* Made further changes for tfutils

* Fixed finalized bug

* Removed ipython

* Added accuracy printing

* Temp commit

* added fixes

* changes

* Added writing to file

* Fixes for gpus

* Cleaned up code

* Temp commit

* Gpu support fully implemented

* Updated to use num_gpus for actors

* Finished testing gpus implementation

* Changed to be more in line with origin implementation

* Updated test to use actors

* Added support for cpu only systems

* Now works with no cpus

* Minor changes and some documentation.
2017-03-07 01:07:32 -08:00
Stephanie Wang
da06b4db82 Warn the user when a nondeterministic task is detected. (#339)
* WARN instead of FATAL for object hash mismatches, push error to driver

* Document the callback signature for object_table_add/remove

* Error table

* Wait for all errors in python test

* Fix doc

* Fix state test
2017-03-07 00:32:15 -08:00