Commit graph

14 commits

Author SHA1 Message Date
Philipp Moritz
54925996ca Allow remote functions to specify max executions and kill worker once limit is reached. (#660)
* implement restarting workers after certain number of task executions

* Clean up python code.

* Don't start new worker when an actor disconnects.

* Move wait_for_pid_to_exit to test_utils.py.

* Add test.

* Fix linting errors.

* Fix linting.

* Fix typo.
2017-06-13 00:34:58 -07:00
Philipp Moritz
6adf39959c put back large python object tests (commented out) (#636) 2017-06-02 20:36:10 -07:00
Robert Nishihara
2694337c0f Fix large memory tests. (#632)
* Log the driver ID in hex instead of binary.

* Fix large memory test and add more tests to it.

* Remove tests that are too stressful.
2017-06-03 01:12:56 +00:00
Robert Nishihara
1a682e2807 Enable starting and stopping ray with "ray start" and "ray stop". (#628)
* Install start_ray and stop_ray scripts in setup.py.

* Update documentation.

* Fix docker tests.

* Implement stop_ray script in python.

* Fix linting.
2017-06-02 20:17:48 +00:00
Robert Nishihara
bcaab78908 Add script for building MacOS wheels. (#601)
* Add script for building MacOS wheels.

* Small cleanups to script.

* Fix setting of PATH before building wheel.

* Create symbolic link to correct Python executable so Ray installation finds the right Python.

* Address comments.

* Rename readme.
2017-06-01 00:30:46 +00:00
Robert Nishihara
07b21e057c Print the driver stdout/stderr if we fail to decode it in jenkins. (#567)
* Print the driver stdout/stderr if we fail to decode it in jenkins.

* Fix whitespace.

* Add explanation.
2017-05-20 23:11:19 -07:00
Stephanie Wang
ee08c8274b Shard Redis. (#539)
* Implement sharding in the Ray core

* Single node Python modifications to do sharding

* Do the sharding in redis.cc

* Pipe num_redis_shards through start_ray.py and worker.py.

* Use multiple redis shards in multinode tests.

* first steps for sharding ray.global_state

* Fix problem in multinode docker test.

* fix runtest.py

* fix some tests

* fix redis shard startup

* fix redis sharding

* fix

* fix bug introduced by the map-iterator being consumed

* fix sharding bug

* shard event table

* update number of Redis clients to be 64K

* Fix object table tests by flushing shards in between unit tests

* Fix local scheduler tests

* Documentation

* Register shard locations in the primary shard

* Add plasma unit tests back to build

* lint

* lint and fix build

* Fix

* Address Robert's comments

* Refactor start_ray_processes to start Redis shard

* lint

* Fix global scheduler python tests

* Fix redis module test

* Fix plasma test

* Fix component failure test

* Fix local scheduler test

* Fix runtest.py

* Fix global scheduler test for python3

* Fix task_table_test_and_update bug, from actor task table submission race

* Fix jenkins tests.

* Retry Redis shard connections

* Fix test cases

* Convert database clients to DBClient struct

* Fix race condition when subscribing to db client table

* Remove unused lines, add APITest for sharded Ray

* Fix

* Fix memory leak

* Suppress ReconstructionTests output

* Suppress output for APITestSharded

* Reissue task table add/update commands if initial command does not publish to any subscribers.

* fix

* Fix linting.

* fix tests

* fix linting

* fix python test

* fix linting
2017-05-18 17:40:41 -07:00
shane
0a4304725f adding -x for clearer output in build console log (#565) 2017-05-18 17:04:56 -07:00
Robert Nishihara
9f91eb8c91 Change API for remote function declaration, actor instantiation, and actor method invocation. (#541)
* Direction substitution of @ray.remote -> @ray.task.

* Changes to make '@ray.task' work.

* Instantiate actors with Class.remote() instead of Class().

* Convert actor instantiation in tests and examples from Class() to Class.remote().

* Change actor method invocation from object.method() to object.method.remote().

* Update tests and examples to invoke actor methods with .remote().

* Fix bugs in jenkins tests.

* Fix example applications.

* Change @ray.task back to @ray.remote.

* Changes to make @ray.actor -> @ray.remote work.

* Direct substitution of @ray.actor -> @ray.remote.

* Fixes.

* Raise exception if @ray.actor decorator is used.

* Simplify ActorMethod class.
2017-05-14 00:01:20 -07:00
Robert Nishihara
1627f89945 Fix problem in which actors and workers running tasks are not killed by driver exit. (#490)
* Augment test to verify that relevant workers and actors are killed during driver cleanup.

* Fix bug in which we were only killing one worker when a driver exited.

* Fix remove driver test.

* Fix and augment test.
2017-04-26 15:13:39 -07:00
Robert Nishihara
0ac125e9b2 Clean up when a driver disconnects. (#462)
* Clean up state when drivers exit.

* Remove unnecessary field in ActorMapEntry struct.

* Have monitor release GPU resources in Redis when driver exits.

* Enable multiple drivers in multi-node tests and test driver cleanup.

* Make redis GPU allocation a redis transaction and small cleanups.

* Fix multi-node test.

* Small cleanups.

* Make global scheduler take node_ip_address so it appears in the right place in the client table.

* Cleanups.

* Fix linting and cleanups in local scheduler.

* Fix removed_driver_test.

* Fix bug related to vector -> list.

* Fix linting.

* Cleanup.

* Fix multi node tests.

* Fix jenkins tests.

* Add another multi node test with many drivers.

* Fix linting.

* Make the actor creation notification a flatbuffer message.

* Revert "Make the actor creation notification a flatbuffer message."

This reverts commit af99099c8084dbf9177fb4e34c0c9b1a12c78f39.

* Add comment explaining flatbuffer problems.
2017-04-24 18:10:21 -07:00
Philipp Moritz
4043769ba2 Make putting large objects work. (#411)
* putting large objects

* add more checks

* support large objects

* fix test

* fix linting

* upgrade to latest arrow version

* check malloc return code

* print mmap file sizes

* printing

* revert to dlmalloc

* add prints

* more prints

* add printing

* printing

* fix

* update

* fix

* update

* print

* initialization

* temp

* fix

* update

* fix linting

* comment out object_store_full tests

* fix test

* fix test

* evict objects if dlmalloc fails

* fix stresstests

* Fix linting.

* Uncomment large-memory tests.

* Increase memory for docker image for jenkins tests.

* Reduce large memory tests.

* Further reduce large memory tests.
2017-04-05 01:04:05 -07:00
Robert Nishihara
ba02fc0eb0 Run flake8 in Travis and make code PEP8 compliant. (#387) 2017-03-21 12:57:54 -07:00
Johann Schleier-Smith
29c8471fd4 Add multinode tests by simulating multiple nodes using Docker. (#378)
* run test workloads for a Docker cluster

* better manage docker image versions

* Changes to make multinode docker tests work with Python 3.

* option to mount local test directory on head node to speed development

* Attempt to simplify multinode test setup.

* Small change.

* Add in development-mode to run multinode docker tests more easily during development.

* add jenkins test script that links to Docker hash

* Read docker SHA from build_docker.sh and add test that should fail.

* Consolidate implementations and remove duplicate files.

* Allow test to retry if it fails to schedule on all nodes.

* Remove sleep when in docker multinode tests.
2017-03-18 23:44:54 -07:00