Commit graph

4 commits

Author SHA1 Message Date
Robert Nishihara
1627f89945 Fix problem in which actors and workers running tasks are not killed by driver exit. (#490)
* Augment test to verify that relevant workers and actors are killed during driver cleanup.

* Fix bug in which we were only killing one worker when a driver exited.

* Fix remove driver test.

* Fix and augment test.
2017-04-26 15:13:39 -07:00
Robert Nishihara
0ac125e9b2 Clean up when a driver disconnects. (#462)
* Clean up state when drivers exit.

* Remove unnecessary field in ActorMapEntry struct.

* Have monitor release GPU resources in Redis when driver exits.

* Enable multiple drivers in multi-node tests and test driver cleanup.

* Make redis GPU allocation a redis transaction and small cleanups.

* Fix multi-node test.

* Small cleanups.

* Make global scheduler take node_ip_address so it appears in the right place in the client table.

* Cleanups.

* Fix linting and cleanups in local scheduler.

* Fix removed_driver_test.

* Fix bug related to vector -> list.

* Fix linting.

* Cleanup.

* Fix multi node tests.

* Fix jenkins tests.

* Add another multi node test with many drivers.

* Fix linting.

* Make the actor creation notification a flatbuffer message.

* Revert "Make the actor creation notification a flatbuffer message."

This reverts commit af99099c8084dbf9177fb4e34c0c9b1a12c78f39.

* Add comment explaining flatbuffer problems.
2017-04-24 18:10:21 -07:00
Robert Nishihara
ba02fc0eb0 Run flake8 in Travis and make code PEP8 compliant. (#387) 2017-03-21 12:57:54 -07:00
Johann Schleier-Smith
29c8471fd4 Add multinode tests by simulating multiple nodes using Docker. (#378)
* run test workloads for a Docker cluster

* better manage docker image versions

* Changes to make multinode docker tests work with Python 3.

* option to mount local test directory on head node to speed development

* Attempt to simplify multinode test setup.

* Small change.

* Add in development-mode to run multinode docker tests more easily during development.

* add jenkins test script that links to Docker hash

* Read docker SHA from build_docker.sh and add test that should fail.

* Consolidate implementations and remove duplicate files.

* Allow test to retry if it fails to schedule on all nodes.

* Remove sleep when in docker multinode tests.
2017-03-18 23:44:54 -07:00