Commit graph

224 commits

Author SHA1 Message Date
Robert Nishihara
c9d70f0dda Remove num_local_schedulers argument from ray.worker._init. (#3704)
* Remove num_local_schedulers argument from ray.worker._init.

* Fix

* Fix tests.
2019-01-07 12:44:49 -08:00
Yuhong Guo
c9b8ecca51 Add RayParams to refactor the parameters used by ray python. (#3558) 2018-12-29 22:04:27 +08:00
Yuhong Guo
b9e1977fae Fix failure of test_free_objects_multi_node (#3481)
It is possible that `test_free_objects_multi_node` would fail sometimes. If we run this test 20 times, we may found at least one failure.

The cause is that the test is based on function tasks. One raylet may create more than one worker to execute the tasks. So flush operations may be separated to several workers and not clean all the worker objects held by the plasma client.

In this PR, I change function task to actor tasks, which guarantee all the tasks are executed in one worker of a raylet.
2018-12-06 15:55:49 -05:00
Si-Yuan
2e6f9bedf2 Add the extra fallback for serialization (#3468)
* Add the extra fallback for serialization.

* Better comments & warnings. quotes.

* Update test/runtest.py

Co-Authored-By: suquark <suquark@gmail.com>

* Update test/runtest.py

Co-Authored-By: suquark <suquark@gmail.com>

* linting

* Don't hijack too much errors.

* simplify the test

* Update runtest.py

* simplify
2018-12-05 13:09:08 -08:00
Robert Nishihara
3856533065 Fix incompatibility with most recent version of Redis. (#3379)
* Fix incompatibility with most recent version of Redis.

* Fix

* Fixes.
2018-11-24 16:36:38 -08:00
Robert Nishihara
5cbc597494 Suppress duplicate pre-emptive object pushes. (#3276)
* Suppress duplicate pre-emptive object pushes.

* Add test.

* Fix linting

* Remove timer and inline recent_pushes_ into local_objects_.

* Improve test.

* Fix

* Fix linting

* Enable retrying pull from same object manager. Randomize object manager.

* Speed up test

* Linting

* Add test.

* Minor

* Lengthen pull timeout and reissue pull every time a new object becomes available.

* Increase pull timeout in test.

* Wait for nodes to start in object manager test.

* Wait longer for nodes to start up in test.

* Small fixes.

* _submit -> _remote

* Change assert to warning.
2018-11-16 23:02:45 -08:00
Robert Nishihara
d10cb570ab Rename _submit -> _remote. (#3321) 2018-11-15 15:30:18 -08:00
Philipp Moritz
1be1455d86 Fix redis crash when duplicate messages are appended to log. (#3316) 2018-11-15 15:09:39 -08:00
Stephanie Wang
d950e92f63
Allow multiple threads to call ray.get and ray.wait (#3244)
* Handle multiple threads calling ray.get

* Multithreaded ray.wait

* Pass in current task ID in java backend

* Add multithreaded actor to tests, add warning messages to worker for multithreaded ray.get

* Fix test

* Some cleanups

* Improve error message

* Add assertion

* Cleanup, throw error in HandleTaskUnblocked if task not actually blocked

* lint

* Fix python worker reset

* Fix references to reconstruct_objects

* Linting

* java lint

* Fix java

* Fix iterator
2018-11-07 22:39:28 -08:00
Robert Nishihara
1dd5d92789 Enable timeline visualizations of object transfers. (#3255)
* Plot object transfers.

* Linting
2018-11-07 12:45:59 -08:00
Eric Liang
725df3a485 Set the process title in workers and actors (#3219) 2018-11-06 14:59:22 -08:00
Eric Liang
9a0f0db070 Add ray stack tool for debugging (#3213) 2018-11-03 13:13:02 -07:00
Wang Qing
ca7d4c2cf5 Enable to specify driver id by user. (#3084) 2018-11-02 19:01:50 -07:00
Robert Nishihara
5822aa2388 Rename get_task -> worker_idle in timeline. (#3179)
* Rename get_task -> worker_idle in timeline.

* Fix test.
2018-11-02 12:08:46 -07:00
Robert Nishihara
1f29a960f4 Update task_table and object_table API. (#3161)
* Update task_table and object_table API.

* Fix
2018-10-31 12:52:50 -07:00
Robert Nishihara
32f0d6b77e Deprecate num_workers argument to ray.init and ray start. (#3114)
* Remove num_workers argument.

* Fix

* Fix
2018-10-28 20:12:49 -07:00
Robert Nishihara
658c14282c Remove legacy Ray code. (#3121)
* Remove legacy Ray code.

* Fix cmake and simplify monitor.

* Fix linting

* Updates

* Fix

* Implement some methods.

* Remove more plasma manager references.

* Fix

* Linting

* Fix

* Fix

* Make sure class IDs are strings.

* Some path fixes

* Fix

* Path fixes and update arrow

* Fixes.

* linting

* Fixes

* Java fixes

* Some java fixes

* TaskLanguage -> Language

* Minor

* Fix python test and remove unused method signature.

* Fix java tests

* Fix jenkins tests

* Remove commented out code.
2018-10-26 13:36:58 -07:00
Robert Nishihara
9c1826ed69 Use XRay backend by default. (#3020)
* Use XRay backend by default.

* Remove irrelevant valgrind tests.

* Fix

* Move tests around.

* Fix

* Fix test

* Fix test.

* String/unicode fix.

* Fix test

* Fix unicode issue.

* Minor changes

* Fix bug in test_global_state.py.

* Fix test.

* Linting

* Try arrow change and other object manager changes.

* Use newer plasma client API

* Small updates

* Revert plasma client api change.

* Update

* Update arrow and allow SendObjectHeaders to fail.

* Update arrow

* Update python/ray/experimental/state.py

Co-Authored-By: robertnishihara <robertnishihara@gmail.com>

* Address comments.
2018-10-23 12:46:39 -07:00
Robert Nishihara
22dd7e0428 Add test for wait reconstruction. (#3110) 2018-10-22 23:16:54 -07:00
Robert Nishihara
ed6289771a Convert runtest.py to use pytest. (#2966)
* Convert runtest.py to use pytest.

* Linting.

* Fix

* Fix

* Fix

* Fix
2018-09-30 07:59:44 -07:00
Hanwei Jin
dc76e51a60 bugfix: cmake copy plasma java lib from lib64 directory in centos (#2885) 2018-09-16 22:32:09 -07:00
Robert Nishihara
f16d33593b Mark worker as blocked and trigger reconstruction in ray.wait. (#2864)
* Trigger reconstruction in ray.wait and mark worker as blocked.

* Add test.

* Linting.

* Don't run new test with legacy Ray.

* Only call HandleClientUnblocked if it actually blocked in ray.wait.

* Reduce time to ray.wait in the test.
2018-09-13 15:28:17 -07:00
Robert Nishihara
3f6ed537a4 Add ray.is_initialized() function. (#2818)
* Add ray.is_initialized() function.

* Add assert.
2018-09-06 21:20:59 -07:00
Yucong He
5b45f0bdff [xray] Implementing Gcs sharding (#2409)
Basically a re-implementation of #2281, with modifications of #2298 (A fix of #2334, for rebasing issues.).
[+] Implement sharding for gcs tables.
[+] Keep ClientTable and ErrorTable managed by the primary_shard. TaskTable is managed by the primary_shard for now, until a good hashing for tasks is implemented.
[+] Move AsyncGcsClient's initialization into Connect function.
[-] Move GetRedisShard and bool sharding from RedisContext's connect into AsyncGcsClient. This may make the interface cleaner.
2018-08-31 15:54:30 -07:00
Robert Nishihara
eda6ebb87d Convert some unittests to pytest. (#2779)
* Convert multi_node_test.py to pytest.

* Convert array_test.py to pytest.

* Convert failure_test.py to pytest.

* Convert microbenchmarks to pytest.

* Convert component_failures_test.py to pytest and some minor quotes changes.

* Convert tensorflow_test.py to pytest.

* Convert actor_test.py to pytest.

* Fix.

* Fix
2018-08-31 11:24:15 -07:00
Robert Nishihara
32f7d6fcf5 Add back some tests for xray. (#2772) 2018-08-30 11:07:23 -07:00
Robert Nishihara
b7722897b4 Deprecate 'driver_mode' argument. (#2758)
* Deprecate 'driver_mode' argument.

* Fix

* Fix
2018-08-28 16:45:49 -07:00
Alexey Tumanov
de047daea7 [xray] raylet scheduling mechanism with a simple spillback policy (#2749)
## What do these changes do?
* distribute load and resource information on a heartbeat
* for each raylet, maintain total and available resource capacity as well as measure of current load
* this PR introduces a new notion of load, defined as a sum of all resource demand induced by queued ready tasks on the local raylet. This provides a heterogeneity-aware measure of load that supersedes legacy Ray's task count as a proxy for load.
* modify the scheduling policy to perform *capacity-based*, *load-aware*, *optimistically concurrent* resource allocation
* perform task spillover to the heartbeating node in response to a heartbeat, implementing  heterogeneity-aware late-binding/work-stealing.
2018-08-28 00:03:34 -07:00
Yuhong Guo
eeb15771ba Add ray.internal.free (#2542) 2018-08-14 22:01:23 -07:00
Philipp Moritz
d8ba667175 Convert asserts in unittest to pytest (#2529) 2018-08-01 22:32:10 -07:00
Robert Nishihara
909d7172b1 Introduce constant for ID_SIZE in python code. (#2517) 2018-07-31 12:40:53 -07:00
Philipp Moritz
696a229ece Fix text verbosity in python 2.7 by running tests with pytest (#2470) 2018-07-30 11:04:06 -07:00
Hao Chen
05f485e274 Allow Ray API to be used from multiple threads (#2422) 2018-07-20 15:39:01 -07:00
Robert Nishihara
515da7721a Change ray.worker.cleanup -> ray.shutdown and improve API documentation. (#2374)
* Change ray.worker.cleanup -> ray.shutdown and improve API documentation.

* Deprecate ray.worker.cleanup() gracefully.

* Fix linting
2018-07-12 12:00:00 -07:00
Robert Nishihara
b90e551b41 [xray] Implement timeline and profiling API. (#2306)
* Add profile table and store profiling information there.

* Code for dumping timeline.

* Improve color scheme.

* Push timeline events on driver only for raylet.

* Improvements to profiling and timeline visualization

* Some linting

* Small fix.

* Linting

* Propagate node IP address through profiling events.

* Fix test.

* object_id.hex() should return byte string in python 2.

* Include gcs.fbs in node_manager.fbs.

* Remove flatbuffer definition duplication.

* Decode to unicode in Python 3 and bytes in Python 2.

* Minor

* Submit profile events in a batch. Revert some CMake changes.

* Fix

* Workaround test failure.

* Fix linting

* Linting

* Don't return anything from chrome_tracing_dump when filename is provided.

* Remove some redundancy from profile table.

* Linting

* Move TODOs out of docstring.

* Minor
2018-07-04 23:23:48 -07:00
Robert Nishihara
800f7cc77d Make actor handles work in Python mode. (#2283)
* Make actor handles work in local mode.

* Add test for actor handles in local mode.
2018-06-20 23:02:41 -07:00
Robert Nishihara
ff2217251f [xray] Add error table and push error messages to driver through node manager. (#2256)
* Fix documentation indentation.

* Add error table to GCS and push error messages through node manager.

* Add type to error data.

* Linting

* Fix failure_test bug.

* Linting.

* Enable one more test.

* Attempt to fix doc building.

* Restructuring

* Fixes

* More fixes.

* Move current_time_ms function into util.h.
2018-06-20 21:29:28 -07:00
Robert Nishihara
61139e1509 Enable fractional resources and resource IDs for xray. (#2187)
* Implement GPU IDs and fractional resources.

* Add documentation and python exceptions.

* Fix signed/unsigned comparison.

* Fix linting.

* Fixes from rebase.

* Re-enable tests that use ray.wait.

* Don't kill the raylet if an infeasible task is submitted.

* Ignore tests that require better load balancing.

* Linting

* Ignore array test.

* Ignore stress test reconstructions tests.

* Don't kill node manager if remote node manager disconnects.

* Ignore more stress tests.

* Naming changes

* Remove outdated todo

* Small fix

* Re-enable test.

* Linting

* Fix resource bookkeeping for blocked tasks.

* Fix linting

* Fix Java client.

* Ignore test

* Ignore put error tests
2018-06-10 15:31:43 -07:00
Melih Elibol
7246ff80a4
[xray] Implements ray.wait (#2162)
Implements ray.wait for xray. Fixes #1128.
2018-06-06 16:56:44 -07:00
Kunal Gosar
317d0da7d8 Add experimental API for ray.get and ray.wait with additional argument types (#2071) 2018-06-01 16:42:27 -07:00
Robert Nishihara
6172f94c04 Implement Python global state API for xray. (#2125)
* Implement global state API for xray.

* Fix object table.

* Fixes for log structure.

* Implement cluster_resources.

* Add driver task to task table.

* Remove python flatbuffers code

* Get some global state API tests running.

* Python linting.

* Fix linting.

* Fix mock modules for doc

* Copy over flatbuffer bindings.

* Fix for tests.

* Linting

* Fix monitor crash.
2018-05-29 16:25:54 -07:00
Zongheng Yang
fa97acbc89 Integrate credis with Ray & route task table entries into credis. (#1841) 2018-05-24 23:35:25 -07:00
Alok Singh
f795173b51 Use flake8-comprehensions (#1976)
* Add flake8 to Travis

* Add flake8-comprehensions

[flake8 plugin](https://github.com/adamchainz/flake8-comprehensions) that
checks for useless constructions.

* Use generators instead of lists where appropriate

A lot of the builtins can take in generators instead of lists.

This commit applies `flake8-comprehensions` to find them.

* Fix lint error

* Fix some string formatting

The rest can be fixed in another PR

* Fix compound literals syntax

This should probably be merged after #1963.

* dict() -> {}

* Use dict literal syntax

dict(...) -> {...}

* Rewrite nested dicts

* Fix hanging indent

* Add missing import

* Add missing quote

* fmt

* Add missing whitespace

* rm duplicate pip install

This is already installed in another file.

* Fix indent

* move `merge_dicts` into utils

* Bring up to date with `master`

* Add automatic syntax upgrade

* rm pyupgrade

In case users want to still use it on their own, the upgrade-syn.sh script was
left in the `.travis` dir.
2018-05-20 16:15:06 -07:00
Alok Singh
9a8f29e571 YAPF, take 3 (#2098)
* Use pep8 style

The original style file is actually just pep8 style, but with everything
spelled out. It's easier to use the `based_on_style` feature. Any overrides are
clearer that way.

* Improve yapf script

1. Do formatting in parallel
2. Lint RLlib
3. Use .style.yapf file

* Pull out expressions into variables

* Don't format rllib

* Don't allow splits in dicts

* Apply yapf

* Disallow single line if-statements

* Use arithmetic comparison

* Simplify checking for changed files

* Pull out expr into var
2018-05-19 16:07:28 -07:00
Robert Nishihara
78e4b021ab Functions for flushing done tasks and evicted objects. (#2033) 2018-05-18 01:59:58 -07:00
Adam Gleave
470887c2ad Support calling positional arguments by keyword (fix #998) (#2081) 2018-05-17 16:10:26 -07:00
Melih Elibol
bea97b425b Fix python linting (#2076) 2018-05-16 15:04:31 -07:00
Robert Nishihara
570c3153cd Some tests for _submit API. (#2062) 2018-05-16 00:26:25 -07:00
Robert Nishihara
8fbb88485b Create RemoteFunction class, remove FunctionProperties, simplify worker Python code. (#2052)
* Cleaning up worker and actor code. Create remote function class. Remove FunctionProperties object.

* Remove register_actor_signatures function.

* Small cleanups.

* Fix linting.

* Support @ray.method syntax for actor methods.

* Fix pickling bug.

* Fix linting.

* Shorten testBlockingTasks.

* Small fixes.

* Call get_global_worker().
2018-05-14 14:35:23 -07:00
Robert Nishihara
52b0f3734a [xray] Add Travis build for testing xray on Linux. (#2047)
* Run xray tests in travis.

* Comment out TaskTests.testSubmittingManyTasks.

* Comment out failing tests.

* Comment out hanging test.

* Linting

* Comment out failing test.

* Comment out failing test.

* Ignore test_dataframe.py for now.

* Comment out testDriverExitingQuickly.
2018-05-13 21:22:01 -07:00