Commit graph

3783 commits

Author SHA1 Message Date
Eric Liang
1a3b83abf8
[direct call] Fix hang when caller id changes for actor task submission (#6338) 2019-12-04 12:01:35 -08:00
Simon Mo
31113aeded
Use rayproject repo (#6353) 2019-12-03 22:36:40 -08:00
Stephanie Wang
a82fb5585d
[direct task] Remove timeout for resolving futures that were deserialized (#6337)
* Reply GetObjectStatus once the task completes

* Remove timeout-based future resolution

* fix

* Update core_worker.h
2019-12-03 12:04:59 -08:00
Stephanie Wang
d5720779b3 Set the actor ID as the assigned task ID for direct actor workers (#6335)
* Fix

* rename
2019-12-03 10:54:26 -08:00
Kai Yang
d51583dbd6 Add test listener to show the test progress of java UT (#6341) 2019-12-03 16:34:07 +08:00
Eric Liang
bc5e259264
[rllib] Add a doc section on computing actions (#6326)
* options doc

* add note

* hint shr

* doc update
2019-12-03 00:10:50 -08:00
Shital Shah
670cb6374e Doc enhancement: use build.sh for ray, clarification on how rllib selects VisionNetwork, note on setup-dev.py for rllib. (#6092) 2019-12-02 22:19:01 -08:00
Ujval Misra
fa5d62e8ba [tune] Retry restore on timeout (#6284)
* Retry recovery on timeout

* fix bug, revert some code

* Add test for restore time outs.

* Fix lint

* Address comments

* Don't timeout restores.
2019-12-02 20:01:47 -08:00
Richard Liaw
0b3d5d989b
[docs] Add public materials (#6331)
* startup

* update tune readme

* usingrah
2019-12-02 19:59:23 -08:00
Simon Mo
216ef8e41a
Remove the encrypted docker password. Use web UI. (#6333) 2019-12-02 17:22:59 -08:00
Edward Oakes
d2c66ba795
Don't add assigned tasks to SWAP queue (#6325) 2019-12-02 16:39:02 -08:00
Edward Oakes
dff6017272
Fix "failed to create head node" issue (#6304)
* Fix failed to create head node issue

* comments
2019-12-02 15:22:00 -08:00
Ion
2a3adf2d70 New scheduler integration (#6321) 2019-12-02 14:42:16 -08:00
Mitchell Stern
43d20fff62 Refactor dashboard codebase to improve modularity (#6330)
* Refactor dashboard codebase to improve modularity

* Simplify feature interface

* Use arrow notation in makeFeature argument types

* Use separate components for node and worker features rather than a single conditionally-rendered component

* Add comments about Ray worker process titles

* Add comments to non-obvious fields in node info API response
2019-12-02 11:05:40 -08:00
Stephanie Wang
69dd5c9319
[direct task] Fix bug that starts duplicate connections from the worker to the local raylet (#6307)
* Fix bug and add unit test

* rename
2019-12-02 10:25:05 -08:00
Stephanie Wang
da41180dc0
[direct task] Retry tasks on failure and turn on RAY_FORCE_DIRECT for test_multinode_failures.py (#6306)
* multinode failures direct

* Add number of retries allowed for tasks

* Retry tasks

* Add failing test for object reconstruction

* Handle return status and debug

* update

* Retry task unit test

* update

* update

* todo

* Fix max_retries decorator, fix test

* Fix test that flaked

* lint

* comments
2019-12-02 10:20:57 -08:00
Eric Liang
0b0a16982a [doc] Use .options() (#6323)
* options doc

* add note

* hint shr
2019-12-01 17:24:00 -08:00
mehrdadn
75cc994e0a Update various build options relating to Windows (#6315)
* Update .bazelrc for Windows compatibility

* Block inclusion of (legacy) WinSock.h to avoid errors

* Suppress warnings for Windows code

* Include boost::asio in includes so that it is passed as -isystem to avoid warnings

* Link with -lpthread only on non-Windows

* Undefine BOOST_FALLTHROUGH, which is unnecessary and causes macro redefinition warnings

* Define RAY_STATIC and ARROW_STATIC to compile for Windows

* Add WinSock import library for Arrow
2019-12-01 15:05:50 -08:00
Philipp Moritz
22fa9b564b fix linting (#6322) 2019-12-01 14:06:35 -08:00
mehrdadn
10d49a3f6f Use Boost's socket_holder instead of manually managing the socket (#6314)
* Use Boost's socket_holder instead of manually managing sockets.

Socket types are not ints on Windows, and we need to use wrapper for proper lifetime management regardless.
2019-12-01 13:27:52 -08:00
fangfengbin
7275556365 Reconstruct local dead actors immediately instead of waiting for initial_reconstruction_timeout_ms (#6243) 2019-11-30 18:03:48 +08:00
Simon Mo
4033d65e4f
Fix redis-server stoping in linux (#6296)
* Cleanup test_calling_start_ray_head

* Kill redis-server with args instead of comm

In linux, ps -o pid,comm output just redis-server instead of the
full executable path
2019-11-29 22:50:05 -08:00
mehrdadn
e28e464158 Convert io_service_ from reference to smart pointer (#6285) 2019-11-29 16:09:46 -08:00
mehrdadn
b8cfdba752 Bazelify hiredis (#6203) 2019-11-29 15:32:45 -08:00
Yuhao Yang
ffa043d4b7 [tune] replace self.config (#6313) 2019-11-29 11:09:30 -08:00
Stephanie Wang
724a5e3909
Turn on direct calls for test_failure.py (#6291) 2019-11-28 12:28:30 -08:00
Eric Liang
b7b655c851
Also use NotifyDirectCallTaskBlock/Unblocked for plasma store accesses (#6249)
* wip

* fix it

* lint

* wip

* fix

* unblock

* flaky

* use fetch only flag

* Revert "use fetch only flag"

This reverts commit 56e938a0ee2024f5c99c9ab2d55fd35558fb15e1.

* restore error resolution

* use worker task id

* proto comments

* fix if
2019-11-27 22:46:15 -08:00
Eric Liang
e5863d7914
Force tune tests to run in direct call mode (#6301)
* force tune direct mode

* force tune

* fix

* Update run_multi_node_tests.sh
2019-11-27 19:58:33 -08:00
Simon Mo
dd80c6e6d4 Hotfix make docker images building optional (#6309)
* Make docker build optional

* Fix syntax error
2019-11-27 20:52:21 -06:00
Stephanie Wang
31a0b11e16 Revert SubmitTask over grpc, use RayletConnection instead (#6305)
* Revert SubmitTask over grpc

* comment
2019-11-27 19:28:12 -06:00
Simon Mo
22b305223a
Build Docker Containers for Linux Wheels (#6233) 2019-11-27 17:05:36 -08:00
Stephanie Wang
2797c11b69
[direct task] For serialized object IDs, check with owner before declaring object unreconstructable (#6286)
* Track borrowed vs owned objects

* Serialize owner address with object ID

* serialize owner task id

* Deserialize object IDs

* Pass direct task ID instead of plasma ID

* it works

* Fix ref count test

* Add unit test

* update warning

* we own ray.put objects

* missing file

* doc

* Fix unit test

* comments

* Fix py2

* lint

* update
2019-11-27 15:31:44 -08:00
Eric Liang
77b5098e7d
[rllib] Warn about dict action spaces 2019-11-27 12:57:38 -08:00
Simon Mo
df453c2a2f
Remove valgrind block (#6297) 2019-11-26 20:20:01 -08:00
Edward Oakes
e4f9b3b7d9
Use process reaper for cleanup (#6253) 2019-11-26 22:00:08 -06:00
Edward Oakes
8622559e0c
Use one queue per resource shape in direct task transport (#6277) 2019-11-26 20:56:05 -06:00
Eric Liang
ddc8855f41
Fix wrap (#6293) 2019-11-26 17:47:47 -08:00
Eric Liang
30b2fc1d81
Fix actor creation hang due to race in SWAP queue (#6280) 2019-11-26 15:21:03 -08:00
mehrdadn
cafdaa346f Update glog (#6287) 2019-11-26 14:54:36 -08:00
mehrdadn
5340e5280a Patch prometheus-cpp internally (#6281) 2019-11-26 14:49:24 -08:00
mehrdadn
82d17888e0 Patch grpc for Windows (#6282) 2019-11-26 14:45:21 -08:00
Simon Mo
1ca8c427e3 Consistent Name for Process Title (#6276)
* Consistent naming for setprotitle

* Address comments

* Add debug/verbose mode

* Fix test
2019-11-26 11:56:28 -08:00
Edward Oakes
141d667cee
Fix bash syntax error in test-wheels.sh (#6290) 2019-11-26 13:15:54 -06:00
Robert Nishihara
ffb9c0ecae Fix bug in which remote function redefinition doesn't happen. (#6175) 2019-11-26 11:19:19 -06:00
Edward Oakes
7f8de61441 [hotfix] Remove python/ray/tests/__init__.py (#6279)
* Remove python/ray/tests/__init__.py for bazel

* Comment out checks
2019-11-25 17:04:20 -08:00
Stephanie Wang
f6a0408173
Track pending tasks with TaskManager (#6259)
* TaskStateManager to track and complete pending tasks

* Convert actor transport to use task state manager

* Refactor direct actor transport to use TaskStateManager

* rename

* Unit test

* doc

* IsTaskPending

* Fix?

* Shared ptr

* HUH?

* Update src/ray/core_worker/task_manager.cc

Co-Authored-By: Zhijun Fu <37800433+zhijunfu@users.noreply.github.com>

* Revert "HUH?"

This reverts commit f80f0ba204ff4da5e0b03191fa0d5a4d9f552434.

* Fix memory issue

* oops
2019-11-25 16:37:26 -08:00
mehrdadn
ed5154d7fe Modify RayLogLevel to avoid conflicts with DEBUG macro and ERROR macros that are defined externally (#6204)
* Prevent name collision of ERROR macro from Windows with RayLogLevel::ERROR
2019-11-25 17:02:26 -07:00
mehrdadn
ca08a8f479 Update grpc to version that fixes typo in third_party/py/python_configure.bzl (#6235)
See https://github.com/grpc/grpc/pull/20774
2019-11-25 15:20:33 -07:00
Eric Liang
64a3a7239e
Set RAY_FORCE_DIRECT=1 for run_rllib_tests, test_basic (#6171) 2019-11-25 14:12:11 -08:00
Edward Oakes
c9314098b9
Implement direct task worker lease timeouts (#6188) 2019-11-25 14:48:19 -07:00