Commit graph

3547 commits

Author SHA1 Message Date
Eric Liang
ddc8855f41
Fix wrap (#6293) 2019-11-26 17:47:47 -08:00
Eric Liang
30b2fc1d81
Fix actor creation hang due to race in SWAP queue (#6280) 2019-11-26 15:21:03 -08:00
mehrdadn
cafdaa346f Update glog (#6287) 2019-11-26 14:54:36 -08:00
mehrdadn
5340e5280a Patch prometheus-cpp internally (#6281) 2019-11-26 14:49:24 -08:00
mehrdadn
82d17888e0 Patch grpc for Windows (#6282) 2019-11-26 14:45:21 -08:00
Simon Mo
1ca8c427e3 Consistent Name for Process Title (#6276)
* Consistent naming for setprotitle

* Address comments

* Add debug/verbose mode

* Fix test
2019-11-26 11:56:28 -08:00
Edward Oakes
141d667cee
Fix bash syntax error in test-wheels.sh (#6290) 2019-11-26 13:15:54 -06:00
Robert Nishihara
ffb9c0ecae Fix bug in which remote function redefinition doesn't happen. (#6175) 2019-11-26 11:19:19 -06:00
Edward Oakes
7f8de61441 [hotfix] Remove python/ray/tests/__init__.py (#6279)
* Remove python/ray/tests/__init__.py for bazel

* Comment out checks
2019-11-25 17:04:20 -08:00
Stephanie Wang
f6a0408173
Track pending tasks with TaskManager (#6259)
* TaskStateManager to track and complete pending tasks

* Convert actor transport to use task state manager

* Refactor direct actor transport to use TaskStateManager

* rename

* Unit test

* doc

* IsTaskPending

* Fix?

* Shared ptr

* HUH?

* Update src/ray/core_worker/task_manager.cc

Co-Authored-By: Zhijun Fu <37800433+zhijunfu@users.noreply.github.com>

* Revert "HUH?"

This reverts commit f80f0ba204ff4da5e0b03191fa0d5a4d9f552434.

* Fix memory issue

* oops
2019-11-25 16:37:26 -08:00
mehrdadn
ed5154d7fe Modify RayLogLevel to avoid conflicts with DEBUG macro and ERROR macros that are defined externally (#6204)
* Prevent name collision of ERROR macro from Windows with RayLogLevel::ERROR
2019-11-25 17:02:26 -07:00
mehrdadn
ca08a8f479 Update grpc to version that fixes typo in third_party/py/python_configure.bzl (#6235)
See https://github.com/grpc/grpc/pull/20774
2019-11-25 15:20:33 -07:00
Eric Liang
64a3a7239e
Set RAY_FORCE_DIRECT=1 for run_rllib_tests, test_basic (#6171) 2019-11-25 14:12:11 -08:00
Edward Oakes
c9314098b9
Implement direct task worker lease timeouts (#6188) 2019-11-25 14:48:19 -07:00
Edward Oakes
e72aef2ba6
[hotfix] Fix building linux wheels 2019-11-25 12:45:31 -07:00
Ameer Haj Ali
71316fa8d0 wrap models with DistributionalQModel when running DQN (#6258)
* wrap models with DistributionalQModel when running DQN

* wrap only for tensorflow models

* Update custom_keras_model.py
2019-11-25 00:11:24 -08:00
Eric Liang
7917bbef78
Set progress report interval for bazel explicitly (#6262)
* set progress internval

* add keep alive

* add keepalive

* remove cat

* smaller time

* squash error

* reduce log spam
2019-11-24 22:37:59 -08:00
Simon Mo
c8b69727cd
ray stop only kills process with ray keyword (#6257)
* Use psutil to kill processes

* Psutil as core requirement

* Revert "Psutil as core requirement"

This reverts commit d3235ce3d994d2bb7db39e3ad4a46049703898bb.

* Revert "Use psutil to kill processes"

This reverts commit de0ed874fed673f5e98715950688f418bbcc415c.

* Revert back to subproc

* Add comments, grep for ray as well

* SIGTERM
2019-11-24 16:32:07 -08:00
Eric Liang
e5b5c98558
Fix python PATH for build (#6260) 2019-11-24 15:32:06 -08:00
Eric Liang
53641f1f74
Move more unit tests to bazel (#6250)
* move more unit tests to bazel

* move to avoid conflict

* fix lint

* fix deps

* seprate

* fix failing tests

* show tests

* ignore mismatch

* try combining bazel runs

* build lint

* remove tests from install

* fix test utils

* better config

* split up

* exclusive

* fix verbosity

* fix tests class

* cleanup

* remove flaky

* fix metrics test

* Update .travis.yml

* no retry flaky

* split up actor

* split basic test

* split up trial runner test

* split stress

* fix basic test

* fix tests

* switch to pytest runner for main

* make microbench not fail

* move load code to py3

* test is no longer package

* bazel to end
2019-11-24 11:43:34 -08:00
Simon Mo
aa8d5d2f6c
Rate limit asyncio actor (#6242) 2019-11-24 11:39:28 -08:00
Simon Mo
9f0d005ce6
Use jobs 50 (#6255) 2019-11-24 00:32:38 -08:00
Yuhao Yang
f6a5baf844 [tune] minor doc fix (#6248) 2019-11-23 21:54:41 -08:00
Stephanie Wang
d2662fecea
Miscellaneous bug fixes to throw unreconstructable errors for direct calls (#6245)
* Test cases

* Fix InPlasmaError

* raylet fixes to force errors for direct calls

* Disable lineage logging and task pending checks for direct calls

* move todo

* Clean up tests

* Fix bugs in object store for Contains and Delete

* Use direct call in tests

* Fixes, separate actor creation direct call from normal direct call spec
2019-11-23 15:05:49 -08:00
Stephanie Wang
c4fa3b3afb
fix (#6251) 2019-11-23 15:04:48 -08:00
Eric Liang
ea270495a1
Remove stray change (#6247) 2019-11-23 00:07:45 -08:00
mehrdadn
94d37eee28 Update Boost via our own rule instead of managing our own fork (#6238) 2019-11-22 16:10:47 -08:00
Edward Oakes
ae5abc48a9
Fix race condition in redis_async_context.cc (#6231)
* dispatch callback to backend thread

* tmp: test in loop

* compiling

* Works using shared_ptrs

* Revert "tmp: test in loop"

This reverts commit faf1f8f74b34a99396906f56827d2691472ae7d4.

* Copy into CallbackReply

* fix comment

* warning

* add nil case
2019-11-22 15:51:40 -08:00
Simon Mo
f53f576120
Quiet Wget (#6244) 2019-11-22 14:32:14 -08:00
Eric Liang
b052bcf1fc
Bazelify tune tests in travis (#6219) 2019-11-22 13:58:50 -08:00
Ion
68ac08332b Initial commit of new cluster resource scheduler (#6178) 2019-11-22 11:14:46 -08:00
mehrdadn
05ce789e5b Reorganize ray_deps_setup.bzl to make all the GitHub rules uniform and download ZIP files for everything (#6193)
* Reorganize ray_deps_setup.bzl to make all the GitHub rules uniform

* Rewrite github_repository with explicit keyword-only arguments

Requires Bazel >= 0.29.0: https://github.com/bazelbuild/buildtools/pull/677
2019-11-22 09:59:32 -08:00
Simon Mo
eb6a93c0f0
[hotfix] fix lint (#6236) 2019-11-21 18:30:57 -08:00
Eric Liang
7559fdb141 [rllib/tune] Cache get_preprocessor() calls, default max_failur… (#6211) 2019-11-21 15:55:56 -08:00
Stephanie Wang
d3227f2f2d
Fix bug in direct task calls for objects that were evicted (#6216)
* Fix bug and add some checks

* rename
2019-11-21 15:38:31 -08:00
Stephanie Wang
eb7b73d731
Disconnect direct task workers that died (#6213)
* Disconnect workers that died so that we push the worker died error to redis

* Push error if actor is non nil

* fix test
2019-11-21 15:37:15 -08:00
mehrdadn
ba86c75c21 Patch Cython in grpc to use our COPTS (#6223) 2019-11-21 15:32:48 -08:00
Simon Mo
57e101e648
[CI] Pass cloud cache secrets to linux wheel (#6232) 2019-11-21 14:41:13 -08:00
Simon Mo
29ba6bfc64
Basic Async Actor Call (#6183)
* Start trying to figure out where to put fibers

* Pass is_async flag from python to context

* Just running things in fiber works

* Yield implemented, need some debugging to make it work

* It worked!

* Remove debug prints

* Lint

* Revert the clang-format

* Remove unnecessary log

* Remove unncessary import

* Add attribution

* Address comment

* Add test

* Missed a merge conflict

* Make test pass and compile

* Address comment

* Rename async -> asyncio

* Move async test to py3 only

* Fix ignore path
2019-11-21 11:56:46 -08:00
Simon Mo
c4132b501b [CI] Add Remote Caching (#6210) 2019-11-21 11:36:36 -08:00
Eric Liang
7f52d019ca
Inline memory_store_provider into memory_store (#6217) 2019-11-21 10:13:53 -08:00
Philipp Moritz
a4437813eb
[Projects] Unify hyphen vs underscore handling for arguments (#6208) 2019-11-20 23:52:41 -08:00
Eric Liang
1f9ab74293
Fix hang on Ray shutdown (#6201) 2019-11-20 23:30:35 -08:00
Eric Liang
425edb5cd9
Support NotifyBlocked/UnBlocked for direct call tasks (#6177) 2019-11-20 22:07:12 -08:00
Stephanie Wang
db77595298
Fix segfault for task arguments passed by value (#6214)
* Fix null data

* rename
2019-11-20 22:02:18 -08:00
mehrdadn
95bf977839 Rename UpdateResource due to conflict with Windows (#6205)
* Rename UpdateResource due to conflict with Windows

* Rename UpdateResource_ to UpdateResourceCapacity
2019-11-20 20:44:13 -08:00
Stephanie Wang
c0be9e6738
Resolve dependencies locally before submitting direct actor tasks (#6191)
* Priority queue in direct actor transport by task number

* Move LocalDependencyResolver out to separate file, share with direct actor transport

* works

* Test case for ordering

* Cleanups

* Remove priority queue

* comment

* Share ClientFactoryFn with direct actor transport

* Unit test

* fix
2019-11-20 16:45:19 -08:00
Philipp Moritz
33c768ebe4
Fix worker signal.SIGTERM handler being installed from outside the main thread (#6176) 2019-11-20 11:14:28 -08:00
Ujval Misra
0010382cc7 [tune] Report failures in a separate table (#6160)
* Report errors in a separate table.

* Single error file.
2019-11-20 10:53:47 -08:00
micafan
e7dbafa000 fix gcs::RedisAsioClient non-thread safe (#5946) 2019-11-20 10:18:35 -08:00