fangfengbin
ff34563539
[PlacementGroup]Fix bug that kill workers mistakenly when gcs restarts ( #12568 )
2020-12-03 17:50:48 +08:00
Richard Liaw
7c58a85fed
[tune] fix Tensorboard file descriptor leak ( #12425 )
2020-12-03 00:06:54 -08:00
Eric Liang
62fbe63f34
Disable flaky test test_delete_objects_multi_node ( #12584 )
...
* update
* fix
* update
2020-12-02 19:19:12 -08:00
Edward Oakes
8058c1eb54
[serve] Add option to not start HTTP servers ( #11627 )
2020-12-02 16:49:34 -06:00
Max Fitton
a5c846c83b
[Dashboard][Bugfix] Filter dead nodes from Machine View (fixes duplicate node issue) ( #12579 )
2020-12-02 14:08:14 -08:00
Keqiu Hu
2ec7b7367e
[doc] update contributing doc ( #12564 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-12-02 12:08:30 -08:00
Kaushik B
7422abddb4
[tune] trim kwargs in shim instantiation functions ( #12544 )
2020-12-02 12:07:00 -08:00
Richard Liaw
da42bf29d0
[tune] horovod release test ( #12495 )
2020-12-02 12:04:54 -08:00
Stephanie Wang
443339ab19
[core] Move out-of-memory handling into the plasma store and support async object creation ( #12186 )
...
* Refactor to extract creation request queue
* timer on oom
* move timer out
* Move evict_if_full and on_store_full into plasma store
* Remove client-side code
* revert
* Distinguish between transient and permanent OOM delays
* update
* Move out create request queue, unit test
* unit test
* Fix max retries
* test
* Do not pin restored objects
* First pass to add polling requests, unit test passes
* worker plasma client retries plasma requests
* cleanup
* Clean up after disconnected clients, check memory leaks
* Support immediate requests in request queue
* Option to try creating immediately
* lint
* Fix build, address comments
* doc
* fixes
* debug travis
* debug
* debug
* debug
* debug
* Revert "debug"
This reverts commit 6bf2f6ee5640e71630c4aecdb7ebf54911ea32db.
Revert "debug"
This reverts commit 73017099c9b06cdaae1217bf0e0f4d23ed68a9e5.
Revert "debug"
This reverts commit 5a155529e28cee9461a598b0cdf7b6a3cc194c93.
Revert "debug"
This reverts commit b50c2101afd45d4cf663daae857bfe1b40387703.
Revert "debug travis"
This reverts commit 012b8721dedf9bca46294ae75eee2815b160368b.
* Skip if new scheduler enabled
* error message
* merge
2020-12-02 13:25:54 -05:00
Ian Rodney
786f839ff3
[Windows] Fix windows build ( #12555 )
...
* fix remote watch
* remove const
* unfix remote-watch
* format
2020-12-02 09:37:40 -08:00
Kai Fricke
0a12eba603
Revert "Fix race condition between failure detection and references going out of scope ( #12548 )" ( #12570 )
...
This reverts commit 8801e87a
2020-12-02 10:20:17 -05:00
Richard Liaw
a21523c709
[tune/core] serialization debugging utility ( #12142 )
...
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: Kai Fricke <kai@anyscale.com>
2020-12-02 00:52:17 -08:00
Kai Fricke
63b85df828
[xgb] update docs ( #12549 )
2020-12-01 23:17:23 -08:00
Simon Mo
e428134137
[Hotfix] Pin llvmlite for windows build ( #12559 )
2020-12-01 19:43:08 -08:00
Siyuan (Ryans) Zhuang
615f974313
Add context for "test_buffer_alignment" ( #12519 )
2020-12-01 19:27:14 -08:00
Stephanie Wang
8801e87afd
Fix race condition between failure detection and references going out of scope ( #12548 )
...
* fix
* lint
2020-12-01 20:52:30 -05:00
Sven Mika
19c8033df2
[RLlib] Fix most remaining RLlib algos for running with trajectory view API. ( #12366 )
...
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* LINT and fixes.
MB-MPO and MAML not working yet.
* wip
* update
* update
* rmeove
* remove dep
* higher
* Update requirements_rllib.txt
* Update requirements_rllib.txt
* relpos
* no mbmpo
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-12-01 17:41:10 -08:00
Richard Liaw
4dc16730a7
[tune] with-params fix ( #12522 )
2020-12-01 16:47:03 -08:00
Simon Mo
7022278ce9
Deflake Serve tests ( #12542 )
2020-12-01 13:42:21 -08:00
Ameer Haj Ali
4288b5b9ff
[placement group] -1 option for placement group index ( #12532 )
...
* -1 option for placement group index
Add documentation of passing -1 as a placement group index to specify any available bundle.
* Update
2020-12-01 13:16:18 -08:00
SangBin Cho
981df65b91
[Doc] Improve the placement group document ( #12507 )
...
* Improve the placement group document.
* Fix grammar.
* Addressed code review.
2020-12-01 13:15:30 -08:00
Barak Michener
6412dfaf38
[ray_client] actors v0 ( #12388 )
2020-12-01 13:12:08 -08:00
SangBin Cho
0e892908f7
[Object Spilling] Delete spilled objects when references are gone out of scope. ( #12341 )
2020-12-01 13:10:39 -08:00
Simon Mo
ef1b0c13c3
Async Future Throws RayError as well ( #12419 )
2020-12-01 13:07:43 -08:00
Richard Liaw
bdf8ad3b5a
fix ( #12528 )
...
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2020-12-01 09:58:12 -08:00
Simon Mo
f596113fc7
[Core] Actor Retries Out of Order Tasks on Restart ( #12338 )
2020-12-01 09:35:54 -08:00
SangBin Cho
f6f3cc9af1
[Core]Remove checkpoint table ( #12235 )
...
* Delete an actor entry from node manager.
* Remove checkpoint table
* remote checkpoint interface
* remove checkpoint interface
* fix ExitActorTest
Co-authored-by: chaokunyang <shawn.ck.yang@gmail.com>
2020-12-01 08:58:36 -08:00
Sven Mika
9021f15b2a
[RLlib] Fix setup-dev.py error when creating a softlink for new_dashboard. ( #12442 )
2020-12-01 11:46:59 +01:00
Sven Mika
3ad9365e1d
[RLlib] Attention Net prep PR #2 : Smaller cleanups. ( #12449 )
2020-12-01 08:21:45 +01:00
Edward Oakes
e72147de38
Fix Serve typo ( #12524 )
2020-11-30 23:15:42 -08:00
Eric Liang
fd8ae0697b
[autoscaler] Fix test heartbeats single test ( #12513 )
...
* update
* update
* update
2020-11-30 21:24:45 -08:00
Amog Kamsetty
16ca748454
[CI] Use legacy resolver for some pip imports ( #12517 )
2020-11-30 21:18:21 -08:00
Amog Kamsetty
f9a99f20dd
Revert "Re-Revert "[Core] zero-copy serializer for pytorch ( #12344 )" ( #12478 )" ( #12515 )
...
This reverts commit 3f22448834
.
2020-11-30 19:05:55 -08:00
SangBin Cho
8223a33bff
[Logging] Log rotation on all components ( #12101 )
...
* In Progress.
* Done.
* Fix the issue.
* Add wait for condition because logs are not written right away now.
* debug string.
* lint.
* Fix flaky test.
* Fix issues.
* Fix test.
* lint.
2020-11-30 19:03:55 -08:00
Max Fitton
2708b3abbc
[Dashboard][Bug] Fix duplicate node total rows in dashboard ( #12410 )
...
* Fix duplicate node total rows in dashboard by changing the react key of the NodeTotalRow component from the node IP to the node ID (node IP can be duplicated in the case of docker).
* simplify a piece of test code and fix a flaky time out
* lint
2020-11-30 18:43:09 -08:00
Ian Rodney
e422ace053
[serve] Create CurrentState & GoalState ( #12369 )
2020-11-30 17:34:30 -08:00
Eric Liang
234df9091e
[autoscaler] Try to improve the request_resources() documentation ( #12465 )
2020-11-30 16:03:30 -08:00
Richard Liaw
9ce7ad17fd
[tune] remove some bottlenecks in trialrunner ( #12476 )
2020-11-30 14:54:25 -08:00
Ian Rodney
f5fe3794c8
[Docker] Uninstall Typing ( #12500 )
2020-11-30 14:12:57 -08:00
Siyuan (Ryans) Zhuang
3f22448834
Re-Revert "[Core] zero-copy serializer for pytorch ( #12344 )" ( #12478 )
...
* [Core] zero-copy serializer for pytorch (#12344 )
* zero-copy serializer for pytorch
* address possible bottleneck
* add tests & device support
(cherry picked from commit 0a505ca83d
)
* add environmental variables
* update doc
2020-11-30 11:43:03 -08:00
Sven Mika
bb03e2499b
[RLlib] PyBullet Env native support via env str-specifier (if installed). ( #12209 )
2020-11-30 12:41:24 +01:00
Tao Wang
b85c6abc3e
Rename fields/variables from client id to node id ( #12457 )
2020-11-30 14:33:36 +08:00
SangBin Cho
3964defbe1
[Logging] Fix tensorflow logging issue. ( #12225 )
...
* in progress.
* ip
* In Progress
* done.
* fix lint.
* Addressed code review
* Addressed code review.
2020-11-29 22:16:52 -08:00
SangBin Cho
91d54ef621
[Core] Remove actor arg from executor to allow users to specify actor… ( #12239 )
...
* [Core] Remove actor arg from executor to allow users to specify actor arg in their Actor.remote.
* Addressed code review.
2020-11-29 22:15:48 -08:00
chaokunyang
17a6b9bbe7
Fix not cp jars ( #12456 )
2020-11-30 13:53:09 +08:00
Philipp Moritz
cf73ccddae
Allow more fields for object metadata ( #12484 )
2020-11-29 21:50:18 -08:00
Alex Wu
f1cc33a6a6
Actor resource backlog hotfix ( #12471 )
...
* prepare implemented
* works?
* deflek
* git
* deflek round 2
* .
* improve the test
Co-authored-by: Alex <alex@anyscale.com>
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-11-29 20:55:50 -08:00
Sven Mika
fb318addcb
[RLlib] Curiosity exploration module: tf/tf2.x/tf-eager support. ( #11945 )
2020-11-29 12:31:24 +01:00
Micah Yong
a537b852e6
[docs][core] Documentation improvement in master/walkthrough.html ( #12473 )
2020-11-28 20:36:01 -08:00
Amog Kamsetty
8a406e1f9a
[SGD] Add PTL Docs ( #12440 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-28 10:09:38 -08:00