Commit graph

5137 commits

Author SHA1 Message Date
Siyuan (Ryans) Zhuang
d61d92afc7
Cleanup Plasma Store (hash utilities) (#9524) 2020-07-16 14:52:14 -07:00
Michael Luo
94fcd43593
[rllib] MAML Transform (#9463)
* MAML Transform

* Moved Inner Adapt to Method in Execution Plan
2020-07-16 11:11:33 -07:00
Stephanie Wang
baf4be245d
Fix flaky test_actor_failures::test_actor_restart (#9509)
* Fix flaky test

* os exit
2020-07-16 10:48:33 -07:00
Ameer Haj Ali
1e46d4e29f
[Autoscaler] Making bootstrap config part of the node provider interface (#9443)
* supporting custom bootstrap config for external node providers

* bootstrap config

* renamed config to cluster_config

* lint

* remove 2 args from importer

* complete move of bootstrap to node_provider

* renamed provider_cls

* move imports outside functions

* lint

* Update python/ray/autoscaler/node_provider.py

Co-authored-by: Eric Liang <ekhliang@gmail.com>

* final fixes

* keeping lines to reduce diff

* lint

* lamba config

* filling in -> adding for lint

Co-authored-by: Ameer Haj Ali <ameerhajali@Ameers-MacBook-Pro.local>
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-07-16 09:54:20 -07:00
SangBin Cho
63e052a5f3
Fix. (#9464) 2020-07-16 11:51:32 -05:00
SangBin Cho
41ad5de1c4
Fix broken test_raylet_info_endpoint (#9511) 2020-07-16 11:51:06 -05:00
mehrdadn
ac39e23145
Get rid of build shell scripts and move them to Python (#6082) 2020-07-16 11:26:47 -05:00
Sven Mika
935d8308fb
[RLlib] Issue #9437 (PyTorch converts to CPU tensor, even if on GPU). (#9497) 2020-07-16 14:55:50 +02:00
SangBin Cho
2f674728a6
[GCS Actor Management] Gcs actor management broken detached actor (#9473) 2020-07-16 15:41:18 +08:00
mehrdadn
06ed2313e2
Fix clang-cl build (#9494)
Co-authored-by: Mehrdad <noreply@github.com>
2020-07-15 22:17:11 -07:00
chaokunyang
9318e76b81
[Java] Named java actor (#9037) 2020-07-16 11:31:18 +08:00
kisuke95
5e2571e214
release gil in global state accessor (#9357) 2020-07-16 11:21:10 +08:00
Stephanie Wang
4e81804cba
[core] Replace task resubmission in raylet with ownership protocol (#9394)
* Add intended worker ID to GetObjectStatus, tests

* Remove TaskID owner_id

* lint

* Add owner address to task args

* Make TaskArg a virtual class, remove multi args

* Set owner address for task args

* merge

* Fix tests

* Add ObjectRefs to task dependency manager, pass from task spec args

* tmp

* tmp

* Fix

* Add ownership info for task arguments

* Convert WaitForDirectActorCallArgs

* lint

* build

* update

* build

* java

* Move code

* build

* Revert "Fix Google log directory again (#9063)"

This reverts commit 275da2e400.

* Fix free

* Regression tests - shorten timeouts in reconstruction unit tests

* Remove timeout for non-actor tasks

* Modify tests using ray.internal.free

* Clean up future resolution code

* Raylet polls the owner

* todo

* comment

* Update src/ray/core_worker/core_worker.cc

Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>

* Drop stale actor table notifications

* Fix bug where actor restart hangs

* Revert buggy code for duplicate tasks

* build

* Fix errors for lru_evict and internal.free

* Revert "Drop stale actor table notifications"

This reverts commit 193c5d20e5577befd43f166e16c972e2f9247c91.

* Revert "build"

This reverts commit 5644edbac906ff6ef98feb40b6f62c9e63698c29.

* Fix free test

* Fixes for freed objects

Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
2020-07-15 14:55:51 -07:00
krfricke
5a40299d42
[tune] extend PTL template (GPU, typing fixes, tensorboard) (#9451)
Co-authored-by: Kai Fricke <kai@anyscale.com>
2020-07-15 10:30:20 -07:00
mehrdadn
aa8928fac2
Make more tests compatible with Windows (#9303) 2020-07-15 11:34:33 -05:00
mehrdadn
ad83337f46
Make pip install verbose (#9496)
Co-authored-by: Mehrdad <noreply@github.com>
2020-07-15 18:02:42 +02:00
Kai Yang
005ea1e125
Add job configs to gcs (#9374) 2020-07-15 15:18:48 +08:00
mehrdadn
33e400998c
Fix name clash on Windows (#9412)
Co-authored-by: Mehrdad <noreply@github.com>
2020-07-14 23:14:53 -07:00
Stephanie Wang
6d99aa34a5
[core] Handle out-of-order actor table notifications (#9449)
* Drop stale actor table notifications

* build

* Add num_restarts to disconnect handler

* Unit test and increment num_restarts on ALIVE, not RESTARTING

* Wait for pid to exit
2020-07-14 22:55:04 -07:00
chaokunyang
ccc1133a7a
[Java] fix redis-server binary path (#9398) 2020-07-15 10:47:16 +08:00
Edward Oakes
7eafe646a9
Fix flaky test_object_manager.py (#9472) 2020-07-14 18:44:48 -05:00
mehrdadn
ca4f3b79db
Speedups for GitHub Actions (#9343)
Co-authored-by: Mehrdad <noreply@github.com>
2020-07-14 14:51:51 -07:00
Zhuohan Li
003518619f
[Core] remove create_and_seal and create_and_seal_batch (#9457) 2020-07-14 14:37:13 -07:00
Ian Rodney
ed6157c257
[docker] Include base-deps image in rayproject Docker Hub (#9458) 2020-07-14 14:22:29 -07:00
Michael Mui
e93cde8c66
[tune] Issue 8821: ExperimentAnalysis doesn't expand user (#9461) 2020-07-14 13:53:37 -07:00
Siyuan (Ryans) Zhuang
1c992661a8
Add scripts symlink back (#9219) (#9475)
(cherry picked from commit 77933c922d)

Co-authored-by: Simon Mo <xmo@berkeley.edu>
2020-07-14 12:31:49 -07:00
SangBin Cho
539c51a003
[Core] Support GCS server port assignment. (#8962) 2020-07-14 11:49:56 -05:00
SangBin Cho
f6eb47fc1f
[Stats] metrics agent exporter (#9361) 2020-07-14 11:49:16 -05:00
Siyuan (Ryans) Zhuang
5b192842b5
Fix ObjectRef and ActorHandle serialization (#9462) 2020-07-14 09:42:32 -07:00
Siyuan (Ryans) Zhuang
d57ff5e2af
Remove legacy C++ code (#9459) 2020-07-14 00:57:42 -07:00
krfricke
deba082cb4
[tune] PyTorch CIFAR10 example (#9338)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Kai Fricke <kai@anyscale.com>
2020-07-13 23:16:05 -07:00
kisuke95
276fe109c5
change error code name of boost timer (#9417) 2020-07-14 11:50:58 +08:00
fangfengbin
3c90f960fb
Fix gcs_pubsub_test bug(#9438)
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-07-14 11:34:50 +08:00
Sven Mika
617eb8f279
[RLlib] Issue 9402 MARWIL producing nan rewards. (#9429) 2020-07-14 05:07:16 +02:00
Sven Mika
03ab86567f
[RLlib] Layout of Trajectory View API (new class: Trajectory; not used yet). (#9269) 2020-07-14 04:27:49 +02:00
Max Fitton
222635b63f
Machine View Sorting / Grouping (#9214)
* Convert NodeInfo.tsx to a functional component

* Update NodeRowGroup to be a functional component

* lint

* Convert TotalRow to functional component.

* lint

* move node info over to using the sortable table head component. spacing is still a little wonky.

* Factor a NoewWorkerRow class out of NodeRowGroup that will be usable when grouping / ungrouping

* Compilation checkpoint, I factored the worker filtering logic out of node info into the reducer

* Add sort accessors for CPU

* Add sort accessors for Disk

* Add sort accessors for RAM

* add a table sort util for function based accessors (rather than flat attribute-based accessor)

* wip refactor node info features

* wip

* Rendering Checkpoint. I've refactored the features and how they are called to add sorting support. Also reworks the way error counts and log counts are passed to the front-end to remove some ugly logic

* wip

* wip

* wip

* Finish adding sorting and grouping of machine view

* lint

* fix bug in filtration of logs and errors by worker from recent refactor.

* Add export of Cluster Disk feature

* fix some merge issues

Co-authored-by: Max Fitton <max@semprehealth.com>
2020-07-13 20:45:17 -05:00
SangBin Cho
22b2e51152
Fix test-multi-node (#9453) 2020-07-13 20:44:27 -05:00
Richard Liaw
a567f7977c
[tune] Put examples under proper version control (#9427)
Co-authored-by: krfricke <krfricke@users.noreply.github.com>
2020-07-13 18:01:10 -07:00
Richard Liaw
7abf7a0109
[docs] Render ActorPool documentation, etc (#9433) 2020-07-13 17:59:22 -07:00
Vasily Litvinov
6ad13e0da8
Add ability to specify SOCKS proxy for SSH connections (#8833) 2020-07-13 16:10:07 -07:00
Richard Liaw
dfe3ebe4a2
[tune] sklearn comment out (#9454) 2020-07-13 16:06:44 -07:00
Siyuan (Ryans) Zhuang
4da97a7c99
[Core] Build raylet client as an independent component (#9434) 2020-07-13 16:00:32 -07:00
mehrdadn
3d65682e62
Bazel selects compiler flags based on compiler (#9313)
Co-authored-by: Mehrdad <noreply@github.com>
2020-07-13 15:31:46 -07:00
Henk Tillman
c7714ca575
GCP authentication using oauth tokens (#9279) 2020-07-13 14:36:40 -07:00
Ian Rodney
0085cf75d0
Allow --lru-evict to be passed into ray start (#8959) 2020-07-13 14:09:39 -07:00
Amog Kamsetty
4454d05bcf
[Tune] Trainable documentation fix (#9448) 2020-07-13 13:15:01 -07:00
Hao Chen
e6225bdfa1
[GCS] Fix the bug about raylet receiving duplicate actor creation tasks (#9422) 2020-07-13 11:34:02 -07:00
mehrdadn
5291bf235b
TRAVIS_PULL_REQUEST is false for non-PRs, not empty (#9439)
Co-authored-by: Mehrdad <noreply@github.com>
2020-07-13 14:52:40 +02:00
Nicolaus93
b5a6c57295
[tune] handling nan values (#9381) 2020-07-12 17:08:36 -07:00
Tanay Wakhare
15aa08a3d1
[RLLib] WindowStat bug fix (#9213)
* WindowStat error catching, which processes NaNs properly instead of erroring. This ought to resolve issue #7910.
https://github.com/ray-project/ray/issues/7910
2020-07-12 23:01:32 +02:00