Commit graph

5537 commits

Author SHA1 Message Date
Max Fitton
832f5cdccb
[Dashboard] Memory View Group by Stack Trace and UI Overhaul (#10227) 2020-08-24 14:54:42 -05:00
raoul-khour-ts
c8c4832794
Prevent Local Worker creation from blocking remote worker creation by creating remote workers before local worker (#10245)
* create remote workers before local worker

* reformatted
2020-08-24 12:29:55 -07:00
PidgeyBE
a82124d304
Update memory_monitor.py (#9212) 2020-08-24 10:29:01 -07:00
Eric Liang
4761eacc3e
[autoscaler] Also account for head node resources in multi node type autoscaling (#10230) 2020-08-24 10:26:22 -07:00
Ian Rodney
f051c2852e
[docker] docker cp correctly into container (#10253) 2020-08-24 09:18:34 -07:00
Kai Yang
07f6cb17e4
[Core] Multi-tenancy: Refine worker env variable passing (#10191)
* Resolve issues with environment variable handling

* fix

* fix warning

* lint

Co-authored-by: Mehrdad <noreply@github.com>
2020-08-24 09:04:22 -07:00
SangBin Cho
1f54acd274
[Tech Debt] Use f-string for python/ray/*.py (#10268)
* In progress.

* Done with critical path.

* Modified cluster_utils.py and log_monitor.py

* Addressed code review.
2020-08-23 22:01:31 -07:00
fangfengbin
b61a79efd7
[Placement Group]Fix SigSegv bug (#10262)
* fix SigSegv bug

* fix review comments

* fix ut bug

Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-08-23 11:33:40 -07:00
Richard Liaw
73c4246332
[Core] fix-bad-stack (#10266) 2020-08-23 10:33:29 -07:00
Michael Luo
48a39d7cb9
[RLlib] Deepmind Control Suite Examples (#9751) 2020-08-23 12:53:08 +02:00
Yu Shan
5264f888e4
fix iterable dataset (issue 9899) (#9952) 2020-08-22 19:40:38 -07:00
Maksim Smolin
245c0a9e43
[cli] Tests (#10057)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-08-22 13:29:10 -07:00
fangfengbin
8362029dcf
[Placement Group]Fix CrossLanguageInvocationTest failure (#10257)
* add part code

* rebase master

* add part code

* rebase master

Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-08-22 12:12:00 -07:00
Richard Liaw
6bd5458bef
[tune] cleanup error messaging/diagnose_serialization helper (#10210) 2020-08-22 11:50:49 -07:00
Richard Liaw
24ee496b89
[tune] support rerunning failed trials (#10060) 2020-08-22 09:59:05 -07:00
Ian Rodney
32ed1a18b7
[hotfix] Fix lint in master (#10254) 2020-08-21 20:53:05 -07:00
krfricke
c31876002d
[tune/rllib] made wandb compatible with rllib trainables (#10252) 2020-08-21 17:25:52 -07:00
Richard Liaw
f87669372d
[cli] enable log-new-style by default (#10213) 2020-08-21 15:21:43 -07:00
Alex Wu
136c8ff19e
[NewScheduler] Pass test_basic.py (#10059)
* .

* .

* Cleanup

* .

* whoops

* Update src/ray/raylet/scheduling/cluster_task_manager.h

Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>

* Update src/ray/raylet/scheduling/cluster_task_manager.h

Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>

* CR

* .

* .

* done

* .

* Unit tests

Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>
2020-08-21 15:00:08 -07:00
fangfengbin
36c6c4b298
[Placement group] Check if placement group bundle index is valid (#10194)
* add part code

* rebase master

* add java testcase

* fix review comments

* fix lint error

* rebase master

* fix lint error

Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-08-21 11:04:56 -07:00
Max Fitton
17f801dc69
Make get_py_stack return more stack frames (#9512) 2020-08-21 13:02:12 -05:00
Barak Michener
f03caa4532
rpc: Follow-up by sharing the core worker client pool within the core worker. (#10206)
* Share CoreWorkerClientPool

* Format
2020-08-21 11:01:22 -07:00
Sven Mika
e968b52cb7
[RLlib] Trajectory view API - 03 Fast LSTM + prev actions/rewards (#9950) 2020-08-21 12:35:16 +02:00
SangBin Cho
92664249e8
Partially Use f string (#10218)
* flynt. trial 1.

* Trial 1.

* Addressed code review.
2020-08-20 18:21:16 -07:00
architkulkarni
07cd815e5a
[Serve] Type hints for API (#10205) 2020-08-20 15:33:04 -07:00
Stephanie Wang
85e57a7a98
[Object spilling] Look up the location of the primary raylet from the owner's metadata (#10197)
* Get the primary copy from the owner, python test, some node manager fixes

* fixes and todo

* update

* lint

* fix build
2020-08-20 14:46:59 -07:00
Eric Liang
0baf992a4f
[hotfix] [autoscaler] Address remaining comments on renaming instance => node (#10229)
* more renaming

* fix import
2020-08-20 14:37:41 -07:00
Eric Liang
85a6876119
[autoscaler] Rename instance_type => node_type, TAG_RAY_INSTANCE_TYPE => TAG_RAY_USER_NODE_TYPE (#10207) 2020-08-20 12:27:11 -07:00
Amog Kamsetty
8d466749ee
[Tune] PBT hyperparam_mutations fix (#10217) 2020-08-20 12:02:29 -07:00
Simon Mo
6b93ad11d0
[Doc] Add Architecture Doc for Ray Serve (#10204) 2020-08-20 11:40:47 -07:00
fangfengbin
a462ae2747
[Placement Group]Add strict spread strategy (#10174)
* support STRICT_SPREAD strategy

* fix review comments

* rebase master

* fix lint error

* fix lint error

Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-08-20 10:18:58 -07:00
SangBin Cho
224933b5e4
[Placement Group] Remove API part 2 (#10215)
* Initial progress done.

* Fix mistake.

* Addressed code review.

* Fix cpp build issue.

* Addressed code review.
2020-08-20 09:50:13 -07:00
Sven Mika
d14b501692
[RLlib] First attempt at cleaning up algo code in RLlib: PG. (#10115) 2020-08-20 17:05:57 +02:00
Eric Liang
538cb802d5
[autoscaler] Refactor multi node type autoscaler config (#10190) 2020-08-19 20:46:00 -07:00
Richard Liaw
2fd59de05d
[autoscaler] hotfix - swallowed error for missing yaml (#10212) 2020-08-19 20:02:56 -07:00
Amog Kamsetty
9ff687c093
[SGD][Docs] docs for training/ validation results (#10181) 2020-08-19 17:22:28 -07:00
Simon Mo
a785106b47
[Doc] Remove experimental marker for asyncio API (#10202) 2020-08-19 16:52:50 -07:00
Amog Kamsetty
44e254788a
[Tune] PBT hyperparam_mutations improvements (#10170) 2020-08-19 16:50:19 -07:00
Eric Liang
5d265e9bd1
remove osx and linux actions (#10209) 2020-08-19 15:43:03 -07:00
architkulkarni
a3a9421787
added single quotes in pip install 'ray[rllib]' 2020-08-19 15:34:49 -07:00
Raphael Avalos
8b704eb419
Small fix for Cuda Torch DQN. (#10177) 2020-08-19 13:28:05 -07:00
Alex Wu
b70dce0d02
[autoscaler] Hotfix bad None check (#10196) 2020-08-19 13:27:20 -07:00
fangfengbin
9734dbca3e
[Placement Group]Reschedule bundles when the node of bundles is dead (#10021) 2020-08-19 13:24:42 -07:00
Edward Oakes
888f0a2c60
[serve] Use ray.experimental.metrics (#10185) 2020-08-19 13:03:22 -05:00
architkulkarni
de46464aa3
[Experimental] Queue: replace polling with async actor (#10120) 2020-08-19 11:55:42 -05:00
Sven Mika
2cbe29a7fa
[RLlib] Curiosity minor fixes, do-overs, and testing. (#10143) 2020-08-19 17:49:50 +02:00
Max Fitton
9c5e5a9757
[Dashboard] Fix and Recommit Reverted Group by Actor Class PR (#10186)
* Revert "Revert "[Dashboard] Group by Actor Class (#10147)" (#10180)"

This reverts commit e4d2ca620a.

* Fix metrics test to agree with the new logical view API

* lint2

Co-authored-by: Max Fitton <max@semprehealth.com>
2020-08-18 20:55:58 -07:00
Edward Oakes
ba0f531da0
[serve] Remove SLO code and blist dependency (#10075) 2020-08-18 17:52:36 -05:00
SangBin Cho
263df6163c
[Placement Group] Placement group remove api part 1 (#10063)
* Added basic rpc calls.

* fix issues.

* Fix the gcs server not getting request issue.

* In Progress.

* Basic logic done. Tests are required.

* In progress.

* In progress in refactoring context.

* Revert "In progress in refactoring context."

This reverts commit 38236256cf1306c60dd203e75d45ceb4509c8106.

* Working now.

* Python test works.

* Lint.

* Addressed code review.

* Addressed code review.

* Lint.

* Added unit tests.

* Done, but one of unit tests fail

* Addressed code review.

* Addressed the last code review.

* Fix the wrong test case.
2020-08-18 12:44:00 -07:00
Lixin Wei
d188becec2
[Python Worker] Add pid to log file name (#10149)
Co-authored-by: Alex Wu <alex@anyscale.io>
2020-08-18 11:48:48 -07:00