Max Fitton
c8f8a8e510
Remove suppress output to see why wheel is not building
2020-12-23 18:34:04 -08:00
Max Fitton
72fc27550c
Update to new form of brew cask install command
2020-12-23 10:39:37 -08:00
Max Fitton
bfc8d1be43
Bump version keys to 1.1.0
2020-12-21 11:38:22 -08:00
Max Fitton
51b4fd77ea
Update rllib regression_tests to install their dependencies via pip
2020-12-21 11:20:44 -08:00
Max Fitton
5f54fa02ec
merge commit bumping latest wheels version.
2020-12-21 10:45:46 -08:00
Max Fitton
f591f6c1c8
[Dashboard][Bugfix] Fix GPU List Bug ( #12666 )
...
* Fix bug where None was passed as the empty value for ActorInfo.gpu_stats instead of an empty list
* lint
* dashboard/modules/logical_view
* fix test
* trigger build
2020-12-20 17:06:49 -08:00
Richard Liaw
d9fc24a7aa
[core] recover startup logs ( #12876 )
2020-12-20 17:02:05 -08:00
architkulkarni
dcef909312
Fix for RLIMIT patch ( #12882 )
...
Implement new soft limit introduced by https://github.com/ray-project/ray/pull/12853 .
2020-12-20 16:58:47 -08:00
Eric Liang
c21d3feccf
Clip RLIMIT_NOFILE increase to avoid redis failing to start on Big Sur
2020-12-20 16:58:10 -08:00
Edward Oakes
40f77101d5
Check for raylet PID as ppid in dashboard agent fate-sharing ( #12867 )
2020-12-20 16:44:20 -08:00
SangBin Cho
ffd7d121ad
[Logging] Use file handle temporalily ( #12839 )
2020-12-20 16:43:11 -08:00
Max Fitton
496e449a8b
Switch long running test cluster yaml to ml image
2020-12-11 15:43:51 -08:00
Max Fitton
9b3863b81b
update long running release tests
2020-12-11 10:22:10 -08:00
Max Fitton
6532e30402
hard code link to release candidate wheel in release tests
2020-12-10 16:17:49 -08:00
Ameer Haj Ali
c7239d7b73
[hotfix][autoscaler] Request resources refactor2 ( #12661 )
...
* prepare for head node
* move command runner interface outside _private
* remove space
* Eric
* flake
* min_workers in multi node type
* fixing edge cases
* eric not idle
* fix target_workers to consider min_workers of node types
* idle timeout
* minor
* minor fix
* test
* lint
* eric v2
* eric 3
* min_workers constraint before bin packing
* Update resource_demand_scheduler.py
* Revert "Update resource_demand_scheduler.py"
This reverts commit 818a63a2c86d8437b3ef21c5035d701c1d1127b5.
* reducing diff
* make get_nodes_to_launch return a dict
* merge
* weird merge fix
* auto fill instance types for AWS
* Alex/Eric
* Update doc/source/cluster/autoscaling.rst
* merge autofill and input from user
* logger.exception
* make the yaml use the default autofill
* docs Eric
* remove test_autoscaler_yaml from windows tests
* lets try changing the test a bit
* return test
* lets see
* edward
* Limit max launch concurrency
* commenting frac TODO
* move to resource demand scheduler
* use STATUS UP TO DATE
* Eric
* make logger of gc freed refs debug instead of info
* add cluster name to docker mount prefix directory
* grrR
* fix tests
* moving docker directory to sdk
* move the import to prevent circular dependency
* smallf fix
* ian
* fix max launch concurrency bug to assume failing nodes as pending and consider only load_metric's connected nodes as running
* small fix
* request_resources -> min workers
* test fixes
* add race condition tests
* Eric
* fixes
* semi final
* semi final
* lint
* lint
Co-authored-by: Ameer Haj Ali <ameerhajali@ameers-mbp.lan>
Co-authored-by: Alex Wu <alex@anyscale.io>
Co-authored-by: Alex Wu <itswu.alex@gmail.com>
Co-authored-by: Eric Liang <ekhliang@gmail.com>
Co-authored-by: Ameer Haj Ali <ameerhajali@Ameers-MacBook-Pro.local>
2020-12-10 13:50:33 -08:00
Richard Liaw
ee2cdc0906
oops ( #12728 )
...
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2020-12-10 13:48:51 -08:00
fangfengbin
1305f5d4e5
[GCS]GCS based Actor Scheduling support actor colocation ( #12707 )
...
* [GCS]GCS based Actor Scheduling support actor colocation
* fix review comment
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2020-12-10 13:46:43 -08:00
Philipp Moritz
ec81eca6b0
[TEST] Fix Ray windows build for debugger ( #12671 )
...
* Fix Ray windows build for debugger
* update
2020-12-10 13:44:12 -08:00
Sumanth Ratna
16f7abfb8f
[dashboard] Resolve npm vulnerabilities ( #12620 )
...
* npm audit fix
* npm dedupe
2020-12-10 13:36:41 -08:00
Kai Fricke
986446d15c
[Release] release tests yamls for Tune & GPU ( #12496 )
2020-12-10 13:36:30 -08:00
Ian Rodney
38249ae035
[docker] Use legacy resolver ( #12741 )
2020-12-10 11:42:16 -08:00
Keqiu Hu
f27ceecbf6
[doc] update lint script location ( #12670 )
2020-12-07 22:26:42 -08:00
SangBin Cho
162f361dab
[Logging] Fix log monitor issue ( #12588 )
...
* Try fixing issues.
* Verficiation.
2020-12-07 22:01:18 -08:00
Max Fitton
cc2f43c826
[Dashboard][Bugfix] Fix bug in display of worker logs and errors in Dashboard ( #12660 )
...
* Fix bug with worker logs/errors not displaying in the dashboard
* Add error endpoint test.
* lint
2020-12-07 21:41:13 -08:00
Max Fitton
34b9c7449b
[Dashboard] Fix object store memory display. ( #12664 )
2020-12-07 21:40:49 -08:00
fangfengbin
93c0eb249c
[PlacementGroup]Support acquire and return bundle resource from gcs resource manager ( #12349 )
2020-12-08 10:29:57 +08:00
SangBin Cho
b1f2b142d5
[Core] Ensure global state is connected when exception hook is called from the driver. ( #12655 )
2020-12-07 18:28:32 -08:00
SangBin Cho
040cf2c13b
[Doc] Placement group doc small update ( #12594 )
...
* Modify doc that wasn't supposed to be merged.
* Addressed coder eview.
2020-12-07 13:58:27 -08:00
SangBin Cho
3ee4612696
[Release] Fix cluster.yaml ( #12589 )
...
* Fix cluster.yaml
* Updated to use manylinux2014
2020-12-07 13:52:30 -08:00
Sven Mika
340b1e99fc
[RLlib] Fix JAX import bug. ( #12621 )
2020-12-07 11:05:08 -08:00
fangfengbin
7e1422e925
[PlacementGroup]Fix placement group strict spread bug when node dead ( #12647 )
...
* [PlacementGroup]Fix strict spread bug when node dead
* fix ut bug
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2020-12-07 21:50:28 +08:00
Sven Mika
99c81c6795
[RLlib] Attention Net prep PR #3 . ( #12450 )
2020-12-07 13:08:17 +01:00
fangfengbin
401d342602
[PlacementGroup]Add PlacementGroup wait python api ( #12601 )
2020-12-07 13:53:49 +08:00
Philipp Moritz
73a1a232b9
Ray debugger stepping between tasks ( #12075 )
2020-12-06 21:50:18 -08:00
fangfengbin
260b07cf0c
[PlacementGroup]Add PlacementGroup wait java api ( #12499 )
...
* add part code
* add part code
* add part code
* add part code
* fix review comments
* fix compile bug
* fix compile bug
* fix review comments
* fix review comments
* fix code style
* add part code
* fix review comments
* fix review comments
* fix code style
* rebase master
* fix bug
* fix lint error
* fix compile bug
* fix newline issue
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2020-12-05 16:40:04 +08:00
Kai Fricke
1c0d10f67e
[tune] Add xgboost_ray integration ( #12572 )
2020-12-04 13:59:20 -08:00
Kai Fricke
219c445648
[tune] verbosity refactor second attempt ( #12571 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-12-04 13:56:26 -08:00
Xianyang Liu
7cad648370
[SGD] Fixes TorchTrainer scales up ( #12563 )
2020-12-04 13:55:15 -08:00
Marci
f965537ae9
[tune] Callable accepted for register_env ( #12618 )
2020-12-04 12:21:25 -08:00
SangBin Cho
0138c2dbb4
[Metrics] Remove redundant unit specification. ( #12595 )
2020-12-04 00:06:21 -08:00
Kai Yang
21fcee28f9
[Java] Simplify Ray.init()
by invoking ray start
internally ( #10762 )
2020-12-04 14:33:45 +08:00
Eric Liang
8cebe1e79c
[autoscaler] Fix worker capping fifo test in new scheduler ( #12512 )
2020-12-03 17:21:35 -08:00
Richard Liaw
515f67034a
[tune] debug py37 build ( #12597 )
2020-12-03 13:47:54 -08:00
Richard Liaw
1ce5e0e99f
[tune] Fix file descriptor leak by syncer ( #12590 )
2020-12-03 13:39:04 -08:00
Eric Liang
36e46ed923
Revert "[autoscaler/k8s] Use ray node's HOME in Kubernetes command runner. ( #12417 )" ( #12607 )
...
This reverts commit f669830de6
.
2020-12-03 12:57:59 -08:00
Simon Mo
1f7a4806ff
[Serve] Fix Flask Request self reference ( #12560 )
...
* [Serve] Fix Flask Request self reference
* Working flag
* Fix
2020-12-03 10:45:04 -06:00
Gekho457
f669830de6
[autoscaler/k8s] Use ray node's HOME in Kubernetes command runner. ( #12417 )
2020-12-03 10:43:16 -06:00
Sven Mika
3f4bc16276
[RLlib] Add a minimal JAX ModelV2 (FCNet) to RLlib. ( #12502 )
2020-12-03 15:51:30 +01:00
fangfengbin
ff34563539
[PlacementGroup]Fix bug that kill workers mistakenly when gcs restarts ( #12568 )
2020-12-03 17:50:48 +08:00
Richard Liaw
7c58a85fed
[tune] fix Tensorboard file descriptor leak ( #12425 )
2020-12-03 00:06:54 -08:00