Commit graph

7847 commits

Author SHA1 Message Date
Stephanie Wang
a86a7a6a98
[core] Cap total memory used by executing tasks' arguments (#15027)
* Task dependency map

* Pinned args threshold

* Unit test and fix

* no leaks

* update

* update

* remove assertion
2021-03-31 15:38:40 -07:00
Edward Oakes
126b9a6c14
[serve] Add basic upscaling test using cluster_utils (#15044) 2021-03-31 17:18:02 -05:00
Alex Wu
70f45af541
Deflake test_failure (#15026) 2021-03-31 14:59:38 -07:00
Edward Oakes
4061b72f2e
[serve] Add serve.get_deployment() API (#14953) 2021-03-31 14:57:39 -05:00
Simon Mo
57256b456a
[serve] Make sure test_imported_backend is ran (#15043) 2021-03-31 14:32:45 -05:00
SangBin Cho
79a6aa97b7
[Core] Optimize get core worker Stats (#15008)
* in progress.

* Optimize get core worker stats.

* Fix a segfault.

* Addressed code review.

* Update comments.

* Addressed code review.
2021-03-31 12:21:53 -07:00
Yi Cheng
4480132229
[core] Integration runtime_env with ray client (#14881)
* server side ready

* client size

* py

* fix

* up

* format

* add files

* add pyx

* up

* up

* up

* add keys

* format

* update

* format

* add unittests

* add files

* up

* up

* fix

* up

* fix thread issue

* format

* fix

* update proto

* Fix

* format

* fix

* more

* fix conflict

* fix

* fix order

* format

* add

* up

* compiling fix

* lint

* fix

* format

* fix some

* some fix

* fix comment

* test cases

* add test

* comments

* fix name

* format

* fix

* revert gcs-kv

* fix comments

* fix failure

* fix test

* format

* fix timeout

* fix

* fix

* fix

* format

* format

* fix flaky test

Co-authored-by: Yi Cheng <singye888@gmail.com>
2021-03-31 11:39:34 -07:00
Clark Zinzow
91cf272c2e
[Core] Exit autoscaler with a non-zero exit code upon handling SIGINT/SIGTERM (#14518) 2021-03-31 10:08:02 -07:00
Ian Rodney
32e50b8c67
[Docker] Run docker stop in parallel (#14901)
* first pass at parallel docker stop

* real impl

* use env var variable

* lint fix
2021-03-31 08:41:52 -07:00
Edward Oakes
107effb370
[serve] Add tests for reconnecting to cluster with ray client (#15029) 2021-03-31 10:08:12 -05:00
Edward Oakes
12f5e5ab62
[serve] Small cleanup in HTTP proxy (#15028) 2021-03-31 09:18:11 -05:00
Kai Yang
b0ea947fa3
[Java] Support getCurrentActorId in local mode (#14890) 2021-03-31 21:39:39 +08:00
Kai Yang
6278df8604
[Java] refine generation of jvm options (#14931) 2021-03-31 21:04:52 +08:00
Ian Rodney
73fb5d6022
[Autoscaler][Docker] Make disable_shm_size_detection more usable (#14913) 2021-03-30 18:10:09 -07:00
Siyuan (Ryans) Zhuang
3aa39142db
[Core] Remove code paths that run plasma store as a process (#14924)
* enable plasma store as thread by default

remove unused code path that runs plasma store as a process
2021-03-30 16:19:03 -07:00
Sven Mika
1bb70e4907
[RLlib] Issue 14523: Torch + py3.8 leads to GPU device error. (#15014) 2021-03-30 21:43:11 +02:00
Adam Lee
b643f4fc6d
fix paper link for ICM docs (#14973)
fix broken arvix https link for ICM (intrinsic curiosity module)
2021-03-30 12:27:34 -07:00
Sven Mika
95686a8fdd
[RLlib] Issue 14533: Tf-eager properly use tree.map_struct on value of type Repeated (RLlib-specific space) (#15015) 2021-03-30 19:28:45 +02:00
Sven Mika
c8ca4d03ad
[RLlib] Issue with agent-id -> pol-id mapping not required to be fixed across different episodes. (#15020) 2021-03-30 19:25:52 +02:00
Raphael CHEN
93d4244d9c
[RLlib] Correctly get bytes size of SampleBatch (#14801) 2021-03-30 19:24:58 +02:00
Michael Luo
b84575c092
[RLlib] 2 RLlib Flaky Tests (#14930) 2021-03-30 19:21:13 +02:00
Eric Liang
b90cc51c27
[RLlib] Attempt splitting rollout test to avoid initial timeout (#14999) 2021-03-30 19:20:02 +02:00
Clark Zinzow
ccb0cdaa35
Revert "skip on windows (#14988)" (#15017)
This reverts commit fe39c88a57.
2021-03-30 11:47:39 -05:00
Edward Oakes
c5e7ed5671
Revert "Add support for Python 3.9 (#12613)" (#15003)
This reverts commit 208cde8d9b.
2021-03-30 08:38:54 -05:00
Travis Addair
e5caaa7d1f
Fixed Dask on Ray for dask>=2021.3.1 which dropped Python 3.6 (#14991)
* Fixed Dask on Ray compatibility with dask==2021.3.1 which drops Python 3.6 support

* Lint
2021-03-29 23:21:58 -07:00
SangBin Cho
4edcaa8870
[Stats] Basic implementation for the the periodic asio stats printing support. (#14982)
* Basic implementation for the the periodic asio stats printing support.

* hacky way to count grpc stats.

* lint

* Fix an issue.

* Revert the request/reply.
2021-03-29 21:51:16 -07:00
Edward Oakes
3591c0ea2d
Revert "[minor] improve warning message for Ray. #14949" (#15004)
This reverts commit c84073f3f4.
2021-03-29 21:15:22 -07:00
Simon Mo
6b49714c04
[Serve] Add tests for more FastAPI features (#14961) 2021-03-29 17:38:51 -07:00
SangBin Cho
eaf159795b
[Test] Fix memory scheduling flaky test (#14980) 2021-03-29 15:44:26 -07:00
architkulkarni
7e8ae50c53
Treat doc warnings as errors in Makefile to mirror CI linter (#14917) 2021-03-29 15:18:22 -07:00
Richard Liaw
c84073f3f4
[minor] improve warning message for Ray. #14949
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2021-03-29 15:17:32 -07:00
Akash Patel
208cde8d9b
Add support for Python 3.9 (#12613) 2021-03-29 11:57:06 -07:00
Alex Wu
1f4d4dfeb0
Gcs pull resource reports (#14336) 2021-03-29 11:36:30 -07:00
Sven Mika
4f66309e19
[RLlib] Redo issue 14533 tf enable eager exec (#14984) 2021-03-29 20:07:44 +02:00
Edward Oakes
fe39c88a57
skip on windows (#14988) 2021-03-29 10:06:25 -07:00
Edward Oakes
e79d4cf6f5
[serve] Support setting deployment options via kwargs (#14935) 2021-03-29 11:14:27 -05:00
Sven Mika
e98808ce11
[RLlib] Fix 2 flakey test cases. (#14892) 2021-03-29 17:20:29 +02:00
Amog Kamsetty
95ff342558
[Tune] Wandb API Key File Compatibility with Ray Client (#14942) 2021-03-29 15:39:54 +02:00
dependabot[bot]
68c82b6503
[tune](deps): Bump wandb from 0.10.19 to 0.10.23 in /python/requirements (#14964)
Bumps [wandb](https://github.com/wandb/client) from 0.10.19 to 0.10.23.

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-03-29 15:37:56 +02:00
qicosmos
4c53c6ed1a
[C++ Worker] Improve normal task remote interface (#14978) 2021-03-29 19:55:16 +08:00
Siyuan (Ryans) Zhuang
87c79553e9
[Core] Remove code paths that contains plasma store executable (#14950)
* remove plasma store executable & never used tests

* set default behavior

* fix tests
2021-03-28 21:22:14 -07:00
Micah Yong
b3089b31f2
[RFC] Ray memory improvements: format and summary (#14520)
* Better formatting when terminal size doesn't support tabular

* Summary now displays size of reference types

* Add unit conversion support (e.g. b, kb, mb, gb)

* Format and test

* Add ability to specify the number of sorted entries

* Linting

* Clean up group summary, move import defaultdict, comment num entries counter, n

* Format and lint
2021-03-28 21:03:06 -07:00
DK.Pino
374d166f6d
[JAVA] [Doc] Improve java doc for PG (#14671) 2021-03-29 11:21:20 +08:00
Dmitri Gekhtman
dcf41d868c
[autoscaler][Kubernetes] Fix non_terminated_nodes consistency (#14976)
* Verify pod termination

* deletion-timestamp

* get rid of extra constant
2021-03-28 14:52:12 -07:00
Frank Luan
cdbaf930ab
[metrics] Fix deserialization warnings for metrics.Counter (#14969) 2021-03-28 09:44:30 -05:00
qicosmos
de7ee75d27
[C++ worker] Ray normal task for RAY_REMOTE (#14599) 2021-03-27 09:56:40 +08:00
Edward Oakes
fd4ed3acfe
[serve] Skip failing test_deploy tests on windows (#14957) 2021-03-26 13:51:54 -05:00
Eric Liang
af8a93f2a4
Deflake some RLlib tests (#14947)
* fix

* update

* 100

* flake
2021-03-26 11:45:17 -07:00
SangBin Cho
839cd1e0a2
[Core] Remove unnecessary redis connection (#14511)
* remove unnecessary stuff.

* test in progress.

* Fix tests.

* lint

* fix.

* Remove tests that were not working properly before.
2021-03-26 10:29:12 -07:00
Eric Liang
2157021fd3
Refactor object restoration path (#14821) 2021-03-25 22:46:50 -07:00