Commit graph

10898 commits

Author SHA1 Message Date
Qing Wang
8d2f53e25b
[Java] Add dependency reduced pom file to gitignore. (#21282) 2021-12-29 21:49:06 +08:00
mwtian
5377832383
[GCS][Bootstrap 1/n] Support bootstrapping with GCS in node.py (#21267) 2021-12-28 08:14:38 -07:00
Qing Wang
663e14b232
[Java] Fix namespace test case. (#21280)
Since we've supported lifetime in Java, we should set the DETACHED for the detached actors in test.
2021-12-28 22:31:51 +08:00
WanXing Wang
e5920dee8e
[Core]Refine StealTasks rpc. (#21258)
It seems that the `StealTasks` rpc has no different from other common rpc methods, should be implemented by `VOID_RPC_CLIENT_METHOD` macro. We find this when merge code into our internal codebase.
2021-12-28 14:17:25 +08:00
Philipp Moritz
4b9e865fd7
Unskip remaining tests in test_basic.py on Windows (#21273) 2021-12-27 21:20:45 -08:00
SangBin Cho
b5b11b2d06
[Nightly Test] Add a team column to each test config. (#21198)
Please review **e2e.py and test_suite belonging to your team**! 

This is the first part of https://docs.google.com/document/d/16IrwerYi2oJugnRf5hvzukgpJ6FAVEpB6stH_CiNMjY/edit#

This PR adds a team name to each test suite.

If the name is not specified, it will be reported as unspecified. 

If you are running a local test, and if the new test suite doesn't have a team name specified, it will raise an exception (in this way, we can avoid missing team names in the future).

Note that we will aggregate all of test config into a single file, nightly_test.yaml.
2021-12-27 14:42:41 -08:00
Matti Picus
3de18d2ada
WINDOWS: enable passing/skipping tests (#21136) 2021-12-27 11:59:00 -08:00
Israël Hallé
59209d695b
Includes .pyi files in package data. (#21247) 2021-12-27 11:50:02 -08:00
Philipp Moritz
583744ab57
Graduate Ray on Windows from experimental to beta (#21268) 2021-12-27 00:19:48 -08:00
Matti Picus
fcb952e1bc
WINDOWS: unskip passing runtime_env tests (#21252) 2021-12-26 20:49:02 -08:00
Akash Patel
cbcd03b779
Upgrade cython to 0.29.26 for py310 (#21244) 2021-12-26 20:26:08 -08:00
xwjiang2010
0b9cdb1eae
[tune] Have one canonical way of stopping trial. (#21021)
This PR is introducing a canonical impl for stopping trials by collecting scattered logic from process_trial_result back into stop_trial. This way, we know what is expected (e.g. what callbacks are invoked and when they are invoked).
This PR will correct the current wrong logic that on_trial_complete callback is invoked before on_trial_checkpoint, which is the source of Syncer clean up issues.
2021-12-25 10:13:30 +01:00
Gagandeep Singh
c5c5fec22b
Unskip test_standalone from ci.sh (#21235) 2021-12-25 00:21:58 -08:00
Yi Cheng
0d537c5d70
[5/gcs] Bootstrap default worker and update pubsub unit test (#21211)
This PR passes gcs address to worker and also update pubsub unit test.

Co-authored-by: mwtian <81660174+mwtian@users.noreply.github.com>
Co-authored-by: Mingwei Tian <mwtian@anyscale.com>
2021-12-23 07:57:14 -07:00
Qing Wang
2df27a5f87
[Java] Support ActorLifetime (#21074)
We add a enum class ActorLifetime to indicate the lifetime of an actor. In this PR, we also add the necessary API to create an actor with specifying lifetime.
Currently, it has 2 values: detached and default.
2021-12-23 19:48:56 +08:00
Qing Wang
e653d47533
[Java] Shade some widely used dependencies in bazel_jar_jar rule. (#21237)
These dependencies are widely used:
- com.google.common
- com.google.protobuf
- com.google.thirdparty

So that we need to shade them to avoid being conflict with jars introduced by user.

In this PR, we introduce a `bazel_jar_jar` rule for doing these and also shade them in maven pom files.
2021-12-23 16:54:31 +08:00
Jiajun Yao
60388b2834
Round robin during spread scheduling (#19968) 2021-12-22 20:27:34 -08:00
SangBin Cho
99693096d6
[gRPC] Improve blocking call Placement group (#21130)
Use Sync methods with timeout for placement group RPCs
2021-12-22 17:21:56 -08:00
Yi Cheng
11ab412db1
[4/gcs] Bootstrap global accessor from gcs (#21195)
This is part of redis removal. This PR enable global accessor to be able to start from gcs

Co-authored-by: mwtian <81660174+mwtian@users.noreply.github.com>
Co-authored-by: Mingwei Tian <mwtian@anyscale.com>
2021-12-22 01:27:25 -08:00
Gagandeep Singh
92bf609a08
Unskip tests in `test_basic_3.py` (#20433) 2021-12-22 00:09:32 -08:00
Yi Cheng
0c786b1109
[3/gcs] Bootstrap log monitor and monitor from gcs (#21194)
This is part of redis removal. This PR enable log monitor and monitor to bootstrap from gcs

Co-authored-by: mwtian <81660174+mwtian@users.noreply.github.com>
Co-authored-by: Mingwei Tian <mwtian@anyscale.com>
2021-12-21 23:15:55 -08:00
Simon Mo
cfe0897d05
[CI] Migrate Windows tests to Buildkite (#21227) 2021-12-21 20:16:34 -08:00
Sidhartha Parhi
5d6409fe2e
[Train] Remove run_dir param from BackendExecutor (#21231)
The run_dir argument in ray.train.backend.BackendExecutor.start_training isn't used but is causing the following error: if your host computer and job cluster use different OS, then you get a pathlib error because, for e.g., you can't instantiate a pathlib.WindowsPath in a Linux system.

Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
2021-12-21 19:54:43 -08:00
Amog Kamsetty
57db4640ca
[Train] [Tune] Refactor MLflow (#20802)
Pulls out Tune's MLflow logging logic to a shared MLflow util.
Adds an MLflow logger callback to Ray Train

Closes #20642
2021-12-21 17:17:52 -08:00
Yi Cheng
09421a4ca6
[2/gcs] Bootstrap dashboard for gcs ha (#21179)
This is part of gcs ha project. This PR try to bootstrap dashboard with gcs address instead of redis.

Co-authored-by: mwtian <81660174+mwtian@users.noreply.github.com>
2021-12-21 16:58:03 -08:00
Eric Liang
1db03862a7
Isolate function exports by job in separate queues (#20882) 2021-12-21 16:19:00 -08:00
Jiajun Yao
7d861a2c58
[Test] Add ray wheel sanity check (#21223) 2021-12-21 14:24:02 -08:00
Gagandeep Singh
5dc0f90ada
[Windows] Unskipped tests in test_standalone.py (#21213) 2021-12-21 11:37:23 -08:00
Yi Cheng
f62faca04c
[1/gcs] gcs ha bootstrap for raylet (#21174)
This is part of #21129

This PR tries to cover the cpp/ray part of the bootstrap, some updates there:

remove the unused function/tests
some API updates

Co-authored-by: mwtian <81660174+mwtian@users.noreply.github.com>
2021-12-21 08:50:42 -08:00
SangBin Cho
5d3042ed9d
[Internal Observability] Record Raylet Gauge (#21049)
* Revert "[Please revert] Remove new metrics temporarily"

This reverts commit baf7846daa3d1dad50dbedac19b7afbae3e197fc.

* Addressed code review.

* [Please revert] Revert plasma stats for the next PR

* improve grammar

* Addressed code review v1.

* Addressed code review.

* Add code owner.

* Fix tests.

* Add code owner to metric_defs.cc
2021-12-21 00:34:48 -08:00
Sven Mika
62dbf26394
[RLlib] POC: Run PGTrainer w/o the distr. exec API (Trainer's new training_iteration method). (#20984) 2021-12-21 08:39:05 +01:00
Dmitri Gekhtman
c9cf912a15
[autoscaler] Pass on provider.internal_ip() exceptions during scale down (#21204)
Treats failures of provider.internal_ip during node drain as non-fatal.
For example, if a node is deleted by a third party between the time it's scheduled for termination and drained, there will now be no error on GCP.

Closes #21151
2021-12-20 22:23:17 -08:00
qicosmos
d1a27487a3
[C++ Worker] fix uninit ray runtime instance (#21125)
In some compiler, the static ray runtime in ray runtime holder maybe a new un-init instance in dynamic library, 
so we need to init ray time holder in dynamic library to make sure the new instance valid.
2021-12-21 12:07:59 +08:00
Qing Wang
94251fbcc4
[Core] Fix invalid to specify concurrency group at runtime. (#21191)
We fix the issue that it's unable to specify the concurrency group name of an actor task at runtime with the following usage:
```python
a.f2.options(concurrency_group="compute").remote()
```
2021-12-21 10:47:47 +08:00
Linsong Chu
61bbecdb7d
[Workflow]add doc for metadata (#20156)
This PR adds documentation for Workflow Metadata, which we recently added support in https://github.com/ray-project/ray/pull/19372.

Co-authored-by: Yi Cheng <74173148+iycheng@users.noreply.github.com>
2021-12-20 17:24:07 -08:00
Hankpipi
ae5bb34f60
[Serve Autoscaler] Raise warning if max_concurrent_queries < target_num_ongoing_requests (#21184) 2021-12-20 16:07:19 -08:00
iasoon
1c93beb490
[serve] use true nulls in snapshot (#21062) 2021-12-20 16:07:09 -08:00
SangBin Cho
44320aba3b
[Nightly Test] Fix broken scalability test #21201
I added memory monitor to the scalability tests. This broke the tests because creating a memory monitor requires the node resources (to be scheduled on a head node), and that broke "resource leak" check. Ideally, this resource leak check should be more robust, but I fix the issue in an easier way for now. In the sooner future, memory monitor will become a fixture, and in that case, we should fix resource leak function code.
2021-12-20 14:58:39 -08:00
architkulkarni
5cc1308c66
[runtime env] [doc] [test] Add docs and tests for RAY_runtime_env_skip_local_gc environment variable (#21163) 2021-12-20 10:34:59 -08:00
SangBin Cho
5959669a70
[Core] Remove task table. (#21188)
Remove task table that's not used anymore.
2021-12-20 06:22:01 -08:00
architkulkarni
5b6bf534a0
[Java] Fix typo projetct->project in XML file (#21060) 2021-12-20 20:21:35 +08:00
Qing Wang
bd502e8bd5
[Java] Remove out of date comment. (#21073)
The semantic of `setName` API is changed, but the comment is out of date. This PR fixes it.
2021-12-20 20:07:59 +08:00
DK.Pino
33a45e55df
Revert "Revert "[Placement Group] Make placement group prepare resource rpc r… (#21144)" (#21152)
* Revert "Revert "[Placement Group] Make placement group prepare resource rpc r… (#21144)"

This reverts commit 02465a6792.

* fix flakey ut
2021-12-20 00:32:42 -08:00
mwtian
06ec07057c
Revert "[Core] Unrevert #21115, fix auto address env (#21158)" (#21189)
This reverts commit 968f08607b.

It is breaking e2e tests where worker nodes cannot start. e.g.

```
Traceback (most recent call last):
  File "/home/ray/anaconda3/bin/ray", line 8, in <module>
    sys.exit(main())
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/scripts/scripts.py", line 1961, in main
    return cli()
  File "/home/ray/anaconda3/lib/python3.7/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/ray/anaconda3/lib/python3.7/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/autoscaler/_private/cli_logger.py", line 808, in wrapper
    return f(*args, **kwargs)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/scripts/scripts.py", line 733, in start
    address_ip, password=redis_password)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/_private/services.py", line 593, in create_redis_client
    _, redis_ip_address, redis_port = validate_bootstrap_address(redis_address)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/_private/services.py", line 494, in validate_bootstrap_address
    raise ValueError("Malformed address. Expected '<host>:<port>'.")
ValueError: Malformed address. Expected '<host>:<port>'.
```
2021-12-20 00:22:12 -08:00
Guyang Song
2a9d9726d6
[doc] add doc for container runtime env (#21131) 2021-12-20 14:13:05 +08:00
architkulkarni
774163f9c9
[Java] Bump log4j 2.16.0 -> 2.17.0 (#21176)
Resolves [CVE-2021-45105](https://github.com/advisories/GHSA-p6xc-xr62-6r2g).
2021-12-20 10:27:24 +08:00
Oliver Mannion
8d9e0fca61
fix: data not exported (#20887)
* fix: data not exported

* empty commit
2021-12-18 22:33:34 -08:00
architkulkarni
2489b17634
[release] Uninstall old ray in all release test app configs to fix commit mismatch error (#21175)
* uninstall old ray in all release test app configs

* add instruction to e2e.py dosctring
2021-12-18 16:58:49 -08:00
Clark Zinzow
968f08607b
[Core] Unrevert #21115, fix auto address env (#21158)
This PR unreverts #21115, fixing the handling of an `"auto"` address in the `RAY_ADDRESS` environment variable.

Co-authored-by: Mingwei Tian <mwtian@anyscale.com>
2021-12-18 07:45:00 -08:00
Chen Shen
c9c3f0745a
[Dataset][nighlytest] use latest ray for running test #21148
We are actually using the ray comes with the image, which is on a very old version of Ray. (suprised this actually works)
2021-12-17 23:48:44 -08:00