Commit graph

2880 commits

Author SHA1 Message Date
Jiajun Yao
c5c5c24e8f
Remove unused ObjectDirectory::LookupLocations() (#23647)
Remove dead code.
2022-04-01 10:03:37 -07:00
Hao Chen
75f1861625
Remove predefined resources vector in ResourceRequest (#23584)
"ResourceRequest" now uses 2 containers: a vector for predefined resources, and a map for custom resources. 
This was intended to be a perf optimization. However, in practice, this makes the code more complex, and, moreover, prevents optimizations for some methods (e.g., "ResourceIds", "Size").

This PR removes the vector and makes ResourceRequest use only one map for all resources. Also, "ResourceIds" now returns a "boost:range" to allow iterating resource IDs without having to construct temporary sets. 

microbenchmark shows a slight perf improvement.
last nightly: `placement group create/removal per second 837.76 +- 16.68`.
this PR: `placement group create/removal per second 895.76 +- 16.99`.
2022-03-31 17:16:11 -07:00
Yi Cheng
5a2ab76af8
[flaky] Release gcs client in test (#23644)
To deflaky gcs_client_test, this PR tries to release the client object.
2022-03-31 16:57:50 -07:00
Yi Cheng
8d7f71601d
deflaky ray syncer test (#23641) 2022-03-31 13:42:30 -07:00
Yi Cheng
87dc57df26
[3][cleanup][gcs] Remove redis based pubsub. (#23520)
This PR removes redis based pubsub.
2022-03-31 00:13:55 -07:00
Yi Cheng
df3e761b18
[gcs] Change old syncer to gcs_syncer namespace (#23623)
Before integration of the newly introduced ray_syncer, there is a conflict in naming. This PR move the old ray syncer to another namespace.
2022-03-30 21:54:02 -07:00
ZhuSenlin
e79a63db64
[GCS] [3 / n] Refactor gcs_resource_scheduler to cluster_resource_scheduler #23371
As we (@scv119 @iycheng @raulchen @Chong-Li @WangTaoTheTonic ) discussed offline, the GcsResourceScheduler on the GCS side should be unified to ClusterResourceScheduler.

There is already a big PR( #23268 ) to do this, but in order to make review easy, I will split it to two or mall small PRs.
This is [3/n]:

Move the implementation of all policies from gcs_resource_scheduler to bundle_scheduling_plocy
Delete gcs_resource_scheduler
Refactor gcs_resource_scheduler_test to cluster_resource_scheduler_2_test
BTW: The interface inside ISchedulingPolicy should be refactor in another PR, see the discussion #23323 (comment)

To be clear:

scorer related codes are moved out from gcs_resoruce_scheduler to scorer.h/.cc and no logic changes.
Policy related codes are moved out from gcs_resoruce_scheduler to bundle_scheduling_policy.h/.cc, and a small part of the logic in "GcsResourceScheduler::Schedule" is distributed into each policy.
Some codes inside gcs_placement_group_scheduler.h/.cc are changed to adapt to new data structure (SchedulingResult and SchedulingContext)
2022-03-30 17:39:46 -07:00
Yi Cheng
d01f947ff1
[gcs] Make core worker test compilable. (#23608)
It seems like core worker test is not running and it breaks the build. This PR fixed this.
2022-03-30 17:26:38 -07:00
jon-chuang
54ddcedd1a
[Core] Chore: move test to right dir #23096 2022-03-30 09:29:38 -07:00
Yi Cheng
781c46ae44
[scheduling][5] Refactor resource syncer. (#23270)
## Why are these changes needed?

This PR refactor the resource syncer to decouple it from GCS and raylet. GCS and raylet will use the same module to sync data. The integration will happen in the next PR.

There are several new introduced components:

* RaySyncer: the place where remote and local information sits. It's a coordinator layer.
* NodeState: keeps track of the local status, similar to NodeSyncConnection.
* NodeSyncConnection: keeps track of the sending and receiving information and make sure not sending the information the remote node knows.

The core protocol is that each node will send {what it has} - {what the target has} to the target.
For example, think about node A <-> B. A will send all A has exclude what B has to B.

Whenever when there is new information (from NodeState or NodeSyncConnection), it'll be passed to RaySyncer broadcast message to broadcast. 

NodeSyncConnection is for the communication layer. It has two implementations Client and Server:

* Server => Client: client will send a long-polling request and server will response every 100ms if there is data to be sent.
* Client => Server: client will check every 100ms to see whether there is new data to be sent. If there is, just use RPC call to send the data.

Here is one example:

```mermaid
flowchart LR;
    A-->B;
    B-->C;
    B-->D;
```

It means A initialize the connection to B and B initialize the connections to C and D

Now C generate a message M:

1. [C] RaySyncer check whether there is new message generated in C and get M
2. [C] RaySyncer will push M to NodeSyncConnection in local component (B)
3. [C] ServerSyncConnection will wait until B send a long polling and send the data to B
4. [B] B received the message from C and push it to local sync connection (C, A, D)
5. [B] ClientSyncConnection of C will not push it to its local queue since it's received by this channel.
6. [B] ClientSyncConnection of D will send this message to D
7. [B] ServerSyncConnection of A will be used to send this message to A (long-polling here)
8. [B] B will update NodeState (local component) with this message M
9. [D] D's pipelines is similar to 5) (with ServerSyncConnection) and 8)
10. [A] A's pipeline is similar to 5) and 8)
2022-03-29 23:52:39 -07:00
Stephanie Wang
da7901f3fc
[core] Filter out self node from the list of object locations during pull (#23539)
Running Datasets shuffle with 1TB data and 2k partitions sometimes times out due to a failed object fetch. This happens because the object directory notifies the PullManager that the object is already on the local node, even though it isn't. This seems to be a bug in the object directory.

To work around this on the PullManager side, this PR filters out the current node from the list of locations provided by the object directory. @jjyao confirmed that this fixes the issue for Datasets shuffle.
2022-03-29 15:18:14 -07:00
Yi Cheng
61c9186b59
[2][cleanup][gcs] Cleanup GCS client options. (#23519)
This PR cleanup GCS client options.
2022-03-29 12:01:58 -07:00
Hao Chen
b7d32df8b0
Refactor scheduler data structures (#22854)
This is the first PR to refactor scheduler data structures (See #22850).

Major changes:
- Hid the implementation details in the `ResourceRequest` and `TaskResourceInstnaces` classes, which expose public methods such as algebra operators and comparison operators. 
- Hid the differences between "predefined" and "custom" resources inside these 2 classes. Call sites can simply use the resource ID to access the resource, no matter it is predefined or custom.
- The predefined_resources vector now always has the full length. So no more "resize"s are needed. 
- Removed the `ResourceCapacity` class. Now "total" and "available" resources are stored in separate fields in "NodeResources". 
- Moved helper functions for FixedPoint vectors from "cluster_resource_data.h" to "fixed_point.h"
- "ResourceID" now has static methods to get the resource ids of predefined resources, e.g. "ResourceID::CPU()". 
- Encapsulated unit-instance resource logic to "ResourceID"

Other planned changes that are not included in this PR:
- Rename ResourceRequest to ResourceSet, and move it to its own file.
- Remove the predefined vectors and always use maps.

Co-authored-by: Chong-Li <lc300133@antgroup.com>
2022-03-29 19:44:59 +08:00
Matti Picus
77c4c1e48e
WINDOWS: enable and fix failures in test_runtime_env_complicated (#22449) 2022-03-29 00:56:42 -07:00
Yi Cheng
7de751dbab
[1][core][cleanup] remove enable gcs bootstrap in cpp. (#23518)
This PR remove enable_gcs_bootstrap flag in cpp.
2022-03-28 21:37:24 -07:00
Chen Shen
51bdefc2c8
[scheduler][monitoring] dump detailed spilling metrics (#23321)
Dump the detailed spilling metrics in scheduler.
2022-03-28 10:49:04 -07:00
Qing Wang
ef5b9b87d3
[Java] Add set runtime env api for normal task. (#23412)
This PR adds the API `setRuntimeEnv` for submitting a normal task, for the usage:
```java
RuntimeEnv runtimeEnv =
    new RuntimeEnv.Builder()
        .addEnvVar("KEY1", "A")
        .build();

/// Return `A`
Ray.task(RuntimeEnvTest::getEnvVar, "KEY1").setRuntimeEnv(runtimeEnv).remote().get();
```
2022-03-24 15:57:24 +08:00
mwtian
26f1a7ef7d
[Core] Account for spilled objects when reporting object store memory usage (#23425) 2022-03-23 22:25:22 -07:00
Eric Liang
38925f60d2
Add a get_if_exists option for simpler creation of named actors (#23344)
Getting or creating a named actor is a common pattern, however it is somewhat esoteric in how to achieve this. Add a utility function and test that it doesn't cause any scary error messages.

Actor.options(name="my_singleton", get_if_exists=True).remote(args)
2022-03-23 22:02:58 -07:00
Chong-Li
6e0e46ea56
[GCS] Make gcs scheduler accommodate cluster/local task managers (#22942)
* Accommodate cluster and local task managers

* Fix warning

* Fix bug

* Format

* Format

* fix torch

* Fix comments

* lint

Co-authored-by: Chong-Li <lc300133@antgroup.com>
2022-03-23 15:58:59 -07:00
Jiajun Yao
dfebf7ffae
Fix metric type for NumSpilledTasks to gauge (#23391)
The metric type for NumSpilledTasks should be gauge since the sum already happens in SchedulerStats.
2022-03-22 16:17:00 -07:00
Guyang Song
69af9764b2
[runtime env] URI reference refactor (#22828)
- Move the URI reference logic from raylet to agent.
- Redefine the runtime env agent RPC to `CreateRuntimeEnvOrGet` and `DeleteRuntimeEnvIfPossible`
- More details https://github.com/ray-project/ray/issues/21695#issuecomment-1032161528

Future works
- We don't remove the `RuntimeEnvUris` from `RuntimeEnv` protobuf in current PR because gcs also uses those URIs to do GC by runtime_env_manager. We should also clear this.
- Ray client server shouldn't interact with agent directly. Or Ray client server should also decrease the reference count.
- Currently, `WorkerPool::HandleJobStarted` will be called multiple times for one job. So we should make sure this function is idempotent. Can we change this logic and make this function be called only once?
2022-03-21 11:21:15 -05:00
Larry
81dcf9ff35
[Placement Group] Make PlacementGroupID generate from JobID (#23175) 2022-03-21 17:09:16 +08:00
ZhuSenlin
871f749baf
[GCS] [2 / n] Refactor gcs_resource_scheduler to cluster_resource_scheduler (#23323)
* Add new interface to policy for batch scheduling and unify the scheduling result and context

* Remove the dependence of GcsClient on ClusterResourceScheduler

* fix compile error

* fix lint error

Co-authored-by: 黑驰 <senlin.zsl@antgroup.com>
2022-03-20 15:03:14 -07:00
mwtian
909cdea3cd
[Python Worker] add feature flag to support forking from workers (#23260)
Make sure Python dependencies can be imported on demand, without the background importer thread. Use cases are:

If the pubsub notification for a new export is lost, importing can still be done.
Allow not running the background importer thread, without affecting Ray's functionalities.
Add a feature flag to support forking from Python workers, by

Enable fork support in gRPC.
Disable importer thread and only leave the main thread in the Python worker. The importer thread will not run after forking anyway.
2022-03-18 14:47:18 -07:00
Jialing He
4a83bc3dc2
[runtime env] Support set timeout for runtime env setup (#23082)
Interface example:
```python
@ray.remote(runtime_env=RuntimeEnv(..., config=RuntimeEnvConfig(setup_timeout_s=10))
def f(): pass

@ray.remote(runtime_env={..., "config": {"setup_timeout_s": 10}})
def f(): pass
```

Support set timeout second for timeout of runtime environment creation.

Co-authored-by: 捕牛 <hejialing.hjl@antgroup.com>
2022-03-18 12:52:59 -05:00
ZhuSenlin
d3f92cca33
rename gcs_resource_scheduler to cluster_resource_scheduler (#23274) 2022-03-18 13:19:33 +08:00
Tao Wang
b4bc8809dc
[Core][Tiny]Shorter thread name (#23222)
In linux the thread name could not be longer than 15 chars.
When we use command like top, we are easy being confused by similar thread name like `resource_report_poller` and `resource_report_broadcaster` because they are both show `resource_report`.

This pr uses abbr to make the thread names shorter.
2022-03-18 09:58:32 +08:00
Chris K. W
6416c65505
Revert "Revert "[Client] chunked get requests (#22455)"" (#23261)
* revert revertchunkedgets

* exit early if all chunks received, tighter exception handler for stream in proxy
2022-03-17 16:24:30 -07:00
ZhuSenlin
125ef0e5a6
[GCS] integrate cluster_resource_manager into gcs_resource_manager and gcs_resource_scheduler (#23105)
* refactor gcs_resource_manager

* fix lint error

* fix lint error

* fix compile error

* fix test

* fix test

* fix test

* add unit test

* refactor UpdateNodeNormalTaskResources

* fix comment

Co-authored-by: 黑驰 <senlin.zsl@antgroup.com>
2022-03-16 16:27:14 -07:00
Tao Wang
4614536572
Migrating to flat hash map [core worker&object manager] (#23126)
Next move of #22932. This pr replace unordered_map to flat_hash_map in core worker and object manager module.
Also some interfaces, like GetAllReferenceCounts, which expose user interfaces in Java/Python, is exclusive as it's a little bit complicated. We save them to deal with pg together.

The follow-up PRs would be migrating in reference counting, placement group and others.
2022-03-15 22:16:28 -07:00
Qing Wang
149d06442b
[Core][Java][Remove JVM FullGC 3/N] Disable every 10min FullGC. (#21443)
In this PR, we disabled every 10min FullGC which is not triggered by a global gc event in Java worker. As detail, we added `triggered_by_global_gc` flag to indicate whether the gc event is triggered by a global gc event. If it's triggered by global gc, we still need to do FullGC.

Co-authored-by: Qing Wang <jovany.wq@antgroup.com>
2022-03-16 11:18:12 +08:00
qicosmos
d8de5a445a
[C++ Worker]Python call cpp actor (#23061)
[Last PR](https://github.com/ray-project/ray/pull/22820) has supported python call c++ normal task, this PR supports python call c++ actor task.
2022-03-15 19:54:10 -07:00
Qing Wang
f51cb09e02
[Core][Java][Remove JVM FullGC 2/N] Make JVM be aware of in-memory store pressure. (#21441) 2022-03-15 19:25:27 +08:00
Guyang Song
f65971756d
[dashboard agent] Catch agent port conflict (#23024) 2022-03-15 16:09:15 +08:00
Chen Shen
5a2ebc281c
[Scheduler] separate scheduler code to its own build target (#23124)
* wip

* comments

* fix build

* fix-test

* fix format
2022-03-14 23:23:58 -07:00
Kai Yang
35c7275bfc
[Object Spilling] Handle IO worker failures correctly (#20752)
Currently, when a spill/restore worker fails and the state of it in the worker pool is idle, the worker pool will not clean up the metadata of the worker. Subsequent spill/restore requests will reuse this dead worker and RPC requests cannot succeed. This results in broken object spilling functionality.

This PR addresses the issue by removing disconnected IO workers from `registered_io_workers` and `idle_io_workers`.
2022-03-15 12:14:14 +08:00
Jialing He
39a6c054d3
[runtime env][feature] introduce pip_check_enable and pip_version (#22826) 2022-03-14 23:41:19 +08:00
Kai Yang
e9755d87a6
[Lint] One parameter/argument per line for C++ code (#22725)
It's really annoying to deal with parameter/argument conflicts. This is even frustrating when we merge code from the community to Ant's internal code base with hundreds of conflicts caused by parameters/arguments.

In this PR, I updated the clang-format style to make parameters/arguments stay on different lines if they can't fit into a single line.

There are several benefits:

* Conflict resolving is easier.
* Less potential human mistakes when resolving conflicts.
* Git history and Git blame are more straightforward.
* Better readability.
* Align with the new Python format style.
2022-03-13 17:05:44 +08:00
Chong-Li
f7e1343d39
[GCS] Fix the normal task resources at GCS (#22857)
* Fix the normal task resources at GCS

* Fix comments

* Leave a TODO

* Bring back a UT

* consider object memory

* Fix

Co-authored-by: Chong-Li <lc300133@antgroup.com>
2022-03-11 21:54:03 -08:00
jon-chuang
0b54d9c780
[GCS] Non-STRICT_PACK PGs should be sorted by resource priority, size (#22762)
Previously, placement group had suboptimal bin-packing resulting in unexpected placement group stalls for users.

The root cause is lack of implementation for sorting of pg bundles by resource priority and size.

This PR implements a naive priority mechanism for bundles that can be improved upon (and even config by user in the future) in the GCS resource scheduler.

The behaviour is to schedule: "GPU" first, custom resources in int64_t order next, and finally, memory and then "CPU" last.
2022-03-11 21:47:07 -08:00
Jialing He
0cbbb8c1d0
[runtime env][core] Use Proto message RuntimeEnvInfo between user code and core_worker (#22856) 2022-03-11 22:14:18 +08:00
Tao Wang
10c03cb126
Migrating to flat hash map [GCS&util&common] (#22932)
Next move of #19220. This pr replace unordered_map to flat_hash_map in most GCS code and some util & common modules.
The placement group part, which exposes user interfaces in Java/Python, is exclusive as it's a little bit complicated.

The follow-up PRs would be migrating in core worker, placement group and others.
2022-03-11 18:35:06 +09:00
Yi Cheng
ec88eb7d1d
[4][resource reporting] Remove ray syncer from gcs_resource_manager (#22832)
This PR is part of resource reporting refactoring. In this PR ray syncer is moved from gcs_resource_manager to gcs_placement_group_scheduler. With this one, gcs_resource_manager is totally decoupled from resource broadcasting.
2022-03-11 01:15:25 -08:00
Chen Shen
3ebc4ae289
fix comments and typo (#23008)
Fix comments and typos for scheduler code.
2022-03-10 11:40:31 -08:00
Yi Cheng
9f275c9bb8
[3][resource reporting] Use GCS to report the placement group creation information instead of reporting by raylet (#22597) 2022-03-10 11:08:21 -08:00
qicosmos
e4a9517739
[C++ Worker]Python call cpp worker (#22820) 2022-03-10 11:06:14 -08:00
ZhuSenlin
a15890be58
[GCS] refactor the resource related data structures on the GCS (#22924)
* refactor resource data structure in gcs

* fix comment

* fix lint error

* fix

* DISABLED_TestRejectedRequestWorkerLeaseReply as it depends on the update of normal task

Co-authored-by: 黑驰 <senlin.zsl@antgroup.com>
2022-03-09 08:22:02 -08:00
Chen Shen
bc3f7a7684
[scheduling policy 3/n][rfc] Refactor SchedulingPolicy into interface and implementations (#22907)
* scheduling policy

* update

Co-authored-by: Gagandeep Singh <gdp.1807@gmail.com>
2022-03-08 18:47:56 -08:00
Chen Shen
cd0354e06d
[scheduling-policy 2/n] refactor scheduling policy API (#22885)
* add scheduling-options

* address comments
2022-03-08 09:29:00 -08:00