Skip to content

Commit d14ac35

Browse files
committed
[hypearactor] skeleton for mesh controller behavior
Pull Request resolved: #1729 # Basic idea: meshes are just resources like any other The idea is that we treat meshes like any other resource. We define a common behavior for mesh controllers. We stand up a controller actor, and ActorMesh::allocate, ProcMesh::spawn, ActorMesh::spawn, etc. -- all they do is issue a CreateOrUpdate. (This makes them nonblocking -- which is our eventual goal -- but we have a few more things to do before we're fully ready for this.) We can now query states, etc., by just calling GetState on the *mesh* resource. This is managed by the controller, and returns immediately (nonblocking). The controller is responsible for polling/pushing/keepaliving the underlying resources. # Supervision events The mesh controller can "raise" supervision events in the following way: when we create a resource, we include a "supervisor". The controller synthesizes and then pushes supervision events to this port. In effect, any time a rank in a mesh enters non-running status, we generate a supervision event. The controller keeps track of which events have been raised so as to not duplicate them. # Rationale Having a common resource model is very powerful. For one, it establishes a clean separation of concerns (controller responsible for managing a mesh vs. the owner of the mesh), and it allows us to build common tooling for managing all mesh types centered around a common interface (the mesh resource behavior). It also means that 1) we can have common tooling for *all* resources (a mesh is now named just by its controller actor+Name, and we can query it like any other resource, e.g., with `hyper`); 2) it integrates cleanly with observability / "Monarchy". ghstack-source-id: 321276801 @exported-using-ghexport Differential Revision: [D85982104](https://our.internmc.facebook.com/intern/diff/D85982104/) **NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D85982104/)!
1 parent 43b0dcd commit d14ac35

File tree

2 files changed

+55
-0
lines changed

2 files changed

+55
-0
lines changed

hyperactor_mesh/src/resource.rs

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@
99
//! This modules defines a set of common message types used for managing resources
1010
//! in hyperactor meshes.
1111
12+
pub mod mesh;
13+
1214
use core::slice::GetDisjointMutIndex as _;
1315
use std::collections::HashMap;
1416
use std::fmt;
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
/*
2+
* Copyright (c) Meta Platforms, Inc. and affiliates.
3+
* All rights reserved.
4+
*
5+
* This source code is licensed under the BSD-style license found in the
6+
* LICENSE file in the root directory of this source tree.
7+
*/
8+
9+
#![allow(dead_code)]
10+
11+
use hyperactor::Named;
12+
/// This module defines common types for mesh resources. Meshes are managed as
13+
/// resources, usually by a controller actor implementing the [`crate::resource`]
14+
/// behavior.
15+
///
16+
/// The mesh controller manages all aspects of the mesh lifecycle, and the owning
17+
/// actor uses the resource behavior directly to query the state of the mesh.
18+
use ndslice::Extent;
19+
use serde::Deserialize;
20+
use serde::Serialize;
21+
22+
use crate::resource::CreateOrUpdate;
23+
use crate::resource::GetState;
24+
use crate::resource::Status;
25+
use crate::resource::Stop;
26+
use crate::v1::ValueMesh;
27+
28+
/// Mesh specs
29+
#[derive(Debug, Named, Serialize, Deserialize)]
30+
pub struct Spec<S> {
31+
/// All meshes have an extent
32+
extent: Extent,
33+
// supervisor: PortHandle<SupervisionEvent(?)>
34+
/// The mesh-specific spec.
35+
spec: S,
36+
}
37+
38+
/// Mesh states
39+
#[derive(Debug, Named, Serialize, Deserialize)]
40+
pub struct State<S> {
41+
/// The current status for each rank in the mesh.
42+
statuses: ValueMesh<Status>,
43+
/// Mesh-specific state.
44+
state: S,
45+
}
46+
47+
// The behavior of a mesh controllšr.
48+
// hyperactor::behavior!(
49+
// Controller<Sp, St>,
50+
// CreateOrUpdate<Spec<Sp>>,
51+
// GetState<State<St>>,
52+
// Stop,
53+
// );

0 commit comments

Comments
 (0)