From 1d0e83912ff7d3cbc963b156ef3ebe19d86c5c4e Mon Sep 17 00:00:00 2001 From: Marius Eriksen Date: Fri, 31 Oct 2025 13:26:37 -0700 Subject: [PATCH] [hypearactor] skeleton for mesh controller behavior # Basic idea: meshes are just resources like any other The idea is that we treat meshes like any other resource. We define a common behavior for mesh controllers. We stand up a controller actor, and ActorMesh::allocate, ProcMesh::spawn, ActorMesh::spawn, etc. -- all they do is issue a CreateOrUpdate. (This makes them nonblocking -- which is our eventual goal -- but we have a few more things to do before we're fully ready for this.) We can now query states, etc., by just calling GetState on the *mesh* resource. This is managed by the controller, and returns immediately (nonblocking). The controller is responsible for polling/pushing/keepaliving the underlying resources. # Supervision events The mesh controller can "raise" supervision events in the following way: when we create a resource, we include a "supervisor". The controller synthesizes and then pushes supervision events to this port. In effect, any time a rank in a mesh enters non-running status, we generate a supervision event. The controller keeps track of which events have been raised so as to not duplicate them. # Rationale Having a common resource model is very powerful. For one, it establishes a clean separation of concerns (controller responsible for managing a mesh vs. the owner of the mesh), and it allows us to build common tooling for managing all mesh types centered around a common interface (the mesh resource behavior). It also means that 1) we can have common tooling for *all* resources (a mesh is now named just by its controller actor+Name, and we can query it like any other resource, e.g., with `hyper`); 2) it integrates cleanly with observability / "Monarchy". Differential Revision: [D85982104](https://our.internmc.facebook.com/intern/diff/D85982104/) **NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D85982104/)! [ghstack-poisoned] --- hyperactor_mesh/src/resource.rs | 2 ++ hyperactor_mesh/src/resource/mesh.rs | 53 ++++++++++++++++++++++++++++ 2 files changed, 55 insertions(+) create mode 100644 hyperactor_mesh/src/resource/mesh.rs diff --git a/hyperactor_mesh/src/resource.rs b/hyperactor_mesh/src/resource.rs index 9cc23e2c6..d4edb73b3 100644 --- a/hyperactor_mesh/src/resource.rs +++ b/hyperactor_mesh/src/resource.rs @@ -9,6 +9,8 @@ //! This modules defines a set of common message types used for managing resources //! in hyperactor meshes. +pub mod mesh; + use core::slice::GetDisjointMutIndex as _; use std::collections::HashMap; use std::fmt; diff --git a/hyperactor_mesh/src/resource/mesh.rs b/hyperactor_mesh/src/resource/mesh.rs new file mode 100644 index 000000000..bcc4ce5bd --- /dev/null +++ b/hyperactor_mesh/src/resource/mesh.rs @@ -0,0 +1,53 @@ +/* + * Copyright (c) Meta Platforms, Inc. and affiliates. + * All rights reserved. + * + * This source code is licensed under the BSD-style license found in the + * LICENSE file in the root directory of this source tree. + */ + +#![allow(dead_code)] + +use hyperactor::Named; +/// This module defines common types for mesh resources. Meshes are managed as +/// resources, usually by a controller actor implementing the [`crate::resource`] +/// behavior. +/// +/// The mesh controller manages all aspects of the mesh lifecycle, and the owning +/// actor uses the resource behavior directly to query the state of the mesh. +use ndslice::Extent; +use serde::Deserialize; +use serde::Serialize; + +use crate::resource::CreateOrUpdate; +use crate::resource::GetState; +use crate::resource::Status; +use crate::resource::Stop; +use crate::v1::ValueMesh; + +/// Mesh specs +#[derive(Debug, Named, Serialize, Deserialize)] +pub struct Spec { + /// All meshes have an extent + extent: Extent, + // supervisor: PortHandle + /// The mesh-specific spec. + spec: S, +} + +/// Mesh states +#[derive(Debug, Named, Serialize, Deserialize)] +pub struct State { + /// The current status for each rank in the mesh. + statuses: ValueMesh, + /// Mesh-specific state. + state: S, +} + +// The behavior of a mesh controllšr. +// hyperactor::behavior!( +// Controller, +// CreateOrUpdate>, +// GetState>, +// Stop, +// );