moq-dev · kixelated · Nov 12, 2025 · Nov 12, 2025
@@ -57,14 +57,14 @@ The first requirement for a real-time conferencing application is to discover ot
 Hang does this using moq-lite's ANNOUNCE capabilities.
 
 A room consists of a path.
-Any participants within the room MUST publish a broadcast with the room path as a prefix and it SHOULD end with the `.hang` suffix.
+Any participants within the room MUST publish a broadcast with the room path as a prefix which SHOULD end with the `.hang` suffix.
 
 For example:
 
 ~~~
-/room/alice.hang
-/room/bob.hang
-/other/zoe.hang
+/room123/alice.hang
+/room123/bob.hang
+/room456/zoe.hang
 ~~~
 
 A participant issues an ANNOUNCE_PLEASE message to discover any other participants in the same room.
@@ -94,113 +94,128 @@ A participant MAY forgo publishing a catalog if it does not wish to publish any
 
 The catalog track consists of multiple groups, one for each update.
 Each group contains a single frame with UTF-8 JSON.
-A publisher MUST NOT write multiple frames to a group until a future specification includes a delta-encoding mechanism.
+
+A publisher MUST NOT write multiple frames to a group until a future specification includes a delta-encoding mechanism (via JSON Patch most likely).
 
 ## Root
 The root of the catalog is a JSON document with the following schema:
 
 ~~~
 type Catalog = {
-	"audio": AudioTrack[],
-	"video": VideoTrack[],
+	"audio": AudioSchema | undefined,
+	"video": VideoSchema | undefined,
+	// ... any custom fields ...
 }
 ~~~
 
-When there are multiple audio or video tracks, they SHOULD describe the same content.
-For example, different resolutions, codecs, bitrates, etc.
-If a participant wants to publish unrelated content, for example sharing the screen in addition to a webcam, it SHOULD publish a separate broadcast (and catalog).
-
 Additional fields MAY be added based on the application.
 The catalog SHOULD be mostly static, delegating any dynamic content to other tracks.
-Additionally, a catalog SHOULD describe optional content, allowing the client to decide if it wants to subscribe.
 
-For example, a `"chat"` field should include the name of a chat track, not individual chat messages.
+For example, a `"chat"` section should include the name of a chat track, not individual chat messages.
 This way catalog updates are rare and a client MAY choose to not subscribe.
 
+This specification currently only defines audio and video tracks.
 
 ## Video
 A video track contains the necessary information to decode a video stream.
 
-Hang uses the [VideoDecoderConfig](https://www.w3.org/TR/webcodecs/#video-decoder-config).
-Any Uint8Array fields are hex-encoded into a string.
-
-The `track` field includes the name and priority of the track within the broadcast.
 
 ~~~
-type VideoTrack = {
-	"track": {
-		"name": string,
-		"priority": number,
-	},
-	"config": VideoDecoderConfig,
+type VideoSchema = {
+	"renditions": Map<TrackName, VideoDecoderConfig>,
+	"priority": u8,
+	"display": {
+		"width": number,
+		"height": number,
+	} | undefined,
+	"rotation": number | undefined,
+	"flip": boolean | undefined,
 }
 ~~~
 
+The `renditions` field contains a map of track names to video decoder configurations.
+See the [WebCodecs specification](https://www.w3.org/TR/webcodecs/#video-decoder-config) for specifics and registered codecs.
+Any Uint8Array fields are hex-encoded as a string.
+
 For example:
 
 ~~~
 {
-	"track": {
-		"name": "video",
-		"priority": 2
+	"renditions": {
+		"720p": {
+			"codec": "avc1.64001f",
+			"codedWidth": 1280,
+			"codedHeight": 720,
+			"bitrate": 6000000,
+			"framerate": 30.0
+		},
+		"480p": {
+			"codec": "avc1.64001e",
+			"codedWidth": 848,
+			"codedHeight": 480,
+			"bitrate": 2000000,
+			"framerate": 30.0
+		}
+	},
+	"priority": 2,
+	"display": {
+		"width": 1280,
+		"height": 720
 	},
-	"config": {
-		"codec": "avc1.64001f",
-		"codedWidth": 1280,
-		"codedHeight": 720,
-		"bitrate": 6000000,
-		"framerate": 30.0
-	}
+	"rotation": 0,
+	"flip": false,
 }
 ~~~
 
 
 ## Audio
 An audio track contains the necessary information to decode an audio stream.
 
-The `track` field includes the name and priority of the track within the broadcast.
-
-The `config` field contains an [AudioDecoderConfig](https://www.w3.org/TR/webcodecs/#audio-decoder-config).
-Any Uint8Array fields are hex-encoded into a string.
-
 ~~~
-type AudioTrack = {
-	"track": {
-		"name": string,
-		"priority": number,
-	},
-	"config": AudioDecoderConfig,
+type AudioSchema = {
+	"renditions": Map<TrackName, AudioDecoderConfig>,
+	"priority": u8,
 }
 ~~~
 
+The `renditions` field contains a map of track names to audio decoder configurations.
+See the [WebCodecs specification](https://www.w3.org/TR/webcodecs/#audio-decoder-config) for specifics and registered codecs.
+Any Uint8Array fields are hex-encoded as a string.
+
 For example:
 
 ~~~
 {
-	"track": {
-		"name": "audio",
-		"priority": 1
+	"renditions": {
+		"stereo": {
+			"codec": "opus",
+			"sampleRate": 48000,
+			"numberOfChannels": 2,
+			"bitrate": 128000
+		},
+		"mono": {
+			"codec": "opus",
+			"sampleRate": 48000,
+			"numberOfChannels": 1,
+			"bitrate": 64000
+		}
 	},
-	"config": {
-		"codec": "opus",
-		"sampleRate": 48000,
-		"numberOfChannels": 2,
-		"bitrate": 128000
-	}
+	"priority": 1,
 }
 ~~~
 
-# Media
-Media tracks are split into groups and further into frames.
+# Container
+Audio and video tracks use a lightweight container to encapsulate the media payload.
+
+Each moq-lite group MUST start with a keyframe.
+If codec does not support delta frames (ex. audio), then a group MAY consist of multiple keyframes.
+Otherwise, a group MUST consist of a single keyframe followed by zero or more delta frames.
+
+Each frame starts with a timestamp, a QUIC variable-length integer (62-bit max) encoded in microseconds.
+The remainder of the payload is codec specific; see the WebCodecs specification for specifics.
 
-A group consists of one or more frames in decode order.
-Each group MUST start with a keyframe.
-If a codec supports delta frames (video), then all subsequent frames MUST be delta frames.
-Otherwise, a group MAY consist of multiple keyframes (audio).
+For example, h.264 with no `description` field would be annex.b encoded, while h.264 with a `description` field would be AVCC encoded.
 
-Each "frame" consists of a tiny "container" containing the timestamp and codec specific payload.
-The timestamp is the presentation timestamp in microseconds encoded as a QUIC variable-length integer (62-bit max).
-The remainder of the frame payload is codec specific.
 
 # Security Considerations
 TODO Security