Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
139 changes: 77 additions & 62 deletions draft-lcurley-moq-hang.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,14 +57,14 @@ The first requirement for a real-time conferencing application is to discover ot
Hang does this using moq-lite's ANNOUNCE capabilities.

A room consists of a path.
Any participants within the room MUST publish a broadcast with the room path as a prefix and it SHOULD end with the `.hang` suffix.
Any participants within the room MUST publish a broadcast with the room path as a prefix which SHOULD end with the `.hang` suffix.

For example:

~~~
/room/alice.hang
/room/bob.hang
/other/zoe.hang
/room123/alice.hang
/room123/bob.hang
/room456/zoe.hang
~~~

A participant issues an ANNOUNCE_PLEASE message to discover any other participants in the same room.
Expand Down Expand Up @@ -94,113 +94,128 @@ A participant MAY forgo publishing a catalog if it does not wish to publish any

The catalog track consists of multiple groups, one for each update.
Each group contains a single frame with UTF-8 JSON.
A publisher MUST NOT write multiple frames to a group until a future specification includes a delta-encoding mechanism.

A publisher MUST NOT write multiple frames to a group until a future specification includes a delta-encoding mechanism (via JSON Patch most likely).

## Root
The root of the catalog is a JSON document with the following schema:

~~~
type Catalog = {
"audio": AudioTrack[],
"video": VideoTrack[],
"audio": AudioSchema | undefined,
"video": VideoSchema | undefined,
// ... any custom fields ...
}
~~~

When there are multiple audio or video tracks, they SHOULD describe the same content.
For example, different resolutions, codecs, bitrates, etc.
If a participant wants to publish unrelated content, for example sharing the screen in addition to a webcam, it SHOULD publish a separate broadcast (and catalog).

Additional fields MAY be added based on the application.
The catalog SHOULD be mostly static, delegating any dynamic content to other tracks.
Additionally, a catalog SHOULD describe optional content, allowing the client to decide if it wants to subscribe.

For example, a `"chat"` field should include the name of a chat track, not individual chat messages.
For example, a `"chat"` section should include the name of a chat track, not individual chat messages.
This way catalog updates are rare and a client MAY choose to not subscribe.

This specification currently only defines audio and video tracks.

## Video
A video track contains the necessary information to decode a video stream.

Hang uses the [VideoDecoderConfig](https://www.w3.org/TR/webcodecs/#video-decoder-config).
Any Uint8Array fields are hex-encoded into a string.

The `track` field includes the name and priority of the track within the broadcast.

~~~
type VideoTrack = {
"track": {
"name": string,
"priority": number,
},
"config": VideoDecoderConfig,
type VideoSchema = {
"renditions": Map<TrackName, VideoDecoderConfig>,
"priority": u8,
"display": {
"width": number,
"height": number,
} | undefined,
"rotation": number | undefined,
"flip": boolean | undefined,
}
~~~

The `renditions` field contains a map of track names to video decoder configurations.
See the [WebCodecs specification](https://www.w3.org/TR/webcodecs/#video-decoder-config) for specifics and registered codecs.
Any Uint8Array fields are hex-encoded as a string.

For example:

~~~
{
"track": {
"name": "video",
"priority": 2
"renditions": {
"720p": {
"codec": "avc1.64001f",
"codedWidth": 1280,
"codedHeight": 720,
"bitrate": 6000000,
"framerate": 30.0
},
"480p": {
"codec": "avc1.64001e",
"codedWidth": 848,
"codedHeight": 480,
"bitrate": 2000000,
"framerate": 30.0
}
},
"priority": 2,
"display": {
"width": 1280,
"height": 720
},
"config": {
"codec": "avc1.64001f",
"codedWidth": 1280,
"codedHeight": 720,
"bitrate": 6000000,
"framerate": 30.0
}
"rotation": 0,
"flip": false,
}
~~~


## Audio
An audio track contains the necessary information to decode an audio stream.

The `track` field includes the name and priority of the track within the broadcast.

The `config` field contains an [AudioDecoderConfig](https://www.w3.org/TR/webcodecs/#audio-decoder-config).
Any Uint8Array fields are hex-encoded into a string.

~~~
type AudioTrack = {
"track": {
"name": string,
"priority": number,
},
"config": AudioDecoderConfig,
type AudioSchema = {
"renditions": Map<TrackName, AudioDecoderConfig>,
"priority": u8,
}
~~~

The `renditions` field contains a map of track names to audio decoder configurations.
See the [WebCodecs specification](https://www.w3.org/TR/webcodecs/#audio-decoder-config) for specifics and registered codecs.
Any Uint8Array fields are hex-encoded as a string.

For example:

~~~
{
"track": {
"name": "audio",
"priority": 1
"renditions": {
"stereo": {
"codec": "opus",
"sampleRate": 48000,
"numberOfChannels": 2,
"bitrate": 128000
},
"mono": {
"codec": "opus",
"sampleRate": 48000,
"numberOfChannels": 1,
"bitrate": 64000
}
},
"config": {
"codec": "opus",
"sampleRate": 48000,
"numberOfChannels": 2,
"bitrate": 128000
}
"priority": 1,
}
~~~

# Media
Media tracks are split into groups and further into frames.
# Container
Audio and video tracks use a lightweight container to encapsulate the media payload.

Each moq-lite group MUST start with a keyframe.
If codec does not support delta frames (ex. audio), then a group MAY consist of multiple keyframes.
Otherwise, a group MUST consist of a single keyframe followed by zero or more delta frames.

Each frame starts with a timestamp, a QUIC variable-length integer (62-bit max) encoded in microseconds.
The remainder of the payload is codec specific; see the WebCodecs specification for specifics.

A group consists of one or more frames in decode order.
Each group MUST start with a keyframe.
If a codec supports delta frames (video), then all subsequent frames MUST be delta frames.
Otherwise, a group MAY consist of multiple keyframes (audio).
For example, h.264 with no `description` field would be annex.b encoded, while h.264 with a `description` field would be AVCC encoded.

Each "frame" consists of a tiny "container" containing the timestamp and codec specific payload.
The timestamp is the presentation timestamp in microseconds encoded as a QUIC variable-length integer (62-bit max).
The remainder of the frame payload is codec specific.

# Security Considerations
TODO Security
Expand Down
Loading
Loading