Skip to content

Conversation

@rfejgin
Copy link
Collaborator

@rfejgin rfejgin commented Dec 23, 2025

What does this PR do ?

Adds the Frechet Codec Distance metric and integrates it in MagpieTTS inference scripts. Also fixes some minor MagpieTTS inference bugs.

Collection: TTS

Changelog

The Frechet Distance (FD) is commonly used to evaluate generative models (e.g. Frechet Inception Distance, Frechet Audio Distance). In this PR we implements FD in the embedding space of a neural codec. This is a metric that measures how closely the distributions of real and generated codec frames match, at the single frame level.

Changes:

  • frechet_codec_distance.py: An implementation of FD in codec embedding space. Builds on TorchMetrics' FID implementation. We provide the audio codec as a custom feature extractor.
  • test_frechet_coec_distance.py: Unit test
  • Integration of the FCD in MagpieTTS inference scripts. If desired, FCD calculation can be disabled using the --disable_fcd command line argument to magpietts_inference.py
  • Inference bugfixes
    • fix a logging statement that was reporting errors due to incorrect formatting syntax
    • disable logging of thousands of messages during loading of the titanet_small speaker representation model. This was present in earlier versions of the inference scripts and appears to have been accidentally lost in recent refactorings
    • Fix an issue where filewise metrics were not being filtered to a spcified subset as intended

PR Type:

  • New Feature
  • Bugfix
  • Documentation

@github-actions github-actions bot added the TTS label Dec 23, 2025
@rfejgin rfejgin marked this pull request as ready for review December 23, 2025 06:58
@rfejgin rfejgin marked this pull request as draft December 23, 2025 07:11
Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>
Instead of taking a codec instance, accept a codec name: local path or HF/NGC name.

This simplifies the metric's integration in calling code.

Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>
Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>
Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>
* address some CI linting issues
* include a file that was missed in last commit

Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>
Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>
@rfejgin rfejgin marked this pull request as ready for review December 23, 2025 18:27
blisc
blisc previously approved these changes Jan 6, 2026
@blisc blisc enabled auto-merge (squash) January 6, 2026 16:12
@rfejgin rfejgin marked this pull request as draft January 7, 2026 00:31
auto-merge was automatically disabled January 7, 2026 00:31

Pull request was converted to draft

* Add (optional) saving of generated codes and FCD calcualtion to longform version of inference
* Clean up how disabling FCD is done: make it an explicit part of EvaluationConfig

Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>
@rfejgin
Copy link
Collaborator Author

rfejgin commented Jan 7, 2026

@subhankar-ghosh Could you please review just the latest commit in this PR? That part touches EvaluationConfig and longform inference. I mean the commit titled Integrate FCD in longform inference and rework --disable_fcd. Thanks!

@rfejgin rfejgin marked this pull request as ready for review January 7, 2026 02:10

if codec_model_path is not None:
if with_fcd:
fcd_metric = FrechetCodecDistance(codec_name=codec_model_path).to(device)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if from torchmetrics.image.fid import FrechetInceptionDistance is installed in NeMo container by default. If it is not then we might want to check for it's availability and based on that log a warning message if it is not installed.

fcd_metric.reset()
else:
fcd = 0.0
fcd = float('nan')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be careful about setting the None value for metrics, check how it is formatted in def compute_mean_with_confidence_interval. We should have a uniform default value for the condition if <any_metric> is None: case.

Copy link
Collaborator

@subhankar-ghosh subhankar-ghosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments. They are the points where I thought things might break just for you to double check. Otherwise LGTM. Make sure the tests pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants