-
Notifications
You must be signed in to change notification settings - Fork 3.3k
[TTS] MagpieTTS: Implement Frechet Codec Distance metric + some minor inference bugfixes #15223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[TTS] MagpieTTS: Implement Frechet Codec Distance metric + some minor inference bugfixes #15223
Conversation
nemo/collections/tts/modules/magpietts_inference/evaluate_generated_audio.py
Fixed
Show fixed
Hide fixed
Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>
Instead of taking a codec instance, accept a codec name: local path or HF/NGC name. This simplifies the metric's integration in calling code. Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>
Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>
Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>
* address some CI linting issues * include a file that was missed in last commit Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>
8d997ac to
3fc5f37
Compare
Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>
* Add (optional) saving of generated codes and FCD calcualtion to longform version of inference * Clean up how disabling FCD is done: make it an explicit part of EvaluationConfig Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>
|
@subhankar-ghosh Could you please review just the latest commit in this PR? That part touches EvaluationConfig and longform inference. I mean the commit titled |
|
|
||
| if codec_model_path is not None: | ||
| if with_fcd: | ||
| fcd_metric = FrechetCodecDistance(codec_name=codec_model_path).to(device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if from torchmetrics.image.fid import FrechetInceptionDistance is installed in NeMo container by default. If it is not then we might want to check for it's availability and based on that log a warning message if it is not installed.
| fcd_metric.reset() | ||
| else: | ||
| fcd = 0.0 | ||
| fcd = float('nan') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Be careful about setting the None value for metrics, check how it is formatted in def compute_mean_with_confidence_interval. We should have a uniform default value for the condition if <any_metric> is None: case.
subhankar-ghosh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments. They are the points where I thought things might break just for you to double check. Otherwise LGTM. Make sure the tests pass.
What does this PR do ?
Adds the Frechet Codec Distance metric and integrates it in MagpieTTS inference scripts. Also fixes some minor MagpieTTS inference bugs.
Collection: TTS
Changelog
The Frechet Distance (FD) is commonly used to evaluate generative models (e.g. Frechet Inception Distance, Frechet Audio Distance). In this PR we implements FD in the embedding space of a neural codec. This is a metric that measures how closely the distributions of real and generated codec frames match, at the single frame level.
Changes:
frechet_codec_distance.py: An implementation of FD in codec embedding space. Builds on TorchMetrics' FID implementation. We provide the audio codec as a custom feature extractor.test_frechet_coec_distance.py: Unit test--disable_fcdcommand line argument tomagpietts_inference.pytitanet_smallspeaker representation model. This was present in earlier versions of the inference scripts and appears to have been accidentally lost in recent refactoringsPR Type: