Skip to content

Conversation

@kausv
Copy link
Contributor

@kausv kausv commented Aug 18, 2025

Summary:
Test link: https://www.internalfb.com/intern/test/281475203207916

The test is flaky because KVZCH kernel guarantees accuracy of 1e-2 for FP16.
I changed test_model_parallel_base to accept custom tolerance to override default atol/rtol and added the tolerance to this test to resolve the flakiness

Differential Revision: D80457783

Summary:
Test link: https://www.internalfb.com/intern/test/281475203207916

The test is flaky because KVZCH kernel [guarantees accuracy of 1e-2](https://www.internalfb.com/code/fbsource/[35a43c0e43e5]/fbcode/deeplearning/fbgemm/fbgemm_gpu/test/tbe/ssd/ssd_split_tbe_training_test.py?lines=1399-1402) for FP16.
I changed test_model_parallel_base to accept custom tolerance to override default atol/rtol and added the tolerance to this test to resolve the flakiness

Differential Revision: D80457783
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 18, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80457783

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants