GitHub - north-numerical-computing/MATLAB-tensor-core: The MATLAB Tensor Core: a set of models of tensor cores written in MATLAB

MATLAB Tensor Core models

Overview

This repository provides accurate tensor core models written in MATLAB. It also includes parts of the model validation data which is used to refine the models as shown in [1].

The models directory contains the MATLAB models of tensor cores of several NVIDIA GPUs, all of which are build on the parameterised model in Generic_BFMA_TC.m. For example the B200TC.m models the General Matrix Multiply (GEMM) based on the accurate model of a tensor core in the NVIDIA Blackwell B200 GPUs. In the current version of the toolbox, the models take matrices and input and output floating-point formats as inputs and multiply the matrices by using a recursive summation algorithm to accummulate the results of several tensor core invocations.

The initial analysis of the behaviour of GPU tensor cores is performed with the code available at IEEE_HPEC2025_block_FMA_tests. It is based on the generalised testing methodology [2] which determines the following features of hardware computing mixed-precision inner products:

Support for subnormal numbers
Presence of extra bits for significand alignment in multi-term addition
Availability of extra carry bits
Normalization patterns in multi-term floating-point addition
Supported rounding modes
Effective FMA size (i.e., number of terms accumulated before a single normalization)

The model_validation contains part of the model validation data that was used in [1] to refine the models and verify the bit-accurate behaviour against the corresponding GPUs. Full-sized experiments and data is not stored in this repository but is available on request.

The experiments directory contains various experiments with some of the models that were performed to plot the results in [1]. These can serve as examples on how to utilise the models.

Dependencies and installation

Set up the custom precision floating-point format simulator CPFloat.
Add models/ to the MATLAB search path.
Add models/tools to the MATLAB search path.

Example: Using in-built models

The following example rounds two matrices to fp16 and multiplies them using the model of the B200 tensor core. Note that B200TC computes the GEMM, with alpha and beta scale factors set to 1.

>> inopts.format = 'binary16';
>> outopts.format = 'binary32';
>> A = cpfloat(rand(4,4), inopts);
>> B = cpfloat(rand(4,4), inopts);
>> B200TC(1, A, B, 1, 0, inopts.format, outopts.format)

ans =

   0.995566666126251   1.208170533180237   1.368334889411926   1.017799258232117
   0.991239666938782   1.084852933883667   1.350871562957764   1.328557014465332
   1.190854787826538   1.693876862525940   1.763551592826843   1.278026223182678
   0.901759386062622   1.838499188423157   1.608222723007202   1.265371918678284

The following example uses an 8-bit floating-point format as the input format in the B200 tensor core model.

>> inopts.format = 'fp8-e4m3';
>> A = cpfloat(rand(4,4), inopts);
>> B = cpfloat(rand(4,4), inopts);
>> B200TC(1, A, B, 1, 0, inopts.format, outopts.format)

ans =

   0.390136718750000   0.589843750000000   0.625976562500000   0.748046875000000
   1.180175781250000   1.117187500000000   1.220703125000000   1.935546875000000
   1.267822265625000   0.752929687500000   0.867187500000000   1.813476562500000
   1.007812500000000   1.242187500000000   1.395996093750000   1.740234375000000

Example: Setting up the NVIDIA B200 model

While the B200 tensor core model comes with this toolbox, below is a minimal example for setting it up. The input matrices are assumed to be rounded to the appropriate formats with CPFloat. The model in B200.m provides a more detailed set up that changes the parameters of a generalised model based on all possible input/output format combinations.

% Default structures assuming fp16 in and fp32 output
def_params.fma    = 16;      % Fused multiply-add (FMA) size
def_params.neab   = 2;       % TC extra alignment bits
def_params.frmode = 'rz';    % TC final rounding mode
def_params.inter_pattern=1;  % Interleave two 16-element vectors

D = GEMM(alpha, A, B, beta, C, informat, outformat, def_params);

References

[1] F. A. Khattak and M. Mikaitis, Accurate Models of NVIDIA Tensor Cores. arXiv:2512.07004 [cs.MS]. Dec. 2025.
[2] F. A. Khattak and M. Mikaitis, Generalized Methodology for Determining Numerical Features of Hardware Floating-Point Matrix Multipliers: Part I. 2025 IEEE High Performance Extreme Computing Conference (HPEC). Sep. 2025.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
experiments		experiments
model_validation		model_validation
models		models
LICENCE		LICENCE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MATLAB Tensor Core models

Overview

Dependencies and installation

Example: Using in-built models

Example: Setting up the NVIDIA B200 model

References

About

Uh oh!

Releases 1

Contributors 2

Languages

License

north-numerical-computing/MATLAB-tensor-core

Folders and files

Latest commit

History

Repository files navigation

MATLAB Tensor Core models

Overview

Dependencies and installation

Example: Using in-built models

Example: Setting up the NVIDIA B200 model

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Contributors 2

Languages