Add predefined datatypes for bfloat16 data #5402

jhendersonHDF · 2025-03-21T23:05:29Z

Adds predefined datatypes for little- and big-endian bfloat16 data

Does not add support for any native bfloat16 types; datatype conversions are performed in software

Also adds missing float16 predefined types to fortran

Important

Add predefined datatypes for bfloat16 in little- and big-endian formats, including tests and updates across multiple files.

Behavior:
- Adds predefined datatypes FLOAT_BFLOAT16BE and FLOAT_BFLOAT16LE in H5PredType.cpp and H5PredType.h.
- Updates H5T.c to initialize bfloat16 types.
- Adds bfloat16 support in h5import.c and h5util.c for data conversion.
- Adds bfloat16 tests in test_lite.c, dt_arith.c, and dtypes.c.
Fortran:
- Adds missing float16 and new bfloat16 types in H5_f.c and H5_ff.F90.
Java:
- Adds bfloat16 constants in HDF5Constants.java.
Testing:
- Adds bfloat16 test files tbfloat16.h5 and tbfloat16_be.h5.
- Updates CMakeTests.cmake to include bfloat16 tests.
Misc:
- Updates H5LTanalyze.l and H5LTparse.c for bfloat16 token handling.
- Updates H5trace.c to handle bfloat16 in trace arguments.

^{This description was created by}^{for d4bbe19. You can customize this summary. It will automatically update as commits are pushed.}

jhendersonHDF · 2025-03-21T23:06:00Z

Essentially finished, pending CI results and some potential testing on a big-endian system

jhendersonHDF · 2025-03-21T23:06:41Z

doxygen/dox/DDLBNF200.dox

            H5T_IEEE_F32BE     | H5T_IEEE_F32LE     |
            H5T_IEEE_F64BE     | H5T_IEEE_F64LE     |
-            H5T_NATIVE_FLOAT16 | H5T_NATIVE_FLOAT   | 
+            H5T_FLT_BF16BE     | H5T_FLT_BF16LE     |


Open to any alternative names that seem more fitting, but this type is distinct from the IEEE standard, so I basically created a new category for alternative floating-point formats.

Perhaps "FLOAT" instead of "FLT"?

Seems reasonable to me

Agree - I like "FLOAT" better also

Actually, maybe "NONSTD", since it's a contrast to the "IEEE" label?

So... H5T_NONSTD_BF16 or H5T_NONSTD_BFLOAT16? I like "FLOAT" explicitly.

The issue I see with that is that it's not unlikely that these types get adopted into some standard in the future, whether it's IEEE or not.

jhendersonHDF · 2025-03-21T23:09:41Z

src/H5Tnative.c

+            if (NULL == (bf16_be_dt = H5I_object(H5T_FLT_BF16BE)))
+                HGOTO_ERROR(H5E_ARGS, H5E_BADTYPE, NULL, "not a data type");
+
+            /* Promote bfloat16 to float instead of float16, as it


bfloat16 types should be promoted to float instead of float16, as the type is the same size as float16, but a different format. Converting between bfloat16 and float is also very simple (by design).

jhendersonHDF · 2025-03-21T23:12:11Z

tools/lib/h5tools_type.c

-            if (size == 2)
-                p_type = H5Tcopy(H5T_IEEE_F16LE);
+            if (size == 2) {
+                if (true == H5Tequal(tid, H5T_IEEE_F16LE) || true == H5Tequal(tid, H5T_IEEE_F16BE))


Had to make sure the correct type between float16 and bfloat16 is picked so that the data comes out correctly.

jhendersonHDF · 2025-03-21T23:14:12Z

tools/test/h5dump/h5dumpgentest.c

+    hsize_t dims[2], adims[1];
+
+    /*
+     * bfloat16 keeps approximately the same range as the IEEE 32-bit


Eventually better tests can be written, but for now this PR just adds support for predefined types and doesn't add support for a native type. GCC and Clang both have support for a __bf16 type at this point though, so it should be doable in the future.

qkoziol · 2025-03-22T02:12:57Z

I'll try to review tomorrow and get you some feedback

epourmal · 2025-03-24T17:18:58Z

It was my undesrtanding that middle part "_IEEE_" or similar represents architecture or a standard. Should we follow the same rule here?

…

On Mon, Mar 24, 2025 at 12:11 PM Quincey Koziol ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In doxygen/dox/DDLBNF200.dox <#5402 (comment)>: > H5T_IEEE_F32BE | H5T_IEEE_F32LE | H5T_IEEE_F64BE | H5T_IEEE_F64LE | - H5T_NATIVE_FLOAT16 | H5T_NATIVE_FLOAT | + H5T_FLT_BF16BE | H5T_FLT_BF16LE | Agree - I like "FLOAT" better also — Reply to this email directly, view it on GitHub <#5402 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADLFT3POWBKSASYVAQ4VSTD2WA4CRAVCNFSM6AAAAABZRCFMH2VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDOMJRGEZTANZUHA> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

qkoziol · 2025-03-24T17:20:58Z

It was my undesrtanding that middle part "IEEE" or similar represents architecture or a standard. Should we follow the same rule here?
…
On Mon, Mar 24, 2025 at 12:11 PM Quincey Koziol @.> wrote: @.* commented on this pull request. ------------------------------ In doxygen/dox/DDLBNF200.dox <#5402 (comment)>: > H5T_IEEE_F32BE | H5T_IEEE_F32LE | H5T_IEEE_F64BE | H5T_IEEE_F64LE | - H5T_NATIVE_FLOAT16 | H5T_NATIVE_FLOAT | + H5T_FLT_BF16BE | H5T_FLT_BF16LE | Agree - I like "FLOAT" better also — Reply to this email directly, view it on GitHub <#5402 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADLFT3POWBKSASYVAQ4VSTD2WA4CRAVCNFSM6AAAAABZRCFMH2VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDOMJRGEZTANZUHA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Right - that was why I thought "NONSTD" might be better

qkoziol · 2025-03-24T17:38:23Z

Maybe "NONSTD_BFLOAT16" ?

jhendersonHDF · 2025-03-24T17:40:07Z

It was my undesrtanding that middle part "IEEE" or similar represents architecture or a standard. Should we follow the same rule here?

I'm not sure that this is necessarily the case and I'd argue we should consider abandoning that. For example, H5T_STD_B32BE doesn't really mean anything, because AFAIK there's really no standard type for this. C23 does appear to introduce those, but we don't target C23 yet.

jhendersonHDF · 2025-03-24T17:42:02Z

Maybe "NONSTD_BFLOAT16" ?

That name is a bit weird to me because bfloat16 is the standard. There's no other standard for the type, it's just that the standard isn't an IEEE standard.

jhendersonHDF · 2025-03-24T17:43:54Z

Also consider I plan to add support for FP8, FP6 and FP4 following this PR, and in each case the formats essentially are the standard until they have wider adoption. https://arxiv.org/abs/2209.05433 for example.

qkoziol · 2025-03-24T17:44:15Z

Since Google created it, maybe "GOOGLE_BFLOAT16" ?

ajelenak · 2025-03-24T17:45:35Z

I was thinking the same...

epourmal · 2025-03-24T17:53:09Z

Also consider I plan to add support for FP8, FP6 and FP4 following this PR, and in each case the formats essentially are the standard until they have wider adoption. https://arxiv.org/abs/2209.05433 for example.

Frankly speaking, I would drop middle part for these types and document their implementation. I would vote against introducing GOOGLE.

jhendersonHDF · 2025-03-24T17:55:41Z

Since Google created it, maybe "GOOGLE_BFLOAT16" ?

Perhaps. The reason I used BF16 on the end part was just to match our existing conventions like H5T_IEEE_F32LE, especially because there will be types like F8E4M3 (FP8 with 4 exponent bits and 3 mantissa bits). That said, I also like FLOAT being spelled out explicitly, but I still think it's much more natural to just use H5T_FLOAT_ so that we don't tie the names to any particular standard, architecture or company unless it makes sense to. I see this as "one datatype of class H5T_FLOAT is called H5T_FLOAT_BF16". I'd definitely like to avoid the STD vs. NONSTD naming if possible, because I'm fairly certain these types will be adopted into some standard in the future.

ajelenak · 2025-03-24T18:26:34Z

I was ok with H5T_FLOAT_BF16 to begin with but enjoy discussing alternative naming options.

jhendersonHDF · 2025-03-25T18:44:41Z

For the time being, I'm going to proceed on the other datatypes using H5T_FLOAT_ names until we come to a resolution.

I propose that we use a convention of H5T_<TYPE_CLASS>_<OPTIONAL(?)_ARCH_OR_STANDARD>_<SPECIFICS> going forward and welcome any feedback if others think differently. This leads to names like H5T_FLOAT_INTEL_F32, H5T_FLOAT_IEEE_F32LE, H5T_INTEGER_I32BE, etc. The question mark on optional is because I'm thinking it might be worth it to always mention a standard or architecture to be consistent, leading to, for example, H5T_INTEGER_C_I32BE for int32_t. Note that this is already the convention I used for predefined complex datatypes (H5T_COMPLEX_IEEE_F32LE, for example). I believe this form does a better job of actually distinguishing what a particular datatype is. For example, H5T_IEEE_F32LE tells you that the datatype comes from or relates to an IEEE standard, but you rely on the "F32" part to tell you that it's a floating-point type. If additional non-floating-point types are added to IEEE standards, this may get confusing. For example, if we wanted to add support for the decimal floating-point types, I imagine we'd currently use something like H5T_IEEE_D32LE (maybe H5T_IEEE_DF32LE). However, that "D" would be somewhat in conflict with the current H5T_UNIX_D32LE and related types, where "D" apparently represents time data. If an IEEE standard included time types (for whatever reason), this would just complicate things more. A different letter than "D" could of course be picked to represent decimal floating-point datatypes, but I feel adding the type class gives context to anything in the "specifics" part of the name and helps deal with any conflict between lettering. It may also be worth revisiting what exactly should be in the "specifics" part of the name.

epourmal · 2025-03-25T20:03:30Z

Makes sense, but I would leave old types as they are and use new standard only for the new types.

jhendersonHDF · 2025-03-25T20:43:57Z

Makes sense, but I would leave old types as they are and use new standard only for the new types.

I certainly wouldn't want to go changing old type names as it would be nothing but a source of annoyance (though we can always introduce new macros that just point to the old names if we wanted).

Copilot

Pull Request Overview

This PR adds support for predefined datatypes for bfloat16 data (both little- and big-endian) and includes missing float16 predefined types for Fortran. Key changes include updates to native type retrieval in src/H5Tnative.c, new datatype definitions in Java JNI and C++ bindings, and corresponding updates in documentation and tests.

Reviewed Changes

Copilot reviewed 48 out of 48 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
src/H5Tnative.c	Updated H5T__get_native_float signature and incorporated bfloat16 support logic.
src/H5Tmodule.h & src/H5T.c	Defined new macros and initialized new datatype IDs for bfloat16.
java/src/jni/h5util.c	Added conditional branches for bfloat16 in datatype resolution.
java/src/jni/h5Constants.c, HDF5Constants.java	Added JNI constant definitions for the new bfloat16 datatypes.
hl/, fortran/, doxygen/, c++/src/	Updated tests, parsers, documentation, and language bindings to reflect the new datatypes.

java/src/jni/h5util.c

Adds predefined datatypes for little- and big-endian bfloat16 data Does not add support for any native bfloat16 types; datatype conversions are performed in software

brtnfld · 2025-07-07T17:22:51Z

@ellipsis-dev, review PR.

ellipsis-dev

Caution

Changes requested ❌

Reviewed everything up to d4bbe19 in 4 minutes and 6 seconds. Click for details.

Reviewed 5276 lines of code in 48 files
Skipped 0 files when reviewing.
Skipped posting 14 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. tools/test/h5dump/h5dumpgentest.h:128

Draft comment:
New bfloat16 test function declarations (gent_bfloat16 and gent_bfloat16_be) have been added. Ensure their implementations exist and are correctly linked.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

2. tools/test/h5gentest.c:266

Draft comment:
Calls to gent_bfloat16() and gent_bfloat16_be() have been inserted into the test generator. Verify that these functions produce the expected bfloat16 outputs.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

3. tools/test/h5ls/CMakeTests.cmake:363

Draft comment:
New test macros for bfloat16 types (tbfloat16 and tbfloat16_be) are added. Confirm that the expected output files match the new bfloat16 formatting.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

4. hl/src/H5LTanalyze.c:983

Draft comment:
The #line directive still includes a double slash in the file path ("hl/src//H5LTanalyze.c"). This may be a typographical error and should be corrected to a single slash if not intentional.
Reason this comment was not posted:
Comment was on unchanged code.

5. hl/src/H5LTparse.c:1423

Draft comment:
Typo: The #line directive's file path contains a double slash ("hl/src//H5LTparse.y"). Consider correcting it to a single slash for clarity.
Reason this comment was not posted:
Comment was on unchanged code.

6. hl/src/H5LTparse.c:1753

Draft comment:
Typo in comment: "pushd onto the stack" might be intended as "pushed onto the stack".
Reason this comment was not posted:
Comment was on unchanged code.

7. hl/src/H5LTparse.c:2057

Draft comment:
There appears to be an extra slash in the file path "hl/src//H5LTparse.c". Consider using a single slash "hl/src/H5LTparse.c" to avoid potential confusion.
Reason this comment was not posted:
Comment was on unchanged code.

8. hl/src/H5LTparse.c:2061

Draft comment:
There appears to be an extra slash in the file path "hl/src//H5LTparse.c" here as well. Please update to "hl/src/H5LTparse.c".
Reason this comment was not posted:
Comment was on unchanged code.

9. test/dt_arith.c:2065

Draft comment:
Typo: Consider using "endianness" instead of "endian-ness" for consistency.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50% While "endianness" is more commonly used today, "endian-ness" is also an acceptable spelling. The change is purely cosmetic and doesn't improve code quality or fix any actual issues. The comment violates the rule about not making purely informative comments that don't require code changes. The spelling "endian-ness" is technically valid even if less common. The error message is still perfectly clear with either spelling. Even though both spellings are valid, we should err on the side of not making purely stylistic suggestions that don't materially improve the code. The comment should be removed as it suggests a purely stylistic change that doesn't meaningfully improve the code.

10. test/dtypes.c:6971

Draft comment:
Typo: The error message for the bfloat16 bit field check for H5T_FLOAT_BFLOAT16BE is missing a closing parenthesis. Please add a ')' after listing the expected bit field values.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50% While the comment is technically correct about the missing parenthesis, this is in a test file's error message that would only be seen if the test fails. The missing parenthesis doesn't affect functionality. Error message formatting in test files is a very minor issue that doesn't warrant a PR comment. The missing parenthesis could make the error message harder to read if the test fails, potentially making debugging more difficult. Even with the missing parenthesis, the error message is still understandable and the values are clearly listed. This is too minor of an issue to block or delay a PR. Delete this comment as it points out an extremely minor formatting issue in a test file's error message that doesn't impact functionality.

11. tools/src/h5import/h5import.c:2441

Draft comment:
Typographical: In the error string for OUTPUT-SIZE, there is an extra space before the period. Please remove the space after the closing parenthesis, i.e., change "(32, 64) ." to "(32, 64).".
Reason this comment was not posted:
Comment was on unchanged code.

12. tools/test/h5dump/CMakeTestsXML.cmake:311

Draft comment:
There's trailing whitespace at the end of this line. Please remove the extra spaces.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% While the comment is technically correct about the trailing whitespace, this is a very minor style issue that doesn't affect functionality. Most IDEs automatically strip trailing whitespace, and this isn't worth interrupting the PR review process over. The rules state not to make comments that are obvious or unimportant. The trailing whitespace could be considered a style violation that should be fixed for consistency. Some projects have strict style requirements. While style consistency is good, this is too minor of an issue to warrant a PR comment. It can be fixed automatically by tools and doesn't impact readability or functionality. Delete this comment as it points out an unimportant whitespace issue that doesn't meaningfully impact the code.

13. tools/test/h5dump/h5dumpgentest.c:12872

Draft comment:
Typo: Consider changing "data for these testing files isn't particular interesting or precise" to "data for these testing files isn't particularly interesting or precise".
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% While the comment is technically correct and points out a real typo, it's a very minor documentation issue that doesn't affect code functionality. The rules state we should not make purely informative comments or comments that are obvious/unimportant. Documentation typos generally fall into this category unless they make the meaning unclear. The typo could potentially make the meaning slightly less clear. Documentation quality is important for maintainability. While documentation quality matters, this particular typo is extremely minor and the meaning is still perfectly clear from context. The comment adds more noise than value. Delete this comment as it points out a trivial documentation issue that doesn't meaningfully impact code quality or clarity.

14. tools/test/h5dump/h5dumpgentest.c:12948

Draft comment:
Typo: Consider changing "data for these testing files isn't particular interesting or precise" to "data for these testing files isn't particularly interesting or precise".
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% While the comment is technically correct about the grammar, it's a very minor issue in documentation. The rules state not to make purely informative comments or comments that are obvious/unimportant. A small grammatical error in a comment doesn't impact code functionality or readability significantly. The typo could be seen as affecting code quality and professionalism. Documentation is part of the codebase and should be held to high standards. While documentation quality matters, this is an extremely minor issue that doesn't impact understanding. The rules explicitly say not to make unimportant comments. Delete this comment as it's too minor and doesn't affect code functionality or understanding.

Workflow ID: wflow_AYKNFjhGHAyv8xgg

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

test/dtypes.c

tools/src/h5import/h5import.c

jhendersonHDF commented Mar 21, 2025

View reviewed changes

nbagha1 added this to HDF5 - TRIAGE & TRACK May 24, 2025

github-project-automation bot moved this to To be triaged in HDF5 - TRIAGE & TRACK May 24, 2025

nbagha1 moved this from To be triaged to In progress in HDF5 - TRIAGE & TRACK May 24, 2025

nbagha1 added this to the Release 2.0.0 milestone May 27, 2025

jhendersonHDF force-pushed the feature/bfloat16_predefined branch 2 times, most recently from adc8a52 to 8563514 Compare June 1, 2025 23:52

jhendersonHDF marked this pull request as ready for review June 2, 2025 00:23

jhendersonHDF requested review from byrnHDF, derobins, brtnfld, epourmal, lrknox, fortnern, vchoi-hdfgroup, bmribler, glennsong09 and mattjala as code owners June 2, 2025 00:23

gheber requested a review from Copilot June 11, 2025 22:28

Copilot AI reviewed Jun 11, 2025

View reviewed changes

java/src/jni/h5util.c Outdated Show resolved Hide resolved

java/src/jni/h5util.c Outdated Show resolved Hide resolved

jhendersonHDF force-pushed the feature/bfloat16_predefined branch 2 times, most recently from da4a32c to 408c6bc Compare June 16, 2025 20:42

brtnfld requested a review from qkoziol June 19, 2025 18:09

jhendersonHDF force-pushed the feature/bfloat16_predefined branch from 408c6bc to 10d9991 Compare June 19, 2025 18:55

Add predefined datatypes for bfloat16 data

d4bbe19

Adds predefined datatypes for little- and big-endian bfloat16 data Does not add support for any native bfloat16 types; datatype conversions are performed in software

jhendersonHDF force-pushed the feature/bfloat16_predefined branch from 10d9991 to d4bbe19 Compare June 19, 2025 19:31

ellipsis-dev bot reviewed Jul 7, 2025

View reviewed changes

test/dtypes.c Show resolved Hide resolved

tools/src/h5import/h5import.c Show resolved Hide resolved

brtnfld removed the Priority - 1. High label Jul 10, 2025

nbagha1 assigned fortnern and unassigned jhendersonHDF Jul 22, 2025

Uh oh!

Add predefined datatypes for bfloat16 data #5402

Are you sure you want to change the base?

Add predefined datatypes for bfloat16 data #5402

Uh oh!

Conversation

jhendersonHDF commented Mar 21, 2025 • edited by ellipsis-dev bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhendersonHDF commented Mar 21, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qkoziol commented Mar 22, 2025

Uh oh!

epourmal commented Mar 24, 2025 via email

Uh oh!

qkoziol commented Mar 24, 2025

Uh oh!

qkoziol commented Mar 24, 2025

Uh oh!

jhendersonHDF commented Mar 24, 2025

Uh oh!

jhendersonHDF commented Mar 24, 2025

Uh oh!

jhendersonHDF commented Mar 24, 2025

Uh oh!

qkoziol commented Mar 24, 2025

Uh oh!

ajelenak commented Mar 24, 2025

Uh oh!

epourmal commented Mar 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhendersonHDF commented Mar 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ajelenak commented Mar 24, 2025

Uh oh!

jhendersonHDF commented Mar 25, 2025

Uh oh!

epourmal commented Mar 25, 2025

Uh oh!

jhendersonHDF commented Mar 25, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

brtnfld commented Jul 7, 2025

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jhendersonHDF commented Mar 21, 2025 •

edited by ellipsis-dev bot

Loading

epourmal commented Mar 24, 2025 •

edited

Loading

jhendersonHDF commented Mar 24, 2025 •

edited

Loading