Skip to content

Conversation

@voidbar
Copy link
Contributor

@voidbar voidbar commented Nov 10, 2025

Fixes #378

According to the docs:

hugetlb.<hugepagesize>.events
A read-only flat-keyed file which exists on non-root cgroups.

max
   The number of allocation failure due to HugeTLB limit

@akhilerm
Copy link
Member

@voidbar You can refer https://github.com/containerd/project/blob/main/CONTRIBUTING.md#sign-your-work to add the DCO to fix the Project Checks failure

@voidbar voidbar force-pushed the hugetlb-fix branch 2 times, most recently from 574c655 to 7417fb9 Compare November 11, 2025 09:33
@voidbar
Copy link
Contributor Author

voidbar commented Nov 11, 2025

@akhilerm done

@akhilerm
Copy link
Member

/cc @dcantah

@akhilerm akhilerm requested a review from dcantah November 11, 2025 09:53
cgroup2/utils.go Outdated
Comment on lines 422 to 426
events := make(map[string]uint64)
err := readKVStatsFile(path, "hugetlb."+pagesize+".events", events)
if err != nil {
log.L.WithError(err).Errorf("failed to read hugetlb events for %s", filepath.Join(path, "hugetlb."+pagesize+".events"))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this fail hard if the file doesn't exist, or ignore non-existing (os.IsNotExist(err)) ? (Not sure if it's guaranteed to exist?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the docs it can be absent
A read-only flat-keyed file which **exists on non-root** cgroups.
we can wrap this logic in existence check and fail hard in case we failed to parse it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it looks like getStatFileContentUint64 (used for non-KV-files) is fairly relaxed on errors, and at most logs an error;

cgroups/cgroup2/utils.go

Lines 266 to 293 in f1e92d8

// Gets uint64 parsed content of single value cgroup stat file
func getStatFileContentUint64(filePath string) uint64 {
f, err := os.Open(filePath)
if err != nil {
return 0
}
defer f.Close()
// We expect an unsigned 64 bit integer, or a "max" string
// in some cases.
buf := make([]byte, 32)
n, err := f.Read(buf)
if err != nil {
return 0
}
trimmed := strings.TrimSpace(string(buf[:n]))
if trimmed == "max" {
return math.MaxUint64
}
res, err := parseUint(trimmed, 10, 64)
if err != nil {
log.L.Errorf("unable to parse %q as a uint from Cgroup file %q", trimmed, filePath)
return res
}
return res

To keep aligned with that behavior, perhaps a getKVStatsFileContentUint64 utility that handles things similarly would make sense; it could get the name of the property looked for ("max") passed as an argument, which would allow returning early (instead of reading all properties potentialy present in the file);

// Gets uint64 parsed content of key-value cgroup stat file
func getKVStatsFileContentUint64(filePath string, propertyName string) uint64 {
	f, err := os.Open(filepath.Join(path, file))
	if err != nil {
		return 0
	}
	defer f.Close()

	s := bufio.NewScanner(f)
	for s.Scan() {
		name, value, err := parseKV(s.Text())
		if name == propertyName {
			if err != nil {
				log.L.Errorf("unable to parse %q as a uint from Cgroup file %q: %w", propertyName, filePath, err)
				return 0
			}
			return value
		}
	}
	return s.Err()
}

Perhaps a log for failing to open the file for reasons other than the file not being found would be good (the above was based on the getStatFileContentUint64 code, which doesn't do so, but ... maybe should?).

Do we know if these KV-files could also have magic values (like max)? I see the other utility has some special handling for that before attempting to parse it as an uint;

cgroups/cgroup2/utils.go

Lines 282 to 288 in f1e92d8

trimmed := strings.TrimSpace(string(buf[:n]))
if trimmed == "max" {
return math.MaxUint64
}
res, err := parseUint(trimmed, 10, 64)
if err != nil {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hugetlb..max
Set/show the hard limit of “hugepagesize” hugetlb usage. The default value is “max”. It exists for all the cgroup except root.
the max value is used for hugetlb..max file @thaJeztah

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented a simple getKVStatsFileContentUint64. I think it strikes the balance between being too strict and too lenient. It follows the same assumptions as getStatsFileContentUint64 and provides only the value required.

@voidbar
Copy link
Contributor Author

voidbar commented Nov 17, 2025

@akhilerm @thaJeztah can we take this to main?

Copy link
Member

@AkihiroSuda AkihiroSuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

Signed-off-by: Tomer Lev <me@tomerlev.com>
@AkihiroSuda AkihiroSuda merged commit 4fc9769 into containerd:main Nov 19, 2025
8 checks passed
@Charlesm54
Copy link

Why do I have still this issue on my Ubuntu server.
Running this on zimaboard with runtipi dashboard...
My Proxmox servers are not affected...only my server on zimaboard.

Client: Docker
Version:
Engine - Community
API version:
29.1.2
Go version:
1.52
Git commit:
g01.25.5
890dcca
Tue Dec
Built:
2 21:55:19
2025
OS/Arch:
linux/amd64
Context:
default
Engine:
Server: Docker Engine - Community
Version:
API version:
Go version:
Git commit:
29.1.2
1.52 (minimum
go1 .25.5
de45c2a
Tue Dec 2 21:55:19
linux/amd64
false
version
1.24)
Built:
OS/Arch:
2025
Experimental:
containerd:
Version:
V2.2.0
GitCommit:
1c4457e00facac03ce1d75f7b6777a7a851e5runc:
Version:
GitCommit:
1.3.4
docker-init:
V1.3.4-0-gd6d73eb8
Version:
0.19.0
de40ad0
GitCommit:

@thaJeztah
Copy link
Member

@Charlesm54 containerd v2.2.0 does not yet have this fix; it's back ported to be included in a v2.2.1 patch release, but it's not released yet.

@dmcgowan dmcgowan changed the title hugetlb: correctly parse hugetlb.<size>.events files Fix parsing of hugetlb.<size>.events files Dec 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unable to parse "max 0" as a uint from Cgroup file "/sys/fs/cgroup/.../hugetlb.<size>.events"

7 participants