Skip to content

Conversation

wtripp180901
Copy link
Contributor

CEEMS currently only supports the standard systemd cgroup layout, but certain OpenStack configurations don't use systemd. Adds a collector.cgroups.no-systemd-mode flag to support this config. Currently only supports libvirt collector but support for other collectors could be added if there is interest in this PR.

@mahendrapaipuri
Copy link
Collaborator

Thanks a lot @wtripp180901 for this PR. Really appreciate your time and effort.

I think we are taking both systemd and cgroup drivers into account for k8s and AFAIK SLURM does not really support non-systemd in production. So, I think adding the support to Libvirt should be "complete".

Do you have access to any production systems that use non-systemd layout? If so, could you please zip the contents in /sys/fs/cgroup so that I can add e2e tests? Cheers!

@mahendrapaipuri mahendrapaipuri added the enhancement New feature or request label Jul 8, 2025
@wtripp180901
Copy link
Contributor Author

Great, yes I should be able to get that data. Where in the repo would that live? Or shall I get it to you directly somehow?

@mahendrapaipuri
Copy link
Collaborator

Cheers!! That would be super helpful if you can extract some data. Actually that data lives in sys.ttar file.

It is my bad that I have not made a CONTRIBUTING.md file. You can attach the file here in the issues and I will add them to test resources.

Comment on lines 148 to 152

noSystemdMode = CEEMSExporterApp.Flag(
"collector.cgroups.no-systemd-mode",
"Set if running on a non-systemd host",
).Default("false").Bool()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using a CLI flag (which users have to configure), we should be able to detect the cgroup layout, right?! This is what we are doing for k8s to detect if we are using cgroupfs or systemd drivers. We can use a similar logic here as well. We can add a field to cgroupManager something like nonSystemdLayout bool and set it to true when machine is found. Something like:

// Discover all cgroup layout
for _, slice := range []string{"machine", "machine.slice"} {
	if _, err := os.Stat(filepath.Join(*cgroupfsPath, activeSubsystem, slice)); err == nil {
		manager.slices = append(manager.slices, slice)
                if slice == "machine" {
                   manager.nonSystemdLayout = true
                }
                break // This should be fine as there will atmost one of machine or machine.slice exist at any given time
	}
}

Eventually we can use the manager.nonSysytemdLayout when parsing number of vCPUs. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that sounds sensible

var (
libvirtCgroupPathRegex = regexp.MustCompile("^.*/(?:.+?)-qemu-(?:[0-9]+)-(?P<id>instance-[0-9a-f]+)(?:.*$)")
libvirtCgroupPathRegex = regexp.MustCompile("^.*/(?:.+?)-qemu-(?:[0-9]+)-(?P<id>instance-[0-9a-f]+)(?:.*$)")
libvirtCgroupNoSystemdPathRegex = regexp.MustCompile("^.*/(?:.+?)qemu-(?:[0-9]+)-(?P<id>instance-[0-9a-f]+)(?:.*$)")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a regex regexp.MustCompile("^.*/(?:.+?)qemu-(?:[0-9]+)-(?P<id>instance-[0-9a-f]+)(?:.*$)") must work for both cases. So, I think there is no need to define two?!

@wtripp180901
Copy link
Contributor Author

Hi @mahendrapaipuri , production cgroup info might be hard to get hold of because of concerns over leaking info, but I can provide data from a virtual dev openstack hypervisor. It's nested virt so there could be some data missing, but the layout of the qemu scopes looks similar enough to prod. Let me know if this is good enough
cgroup_dump.zip

@mahendrapaipuri
Copy link
Collaborator

@wtripp180901 This is great, what you have provided is more than enough!! Thanks a lot! Really appreciate it!!

Copy link
Collaborator

@mahendrapaipuri mahendrapaipuri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks @wtripp180901 for this PR. Appreciate your time and effort.

@mahendrapaipuri mahendrapaipuri merged commit e12bc17 into ceems-dev:main Jul 16, 2025
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants