diff --git a/assets/contributors.csv b/assets/contributors.csv index a9e1572658..52f3d1587b 100644 --- a/assets/contributors.csv +++ b/assets/contributors.csv @@ -104,4 +104,5 @@ Alejandro Martinez Vicente,Arm,,,, Mohamad Najem,Arm,,,, Ruifeng Wang,Arm,,,, Zenon Zhilong Xiu,Arm,,zenon-zhilong-xiu-491bb398,, -Zbynek Roubalik,Kedify,,,, \ No newline at end of file +Zbynek Roubalik,Kedify,,,, +Yahya Abouelseoud,Arm,,,, \ No newline at end of file diff --git a/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/1_Overview.md b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/1_Overview.md new file mode 100644 index 0000000000..c26a391eaa --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/1_Overview.md @@ -0,0 +1,19 @@ +--- +title: Overview +weight: 2 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Linux kernel profiling with Arm Streamline + +Performance tuning is not limited to user-space applications—kernel modules can also benefit from careful analysis. [Arm Streamline](https://developer.arm.com/Tools%20and%20Software/Streamline%20Performance%20Analyzer) is a powerful software profiling tool that helps developers understand performance bottlenecks, hotspots, and memory usage, even inside the Linux kernel. This learning path explains how to use Arm Streamline to profile a simple kernel module. + +### Why profile a kernel module? + +Kernel modules often operate in performance-critical paths, such as device drivers or networking subsystems. Even a small inefficiency in a module can affect the overall system performance. Profiling enables you to: + +- Identify hotspots (functions consuming most CPU cycles) +- Measure cache and memory behavior +- Understand call stacks for debugging performance issues diff --git a/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/2_build_kernel_image.md b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/2_build_kernel_image.md new file mode 100644 index 0000000000..03860d4453 --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/2_build_kernel_image.md @@ -0,0 +1,71 @@ +--- +title: Build Linux image +weight: 3 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Build a debuggable kernel image + +For this learning path we will be using [Buildroot](https://github.com/buildroot/buildroot) to build a Linux image for Raspberry Pi 3B+ with a debuggable Linux kernel. We will profile Linux kernel modules built out-of-tree and Linux device drivers built in the Linux source code tree. + +1. Clone the Buildroot Repository and initialize the build system with the default configurations. + + ```bash + git clone https://github.com/buildroot/buildroot.git + cd buildroot + make raspberrypi3_64_defconfig + make menuconfig + make -j$(nproc) + ``` + +2. Change Buildroot configurations to enable debugging symbols and SSH access. + + ```plaintext + Build options ---> + [*] build packages with debugging symbols + gcc debug level (debug level 3) + [*] build packages with runtime debugging info + gcc optimization level (optimize for debugging) ---> + + System configuration ---> + [*] Enable root login with password + (****) Root password # Choose root password here + + Kernel ---> + Linux Kernel Tools ---> + [*] perf + + Target packages ---> + Networking applications ---> + [*] openssh + [*] server + [*] key utilities + ``` + + You might also need to change your default `sshd_config` file according to your network settings. To do that, you need to modify System configuration→ Root filesystem overlay directories to add a directory that contains your modified `sshd_config` file. + +3. By default the Linux kernel images are stripped so we will need to make the image debuggable as we'll be using it later. + + ```bash + make linux-menuconfig + ``` + + ```plaintext + Kernel hacking ---> + -*- Kernel debugging + Compile-time checks and compiler options ---> + Debug information (Rely on the toolchain's implicit default DWARF version) + [ ] Reduce debugging information #un-check + ``` + +4. Now we can build the Linux image and flash it to the the SD card to run it on the Raspberry Pi. + + ```bash + make -j$(nproc) + ``` + +It will take some time to build the Linux image. When it completes, the output will be in `/output/images/sdcard.img` +For details on flashing the SD card image, see [this helpful article](https://www.ev3dev.org/docs/tutorials/writing-sd-card-image-ubuntu-disk-image-writer/). +Now that we have a target running Linux with a debuggable kernel image, we can start writing our kernel module that we want to profile. diff --git a/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/3_OOT_module.md b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/3_OOT_module.md new file mode 100644 index 0000000000..420bb00662 --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/3_OOT_module.md @@ -0,0 +1,252 @@ +--- +title: Build out-of-tree kernel module +weight: 4 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Creating the Linux Kernel Module + +We will now learn how to create an example Linux kernel module (Character device) that demonstrates a cache miss issue caused by traversing a 2D array in column-major order. This access pattern is not cache-friendly, as it skips over most of the neighboring elements in memory during each iteration. + +To build the Linux kernel module, start by creating a new directory—We will call it **example_module**—in any location of your choice. Inside this directory, add two files: `mychardrv.c` and `Makefile`. + +**Makefile** + +```makefile +obj-m += mychardrv.o +BUILDROOT_OUT := /opt/rpi-linux/buildroot/output # Change this to your buildroot output directory +KDIR := $(BUILDROOT_OUT)/build/linux-custom +CROSS_COMPILE := $(BUILDROOT_OUT)/host/bin/aarch64-buildroot-linux-gnu- +ARCH := arm64 + +all: + $(MAKE) -C $(KDIR) M=$(PWD) ARCH=$(ARCH) CROSS_COMPILE=$(CROSS_COMPILE) modules + +clean: + $(MAKE) -C $(KDIR) M=$(PWD) clean +``` + +{{% notice Note %}} +Change **BUILDROOT_OUT** to the correct buildroot output directory on your host machine +{{% /notice %}} + +**mychardrv.c** + +```c +// SPDX-License-Identifier: GPL-2.0 +#include "linux/printk.h" +#include +#include +#include +#include + +// Using fixed major and minor numbers just for demonstration purposes. +// Major number 42 is for demo/sample uses according to +// https://www.kernel.org/doc/Documentation/admin-guide/devices.txt +#define MAJOR_VERSION_NUM 42 +#define MINOR_VERSION_NUM 0 +#define MODULE_NAME "mychardrv" +#define MAX_INPUT_LEN 64 + +static struct cdev my_char_dev; + +/** + * @brief Traverse a 2D matrix and calculate the sum of its elements. + * + * @size: The size of the matrix (number of rows and columns). + * + * This function allocates a 2D matrix of integers, initializes it with the sum + * of its indices, and then calculates the sum of its elements by accessing them + * in a cache-unfriendly column-major order. + * + * Return: 0 on success, or -ENOMEM if memory allocation fails. + */ +int char_dev_cache_traverse(long size) { + int i, j; + long sum = 0; + + int **matrix; + + // Allocate rows + matrix = kmalloc_array(size, sizeof(int *), GFP_KERNEL); + if (!matrix) + return -ENOMEM; + + // Allocate columns and initialize matrix + for (i = 0; i < size; i++) { + matrix[i] = kmalloc_array(size, sizeof(int), GFP_KERNEL); + if (!matrix[i]) { + for (int n = 0; n < i; n++) { + kfree(matrix[n]); + } + kfree(matrix); + return -ENOMEM; + } + + for (j = 0; j < size; j++) + matrix[i][j] = i + j; + } + + // Access in cache-UNFRIENDLY column-major order + for (j = 0; j < size; j++) { + for (i = 0; i < size; i++) { + sum += matrix[i][j]; + } + } + + pr_info("Sum: %ld\n", sum); + + // Free memory + for (i = 0; i < size; i++) + kfree(matrix[i]); + kfree(matrix); + + return 0; +} + +/** + * @brief Gets the size of the list to be created from user space. + * + */ +static ssize_t char_dev_write(struct file *file, const char *buff, + size_t length, loff_t *offset) { + (void)file; + (void)offset; + + ssize_t ret = 0; + char *kbuf; + long size_value; + + // Allocate kernel buffer + kbuf = kmalloc(MAX_INPUT_LEN, GFP_KERNEL); + if (!kbuf) + return -ENOMEM; + + // copy data from user space to kernel space + if (copy_from_user(kbuf, buff, length)) { + ret = -EFAULT; + goto out; + } + kbuf[length] = '\0'; + + // Convert string to long (Base 10) + ret = kstrtol(kbuf, 10, &size_value); + if (ret) + goto out; + + // Call cache traversal function + ret = char_dev_cache_traverse(size_value); + if (ret) + goto out; + + ret = length; + +out: + kfree(kbuf); + return ret; +} + +static int char_dev_open(struct inode *node, struct file *file) { + (void)file; + pr_info("%s is open - Major(%d) Minor(%d)\n", MODULE_NAME, + MAJOR(node->i_rdev), MINOR(node->i_rdev)); + return 0; +} + +static int char_dev_release(struct inode *node, struct file *file) { + (void)file; + pr_info("%s is released - Major(%d) Minor(%d)\n", MODULE_NAME, + MAJOR(node->i_rdev), MINOR(node->i_rdev)); + return 0; +} + +// File operations structure +static const struct file_operations dev_fops = {.owner = THIS_MODULE, + .open = char_dev_open, + .release = char_dev_release, + .write = char_dev_write}; + +static int __init char_dev_init(void) { + int ret; + // Allocate Major number + ret = register_chrdev_region(MKDEV(MAJOR_VERSION_NUM, MINOR_VERSION_NUM), 1, + MODULE_NAME); + if (ret < 0) + return ret; + + // Initialize cdev structure and add it to kernel + cdev_init(&my_char_dev, &dev_fops); + ret = cdev_add(&my_char_dev, MKDEV(MAJOR_VERSION_NUM, MINOR_VERSION_NUM), 1); + + if (ret < 0) { + unregister_chrdev_region(MKDEV(MAJOR_VERSION_NUM, MINOR_VERSION_NUM), 1); + return ret; + } + + return ret; +} + +static void __exit char_dev_exit(void) { + cdev_del(&my_char_dev); + unregister_chrdev_region(MKDEV(MAJOR_VERSION_NUM, MINOR_VERSION_NUM), 1); +} + +module_init(char_dev_init); +module_exit(char_dev_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Yahya Abouelseoud"); +MODULE_DESCRIPTION("A simple char driver with cache misses issue"); +``` + +The module above receives the size of a 2D array as a string through the `char_dev_write()` function, converts it to an integer, and passes it to the `char_dev_cache_traverse()` function. This function then creates the 2D array, initializes it with simple data, traverses it in a column-major (cache-unfriendly) order, computes the sum of its elements, and prints the result to the kernel log. + +## Building and Running the Kernel Module + +1. To compile the kernel module, run make inside the example_module directory. This will generate the output file `mychardrv.ko`. + +2. Transfer the .ko file to the target using scp command and then insert it using insmod command. After inserting the module, we create a character device node using mknod command. Finally, we can test the module by writing a size value (e.g., 10000) to the device file and measuring the time taken for the operation using the `time` command. + + ```bash + scp mychardrv.ko root@:/root/ + ``` + + {{% notice Note %}} + Replace \ with your own target IP address + {{% /notice %}} + +3. To run the module on the target, we need to run the following commands on the target: + + ```bash + ssh root@ + + #The following commands should be running on target device + + insmod /root/mychardrv.ko + mknod /dev/mychardrv c 42 0 + ``` + + {{% notice Note %}} + 42 and 0 are the major and minor number we chose in our module code above + {{% /notice %}} + +4. Now if you run dmesg you should see something like: + + ```log + [12381.654983] mychardrv is open - Major(42) Minor(0) + ``` + +5. To make sure it's working as expected you can use the following command: + + ```bash { output_lines = "2-4" } + time echo '10000' > /dev/mychardrv + # real 0m 38.04s + # user 0m 0.00s + # sys 0m 38.03s + ``` + + The command above passes 10000 to the module, which specifies the size of the 2D array to be created and traversed. The **echo** command takes a long time to complete (around 38 seconds) due to the cache-unfriendly traversal implemented in the `char_dev_cache_traverse()` function. + +With the kernel module built, the next step is to profile it using Arm Streamline. We will use it to capture runtime behavior, highlight performance bottlenecks, and help identifying issues such as the cache-unfriendly traversal in our module. diff --git a/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/4_sl_profile_OOT.md b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/4_sl_profile_OOT.md new file mode 100644 index 0000000000..a5950cd2ac --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/4_sl_profile_OOT.md @@ -0,0 +1,93 @@ +--- +title: Profile out-of-tree kernel module +weight: 5 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Use Streamline to profile an out-of-tree kernel module + +Arm Streamline is a tool that uses sampling to measure system performance. Instead of recording every single event (like instrumentation does, which can slow things down), it takes snapshots of hardware counters and system registers at regular intervals. This gives a statistical view of how the system runs, while keeping the overhead small. + +Streamline tracks many performance metrics such as CPU usage, execution cycles, memory access, cache hits and misses, and GPU activity. By putting this information together, it helps developers see how their code is using the hardware. Captured data is presented on a timeline, so you can see how performance changes as your program runs. This makes it easier to notice patterns, find bottlenecks, and link performance issues to specific parts of your application. + +For more details about Streamline and its features, refer to the [Streamline user guide](https://developer.arm.com/documentation/101816/latest/Getting-started-with-Streamline/Introduction-to-Streamline). + +Streamline is included with Arm Performance Studio, which you can download and use for free from [Arm Performance Studio downloads](https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Studio#Downloads). + +For step-by-step guidance on setting up Streamline on your host machine, follow the installation instructions provided in [Streamline installation guide](https://developer.arm.com/documentation/101816/latest/Getting-started-with-Streamline/Install-Streamline). + +### Pushing Gator to the Target and Making a Capture + +Once Streamline is installed on the host machine, you can capture trace data of our Linux kernel module. + +1. To communicate with the target, Streamline requires a daemon, called **gatord**, to be installed and running on the target. gatord must be running before you can capture trace data. There are two pre-built gatord binaries available in Streamline's install directory, one for *Armv7 (AArch32)* and one for *Armv8 or later(AArch64)*. Push **gatord** to the target device using **scp**. + + ```bash + scp /streamline/bin/linux/arm64/gatord root@:/root/gatord + # use arm instead of arm64, if your are using an AArch32 target + ``` + +2. Run gator on the target to start system-wide capture mode. + + ```bash + /root/gatord -S yes -a + ``` + + ![Gator command#center](./images/img01_gator_cmd.png) + +3. Open Streamline and choose *TCP mode*. + +4. Enter your target hostname or IP address. +![Streamline TCP settings#center](./images/img02_streamline_tcp.png) + +5. Click on *Select counters* to open the counter configuration dialogue, to learn more about counters and how to configure them please refer to [counter configuration guide](https://developer.arm.com/documentation/101816/latest/Capture-a-Streamline-profile/Counter-Configuration) + +6. Add `L1 data Cache: Refill` and `L1 Data Cache: Access` and enable Event-Based Sampling (EBS) for both of them as shown in the screenshot and click *Save*. + + {{% notice %}} + To learn more about EBS, please refer to [Streamline user guide](https://developer.arm.com/documentation/101816/9-7/Capture-a-Streamline-profile/Counter-Configuration/Setting-up-event-based-sampling) + {{% /notice %}} + + ![Counter configuration#center](./images/img03_counter_config.png) + +7. In the Command section, we will add the same shell command we used earlier to test our Linux module. + + ```bash + sh -c "echo 10000 > /dev/mychardrv" + ``` + + ![Streamline command#center](./images/img04_streamline_cmd.png) + +8. In the Capture settings dialog, select Add image, add your kernel module file `mychardrv.ko` and click Save. +![Capture settings#center](./images/img05_capture_settings.png) + +9. Start the capture and enter a name and location for the capture file. Streamline will start collecting data and the charts will show activity being captured from the target. +![Streamline timeline#center](./images/img06_streamline_timeline.png) + +### Analyze the capture and inspect the code + +Once the capture is stopped, Streamline automatically analyzes the collected data and provides insights to help identify performance issues and bottlenecks. This section describes how to view these insights, starting with locating the functions related to our kernel module and narrowing down to the exact lines of code that may be responsible for the performance problems. + +1. Open the *Functions tab*. In the counters list, select one of the counters we selected earlier in the counter configuration dialog, as shown: + +![Counter selection#center](./images/img07_select_datasource.png) + +2. In the Functions tab, observe that the function `char_dev_cache_traverse()` has the highest L1 Cache refill rate, which we already expected. + Also notice the Image name on the right, which is our module file name `mychardrv.ko`: + +![Functions tab#center](./images/img08_Functions_Tab.png) + +3. To view the call path of this function, right click on the function name and choose *Select in Call Paths*. + +4. You can now see the exact function that called `char_dev_cache_traverse()`. In the Locations column, notice that the function calls started in the userspace (echo command) and terminated in the kernel space module `mychardrv.ko`: +![Call paths tab#center](./images/img09_callpaths_tab.png) + +5. Since we compiled our kernel module with debug info, we will be able to see the exact code lines that are causing these cache misses. + To do so, double-click on the function name and the *Code tab* opens. This view shows you how much each code line contributed to the cache misses and in bottom half of the code view, you can also see the disassembly of these lines with the counter values of each assembly instruction: +![Code tab#center](./images/img10_code_tab.png) + +{{% notice Note %}} +You may need to configure path prefix substitution in the Code tab to view the source code correctly. For details on how to set this up and for more information about code analysis, please refer to [Streamline user guide](https://developer.arm.com/documentation/101816/latest/Analyze-your-capture/Analyze-your-code?lang=en) +{{% /notice %}} \ No newline at end of file diff --git a/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/5_inTree_kernel_driver.md b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/5_inTree_kernel_driver.md new file mode 100644 index 0000000000..cfa99ef04d --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/5_inTree_kernel_driver.md @@ -0,0 +1,60 @@ +--- +title: Build in-tree kernel driver +weight: 6 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Build an in-tree Linux kernel driver + +Now that we have learned how to build and profile an out-of-tree kernel module, we will move on to building a driver statically into the Linux kernel. We will then profile it by adding the kernel’s vmlinux file as an image in Streamline’s capture settings. This allows us to view function calls and call paths as before, and also inspect specific sections of the kernel code that may be contributing to performance issues. + +### Creating an in-tree simple character device driver + +We will use the same example character driver we used earlier `mychardrv` except that this time we will be statically linking it to the kernel. + +1. Go to your kernel source directory, in our case, it's located in Buildroot's output directory in `/output/build/linux-custom`. + +2. Copy the `mychardrv.c` file created earlier to `drivers/char` directory. + + ```bash + cd drivers/char + cp ./mychardrv.c + ``` + +3. Add the following configuration to the bottom of the `Kconfig` file to make the kernel configuration system aware of the the new driver we just added. + + ```plaintext + config MYCHAR_DRIVER + tristate "My Character Driver" + default y + help + A simple character device driver for testing. + endmenu + ``` + +4. We also need to modify the `Makefile` in the current directory to make it build the object file for `mychardrv.c`, so we'll add the following line to it. + + ```Makefile + obj-$(CONFIG_MYCHAR_DRIVER) += mychardrv.o + ``` + +### Rebuild and Run the Linux Image + +You can rebuild the Linux image simply by running the **make** command in your Buildroot directory. This rebuilds the Linux kernel including our new device driver and produce a debuggable `vmlinux` ELF file. + +```bash +cd +make -j$(nproc) +``` + +To verify that our driver was compiled into the kernel, you can run the following command: + +```bash +find -iname "mychardrv.o" +``` + +This should return the full path of the object file produced from compiling our character device driver. + +Now you can flash the new `sdcard.img` file produced to your target's SD card. To learn how to flash the sdcard.img file to your SD card, you can look at [this helpful article](https://www.ev3dev.org/docs/tutorials/writing-sd-card-image-ubuntu-disk-image-writer/). This time our driver will be automatically loaded when Linux is booted. diff --git a/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/6_sl_profile_inTree.md b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/6_sl_profile_inTree.md new file mode 100644 index 0000000000..18a729bf8c --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/6_sl_profile_inTree.md @@ -0,0 +1,28 @@ +--- +title: Profile in-tree kernel driver +weight: 7 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Use Streamline to profile an in-tree kernel driver + +Profiling in-tree drivers follows almost the same process as profiling an out-of-tree kernel module. The steps include: + +1. Transferring gator to the target device using scp. + +2. Launching Streamline, selecting TCP view, and entering the target’s IP or hostname. + +3. Setting up counters and enabling Event-Based Sampling (EBS). + +The main difference is that, instead of adding the kernel module’s object file as the capture image in Capture settings, we now use the Linux ELF file (vmlinux) generated by Buildroot. + +![Vmlinux capture settings#center](./images/img11_vmlinux_capture_settings.png) + +After clicking Save in Capture settings dialog, you can start the capture and analyze it as we did before. +![Vmlinux function tab#center](./images/img12_vmlinux_function_tab.png) + +Since we used vmlinux image we can view our driver functions as well as all other kernel functions that were sampled during our capture. +You can also view the full Call path of any sampled function within the kernel. +![Vmlinux call paths tab#center](./images/img13_vmlinux_callpaths_tab.png) diff --git a/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/7_sl_SPE.md b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/7_sl_SPE.md new file mode 100644 index 0000000000..abb5729d4e --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/7_sl_SPE.md @@ -0,0 +1,28 @@ +--- +title: Using Streamline with Statistical Profiling Extension +weight: 8 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Using the Statistical Profiling Extension (SPE) for better analysis + +With periodic sampling, Streamline collects CPU performance data using hardware counters and software interrupts. Hardware counters only give totals, so you can’t see which exact instructions caused the events. At best, you can link the counts to a broad section of code. This makes it harder to pinpoint problems. Sampling the Program Counter (PC) or call stack is also limited, since software timers handle both sampling and unwinding. + +The Statistical Profiling Extension (SPE) removes these limits. It samples the PC in hardware, directly inside the CPU pipeline. This adds almost no overhead, so the sampling rate can be much higher. SPE also records extra details about each sampled instruction, giving a much clearer view of how the code runs. For more details on SPE and how it works in Streamline see [this blog post](https://community.arm.com/arm-community-blogs/b/tools-software-ides-blog/posts/introduction-to-statistical-profiling-support-in-streamline). + +To find out if your target supports SPE, please see [Streamline user guide](https://developer.arm.com/documentation/101816/9-7/Capture-a-Streamline-profile/Counter-Configuration/Configure-SPE-counters). + +### Profiling Kernel Module Using SPE + +To profile both in-tree and out-of-tree kernel modules, we can use the same setup steps as before. The only change is to add “Arm Statistical Profiling Extension” to the Events to Collect list in the Counter Configuration dialog. +![SPE counter selection#center](./images/img14_spe_select_counters.png) + +After saving the counter configurations, Click Start capture to begin data collection, then wait for Streamline to analyze results. + +To view SPE counter values, Select SPE in the data source drop-down in the Call paths, Functions, or Code view. + +As shown in the image, SPE provides much more data about the profiled code than Event-Based Sampling (EBS), which provides us with deep insights into the CPU performance bottlenecks with very low overhead. It's also possible to view or hide columns from the table in Call paths or Functions views by menu-clicking on the table header and choosing from the list of columns. + +![SPE function tab#center](./images/img15_spe_function_tab.gif) diff --git a/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/8_summary.md b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/8_summary.md new file mode 100644 index 0000000000..ef91418e51 --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/8_summary.md @@ -0,0 +1,12 @@ +--- +title: Summary +weight: 9 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- +## Summary + +In this learning path, we learned how to build and profile Linux kernel modules step by step. We started with an out-of-tree character driver that had a cache performance issue and then used Arm Streamline to spot where the problem was. Later, we tried the same idea with an in-tree driver and saw how profiling works with the full kernel. Although the example problem was simple, the same methods apply to complex, real-world drivers and scenarios. + +The key takeaway is that profiling isn’t just about making code faster—it’s about understanding how your code talks to the hardware. Streamline gives us a clear picture of what’s happening inside the CPU so we can write better, more efficient drivers. By learning to identify bottlenecks, you will be more confident in fixing them and avoiding common mistakes in kernel programming. diff --git a/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/_index.md b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/_index.md new file mode 100644 index 0000000000..56f917249c --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/_index.md @@ -0,0 +1,63 @@ +--- +title: Profiling the Linux kernel with Arm Streamline + +draft: true +cascade: + draft: true + +minutes_to_complete: 60 + +who_is_this_for: Software developers and performance engineers interested in profiling Linux kernel performance. + +learning_objectives: + - Understand the importance of profiling Linux kernel modules. + - Learn how to set up and use Arm Streamline for kernel profiling. + - Gain hands-on experience in profiling both out-of-tree and in-tree kernel modules. + - Learn to interpret profiling data to identify performance bottlenecks. + - Understand the benefits of using the Statistical Profiling Extension (SPE) for enhanced profiling. + +prerequisites: + - Basic understanding of Linux kernel development and module programming + - Arm-based Linux target device (such as a Raspberry Pi, BeagleBone, or similar board) with SSH access + - Host machine that meets [Buildroot system requirements](https://buildroot.org/downloads/manual/manual.html#requirement) + +author: Yahya Abouelseoud + +### Tags +skilllevels: Advanced +subjects: Performance and Architecture +armips: + - Cortex-A + - Neoverse +tools_software_languages: + - Arm Streamline + - Arm Performance Studio + - Linux kernel + - Performance analysis +operatingsystems: + - Linux + + + +further_reading: + - resource: + title: Streamline user guide + link: https://developer.arm.com/documentation/101816/latest/Capture-a-Streamline-profile/ + type: documentation + - resource: + title: Arm Performance Studio Downloads + link: https://developer.arm.com/Tools%20and%20Software/Streamline%20Performance%20Analyzer#Downloads + type: website + - resource: + title: Streamline video tutorial + link: https://developer.arm.com/Additional%20Resources/Video%20Tutorials/Arm%20Mali%20GPU%20Training%20-%20EP3-3 + type: website + + + +### FIXED, DO NOT MODIFY +# ================================================================================ +weight: 1 # _index.md always has weight of 1 to order correctly +layout: "learningpathall" # All files under learning paths have this same wrapper +learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content. +--- diff --git a/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/_next-steps.md b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/_next-steps.md new file mode 100644 index 0000000000..c3db0de5a2 --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/_next-steps.md @@ -0,0 +1,8 @@ +--- +# ================================================================================ +# FIXED, DO NOT MODIFY THIS FILE +# ================================================================================ +weight: 21 # Set to always be larger than the content in this path to be at the end of the navigation. +title: "Next Steps" # Always the same, html page title. +layout: "learningpathall" # All files under learning paths have this same wrapper for Hugo processing. +--- diff --git a/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/images/img01_gator_cmd.png b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/images/img01_gator_cmd.png new file mode 100644 index 0000000000..98b1042236 Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/images/img01_gator_cmd.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/images/img02_streamline_tcp.png b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/images/img02_streamline_tcp.png new file mode 100644 index 0000000000..25c334860e Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/images/img02_streamline_tcp.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/images/img03_counter_config.png b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/images/img03_counter_config.png new file mode 100644 index 0000000000..c61ef7b6c4 Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/images/img03_counter_config.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/images/img04_streamline_cmd.png b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/images/img04_streamline_cmd.png new file mode 100644 index 0000000000..595e7a71fc Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/images/img04_streamline_cmd.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/images/img05_capture_settings.png b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/images/img05_capture_settings.png new file mode 100644 index 0000000000..28788e96a7 Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/images/img05_capture_settings.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/images/img06_streamline_timeline.png b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/images/img06_streamline_timeline.png new file mode 100644 index 0000000000..a411bb1d5d Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/images/img06_streamline_timeline.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/images/img07_select_datasource.png b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/images/img07_select_datasource.png new file mode 100644 index 0000000000..4c6231e82e Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/images/img07_select_datasource.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/images/img08_Functions_Tab.png b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/images/img08_Functions_Tab.png new file mode 100644 index 0000000000..cd23986177 Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/images/img08_Functions_Tab.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/images/img09_callpaths_tab.png b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/images/img09_callpaths_tab.png new file mode 100644 index 0000000000..69d6eff093 Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/images/img09_callpaths_tab.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/images/img10_code_tab.png b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/images/img10_code_tab.png new file mode 100644 index 0000000000..78192a3cc5 Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/images/img10_code_tab.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/images/img11_vmlinux_capture_settings.png b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/images/img11_vmlinux_capture_settings.png new file mode 100644 index 0000000000..bb84649231 Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/images/img11_vmlinux_capture_settings.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/images/img12_vmlinux_function_tab.png b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/images/img12_vmlinux_function_tab.png new file mode 100644 index 0000000000..899502db42 Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/images/img12_vmlinux_function_tab.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/images/img13_vmlinux_callpaths_tab.png b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/images/img13_vmlinux_callpaths_tab.png new file mode 100644 index 0000000000..231e7eaa5e Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/images/img13_vmlinux_callpaths_tab.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/images/img14_spe_select_counters.png b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/images/img14_spe_select_counters.png new file mode 100644 index 0000000000..e7dbc5d6b2 Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/images/img14_spe_select_counters.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/images/img15_spe_function_tab.gif b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/images/img15_spe_function_tab.gif new file mode 100644 index 0000000000..d5e54d08a7 Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/streamline-kernel-module/images/img15_spe_function_tab.gif differ