[kernel] added half2 specialization for layernorm kernel #139

dongxianzhe · 2024-04-22T06:14:10Z

optimize layernorm kernel using half2 type
test layernorm kernel

guocuimi · 2024-04-22T18:09:09Z

src/kernels/layernorm_kernels.cu

+}
+
+template <>
+void invoke_layernorm_kernel<half2>(half2* out,


sounds this template specializations are optional since they are covered by the general template. no?

guocuimi · 2024-04-22T18:10:10Z

src/kernels/layernorm_kernels.cu

+                                   const float epsilon,
+                                   int m,
+                                   int n) {
+  int half_n = n / 2;


what if n % 2 != 0?

sounds you didn't cover this in unittest.

guocuimi · 2024-04-22T18:12:14Z

src/kernels/layernrom_kernels_test.cu

+  float* dinput;
+  float* dweight;
+  float* dbias;
+  cudaMalloc((void**)&dout, sizeof(float) * m * n);


use torch::tensor to allocate memory

guocuimi · 2024-04-22T18:13:13Z

src/kernels/layernrom_kernels_test.cu

+      torch::nn::functional::LayerNormFuncOptions({n}).weight(weight).bias(
+          bias));
+
+  half* hout = (half*)malloc(m * n * sizeof(half));


guocuimi · 2024-04-22T18:14:16Z

src/kernels/layernrom_kernels_test.cu

+  cudaMemcpy(dweight, hweight, sizeof(half) * n, cudaMemcpyHostToDevice);
+  cudaMemcpy(dbias, hbias, sizeof(half) * n, cudaMemcpyHostToDevice);
+
+  llm::kernel::invoke_layernorm_kernel<half>(


just test llm::kernel::layer_norm instead but pass in different length of input to trigger different kernel.

guocuimi

thanks for adding the optimization. could you also add benchmark to show the improvements? thanks

…nitest and just test llm::kernel::layer_norm

guocuimi reviewed Apr 22, 2024

View reviewed changes

guocuimi changed the title ~~[op] layernorm kernel~~ [kernel] added half2 specialization for layernorm kernel Apr 22, 2024

Xianzhe Dong added 4 commits April 27, 2024 02:11

[op] optimize layernorm kernel for half2 type

bd5b91f

[ut] add layernorm kernel unitest

14a99f7

use gtest library rewrite layernorm kernel unitest

6ff1738

added layernorm kernel half2 unit test using gtest library

bc9f7e2

dongxianzhe force-pushed the op/layernorm_kernel branch from e18e337 to bc9f7e2 Compare April 27, 2024 06:22

[refactor] use torch::tensor to allocate memory in layernorm kernel u…

faaec07

…nitest and just test llm::kernel::layer_norm

guocuimi force-pushed the main branch from a158018 to e5f18d9 Compare May 27, 2025 02:45

guocuimi force-pushed the main branch from 19f5798 to 378a1f1 Compare June 18, 2025 21:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[kernel] added half2 specialization for layernorm kernel #139

[kernel] added half2 specialization for layernorm kernel #139

Uh oh!

dongxianzhe commented Apr 22, 2024

Uh oh!

guocuimi Apr 22, 2024

Uh oh!

guocuimi Apr 22, 2024

Uh oh!

guocuimi Apr 22, 2024

Uh oh!

guocuimi Apr 22, 2024

Uh oh!

guocuimi Apr 22, 2024

Uh oh!

guocuimi Apr 22, 2024

Uh oh!

guocuimi left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[kernel] added half2 specialization for layernorm kernel #139

Are you sure you want to change the base?

[kernel] added half2 specialization for layernorm kernel #139

Uh oh!

Conversation

dongxianzhe commented Apr 22, 2024

Uh oh!

guocuimi Apr 22, 2024

Choose a reason for hiding this comment

Uh oh!

guocuimi Apr 22, 2024

Choose a reason for hiding this comment

Uh oh!

guocuimi Apr 22, 2024

Choose a reason for hiding this comment

Uh oh!

guocuimi Apr 22, 2024

Choose a reason for hiding this comment

Uh oh!

guocuimi Apr 22, 2024

Choose a reason for hiding this comment

Uh oh!

guocuimi Apr 22, 2024

Choose a reason for hiding this comment

Uh oh!

guocuimi left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants