Skip to content

Conversation

@andrew-k-park
Copy link
Contributor

Details:

  • When FP16 dynamic convolution has small input channels (≤4) and large output channels (e.g., 1024), the current format selection logic chooses bfyx → fsv16, which triggers oneDNN reference kernel instead of optimized JIT kernel, resulting in significant performance degradation.
  • Override output format to planar (bfyx) when input channels are small (≤ 16), and output channels are large (≥ 32)

Current behavior:

  • Input: 3 channels → Converted to bfyx
  • Output: 1024 channels → Remains fsv16 (only changed when output ≤ 4)
  • Result: bfyx → fsv16 combination uses reference kernel (slow)

Root Cause

The fsv16 blocked format is optimized for reading many channels but introduces overhead when used for writing outputs in channel-expansion scenarios (small input → large output). oneDNN's reference kernel is selected because:

  1. Inefficient write pattern: fsv16 output requires interleaved writes every 16 elements (non-contiguous)
  2. No optimized implementation: oneDNN doesn't provide JIT-optimized kernel for fsv16 output generation from small input channels
  3. Scatter write overhead: Writing 1024 channels in fsv16 format requires complex block-strided access

Tickets:

@andrew-k-park andrew-k-park requested review from a team as code owners December 5, 2025 07:08
@github-actions github-actions bot added the category: GPU OpenVINO GPU plugin label Dec 5, 2025
@andrew-k-park andrew-k-park force-pushed the fix_fp16_conv_format_selection branch 2 times, most recently from dfa6239 to 1c2d830 Compare December 9, 2025 12:18
@andrew-k-park
Copy link
Contributor Author

no perf regression

@e-ddykim e-ddykim added this pull request to the merge queue Dec 10, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 10, 2025
@andrew-k-park andrew-k-park added this pull request to the merge queue Dec 10, 2025
github-merge-queue bot pushed a commit that referenced this pull request Dec 10, 2025
…ion operations (#33131)

### Details:
- When FP16 dynamic convolution has small input channels (≤4) and large
output channels (e.g., 1024), the current format selection logic chooses
`bfyx → fsv16`, which triggers oneDNN reference kernel instead of
optimized JIT kernel, resulting in significant performance degradation.
- Override output format to planar (bfyx) when input channels are small
(≤ 16), and output channels are large (≥ 32)

**Current behavior:**
- Input: 3 channels → Converted to `bfyx`
- Output: 1024 channels → Remains `fsv16` (only changed when output ≤ 4)
- Result: `bfyx → fsv16` combination uses **reference kernel** (slow)

#### Root Cause
The fsv16 blocked format is optimized for reading many channels but
introduces overhead when used for writing outputs in channel-expansion
scenarios (small input → large output). oneDNN's reference kernel is
selected because:

1. **Inefficient write pattern**: fsv16 output requires interleaved
writes every 16 elements (non-contiguous)
2. **No optimized implementation**: oneDNN doesn't provide JIT-optimized
kernel for fsv16 output generation from small input channels
3. **Scatter write overhead**: Writing 1024 channels in fsv16 format
requires complex block-strided access


### Tickets:
 - [CVS-177671](https://jira.devtools.intel.com/browse/CVS-177671)

Signed-off-by: Andrew Park <andrew.park@intel.com>
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 10, 2025
@andrew-k-park andrew-k-park force-pushed the fix_fp16_conv_format_selection branch from 1c2d830 to 47d734e Compare December 10, 2025 06:45
@p-durandin p-durandin added this to the 2026.0 milestone Dec 10, 2025
@andrew-k-park andrew-k-park added this pull request to the merge queue Dec 10, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 10, 2025
@andrew-k-park andrew-k-park added this pull request to the merge queue Dec 10, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 10, 2025
@andrew-k-park andrew-k-park force-pushed the fix_fp16_conv_format_selection branch from 47d734e to 3380c20 Compare December 10, 2025 12:21
…ge channel expansion

Signed-off-by: Andrew Park <andrew.park@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: GPU OpenVINO GPU plugin

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants