FFT : cufft backend #2756
Replies: 3 comments 12 replies
-
Beta Was this translation helpful? Give feedback.
-
Yeah that would be neat. That timing you did includes the transfer time to and from the GPU? Normally you want to organize the code to hide those transfer times. |
Beta Was this translation helpful? Give feedback.
-
Do you plan on adding this to dlib? That would be awesome if we ever planned to revamp the DNN stuff into something maybe a bit more pytorch-like. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I suggest maybe adding a cuFFT backend implementation of
dlib::fft
. Maybe we give it another name likedlib::cu::fft
so that applications can use both CPU and GPU. This won't be useful for small FFTs but sizes >= 1024x1024 this will definitely help. I did a quick test with FFT size 32x1024x1024. With MKL it took around 400ms (single threaded). With cuFFT it took around 3ms. So this is a win.Beta Was this translation helpful? Give feedback.
All reactions