FFT : cufft backend #2756

pfeatherstone · 2023-04-03T07:50:01Z

pfeatherstone
Apr 3, 2023

I suggest maybe adding a cuFFT backend implementation of dlib::fft. Maybe we give it another name like dlib::cu::fft so that applications can use both CPU and GPU. This won't be useful for small FFTs but sizes >= 1024x1024 this will definitely help. I did a quick test with FFT size 32x1024x1024. With MKL it took around 400ms (single threaded). With cuFFT it took around 3ms. So this is a win.

pfeatherstone · 2023-04-03T09:24:15Z

pfeatherstone
Apr 3, 2023
Author

2 replies

arrufat Apr 3, 2023

There is already dlib::cuda::set_device(int dev); :)

pfeatherstone Apr 3, 2023
Author

Ah sorry. I'll delete this thread.

davisking · 2023-04-04T03:52:34Z

davisking
Apr 4, 2023
Maintainer

Yeah that would be neat. That timing you did includes the transfer time to and from the GPU? Normally you want to organize the code to hide those transfer times.

7 replies

davisking Apr 4, 2023
Maintainer

Yeah. To do this right requires using an object that can be either a GPU or CPU hosted block of memory under the covers so you can hide the latency of the transfer. Which is something that can't really be hidden from the user, since they have to make a point of not interlacing CPU and GPU statements or it wrecks the throughput.

Python kinda gets away with it since python is so profoundly slow everyone is in the habit of writing code where each statement does a huge amount of work. Since you can't have loops that do lots of work in python without it taking forever. But in C++ it's totally normal to go into loops that do a bunch of tiny matrix operations of some kind or another, and that doesn't work at all if one of them is silently talking to the GPU since the transfer latency is so huge.

davisking Apr 4, 2023
Maintainer

That is to say, I don't think a C++ library can or should hide this kind of detail since it will invariably make the user's code anomalously slow.

I do think pytorch has the right kind of API for this though, with the explicit .to() calls clearly flagging to the user when the transfers happen. Not hating on pytorch, they did a wonderful job with their API.

pfeatherstone Apr 4, 2023
Author

Maybe this is a lot of work to get right. If we simply use cuFFT + implicit host/device management (or cuFFTW which does that for you behind an FFTW API) then the performance gains are minimal. You're better off using MKL with TBB or openMP enabled

pfeatherstone Apr 4, 2023
Author

Or extend dlib::matrix to support CUDA. All the BLAS stuff can be dealt with by cuBLAS. That seems like a ton of work.

davisking Apr 5, 2023
Maintainer

Yeah it is a major deal to make dlib::matrix have a cuda backend. Realistically it can't be done without making the API really non-cohesive since there would invariably be a ton of things that only work for CPU matrices or CUDA matrices. And that would only be known a runtime so there would be a lot of things that could compile but then have to emit some kind of "oh this isn't supported".

It would be better to have some other type for a gpu or cpu memory backed array and then have functions specifically setup for that that always work on both cpu and gpu. I mean that's what the tensor class is when you get down to it.

I kinda don't like the dlib tensor class. I've since designed much nicer stuff at work and now when I see the one in dlib it's like eh, could be better. The second or third time you make something it's always way better.

pfeatherstone · 2023-05-12T08:56:11Z

pfeatherstone
May 12, 2023
Author

I kinda don't like the dlib tensor class. I've since designed much nicer stuff at work and now when I see the one in dlib it's like eh, could be better. The second or third time you make something it's always way better.

Do you plan on adding this to dlib? That would be awesome if we ever planned to revamp the DNN stuff into something maybe a bit more pytorch-like.

3 replies

davisking May 12, 2023
Maintainer

Na, wrote it at work so it's a bother to get it into open source.

pfeatherstone May 12, 2023
Author

Write it a 4th time and open-source it :)

davisking May 13, 2023
Maintainer

Yeah at some point :)

FFT : cufft backend #2756

Uh oh!

Uh oh!

pfeatherstone Apr 3, 2023

Replies: 3 comments · 12 replies

Uh oh!

pfeatherstone Apr 3, 2023 Author

Uh oh!

arrufat Apr 3, 2023

Uh oh!

pfeatherstone Apr 3, 2023 Author

Uh oh!

davisking Apr 4, 2023 Maintainer

Uh oh!

davisking Apr 4, 2023 Maintainer

Uh oh!

davisking Apr 4, 2023 Maintainer

Uh oh!

pfeatherstone Apr 4, 2023 Author

Uh oh!

pfeatherstone Apr 4, 2023 Author

Uh oh!

davisking Apr 5, 2023 Maintainer

Uh oh!

pfeatherstone May 12, 2023 Author

Uh oh!

davisking May 12, 2023 Maintainer

Uh oh!

pfeatherstone May 12, 2023 Author

Uh oh!

davisking May 13, 2023 Maintainer

pfeatherstone
Apr 3, 2023

Replies: 3 comments 12 replies

pfeatherstone
Apr 3, 2023
Author

pfeatherstone Apr 3, 2023
Author

davisking
Apr 4, 2023
Maintainer

davisking Apr 4, 2023
Maintainer

davisking Apr 4, 2023
Maintainer

pfeatherstone Apr 4, 2023
Author

pfeatherstone Apr 4, 2023
Author

davisking Apr 5, 2023
Maintainer

pfeatherstone
May 12, 2023
Author

davisking May 12, 2023
Maintainer

pfeatherstone May 12, 2023
Author

davisking May 13, 2023
Maintainer