-
Notifications
You must be signed in to change notification settings - Fork 728
Description
🚀 The feature
The development of a Gammatone Filterbank and Gammatone Spectrogram in torchaudio to provide another method of audio feature extraction
The Gammatone Filterbank uses Equal Rectangular Bandwith scale (ERB) to scale the frequency range which are closely associated with the human auditory filters found within the cochlea. Furthermore, a gammatone filter is used which is a non-uniform band-pass filter which is suggested to be an improved method to filters such as the roex (rounded-exponential filter) used in ISO 532-2. Ultimately, this feature aims to be an improvement on the Mel Scale and filterbank which uses triangular filtering.
The gammatone filter will use Scipy to obtain the required filters and associated frequency response. The user will be able to obtain both the Gammatone Filterbanks and the Gammatone Spectrogram through a single transformation on both mono and stereo sources (or any number of channels).
Motivation, pitch
The motivation behind this development is to provide an audio feature extraction which is more closely associated with the understanding of the human auditory filters found within the cochlea. This will use the ERB Scale and Gammatone filters to create a filterbank and spectrogram. The Gammatone filterbank and Gammatone Spectrogram will be a useful feature for audio classification purposes.
Alternatives
The only direct alternative is found in Matlab which implements the Gammatone Filterbank.
Additional context
The features have already been tested on a real-world environemntal sound classification issue with improved results. Furthermore, the Gammatone Spectrogram produces a cleaner spectrogram which has been attached. Note that the inputs for each spectrogram function (i.e. n_fft, win_len, hop_len, n_bands) are the same. The audio sample is that of a person speaking at a distance from the microphone.
