The whisperX library support speaker diarization. It's not perfect, but we should still support it as an optional setting.