-
Notifications
You must be signed in to change notification settings - Fork 729
Open
Description
🐛 Describe the bug
When I try to load a 43-second .wav file, the memory consumption increases, which causes the session to crash. I have about 12GB of RAM.
This is the piece of code that I have
from transformers import ClapProcessor, ClapModel
import torchaudio
import torch
# Setup device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Load CLAP model and processor
model = ClapModel.from_pretrained("laion/clap-htsat-unfused").to(device)
processor = ClapProcessor.from_pretrained("laion/clap-htsat-unfused")
# Load audio file
audio, sr = torchaudio.load("/content/temp_audio_6169.wav")
# Resample to 48kHz if needed
if sr != 48000:
audio = torchaudio.transforms.Resample(sr, 48000)(audio)
# Convert stereo to mono
if audio.shape[0] > 1:
audio = audio.mean(dim=0)
# Limit to 10 seconds (CLAP expects max 480000 samples at 48kHz)
audio = audio[:480000]
# Process audio
inputs = processor(audios=audio, sampling_rate=48000, return_tensors="pt")
inputs = {k: v.to(device) for k, v in inputs.items()} # Move to GPU
# Extract audio embedding
with torch.no_grad():
audio_embedding = model.get_audio_features(**inputs)
print(":loud_sound: Audio embedding shape:", audio_embedding.shape)
audio file
temp_audio_6169.zip
Metadata
Metadata
Assignees
Labels
No labels