Smashing Automatic Speech Recognition Models with C

This tutorial demonstrates how to use the pruna package to optimize any custom whisper model. In this case the outputted model is an optimized whisper model. We will use the openai/whisper-large-v3 model as an example.

Loading the ASR model

First, load your asr model.

import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
from datasets import load_dataset


device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

model_id = "openai/whisper-large-v3"

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)

processor = AutoProcessor.from_pretrained(model_id)

Initializing the Smash Config

Next, initialize the smash_config.

from pruna_engine.SmashConfig import SmashConfig

# Initialize the SmashConfig
smasher_config = SmashConfig()
smasher_config['task'] = 'audio_text_transcription'
smasher_config['compilers'] = 'c_whisper'
smasher_config['processor'] = processor
#uncomment the following line to quantize the model to 8 bits
# smasher_config['n_quantization_bits'] = 8

Smashing the Model

Now, smash the model.

from pruna.smash import smash

# Smash the model
smashed_model = smash(
    model=model,
    api_key='<your-api-key>',  # replace <your-api-key> with your actual API key
    smash_config=smash_config,
)

Don’t forget to replace the api_key by the one provided by PrunaAI.

Preparing the Input

dataset = load_dataset("distil-whisper/librispeech_long", "clean", split="validation")
sample = dataset[0]["audio"]
input_features = processor(sample["array"], sampling_rate=sample["sampling_rate"], return_tensors="pt").input_features.cuda().half()
prompt = processor.get_decoder_prompt_ids(language="english", task="transcribe")

Running the Model

Finally, run the model to transcribe the audio file.

# Display the result
results = smashed_model(input_features)
processor.decode(results, skip_special_tokens=False)

Wrap Up

Congratulations! You have successfully smashed an ASR model. You can now use the pruna package to optimize any custom ASR model. The only parts that you should modify are step 1 and step 5 to fit your use case.