Post by AIxBlock, Inc
8,142 followers
Your ASR model hit 5% WER on the benchmark. Then 25% on real call-center audio. Nothing was wrong with the model. The training data was collected the way most speech data still is: clean read speech, studio mic, quiet room. Production audio looks nothing like that. This is the gap most teams find after deployment, not before. How speech data is collected for ASR decides production accuracy more than model architecture does, and the decisions that matter happen at kickoff. A few places collection quietly breaks: šļø Read speech alone regresses 15 to 25 WER points the moment the model meets real conversation š§ Collecting on the wrong microphone class produces audio that sounds nothing like the deployment channel š Clean rooms make strong benchmarks and weak production behavior The expensive part is what most WER regressions actually are. Not model failures. Collection-protocol failures that surface at evaluation, when fixing them means starting the data over. We unpack the full collection process in our latest newsletter: scripted vs spontaneous, devices, environments, and the metadata that decides whether a corpus survives audit. Read the full newsletter below ā #SpeechAI #ASR #VoiceAI #AIxBlock #TrainingData #SpeechRecognition #ConversationalAI #DataQuality