Enterprise AI company Cohere has launched its first voice model called Transcribe, an open-source automatic speech recognition tool. Designed for tasks such as note-taking and transcription, it supports 14 languages, including some of the world's most spoken ones.
The model is lightweight with only 2 billion parameters, making it suitable for self-hosting on consumer-grade GPUs. According to Cohere, Transcribe outperforms other models like Zoom Scribe v1 and IBM Granite 4.0 1B in accuracy, achieving a WER of 5.42.
However, it struggled with languages such as Portuguese, German, and Spanish, suggesting that while the model is robust across many tongues, there's still room for improvement when dealing with more complex linguistic structures.
Cohere plans to integrate Transcribe into its enterprise agent orchestration platform North. The model will be freely available through Cohere’s API and Model Vault, aiming to tap into the growing demand for note-taking and dictation apps like Granola and Wispr Flow.







