Post by Kyutai
29,439 followers
Real-time speech translation just got easier, better, faster, stronger. Today we're open-sourcing Hibiki-Zero, a real-time speech translation model that rethinks how to approach this problem. The model translates 🇫🇷French, 🇪🇸Spanish, 🇵🇹Portuguese and 🇩🇪German to English: accurate, low-latency, high audio quality with voice transfer. Building on our original Hibiki architecture, we've introduced a new training approach that eliminates the need for complex synthetic data heuristics. Instead of hand-crafted word-level translation alignments, Hibiki-Zero learns them automatically starting from sentence-level supervision followed by reinforcement learning. The result: simpler training, lower latency, and easy scaling to multiple input languages. The RL reward function is elegantly simple, relying directly on the BLEU scores of intermediate translations. Translating early therefore increases the reward, while mistranslating decreases it. A single parameter lets us tune the final translation quality/latency trade-off learnt by the model. As always, the model is open source, with easy-to-use inference code available. Links in comments.
Video Content