Post by Bik Huy
Proven track record of building highly-complex and hyper-optimized software products.
I got Kyutai's Hibiki-Zero, a 3B real-time speech-to-speech translation model, running 3x faster than real time on a MacBook M4 Pro. The path was more interesting than I expected: an MPS port, an MLX q4 rewrite, four moshi-mlx runtime fixes, one missing depformer LayerNorm, and a CPU/GPU pipeline that made the hardware stop waiting on itself. Final result: - 0.7x -> 3.0x real-time - 5.8 GB -> 2.2 GB LM footprint - clean translated audio on Apple Silicon #MachineLearning #AppleSilicon #MLX #SpeechTranslation #OnDeviceAI