Post by DeepInfra

3,501 followers

Today we’re adding Day 0 support for open NVIDIA Nemotron models on DeepInfra. The new Nemotron 3 models from NVIDIA are built specifically for agentic AI. They’re not just powerful, but the right model for each layer of the agent stack. We’re launching with two models today. Together, Nemotron 3 Ultra and Nemotron 3.5 Content Safety cover reasoning and safety layers of the agent stack from day 0: Nemotron 3 Ultra: 550B hybrid Transformer-Mamba MoE with 55B active parameters and up to 1M context delivers frontier-reasoning and orchestration to long-running agent workflows like coding agents, deep research, and complex planning. Designed for efficient agent execution, it delivers up to 5x faster inference and up to 30% lower cost for agentic workloads. Nemotron 3.5 Content Safety: a compact 4B multimodal, multilingual safety model covering 23 categories across text, images, and custom policies. Outputs a safe/unsafe classification with a reasoning trace. Designed to run as a guardrail layer without adding meaningful latency. Nemotron 3.5 ASR (coming soon): a 0.6B streaming ASR model covering 40 language locales with native punctuation and capitalization. Cache-aware architecture designed for high-concurrency real-time voice agents — not batch transcription. The voice layer for the agent stack The framing we find compelling: in agentic AI, the benchmark that matters isn’t just model quality. It’s speed of task completion at a given accuracy. Nemotron is built around that idea. Both models are live now on DeepInfra via our standard API. NVIDIA Read about it here: https://lnkd.in/gCyY8YRW