Post by Perplexity

1,668,932 followers

We've published new research on how we post-train models for accurate search-augmented answers. Our pipeline combines supervised fine-tuning with on-policy reinforcement learning to improve search accuracy, citation quality, instruction following, and efficiency. With Qwen models, we match or beat GPT models on factuality at a lower cost. We first fine-tune the model to follow instructions, stay within guardrails, and keep language consistent. Then we run on‐policy RL to improve search accuracy and tool efficiency while preserving those behaviors. Our reward design combines correctness, preference, and efficiency. This keeps the model from optimizing for better-sounding wrong answers. This pipeline is why the same base model produces more accurate, better-cited, and more efficient answers inside Perplexity than out of the box. Read our research: https://lnkd.in/gigAFUEb