Post by Chip Huyen

AI x stuff

RLHF: Reinforcement Learning from Human Feedback Link: https://lnkd.in/gWYZW8KA This has been a really fun post to write! RLHF is one of the coolest things I saw in NLP research in the last few years. RL has been notoriously difficult to work with, and therefore, mostly confined to gaming and simulated environments. It’s impressive to see it work in a new domain at a massive scale. This post discusses the 3 phases of training a model like ChatGPT, the challenge each phase is supposed to solve, the intuition of why it works, and the mathematical formulation: 1. Pretraining for completion 2. Supervised finetuning for dialogue 3. RLHF We'll also discuss the relationship between RLHF and hallucination as well as hypotheses on why models hallucinate. RLHF is supposed to help with hallucination, but InstructGPT experiment shows that RLHF actually made hallucination worse. As always, feedback is much appreciated! #llm #chatgpt #rlhf #mlops #mlengineering