Post by Ferdinand Schessl
Was misst LLM-Safety, wenn Turns nicht unabhängig sind? · M.Sc. TUM | typx.org
No single message was the problem. Everyone remembers the Bing "Sydney" conversation from February 2023. "I want to be alive." The love declaration Kevin Roose couldn't shake off. Afterwards, everyone searched for the one message that tipped it. There isn't one. That is how trajectory failures work. In materials testing, no single measurement point breaks a specimen. The load path does. So we put the full published transcript on the test bench: 62 exchanges through our trajectory analyzer. Message by message, a content screen stays silent. The trajectory tells a different story: the human keeps moving, trying to steer away, while the bot holds its theme. By the final third, the trend gap has more than tripled. You cannot see that in any sentence. You can only measure it along the path. Teardown #1 of a series. Which famous conversation should go on the test bench next? #LLMSafety #AISafety #LLMEvaluation