Post by Leah von der Heyde

polling & AI | data (e)quality | PhD Social Data Science & Research Methodology

🤔 Can LLMs be used for coding open-ended survey responses? Find out in our new preprint! TL;DR: Well, it depends... We compared several prompting approaches on recent LLMs to human expert codings of German responses and found that 💡overall performance differs greatly between LLMs, with GPT-4o outperforming the most recent multilingual open-source models. 💡 only a fine-tuned LLM achieves satisfactory levels of classification performance. 💡LLMs perform differently well on different categories, which results in different distributions. 👉 Since fine-tuning still requires a substantial amount of human coding, using LLMs for coding open-ends may not be as efficient as one might hope. 👉 As the usability of LLMs for this task does not generalize across contexts, LLMs, and prompting approaches, each use case needs thorough validation before deployment. Want to learn more? 🔗 Link to preprint in the comments! 🤓 I'll be presenting this paper at ESRA (European Survey Research Association) later this summer – see you there! Super happy with the smooth collaboration with Bernd Weiß and Jessica Daikeler at GESIS - Leibniz Institute for the Social Sciences and KODAQS and Caro Haensch at Ludwig-Maximilians-Universität München for this project, as well as the support from Munich Center for Machine Learning ✨