Post by OECD.AI
55,626 followers
๐๐ณ ๐๐ต๐ฒ ๐ถ๐ป๐๐ฒ๐ฟ๐ป๐ฒ๐ ๐ฐ๐ผ๐ป๐๐ฎ๐ถ๐ป๐ ๐ต๐๐ป๐ฑ๐ฟ๐ฒ๐ฑ๐ ๐ผ๐ณ ๐ฏ๐ถ๐น๐น๐ถ๐ผ๐ป๐ ๐ผ๐ณ ๐๐ฒ๐ฏ๐ฝ๐ฎ๐ด๐ฒ๐, ๐๐ต๐ ๐ฎ๐ฟ๐ฒ ๐๐ ๐ฑ๐ฒ๐๐ฒ๐น๐ผ๐ฝ๐ฒ๐ฟ๐ ๐ณ๐ฎ๐ฐ๐ถ๐ป๐ด ๐ฎ ๐ฑ๐ฎ๐๐ฎ ๐๐ต๐ผ๐ฟ๐๐ฎ๐ด๐ฒ? A new AI Wonk blog from the #Inria Centre of the GPAI Expert Community examines the growing AI data paradox and what could follow large-scale scraping practices. ๐๐ฒ๐ ๐ถ๐ป๐๐ถ๐ด๐ต๐๐ ๐ณ๐ฟ๐ผ๐บ ๐๐ต๐ฒ ๐ฉ๐๐๐๐จ๐๐ง ๐ถ๐ป๐ถ๐๐ถ๐ฎ๐๐ถ๐๐ฒ: ๐น AI still relies heavily on scraped public web data, but legal risk and declining quality are changing the landscape ๐น Data is not a single resource: it spans copyrighted content, personal data, trade secrets, public sector data and open data ๐น Sustainable AI requires moving from extraction to structured data-sharing transactions ๐น Three principles underpin responsible access to training data: legal compliance, trust and fairness ๐น Practical solutions include opt-out systems, attribution tools, privacy-enhancing technologies and new economic models for sharing ๐๐๐๐๐ฟ๐ฒ ๐๐ ๐ฐ๐ฎ๐ฝ๐ฎ๐ฏ๐ถ๐น๐ถ๐๐ ๐๐ถ๐น๐น ๐ฑ๐ฒ๐ฝ๐ฒ๐ป๐ฑ ๐ผ๐ป ๐๐ต๐ฒ ๐ฒ๐๐ต๐ถ๐ฐ๐, ๐๐ฐ๐ฎ๐น๐ฎ๐ฏ๐ถ๐น๐ถ๐๐, ๐ฎ๐ป๐ฑ ๐บ๐๐๐๐ฎ๐น ๐ฏ๐ฒ๐ป๐ฒ๐ณ๐ถ๐ ๐ผ๐ณ ๐ฑ๐ฎ๐๐ฎ ๐ฒ๐ฐ๐ผ๐๐๐๐๐ฒ๐บ๐. ๐ฅ๐ฒ๐ฎ๐ฑ ๐๐ต๐ฒ ๐ณ๐๐น๐น ๐ฏ๐น๐ผ๐ด ๐ฝ๐ผ๐๐ ๐ฎ๐ ๐๐ต๐ฒ ๐น๐ถ๐ป๐ธ ๐ถ๐ป ๐ฐ๐ผ๐บ๐บ๐ฒ๐ป๐๐ ๐ฏ๐ฒ๐น๐ผ๐. ๐ Marie Langรฉ Yann Dietrich Bertrand Monthubert Aurรฉlie Simard, Ph.D. Jean Constantin Christian Reimsbach Kounatze Clarisse Girot Sergi Gรกlvez Duran #TheAIWonk #AIData #DataGovernance #TrustworthyAI #GPAI #ArtificialIntelligence