Post by Abdullah Nadeem
Werkstudent – Digital Office / Automation @ Siemens Energy | Computer Engineering @ TU Berlin
I recently had the incredible opportunity to work on the DAPHNE project, a €6.6M EU Horizon 2020 funded system designed for large-scale data management, HPC, and Machine Learning! 🚀 The Challenge: Modern ML relies heavily on sparse matrices, but predicting their memory footprint during complex compiler execution plans is a major bottleneck. Existing estimators often struggle with accuracy, leading to massive memory overhead or poor optimization decisions. What We Built: Our team integrated the Matrix Non-zero Count (MNC) sketch directly into the Daphne compiler. Our C++ architecture was inspired by the seminal SIGMOD ’19 research by Sommer, Boehm, Evfimievski, Reinwald and Haas. (IBM Research / TU Graz). A few technical highlights from our implementation: 🔹 Compiler Integration: Engineered dynamic sparsity inference during the compiling phase utilizing the InferencePass. 🔹 Sketch Propagation: Implemented probabilistic rounding to accurately predict the output structure of deep matrix product chains. 🔹 Extended Operations: We realized sparsity estimations for Transpose, and Element-wise addition/multiplication workloads often ignored in prior research. The Impact: We benchmarked our implementation against 19 real-world datasets from the Suite Sparse Matrix Collection in a Dockerized environment. Our implementation successfully achieved the strict final ratio errors (~1.0) established in the original research paper. By yielding these consistently lower errors, our solution allows the system to refine thresholds and significantly reduce its memory footprint. 💻 Check out our implementation here (our core C++ contributions are in the src, scripts, and test directories): https://lnkd.in/dz5G8SEN Team & Acknowledgements: I’m incredibly grateful to have collaborated with Duong Tran and Khaled Guesmi. Their technical expertise and dedication were instrumental in navigating the Daphne codebase and ensuring our implementation met the project’s high standards. 🙌 I also want to extend my deep gratitude to Professor Patrick Damme at Technische Universität Berlin for his phenomenal guidance and mentorship throughout this project. #MachineLearning #DataEngineering #Cplusplus #Compilers #Optimization #SoftwareEngineering