Zurich, Zurich, Switzerland
I am working as Deep Learning Engineer at NVIDIA on research and engineering problems related to LLM efficiency optimization. I've done a PhD in Computer Science at ETH Zurich, specializing in hierarchical document parsing, and have a background in Machine Learning, Computer Vision, LLMs and Medical AI. I'm passionate about leveraging machine learning to tackle real-world challenges and contributing my technical and interpersonal expertise to drive advancements at the intersection of technology and industry.
Research and engineering for LLM and VLM efficiency optimization.
- Developed end-to-end ML systems for parsing of document renderings (e.g. PDF files, scans) that achieved SOTA performance by leveraging object detection methods, weak supervision, and novel large-scale datasets. [Python, TensorFlow, PyTorch]. - Developed multi-modal transformer models to enable full end-to-end text recognition from document images [PyTorch]. - Managed industry collaborations and supervised students in ML research projects, and served as teaching assistant at ETH. - PhD thesis: "Building end-to-end Systems for Hierarchical Document Parsing and OCR".
- Contributed LLM decoder-based neural network for Optical Character Recognition (OCR) to open source library [PyTorch]. - Created large-scale dataset of rendered documents for training and evaluation of LLM-based document OCR systems [Python]. - Developed system for joint, end-to-end layout recognition and OCR on document images by leveraging a multi-modal transformer architecture for processing both images and text [PyTorch].
- Demonstrated how to extract information from Wikipedia articles with limited manual labels by leveraging data programming and document structure with Fonduer.
- Development of a live visualization for novel tactile sensor technology.