Delhi, India
AI/ML Engineer with 26+ months of experience building scalable systems and Applied AI solutions. Proficient in GenAIworkflows, RAG, and cloud infrastructure (AWS). Experienced in full-stack development, large-scale data pipelines,and model optimization. Recognized on Kaggle with top 10% global rankings for work in LLM retrieval and PIIdetection.
• Automation of KYC process for 75+ documents
• Achieved 76.3% precision and 81.2% recall by building a document parser using PaddleOCR, DBSCAN, and UMAP. • Delivered 91.4% intra-cluster cohesion and 90.8% inter-cluster separation by building an error catego- rization pipeline using TF-IDF, metadata features, and HDBSCAN. • Reduced on-premise storage costs by over $1M annually by porting 1.2 PB of data to AWS S3 and optimizing retrieval pipelines • Built a reasoning pipeline for outlier alerting using Langchain, web scraping, and a pretrained LLM API, achieving 95% retrieval coverage and 88% reasoning accuracy, significantly reducing false positives.
• Designed an alerting system that identifies outliers using scheduled SQL queries, flagging anomalies in production data and notifying relevant teams. • Mapped detected outliers to the correct stakeholders using ownership metadata and triggered alerts through internal notification systems.
• Achieved over 84% accuracy in detecting Regions of Interest using PaddleOCR and DistilBERT. • Led a team of 6 to research Region of Interest detection in sports footage. • Curated a publicly available dataset for model training by extracting data through web scraping, released under the Apache 2.0 license.