New York, New York, United States
Amazon Ads
Worked at Amazon AGI Multimodal Information Retrieval Multimodal LLM based embeddings LLM based evaluation systems Human Evals
•Developed a Background Verification Tool, scraping 65 million court records from the National Judicial Data Grid with Selenium, and securely storing them in AWS’s S3 bucket. • Managed Databricks clusters using PySpark for efficient storage and rapid retrieval of court records. •Significantly reduced retrieval time through smart NLP techniques, enhancing background verification efficiency. • Expertise in scraper development, data synthesis, SQL and AWS Python SDK for data management
• Developed a Post-OCR model for the ’AksharAnveshini’ project, effectively correcting errors in OCRed Sanskrit texts. • Experimented with various sequence-to-sequence machine learning models,including traditional RNN-based encoder-decoder and transformer-based text-to-text generation models, showcasing strong machine learning skills. • Proposed and implemented a novel encoding technique and byte-level tokenization method,surpassing performance benchmarks set by baseline models. • Contributed to research findings that got accepted at EMNLP (Empirical Methods in Natural Language Processing), highlighting the impact of our work in the field. • Collaborated closely with a renowned professor and two post-doctoral researchers, gaining valuable experience in project management and research methodologies. • Played a crucial role in digitizing Sanskrit books and improving the accuracy of OCR models