San Francisco Bay Area
I work as a research scientist at Mistral AI. My professional background is in applying machine learning to solve real world problems. Prior to this, I was at Databricks, during the tail end of which I was a part of the Mosaic AI team. My work can be seen at: https://github.com/AviSoori1x https://github.com/avisoori-databricks https://www.databricks.com/blog/author/avinash-sooriyarachchi https://avi-soori.medium.com/ https://avisoori1x.github.ioI
Now primarily focused on planning and reasoning for robotics Worked on post-training - Alignment - multi-turn instruction following.
I led the US Applied Science team at Mistral AI before I moved to research full time. Before that, I co-created the Applied AI/FDE team at Mistral which has now scaled to over 300 engineers and functioned as the US Applied AI lead. We work with Mistral AI's research team and strategic customers to train foundation models to solve specific business problems. The projects we engage in span pretraining, post training, modality extension and quantization/ pruning for efficient deployment. My current work includes: Mid-training and post-training of a large language model for creative tasks Adding modalities to text only or bi-modal foundation models i.e. omnimodal models Multimodal post-training for specific industries I also work on improving our open source libraries and internal codebases for fine-tuning and certain product features. - Particularly, I've worked on incorporating vision language model fine-tuning and improved training metrics to the mistral finetuning codebase which powers the La Plateforme finetuning API and Mistral offerings in AWS Bedrock, Azure AI and GCP Vertex AI. - Internal tooling in Applied AI Engineering for synthetic data generation, task specific distillation and finetuning for agentic applications. Joined as one of the first few US hires.
As the Generative AI Solutions Architect overlay for digital native and emerging verticals, I lead support and expertise for our customer-facing field team, supporting AI related POCs (primarily with large language models and multi-modal models) and using open-source and third-party models to address customer needs. I represent our product at conferences, webinars, and through blog posts, engaging with customers to convey our vision and gather feedback. While I don't have direct reports, I lead a team of specialists to implement Generative AI use cases on Databricks.
Engineering work: Member of the MLFlow OSS team. Worked with the Notebook AI team in finetuning 1B decoder only LLM for code completion using parameter efficient methods. Field Engineering: Currently leading a number of Generative AI/ Large Language Model based projects across multiple teams at retail and CPG organizations. This also includes work related to leveraging multi-modal models for several high priority enterprise applications. I also work with ISV partners in integrating Databricks Generative AI capabilities into their products and solutions. Some areas include scaling machine learning inference and training workloads. Experienced with TensorRT, vLLM, ONNX, Quantization techniques and CUDA (to a very limited extent). In addition to this, I have functioned as an interviewer and mentor to a few dozen Solutions Architects in growing and up-skilling the technical field such that they are sufficiently prepared for aggressive growth targets.
I function as a machine learning and data science specialist within the Databricks Field Engineering organization. My role includes: Working with data and ML teams within organizations to build systems that leverage ML to solve business problems. Reviewing architecture of current ML solutions at large organizations and optimizing performance, scalability and reliability. Building end-to-end solutions that demonstrate the ML capabilities of the Databricks Lakehouse platform using open source libraries and frameworks such as Pytorch, Hugging Face Transformers, Keras/ Tensorflow and Apache Spark. I regularly blog about these and present at conferences such as GitHub Universe. Collaborate with fellow Solutions Architects, Engineers and Product specialists in building Deep Learning adoption strategy among Databricks Customers. My particular interests include Transformer based ML systems for similarity search and recommenders, in-stream machine learning and scaling model training, hyperparameter tuning, and inference with distributed compute.
I worked as a machine learning specialist architect, working with organizations belonging to multiple verticals. These include Financial Services, Healthcare and Retail industries. My specific role involved building systems that showcased SAS AI and ML capabilities combining SAS APIs, open source web and machine learning frameworks, and cloud services. Some of my project work include: Working with Microsoft Azure technologies to build machine learning based systems that leverage the Azure FHIR APIs. This was built to showcase SAS Machine Learning capabilities in the context of healthcare interoperability. Collaborated directly with Microsoft on SAS - Microsoft integration on Healthcare interoperability. Building Retail and Financial Services oriented applications using a python based stack, Docker and SAS Viya APIs to build end user facing predictive applications that also facilitate explainable AI. Demonstrated these applications to multiple customers to expedite software sales cycles. These include recommender systems, ML systems that integrate with cloud services and fraud detection applications. Creating custom interfaces to a multitude of SAS text analytics APIs and analytical routines using React js + FastAPI to democratize NLP among analytically savvy domain experts. Awards: 2021 Q2 SE of the Quarter 2020 Technical Innovation Excellence Award 2019 Customer Advisory Rookie of the Year 2019 Q3 Rookie of the Quarter
My main responsibility was to provide both strategic and technical guidance to mid-market clients (as well as Fortune 500 enterprise accounts in a machine learning specialist capacity) in solving business problems utilizing Artificial Intelligence (AI) and Machine Learning (ML) while ensuring the highest levels of customer satisfaction. I extensively used Python, Javascript, HTML, CSS and associated frameworks and libraries such as Pandas, Numpy, Bokeh, Flask, Bootstrap, Chart.JS, scikit-learn and SQLAlchemy alongside SAS technologies (In memory-computing and REST APIs) for developing PoCs, customer-specific applications and demo assets that leverage Machine Learning to solve business problems. These projects spanned multiple verticals such as Financial Services, Healthcare, Retail and Manufacturing. I also functioned as the pre-sales technical lead on a new product development initiative in the Mid-Market Business Unit with a focus on creating a low-footprint automated machine learning platform that allows developers to integrate ML functionality to consumer facing applications.
Developed machine learning based SaaS platform Fora™(http://forametrics.com/) to enable marketing and finance professionals to mine social media data. Selected to Cohort V of Iowa State University Startup Factory. This is a NSF funded startup accelerator program where access to resources and training are provided to commercialize technology based products. Conducted over 100 customer discovery interviews. Extensively used Natural Language Processing(NLP) to develop a search based AI analytics tool. Used Rasa, SpaCy, NLTK and Gensim for intent recognition, entity extraction, dialog modeling and topic modeling tasks. Effectively deployed machine learning models to production in a scalable manner and used agile methodologies in iteratively improving the product based on customer feedback. Primarily used AWS and Docker for deployment. Extensive development experience using the PyData stack i.e. Numpy, Pandas, Scikit-learn and Bokeh, NLTK, Tensorflow (Primarily for NLP applications using Recurrent Neural Nets(RNNs)) and Flask. Follow software development best practices (version control using Git, agile development, A/B testing) and lean startup principles in managing resources and understanding the customer.
MEAM 302: Fluid Mechanics Lead recitation sections, Computational Fluid Dynamics training and grading for a class of 77 undergraduates.
Programming 3D printing software using Python. Analysis and modeling of data from mechanical tests to characterize material properties, using Python and MATLAB. Robotic assembly of materials, fabrication of soft material architectures and mathematical modeling of rotational meta-material geometries.