Seattle, Washington, United States
Still GenAI, and still primarily evaluation!
Develop and support systems for foundation LLM and LLM-based agent evaluation, including the design and implementation of: • A metrics framework to assess LLM agent performance (Python). • End-to-end evaluation system telemetry (Java, RDS/MySQL). • Standardized mechanisms for handling and evaluating multimodal foundation model output (Python). • A system for hosting small-to-medium “judge” models used to evaluate LLM output (Python).
Develop and support the Alexa Model Evaluation Service (MES), a large distributed system responsible for performing batch inference on Alexa models as part of a pre-release accuracy gating process. Responsibilities included: • Design and implementation of service integrations enabling on-the-fly evaluation of new model types (Java, Python, AWS CDK) • Design and implementation of an error propagation framework to enhance self-service evaluation debugging and and minimize service code duplication and verbosity (Java) • Migration of MES’s batch inference executor system to containerized deployments to reduce evaluation cost, latency, and operational overhead (Docker, ECS, AWS CDK) • Design and implementation of improvements to MES’s UI to support new backend functionality and improve usability (JavaScript) • Design and implementation of extensions to peripheral services (ex. model packaging) as required to support new evaluation functionality (Java, AWS CDK) • Refactoring, debugging, and support for existing Alexa model evaluation systems (Java, TypeScript, Python) • Review of code submissions and design proposals from teammates and stakeholder teams
• Facilitated weekly discussion sessions and office hours • Provided timely explanations, feedback, and responses to student questions • Aided students in understanding course material and debugging project/lab code (C, RISC-V assembly)
• Led the establishment of a community on Duke’s East Campus, facilitating an environment promoting inclusive intellectual exchange and learning among residents • Provided mentorship and leadership in a building of first-year residents • Ensured safety by participating in the on-call duty rotation, responding to crisis situations, and managing conflict resolution within the community • Effectively managed a constrained budget to program academic engagement, health & wellness, community, and awareness events
Contributed to the development and testing of the Alexa Model Evaluation Service, including: • Implementation of a highly parallelized service traffic-shadowing test suite (Java) to improve integration test coverage • Design and implementation of a request sampling API (TypeScript, AWS Lambda, AWS VPC, MySQL) to support the shadowing of service traffic, as well as an accompanying integration test suite • Extension/refactoring of existing service code to support the additional testing functionality