Seattle, Washington, United States
Hello! I am Charles Tran, an AI/ML Engineer with experience working with scalable, high-performance systems.
▪ Engaged with a team of 15 engineers to develop scalable machine learning inference software, incorporating REST API backend, Kubernetes auto-scaling, and real-time monitoring to streamline model deployment ▪ Collaborated with NVIDIA to integrate AI Enterprise and Neural Inference Microservices, enabling seamless deployment of high-performance, enterprise-grade generative AI applications. ▪ Enhanced inference runtime to enable using vLLM format which increased model support ▪ Added priority classes support in Kubernetes orchestration, optimizing resource allocation and enhancing model inference reliability for time-sensitive workloads ▪ Implemented automated resource validation for S3, Google Cloud, Docker Hub, and Hugging Face integrations, reducing model deployment failure rates ▪ Enabled cross-project resource referencing
▪Integrated Google Cloud Platform with DeterminedAI which increased parallel training speed ▪Optimized a computer vision model with Keras framework to use half the number of GPUs to train ▪Embedded singularity and enroot cache on machine image which improved experiment runtime by 8 minutes ▪Reduced cost of running cloud machines by optimizing deletion routine ▪Configured Ansible and Terraform framework to build GCP images that support SLURM/PBS/Enroot resource manager
▪ Designed CircleCI integration tests to facilitate CI/CD development ▪ Assisted the implementation of SLURM workload management with DeterminedAI, a Deep Learning Training Platform that does distributed training on models over a cluster ▪ Developed compatibility tests for CUDA and CPU based clusters for distributed training ▪ Collaborated in an agile environment in a team of 12 other software developers as part of HPE Cray