Pickering, Ontario, Canada
Cosmos World Foundation Models Inference Time Scaling and Reasoning for Video Generation Diffusion LLMs Pre-Training, Post-Training, and Inference Time Scaling
• Using the Nvidia Megatron-LM library to distribute LLMs and large Transformer models over GPU clus- ters using data, model, and sequence parallelism • Working with multi-modal models such as Open-Sora and Latte, parallelizing their training over GPU clusters using the Microsoft Deepspeed library • Wrote Transformers and Convolutional Neural Networks using PyTorch, and parallelized such models over GPU clusters using CUDA, Unity/FlexFlow, and Fully-Sharded Data Parallel (FSDP), increasing throughput and reducing training time by over 35% • Presented and wrote technical reports for over 15 research papers including: Megatron, Llama, Open-Sora, Stable Diffusion, Imagen, and ViT • Using Astra-SIM, Scale-SIM and other Neural Network simulators to model computation and communication costs of training and inference on distributed environments • Used OpenMP, MPI, and Docker containers to simulate a multi-node environment for high-performance computing and AI workloads
- Conducted a literature review of image segmentation algorithms and prototyped unsupervised ML algorithms for segmenting contour maps - Implemented an import feature for automating non-horizontal soil surface construction in Settle3 - Developed a generic load importer for placing loads with varying elevations in Settle3 - Built a multiple load importer for automatically placing piles and computing load combinations in RSPile - Implemented a Eurocodes option for the interaction diagrams feature in RSPile