Pittsburgh, Pennsylvania, United States
Hi! I am shuaiwei who is currently pursuing a master degree in CMU. I specialize in the co-design of AI software and hardware, focusing on the deployment and optimization of large language models across the cloud-edge continuum. My expertise spans the entire AI lifecycle: 🎢 Performance Optimization: Proficient in CUDA/HIP programming and deep-level optimization for NVIDIA/AMD GPUs. 🪢 System Development: Developed Torch2Needle, a framework for seamless PyTorch-to-AMD migration, achieving substantial inference speedup. 🚅 Architecture & Training: Experienced in building Transformers from scratch, implementing mixed-precision training, and optimizing low-level kernels. 🤖 Edge AI: Specialized in lightweight/efficient AI model design, compression and deployment on embedded systems. Apart from these expertise. I am also an (by courtesy) Sci-fi writer and a huge love of Sci-fi !! I am planning a new full-length sci-fi novel and am set to officially launch its serialization later this year. So, just stay tuned 😎 I am currently seeking Summer 2026 internships in MLE/ML Infra. If your team is looking for a builder who lives at the intersection of low-level optimization for AI infrastructure, or Serving framework for Large Language Model, let’s connect!
Working on Research in Agentic AI and Reinforcement Learning
Fortunate to work with Prof. Sean and Dr. weihua to works on AI agent for kernel optimization
Fortunate to work under Prof. Zhihao Jia's Team and contribute to Mirage Persistent Kernel https://github.com/mirage-project/mirage
Supervisor: Dr. Chun Zhao Working on neuromorphic computing and its application on edge AI
Supervisor: Dr. Dokyun Lee Working on code representation learning on github repositories to discover the impact of big tech to the development of AI technologies over a decade.