Muhammed Emin Ozturk

Member of Technical Staff Engineer @AMD - HPC Researcher -Phd Candidate @University of Utah

Santa Clara, California, United States

About

I am a Ph.D. student in Computer Science from the University of Utah, where I also work as a research assistant under Prof. Sadayappan's supervision. My research focuses on developing and optimizing high-performance computing (HPC) and artificial intelligence (AI) kernels for various platforms and frameworks, such as Kokkos, SYCL, and Cerebras CS-2 AI Accelerator. I have over five years of experience in conducting cutting-edge research and collaborating with leading academic and industry partners, such as IBM Research, Intel AI, Berkeley Lab, AMD Research , and Ohio State University. I have published and presented my work at prestigious conferences, and have received multiple honors and scholarships for my academic excellence and achievements. I am passionate about solving complex and challenging problems in HPC and AI domains and contributing to the advancement of scientific discovery and innovation.

Experience

  • AMD (2 yrs 11 mos)
    • Member of Technical Staff - ML Framework
      Aug 2024 - Present · 1 yr 11 mos

      Developing Hip kernels for ML libraries (MIOpen, CK) on ROCm platform.

    • Ph.D. Researcher - AMD Research
      May 2024 - Aug 2024 · 4 mos

      I dive deep into exploring novel hardware/software co-designed techniques. The primary goal is to enhance GPU performance and power efficiency, focusing on key computational kernels within two cutting-edge areas: Large Language Models (LLMs) and High-Performance Computing (HPC).

    • Co-Op SWD Engineer - ML Framework
      Aug 2023 - May 2024 · 10 mos

      As a Co-Op Engineer intern, working on designing and optimizing hip Kernels for the MIOpen framework and doing research on Machine Learning runtime to improve efficiency and reliability.

  • Research Affiliate at Berkeley Lab
    May 2024 - Present · 2 yrs 2 mos

    Contributing to the research regarding the optimization of tensor contractions Lattice QCD application by utilizing tensor contractions tree scheduling methodology for distributing tasks across multiple GPUs, under the supervision of Aydin Buluc and Oguz Selvitopi at Passion LAB.

  • Research Assistant at University of Utah
    Aug 2020 - Present · 5 yrs 11 mos

    • Project -1 ”Tensor Contraction Kernel Development on GPUs for Kokkos and SYCL Framework The research is developing an optimized KokkosTensor API that supports tensor transpose and tensor contractions, as well as optimization of tensor expressions involving tensor contraction and other element-wise tensor operators. I am mainly responsible for loop-based tensor contraction implementation development on GPUs by Cuda/HIP and have been working on designing an effective tensor contraction kernel on Kokkos including architecture-awared tuned implementation (e.g NVIDIA,AMD and Intel accelerators) • Project -2 ”ML Kernel Development for CS-2 Cerebras AI Accelerator” My current focus is implementing ML kernel for CS-2 (Cerebras) architecture by using their private SDK and data-flow language called CSL. This specific programming language, CSL, enables us to design data flow programming across PEs in the CS-2 accelerator. CS-2 is a 2D mesh-based AI accelerator consisting of 850,000 PEs ( Compute Units). I am targeting to develop a specific kernel for Transformer Model in NLP on this CS-2 accelerator, which minimize data movement across the device and host. • Project -3 ”Compressing Transformers with Tensor Factorizations” On this project, we are seeking answer to question ”Can we effectively compress transformers with tensorized components?, how to chose good or the best tensor factorization without sacrificing accuracy on Transformer Model ? We aim to develop new compressed decomposed Transformer model by several methods ( Tensor Networks, Tensor Train Decomposition etc.)

  • Research Assistant Intern at Berkeley Lab
    May 2023 - Aug 2023 · 4 mos

    Working on optimization of tensor contractions for nuclear physics application by utilizing distributed tensor contractions across multiple GPUs, using partitioning under the supervision of Prof.Aydin Buluc at Passion LAB. ( https://passion.lbl.gov )

  • Deep Learning Engineer Intern at Intel Corporation
    May 2020 - Aug 2020 · 4 mos

    I was working on the BERT model for the Intel AI TensorFlow team under the supervision of Wei Wang. Our work has been presented and published at SC20 as Research Poster. http://sc20.supercomputing.org/proceedings/tech_poster/poster_files/rpost111s2-file3.pdf https://sc20.supercomputing.org/proceedings/tech_poster/tech_poster_pages/rpost111.html