Nick Hill

Inference Engineering

San Jose, California, United States

About

Senior research engineer, technical leader and architect. I design and build enterprise Machine Learning infrastructure at scale, with a focus on performance, concurrency, resiliency, automation and consumability. Expertise spans LLM inferencing and distributed systems – Cloud, Linux, containers, Kubernetes, serverless, microservices, K/V stores, networking, PyTorch. Skilled in Python, Rust, Java, Go. Prolific open source contributor. More than 15 years’ experience leading development and performance teams. Consistent track record of major technical contributions.

Experience

  • Member of Technical Staff at Inferact
    Dec 2025 - Present · 8 mos

  • Senior Principal Software Engineer, AI Engineering at Red Hat
    Nov 2024 - Dec 2025 · 1 yr 2 mos

  • IBM (20 yrs 3 mos)
    • Inferencing STSM, AI Platform Engineering, IBM Research
      Jan 2022 - Nov 2024 · 2 yrs 11 mos

      • Built the back-end LLM inferencing platform that was released as the flagship watsonx.ai Gen-AI cloud service in July 2023, lauded by the CEO and head of Research as unprecedented in terms of Research time-to-market. • Deployed and supported highly successfully IBM-internal LLM inferencing service used by thousands of IBMers as a vehicle for other research and services. • Early significant contributor to the first and most prominent open source LLM inferencing engines - Hugging Face TGI and vLLM. • vLLM committer. Represent IBM in the vLLM community, supporting and coordinating contributions by others. • Drive collaboration across IBM Research in support of common goals - bridging efforts/workstreams and connecting the dots between different projects and teams. • Work as a conduit between Research and product teams in IBM and Red Hat; assisted with early Gen AI customer engagements. • Lead ongoing Gen AI inference engineering and optimization efforts - setting technical strategy, educating and enabling other team members.

    • Senior Software Engineer, IBM Watson AI Infrastructure
      2013 - Dec 2021 · 9 yrs

      Conceived, designed and implemented much of the infrastructure underpinning IBM’s most successful AI Cloud offerings including IBM Watson Assistant and Natural Language Classifier services. Lead and mentor development teams. Build and support distributed, scalable, resilient production systems and components/libraries used within them. Consistent top performer and regular award recipient. • Designed and implemented back end Kubernetes-based engine for orchestrating self-service ML model lifecycles used by Watson Assistant, Natural Language Classifier and Discovery services – co-ordinates model training, retraining and deployment in a fault-tolerant manner. • Designed and implemented general-purpose mesh-based model serving platform now central to most Watson AI cloud services – manages hundreds of thousands of production models. • Generalized components with abstractions to enable seamless migration between changing underlying platform services – container scheduler, data stores, etc. • Create and maintain various low-level libraries used across multiple Watson cloud products – including a service discovery and RPC framework, java etcd client and utilities.

    • Watson Performance/Runtime Technical Lead
      Sep 2012 - 2015 · 2 yrs 5 mos

      Taking IBM Watson from gameshow to product. Applied performance analysis and engineering to parallelize and shrink room-sized question-answering system down to a single server, taking everyone by surprise and driving a fundamental change in architectural direction. Created high-performance concurrent runtime library for Apache UIMA NLP pipelines, onto which all commercial Watson QA pipelines were based. Re-implemented the core Apache UIMA 2.x library internals with a new approach that became the basis for UIMA 3.0.

  • Extreme Blue Intern at IBM United Kingdom Ltd
    Jul 2003 - Sep 2003 · 3 mos

    One of eight students chosen nationally, working in a team of four on research-based software project: an autonomic knowledge-management system based on Semantic Web concepts. Filed three patents.

  • Pre-university co-op at IBM United Kingdom Ltd
    Aug 2000 - Jul 2001 · 1 yr

    Software Engineer in small team integrating Voice over IP into an existing telephony product. Personally took over coding of a significant proportion of the software (in C). Chosen for IBM student Bursary scheme. Returned for subsequent summer internship (Jul 2002 - Sep 2002) – made responsible for the design and development of new SIP (Session Initiation Protocol) functionality. Liaised with vendors while evaluating their protocol stacks and delivered a full solution with one of the offerings.