Jan Hartman

Machine Learning Engineer at Sourcegraph

Slovenia

About

I'm a highly motivated and practical engineer who loves working on interesting and hard problems. My favorite area is machine learning in production. My work has mostly been in AI developer tooling (agents for code search) + adtech, building models that compute over a billion predictions per second and everything around them (feature preprocessing, training, data pipelines, etc.). I have a masters degree in Computer & Data Science.

Experience

  • Sourcegraph (Remote)
    • Senior Machine Learning Engineer
      Nov 2025 - Present · 8 mos

      Leading the charge in AI code understanding

    • Machine Learning Engineer
      Mar 2024 - Dec 2025 · 1 yr 10 mos

      Sourcegraph is a developer tooling company at the forefront of code search and agentic coding. My role is a full-stack machine learning engineer, focusing on the quality of search results and the underlying infrastructure (evaluations, ML inference). Some of the projects I've tackled: - agentic code search (Deep Research for code, https://sourcegraph.com/blog/introducing-deep-search) - context retrieval (keyword search, embeddings, reranking) - synthetic data generation for evaluating search - intent detection (what does the user want to accomplish with this query).

  • Machine Learning Engineer at Outbrain
    Aug 2019 - Mar 2024 · 4 yrs 8 mos

    Full-stack machine learning engineer in an R&D department, worked on an ad recommendation & bidding system that handles several million RPS with latency under 100 milliseconds. My experience ranged from core ML algorithms to data pipelines and A/B testing. I brought many projects from research to production and have a thorough, end-to-end understanding of ML flows. - I led the initiative for implementing ML algorithms for tasks like click prediction. I added support for TensorFlow in the serving stack, implemented SOTA deep learning recommendation models and scaled them to over a billion predictions per second. These models brought massive increases in revenue. - Implemented and maintained an online learning pipeline capable of continuously retraining models on terabytes of data daily and bootstrapping them from scratch on past data seamlessly. - Reimplemented a data pipeline for producing model training data from raw data, migrated from AWS (S3, EC2, EMR) to on-premise (HDFS, Spark, Airflow) and enabled huge cost savings. - Point of contact for high-performance, low-latency ML pipelines and tooling, mentoring colleagues in these areas. Continuously contributed to planning the long-term evolution of the systems and setting high standards for the quality of our work, including publishing and presenting it at conferences. - Designed and implemented an efficient exploration system that leverages model uncertainty to improve prediction models, bringing an increase in revenue. Technologies: TensorFlow, Golang, Python, Java, BigQuery, Airflow, AWS, Hadoop

  • Software Developer at XLAB
    Jul 2016 - Aug 2019 · 3 yrs 2 mos

    Part of a research department, tackled a variety of projects in different areas: - Neural networks: worked on a deep learning model compiler for hardware-specific optimization, implemented support for compiling TensorFlow models. - Cryptography: worked on European Commission research projects in the field of functional encryption and zero knowledge proofs. Translated library of cryptographic algorithms from Python to open-source Golang (https://github.com/fentec-project/gofe) and C (https://github.com/fentec-project/cifer) versions, added CI, benchmarks and integration tests. - Distributed systems: worked on a microservice-based backend for a developer platform, implemented microservices for proxying and task scheduling. Technologies: Python, Node.js, Golang, C/C++, Ansible, RabbitMQ, Docker Swarm