Jan Hartman

Machine Learning Engineer at Sourcegraph

Slovenia

About

I'm a highly motivated and practical engineer who loves working on interesting and hard problems. My favorite area is machine learning in production. My work has mostly been in AI developer tooling (agents for code search) + adtech, building models that compute over a billion predictions per second and everything around them (feature preprocessing, training, data pipelines, etc.). I have a masters degree in Computer & Data Science.

Experience

Sourcegraph (Remote)
- Senior Machine Learning Engineer
  Nov 2025 - Present · 8 mos
  Leading the charge in AI code understanding
- Machine Learning Engineer
  Mar 2024 - Dec 2025 · 1 yr 10 mos
  Sourcegraph is a developer tooling company at the forefront of code search and agentic coding. My role is a full-stack machine learning engineer, focusing on the quality of search results and the underlying infrastructure (evaluations, ML inference). Some of the projects I've tackled: - agentic code search (Deep Research for code, https://sourcegraph.com/blog/introducing-deep-search) - context retrieval (keyword search, embeddings, reranking) - synthetic data generation for evaluating search - intent detection (what does the user want to accomplish with this query).
Machine Learning Engineer at Outbrain
Aug 2019 - Mar 2024 · 4 yrs 8 mos
Full-stack machine learning engineer in an R&D department, worked on an ad recommendation & bidding system that handles several million RPS with latency under 100 milliseconds. My experience ranged from core ML algorithms to data pipelines and A/B testing. I brought many projects from research to production and have a thorough, end-to-end understanding of ML flows. - I led the initiative for implementing ML algorithms for tasks like click prediction. I added support for TensorFlow in the serving stack, implemented SOTA deep learning recommendation models and scaled them to over a billion predictions per second. These models brought massive increases in revenue. - Implemented and maintained an online learning pipeline capable of continuously retraining models on terabytes of data daily and bootstrapping them from scratch on past data seamlessly. - Reimplemented a data pipeline for producing model training data from raw data, migrated from AWS (S3, EC2, EMR) to on-premise (HDFS, Spark, Airflow) and enabled huge cost savings. - Point of contact for high-performance, low-latency ML pipelines and tooling, mentoring colleagues in these areas. Continuously contributed to planning the long-term evolution of the systems and setting high standards for the quality of our work, including publishing and presenting it at conferences. - Designed and implemented an efficient exploration system that leverages model uncertainty to improve prediction models, bringing an increase in revenue. Technologies: TensorFlow, Golang, Python, Java, BigQuery, Airflow, AWS, Hadoop
Software Developer at XLAB
Jul 2016 - Aug 2019 · 3 yrs 2 mos
Part of a research department, tackled a variety of projects in different areas: - Neural networks: worked on a deep learning model compiler for hardware-specific optimization, implemented support for compiling TensorFlow models. - Cryptography: worked on European Commission research projects in the field of functional encryption and zero knowledge proofs. Translated library of cryptographic algorithms from Python to open-source Golang (https://github.com/fentec-project/gofe) and C (https://github.com/fentec-project/cifer) versions, added CI, benchmarks and integration tests. - Distributed systems: worked on a microservice-based backend for a developer platform, implemented microservices for proxying and task scheduling. Technologies: Python, Node.js, Golang, C/C++, Ansible, RabbitMQ, Docker Swarm