Kevin Klein

Senior GenAI Engineer

Tel Aviv-Yafo, Tel Aviv District, Israel

About

Senior ML/GenAI Engineer with 7+ years designing, deploying, and scaling ML and LLM systems in regulated, high-impact environments. Most recently built a production clinical AI platform from scratch — RAG over a 500K-word knowledge base, tool-calling agents over live patient data (FHIR, labs, vitals), and multi-model serving — zero to hospital deployment in 5 months, used daily by 50+ clinicians. Also owned the full ML lifecycle for a transformer-based post-op cardiac risk predictor on a 20,000-patient cohort (recall >80%), enabling clinical intervention up to 6 hours earlier. Strong mathematical foundation — M.Sc. Applied Mathematics, ETH Zurich (Magna Cum Laude).

Experience

  • Senior AI Engineer at Logz.io
    Mar 2026 - Present · 4 mos

  • Machine Learning Engineer at x-cardiac GmbH
    Feb 2022 - Feb 2026 · 4 yrs 1 mo

    Clinical AI Platform (vLLM · RAG · Agentic AI · LLM Evals) • Built and deployed a production clinical LLM platform — RAG over a 500K-word clinical knowledge base, tool-calling agents over live patient data (FHIR, labs, vitals), and multi-model serving — zero to production in 5 months, supporting daily clinical decision-making for 50+ clinicians. • Designed eval framework covering tool-call accuracy, retrieval relevance, and clinical safety; chose RAG over fine-tuning to preserve source auditability and support weekly knowledge updates without retraining. • Achieved <300ms TTFT and 3,000+ tok/s across a multi-model stack (GPT-OSS 120B · Kimi 2.5 · Qwen Coder) on 8× H200 GPUs via vLLM — KV-cache tuning and continuous batching under concurrent clinical load. Post-Op Cardiac Risk Predictor (Transformer · PyTorch · Ray · MLflow) • Owned full ML development cycle (label design, architecture, distributed training, deployment) for a transformer predicting post-op complications on a 20,000-patient cohort; recall >80%, precision 60–70%, enabling intervention up to 6h earlier. • Parallelized distributed training across 8-GPU cluster via Ray (~10× wallclock speedup), enabling 50+ Bayesian hyperparameter trials within 24h; reproducibility enforced via MLflow + DVC.

  • Machine Learning Engineer at Charité
    Feb 2022 - Jan 2026 · 4 yrs

    Clinical AI Platform (vLLM · RAG · Agentic AI · LLM Evals) • Built and deployed a production clinical LLM platform — RAG over a 500K-word clinical knowledge base, tool-calling agents over live patient data (FHIR, labs, vitals), and multi-model serving — zero to production in 5 months, supporting daily clinical decision-making for 50+ clinicians. • Designed eval framework covering tool-call accuracy, retrieval relevance, and clinical safety; chose RAG over fine-tuning to preserve source auditability and support weekly knowledge updates without retraining. • Achieved <300ms TTFT and 3,000+ tok/s across a multi-model stack (GPT-OSS 120B · Kimi 2.5 · Qwen Coder) on 8× H200 GPUs via vLLM — KV-cache tuning and continuous batching under concurrent clinical load. Post-Op Cardiac Risk Predictor (Transformer · PyTorch · Ray · MLflow) • Owned full ML development cycle (label design, architecture, distributed training, deployment) for a transformer predicting post-op complications on a 20,000-patient cohort; recall >80%, precision 60–70%, enabling intervention up to 6h earlier. • Parallelized distributed training across 8-GPU cluster via Ray (~10× wallclock speedup), enabling 50+ Bayesian hyperparameter trials within 24h; reproducibility enforced via MLflow + DVC.

  • Machine Learning Engineer at 4flow
    Jun 2019 - Jun 2021 · 2 yrs 1 mo

    • Designed object detection model extracting structured data from logistics documents, reducing manual processing workload by 40%. • Led previously stalled NLP project to production managing 2 junior engineers: multi-class intent classifier routing 300 emails/day cut manual triage by 25%; RNN-based auto-fill eliminated data entry for ~50% of incoming tickets.

  • Consultant Risk Analytics at BDO Switzerland
    Oct 2016 - Jun 2018 · 1 yr 9 mos

    • Led the firm's first international advisory engagement advising the European Central Bank on a major bank's asset quality review; built a statistical loan selection algorithm cutting selection time by 50%.