Tel Aviv-Yafo, Tel Aviv District, Israel
Senior ML/GenAI Engineer with 7+ years designing, deploying, and scaling ML and LLM systems in regulated, high-impact environments. Most recently built a production clinical AI platform from scratch — RAG over a 500K-word knowledge base, tool-calling agents over live patient data (FHIR, labs, vitals), and multi-model serving — zero to hospital deployment in 5 months, used daily by 50+ clinicians. Also owned the full ML lifecycle for a transformer-based post-op cardiac risk predictor on a 20,000-patient cohort (recall >80%), enabling clinical intervention up to 6 hours earlier. Strong mathematical foundation — M.Sc. Applied Mathematics, ETH Zurich (Magna Cum Laude).
Clinical AI Platform (vLLM · RAG · Agentic AI · LLM Evals) • Built and deployed a production clinical LLM platform — RAG over a 500K-word clinical knowledge base, tool-calling agents over live patient data (FHIR, labs, vitals), and multi-model serving — zero to production in 5 months, supporting daily clinical decision-making for 50+ clinicians. • Designed eval framework covering tool-call accuracy, retrieval relevance, and clinical safety; chose RAG over fine-tuning to preserve source auditability and support weekly knowledge updates without retraining. • Achieved <300ms TTFT and 3,000+ tok/s across a multi-model stack (GPT-OSS 120B · Kimi 2.5 · Qwen Coder) on 8× H200 GPUs via vLLM — KV-cache tuning and continuous batching under concurrent clinical load. Post-Op Cardiac Risk Predictor (Transformer · PyTorch · Ray · MLflow) • Owned full ML development cycle (label design, architecture, distributed training, deployment) for a transformer predicting post-op complications on a 20,000-patient cohort; recall >80%, precision 60–70%, enabling intervention up to 6h earlier. • Parallelized distributed training across 8-GPU cluster via Ray (~10× wallclock speedup), enabling 50+ Bayesian hyperparameter trials within 24h; reproducibility enforced via MLflow + DVC.
Clinical AI Platform (vLLM · RAG · Agentic AI · LLM Evals) • Built and deployed a production clinical LLM platform — RAG over a 500K-word clinical knowledge base, tool-calling agents over live patient data (FHIR, labs, vitals), and multi-model serving — zero to production in 5 months, supporting daily clinical decision-making for 50+ clinicians. • Designed eval framework covering tool-call accuracy, retrieval relevance, and clinical safety; chose RAG over fine-tuning to preserve source auditability and support weekly knowledge updates without retraining. • Achieved <300ms TTFT and 3,000+ tok/s across a multi-model stack (GPT-OSS 120B · Kimi 2.5 · Qwen Coder) on 8× H200 GPUs via vLLM — KV-cache tuning and continuous batching under concurrent clinical load. Post-Op Cardiac Risk Predictor (Transformer · PyTorch · Ray · MLflow) • Owned full ML development cycle (label design, architecture, distributed training, deployment) for a transformer predicting post-op complications on a 20,000-patient cohort; recall >80%, precision 60–70%, enabling intervention up to 6h earlier. • Parallelized distributed training across 8-GPU cluster via Ray (~10× wallclock speedup), enabling 50+ Bayesian hyperparameter trials within 24h; reproducibility enforced via MLflow + DVC.
• Designed object detection model extracting structured data from logistics documents, reducing manual processing workload by 40%. • Led previously stalled NLP project to production managing 2 junior engineers: multi-class intent classifier routing 300 emails/day cut manual triage by 25%; RNN-based auto-fill eliminated data entry for ~50% of incoming tickets.
• Led the firm's first international advisory engagement advising the European Central Bank on a major bank's asset quality review; built a statistical loan selection algorithm cutting selection time by 50%.