Post by Dmitry Sadchikov

Combining AI with Enterprise Software

Our team has put Nebius AI Cloud to a rigorous engineering test while working on a complex AI solution. An architect, who is also a domain expert, and an engineer collaborating with Claude Code/AWS Kiro have embarked on building a highly sought-after component: a document recognition subsystem. This initiative is part of the #NebiusServerlessChallenge, and we are designing the component to be public and reproducible from day one, allowing anyone to set it up from scratch on their own account. Core business requirements include: - Document recognition over plain HTTP, with no dependency on any specific backend, making it usable by any client. - Structured field extraction against a versioned schema (Blueprint), with a confidence score for each field. - Confidence-based routing: auto-accepted, needs review, or escalate to operator. - Support for multiple document types (passport, residence permit, ID card), with auto-classification and multi-page packet handling. - Full reproducibility and no PII: using a synthetic MIDV-2020 dataset only, with a public repository and a self-contained README. Key design decisions involve: - Utilizing Qwen2.5-VL-7B-Instruct served via vLLM on a single H100 SXM to avoid multi-GPU complexity, possibility to fine-tune the model in the future. - Implementing guided decoding (`guided_json`) to ensure the model emits schema-valid JSON, eliminating fragile parsing. - Computing confidence from token logprobs instead of relying on self-reported values, providing an honest uncertainty signal for routing decisions. - Storing Blueprint schemas and their version catalog in S3-like storage (NOS), allowing for the addition of new document types without needing to rebuild the container. - Adhering to Well-Architected Framework principles from the outset. The results have exceeded our expectations, and we will be sharing the full solution soon.