Faiqa Shabbir

AI Engineer | RAG Pipelines · LLM Agents · GPT-4 · GCP | Building AI that works in production, not just demos | Open to Data Scientist roles

Mandi Bahauddin, Punjab, Pakistan

About

I've spent the last 5 years building AI systems that work when real users hit them, not just when everything goes right in a notebook. Most AI projects fail between the prototype and production. I close that gap. My focus is LLM applications: RAG pipelines, AI agents, document intelligence systems, and the evaluation frameworks that make them reliable. I work with GPT-4, LLaMA, LangChain, LlamaIndex, FAISS, Neo4j, and FastAPI, not to flex a tech stack, but because I've found they actually solve the problems I care about. Here's what I've shipped: → RAG-based document intelligence system that improved insight extraction accuracy by ~30% using GPT-4 + vector search over financial and research reports → Long-horizon GUI agent training framework at Turing: designed 850+ complex trajectories (100–200 actions each) to train agents for real software navigation → Parent-Child evaluation framework with 2,500+ scenarios to improve model safety, reduce reasoning failures, and catch edge cases before they hit users → QA and validation protocol that reduced engineering rework by 40% and improved team throughput by 15% → Computer vision systems for construction safety (YOLO-based) achieving 15–25% accuracy improvements across detection tasks I've worked across NLP, Computer Vision, and full-stack, but LLM systems and AI evaluation is where I do my best work. I care about AI that's measurable, testable, and honest about its failure modes. If you're building something that needs to actually work, not just impress in a demo, we'll probably get along. Currently open to Data Scientist or AI Engineer roles with a focus on LLM applications, RAG, personalization, and AI automation.

Experience

  • Software Engineer at A.Team
    Mar 2026 - Present · 4 mos

    - Contributing to Terminal Bench 2.0, a premier agentic coding benchmark referenced by frontier AI labs including Anthropic for Claude 4.5. - Designing realistic DevOps incident scenarios in air-gapped Kubernetes (k3s) environments with no internet access, involving Gitea CI/CD and Vault authentication. - Implementing Python-based functional graders and Docker evaluation environments to validate autonomous agent reasoning across 60+ microservices pods. - Hardening AI-native infrastructure for the Nebula platform to support large-scale task execution and model training.

  • Turing (Remote)
    • LLM - Backend Engineer
      Dec 2025 - Mar 2026 · 4 mos

      - Designed 850+ long-horizon agent training trajectories (100–200 actions each) to teach GUI-based AI agents complex software navigation across Windows, macOS, and Ubuntu, directly feeding model fine-tuning pipelines. - Built a 2,500-scenario Parent-Child evaluation framework covering Critical Mistakes, Side Effects, and Misunderstandings, improving model robustness and reducing safety-related failures. - Introduced a structured QA protocol (S.T.R.U.C.T.) that achieved 100% event-logging accuracy and cut engineering rework by 40%, improving weekly team throughput by 15%.

    • AI Engineer
      Feb 2025 - Dec 2025 · 11 mos

      - Built Creds Creator & Proposal Producer, an end-to-end AI automation suite using GPT-4 + Google Slides API to auto-generate compliant consulting proposals, including a Validator agent that enforces tone, formatting, and quality control before output delivery. - Developed Sales Knowledge Center, a real-time intelligence pipeline that aggregates company data and live web search results to surface executive-level business insights on demand, reducing manual research time significantly. - Engineered Advisor Insight at Scale, a production RAG platform using LangChain + FAISS supporting multi-format document ingestion (PDF, DOCX) and delivering context-aware financial Q&A with high retrieval accuracy.

    • Python Engineer
      Aug 2024 - Feb 2025 · 7 mos

      - Translated a full Swift codebase to Python while adapting logic for modern LLM-integrated AI workflows, maintaining functional accuracy across 100% of original test cases. - Evaluated multi-modal AI model outputs and provided structured performance feedback to identify best-performing architectures, directly influencing model selection decisions. - Integrated LLMs into real-time NLP applications, reducing inference latency and improving response coherence through prompt optimization and pipeline refactoring.

  • Data Scientist at Fehmida AI
    Oct 2024 - Aug 2025 · 11 mos

    - Built RAG-based summarization pipelines using LangChain + vector databases to extract targeted insights from financial, sustainability, and research reports, reducing manual review time significantly. - Designed and maintained end-to-end data engineering pipelines that automated cleaning and preprocessing, ensuring model-ready data at every stage of the AI workflow. - Developed topic modeling features to personalize the website experience, aligning AI behavior with real user needs through close stakeholder collaboration.

  • ML Engineer (Computer Vision & NLP) at UOG
    Jan 2023 - Aug 2024 · 1 yr 8 mos

    - Built real-time virtual try-on web apps by integrating Hugging Face vision models with a React frontend, enabling interactive, production-ready user experiences directly in the browser. - Automated document extraction and compliance workflows using YOLOv8 + Tesseract OCR + NER pipelines, achieving 95% OCR accuracy and 90% NER precision, replacing manual review processes end-to-end. - Delivered construction site analytics using YOLOv5 object detection + OpenCV depth estimation, improving monitoring accuracy and generating structured reporting for safety compliance teams. - Optimized Flask microservices architecture, cutting API response time by 40% and significantly improving reliability under production load. - Deployed full systems on AWS using Docker containers and GitHub Actions CI/CD, enabling seamless iteration from dev to production across Python, FastAPI, React, and PostgreSQL stacks.

  • Client Engagement Lead at Freelancer
    Jan 2021 - Dec 2022 · 2 yrs

    - Designed and shipped multi-tenant Learning Management Systems (LMS) and food ordering platforms from scratch, with secure role-based access controls, optimized backend logic, and scalable REST APIs. - Translated client requirements into high-fidelity Figma prototypes and production UIs, shortening the feedback loop between design and development and improving client-reported satisfaction on every project. - Built backend services with Python, FastAPI, Flask, and PostgreSQL, architected for modularity and performance, supporting multiple concurrent client deployments without rework. - Deployed all projects on AWS with Docker + CI/CD pipelines (GitHub Actions), enabling rapid iteration and zero-downtime updates across active client platforms. - Operated using Agile/XP practices across the full client lifecycle, from requirements gathering to delivery, consistently meeting deadlines and maintaining quality through rapid iteration and continuous feedback.