Shaik Gouse Pasha

AI Specialist | Shipping Production-Grade Agents | Multilingual Voice AI | RAG & Agentic Workflows | LLM Optimization | Enterprise Conversational AI — Voice, WhatsApp, Multilingual | On-Premise LLMs | MENA • Global

Hyderabad, Telangana, India

About

We architect high-performance AI infrastructure, specializing in ASR/TTS model training, Sovereign LLMs, and Multi-Agent Orchestration. As an ML researcher, I solve AI's hardest bottlenecks: latency, speech accuracy, and reasoning constraints. While the market relies on API wrappers, I build the underlying infrastructure. 🚀 𝗧𝗛𝗘 𝗖𝗢𝗥𝗘 𝗣𝗥𝗢𝗕𝗟𝗘𝗠𝗦 𝗜 𝗦𝗢𝗟𝗩𝗘 • 𝗥𝗲𝗮𝗹-𝗧𝗶𝗺𝗲 𝗩𝗼𝗶𝗰𝗲 𝗜𝗻𝘁𝗲𝗹𝗹𝗶𝗴𝗲𝗻𝗰𝗲: Sub-300ms latency voice agents powered by custom-trained speech models and high-speed LLMs. Featuring natural "barge-in" capabilities for seamless interruptions. • 𝗠𝘂𝗹𝘁𝗶-𝗔𝗴𝗲𝗻𝘁 𝗢𝗿𝗰𝗵𝗲𝘀𝘁𝗿𝗮𝘁𝗶𝗼𝗻: Designing autonomous workflows where specialized LLM agents collaborate to resolve complex tasks (sales, support, logistics) without human intervention. • 𝗘𝗻𝘁𝗲𝗿𝗽𝗿𝗶𝘀𝗲 𝗥𝗔𝗚 𝗔𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲𝘀: Building high-accuracy Retrieval-Augmented Generation pipelines (pgvector-backed) to ensure zero-hallucination, context-aware reasoning on private business data. • 𝗣𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴: Optimizing the stack from the ground up to ensure high availability and the lowest possible inference cost. 🧠 𝗠𝗢𝗗𝗘𝗟 𝗧𝗥𝗔𝗜𝗡𝗜𝗡𝗚 & 𝗜𝗡𝗙𝗘𝗥𝗘𝗡𝗖𝗘 𝗢𝗣𝗧𝗜𝗠𝗜𝗭𝗔𝗧𝗜𝗢𝗡 • 𝗖𝘂𝘀𝘁𝗼𝗺 𝗔𝗦𝗥 & 𝗧𝗧𝗦 (𝗖𝗼𝗿𝗲 𝗘𝘅𝗽𝗲𝗿𝘁𝗶𝘀𝗲): Training state-of-the-art speech models from scratch. Highly accurate acoustic/language models for complex dialects and noisy telephony. • 𝗔𝗱𝘃𝗮𝗻𝗰𝗲𝗱 𝗟𝗟𝗠 𝗙𝗶𝗻𝗲-𝗧𝘂𝗻𝗶𝗻𝗴: Utilizing Reinforcement Learning (RLHF, DPO, GRPO) to strictly align outputs to business logic and eliminate hallucinations. • 𝗛𝗮𝗿𝗱𝘄𝗮𝗿𝗲-𝗟𝗲𝘃𝗲𝗹 𝗢𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻: Writing custom CUDA kernels and deploying via vLLM and TensorRT-LLM to maximize GPU utilization for streaming. • 𝗦𝗼𝘃𝗲𝗿𝗲𝗶𝗴𝗻 & 𝗣𝗿𝗶𝘃𝗮𝘁𝗲 𝗔𝗜: Architecting secure, on-premise LLMs and private cloud deployments for strict data residency (PCI-DSS/HIPAA ready). 🌍 𝗚𝗟𝗢𝗕𝗔𝗟 & 𝗠𝗨𝗟𝗧𝗜𝗟𝗜𝗡𝗚𝗨𝗔𝗟 𝗥𝗘𝗔𝗖𝗛 Deep native linguistic optimization, not just translation: • 𝗠𝗘𝗡𝗔 & 𝗚𝗹𝗼𝗯𝗮𝗹: Saudi Najdi, Arabic, US/UK English, and European dialects. 🤝 𝗟𝗘𝗧’𝗦 𝗖𝗢𝗡𝗡𝗘𝗖𝗧 I partner with enterprise leaders and engineering teams to solve complex AI bottlenecks. If you are hitting the limits of standard APIs, struggling with custom ASR/TTS/LLM training, or scaling enterprise RAG and multi-agent workflows—let’s talk architecture. 📩 DM to discuss AI strategy, infrastructure, or technical partnerships.

Experience

  • Artificial Intelligence Consultant at Stealth AI Startup
    May 2026 - Present · 2 mos

    Independent AI Consultant specializing in building and deploying production AI systems at enterprise scale. Fine tuning LLMs using Unsloth LoRA QLoRA PEFT and full parameter training on multi GPU clusters. Training custom ASR and TTS models for multilingual voice applications. Optimizing model inference and serving using vLLM SGLang TensorRT LLM and GGUF quantization for high throughput low latency production deployments. Building private RAG pipelines with Agentic RAG GraphRAG LangChain LlamaIndex ChromaDB Pinecone and Weaviate for enterprise document intelligence using private LLMs embedding models and rerankers ensuring complete data sovereignty. Developing multi agent and autonomous AI systems using Google ADK and custom orchestration frameworks. Building tool calling function calling and MCP based agent architectures for complex enterprise workflows. DSPy based prompt optimization and programmatic prompt engineering for reliable LLM outputs. Generative UI development and text to analytics platforms for enterprise data visualization and decision making. Engineering real time voice AI agents with Pipecat LiveKit Asterisk FreeSWITCH and Twilio across WebRTC SIP and PSTN channels. End to end voice pipelines with STT LLM TTS integration. Edge AI deployment and optimization using LiteRT ONNX Runtime TensorRT and quantized models for real time inference on edge devices. On device LLM deployment and optimization. Computer vision on edge cameras with VLM fine tuning for anomaly detection and geospatial mapping with PostGIS. Model training on H100 A100 clusters using PyTorch DeepSpeed FSDP Accelerate and Slurm. RLHF DPO and preference optimization. Vision language model fine tuning. JEPA and self supervised learning architectures for next generation representation learning.

  • Chief Technology Officer at LiteCompute AI
    Oct 2025 - May 2026 · 8 mos

    Leading the development of next-generation AI services at litecompute.ai. We move beyond simple chatbots to build autonomous AI Agents and custom Machine Learning systems that drive real business ROI. What we deliver: 🚀 Custom AI Agents: Designing intelligent agents capable of reasoning, tool use, and executing multi-step tasks autonomously. 🧠 Advanced ML Services: Delivering tailored ML solutions, including predictive analytics and computer vision integration. 🔒 Secure GenAI Infrastructure: Implementing private RAG (Retrieval-Augmented Generation) stacks for clients who demand total data privacy and control. We turn "AI Hype" into deployable, secure software.

  • Founder at Zingaro Ai
    Dec 2024 - May 2026 · 1 yr 6 mos

    Building Zingaro AI, a platform that lets businesses answer every call and message in the customer’s own language—across Phone, WhatsApp, and Web—with human-like voice agents. • Launched the product to paying customers; established repeatable pilots → subscription motion for SMB & mid-market. • Defined product vision and roadmap around 30+ languages, “perfect memory,” analytics, and enterprise readiness. • Opened key verticals (financial services/gold loans, aviation safety, retail surveys) with referenceable case studies. • Built the go-to-market engine: ICPs, pricing, outreach playbooks, and partner channels; steady MRR growth. • Set up customer success and onboarding to deliver value fast and keep retention high. • Drove partnerships and integrations (WhatsApp/Meta, CRM and payments) to meet real business workflows. • Established security/compliance posture suitable for enterprise procurement. • Led brand and narrative: positioning, website, one-pagers, pitch materials, and LinkedIn presence. • Hired and coached a lean founding team; installed OKRs, weekly reviews, and a data-driven operating cadence. • Represent the company with customers, partners, and investors; own revenue, product quality, and culture.

  • AI/ML Studio Publisher at Lightning AI
    Mar 2024 - Oct 2024 · 8 mos

    Served as the technical backbone for Studio demos and content—owning speech/ASR experiments that powered on-camera showcases and partner pilots. • Trained and fine-tuned multilingual ASR models for Indian languages (Hindi, Telugu, Tamil, Odia, Bengali) and English; built evaluation harnesses (WER/CER, accent & noise stress tests). • Curated/cleaned speech datasets (call-quality, code-mixed, romanized text); implemented text normalization and QA workflows to lift label quality. • Prototyped streaming/telephony scenarios (VAD, chunking, diarization assumptions) to make demos robust to real phone audio. • Packaged models and simple APIs/notebooks so the Studio team could produce reliable, reproducible demos on deadline. • Benchmarked open-source and commercial baselines; tracked results in a standardized dashboard for topic selection and demo readiness. Impact: Enabled repeatable, multilingual speech demos that increased audience trust and accelerated partner conversations.

  • Machine Learning Consultant at ADQ Services
    Aug 2023 - Feb 2024 · 7 mos

    Led applied ML projects in LLM training and speech for Indic languages. • Orchestrated large-scale LLaMA training runs (7B/13B/70B); built instruction-tuning sets and evaluation loops. • Ran multi-node training on A100s with distributed strategies; reduced training time with attention/throughput optimizations. • Built a large corpus by converting ~1M PDFs with a vision-based parser; standardized preprocessing for downstream training. • Fine-tuned Whisper for Indic ASR (Tamil/Telugu/Kannada + others); created speech-text datasets and robust WER/CER harnesses; shipped lightweight CPU/mobile builds. • Set up a pragmatic RAG pipeline for “chat-with-data” prototypes used in stakeholder demos. • Streaming ASR for telephony (8 kHz): VAD, endpointing/chunking, barge-in handling, and latency budgeting; stress-tested accents/noise. • TTS (Text-to-Speech): Trained multi-speaker, multilingual TTS with controllable rate/pitch; curated prompt/phoneme lexicons and G2P rules for code-mixed Indic languages; ran vocoder experiments for low-latency synthesis; produced 8/16 kHz telephony-ready voices; built small-footprint/quantized variants for CPU/mobile; added pronunciation editor and MOS/listening tests for QA; prototyped consent-based voice cloning and accent adaptation. • Prototyped diarization and forced alignment to enable redaction, highlights, and turn-level analytics. • Instrumented QoS dashboards (latency, stability, error codes) and a results logbook for reproducibility. Impact: Faster model experiments, stronger Indic ASR/TTS baselines, and repeatable LLM/RAG + voice demos used in partner conversations.