Chennai, Tamil Nadu, India
Data Scientist with 9 years of experience building production ML systems across banking, automotive, and energy. At Wells Fargo, I build agentic AI systems. I designed an LLM pipeline with code retrieval and automated validation that converts legacy Ab Initio pipelines (~20,000-line graph files) to SQL — it executes the generated SQL, compares outputs against reference data, and feeds discrepancies back to the model for iterative self-correction; the generated SQL was validated end-to-end to match legacy pipeline outputs. I also reconstructed a 600+ node execution DAG spanning 291 SQL stored procedures and 65 Ab Initio files, tracing every output column back to its leaf-level source variables — column-level lineage for model governance and regulatory audit. At Ford, I built a Generalized Nested Logit demand system producing 216 forecasts per vehicle per quarter (≤10% error) for the UK and France, and maximized enterprise profit by optimizing 864 decision variables under business constraints using IPOpt and Sequential Quadratic Programming. I also built an ML competitor-identification system ranking competitors at trim level across the five major European markets — UK, France, Germany, Spain, and Italy — delivered to business teams through an internal web application I built. At TCS, I built the optimization model behind a system running live in a 525MW thermal power plant since 2020 — constrained Differential Evolution over a 1,703-variable sensor space — generating ~₹21 Crores/year in projected savings. Indian patent granted: "System and Method for Efficient Model Selection and Hyper-Parameter Tuning" (Application No. 202021031790). Core expertise: Agentic AI & RAG systems, LLMs, Mathematical Optimization (IPOpt, SQP, Differential Evolution), Discrete Choice Econometrics (GNL / Nested Logit), Forecasting, NLP, Python, SQL. I write about machine learning on Medium: https://nitishkthakur.medium.com/
Promoted to Lead (P4) in August 2025. I build agentic AI systems for model migration and financial analysis: • Built an agentic LLM pipeline converting legacy Ab Initio pipelines (~20,000-line graph files) to SQL — code retrieval, LLM-driven generation, and an automated validation loop that executes generated SQL, compares outputs against reference data, and feeds discrepancies back for self-correction; generated SQL validated end-to-end to match legacy pipeline outputs. • Reconstructed a 600+ node execution DAG spanning 291 SQL stored procedures and 65 Ab Initio files, tracing every output column back to its leaf-level source variables — column-level lineage for model governance and regulatory audit. • Built an AI system to deconstruct complex Excel-based finance models (66 tabs, ~200K formulas) into standardized business requirements — designed a custom Excel MCP server with progressive-disclosure retrieval, generating interactive documentation and recursive upstream-source tracing.
Model governance, re-engineering, and first agentic AI systems for the bank's finance model estate (P3 — Senior band): • Re-engineered legacy Excel-based finance models (50–100 sheets, formulas/VBA) to run on a centralized model execution and versioning platform — input configuration, environment promotion (UAT/SIT/prod), and reporting. • Refactored finance models from offline procedures into a centralized Model Orchestration Framework, reducing execution time and improving reproducibility; designed and presented the execution architecture as DAGs to stakeholders. • Built the agentic backend of a conversational assistant for the model platform — an LLM agent orchestrating MCP servers to create data, execute finance models, retrieve results, and generate insights by grounding results in model documentation; implemented caching and backend services in collaboration with a senior software engineer. • Mentored an engineer end-to-end through building a multi-agent peer-earnings analysis system — intent-routed agents (question expansion, agentic RAG, database extraction, graph generation) with two-tier report generation — benchmarking company financials against industry peers for credit risk workflows. This mentorship was part of my promotion case. Promoted to Lead (P4) in August 2025.
Data Scientist in Business Sales Planning and Analytics Team(M&S). 1. Optimized Enterprise Profit by generating Quarterly Variable Marketing Strategies for Selected Vehicles in EU5 markets. Used Non-Convex Optimization methods which were constrained with Business directives. 2. Forecasted Orders and distribution of customers across Financing Options along with Important KPIs for Business. Calculated Product Elasticities and captured Cross Substitution across various Products. Used Advanced Discrete Choice Models for the same. 3. Built Open Book Q&A NLP Model for an internal search engine for semantic search. 4. Automated Competitor Identification for Vehicles in EU10 markets and Built Web-Application to share results with Stakeholders. 5. Built Vehicle Recommendation system for Online customers visiting the website. Majority of work done in Python, SQL, Alteryx, Excel, Qlik
My responsibilities include generating and presenting Analytics based reports and building machine learning pipelines for Predictive Analytics and Prescriptive Analytics Use-cases. Independently led development of several Data Science, Machine Learning and Analytics Use-cases. Key Contributions: 1. Reduced Power Plant losses(DFG) by 6% by building Machine Learning Models and performing Optimization. 2. Detected Customers with anomalous turbine operation by building Unsupervised Anomaly Detection system. 3. Designed python package for automating hyper-parameter tuning for machine learning models using Optimization and for minimizing time complexity of machine learning pipeline. 4. Built ML pipeline to predict Suction temperature of Chillers to an error of 2% using Ensemble Models. 5. Deduced Usage Patterns of Engines leading to frequent Engine Trips and Shutdowns using Association rule mining and Clustering 6. Reduced Report generation time by 72 hours for a Thermal Power Plant by building a Machine Learning model to predict Fuel Quality to an accuracy of 88%.