Toronto, Ontario, Canada
Senior Data Engineer with end-to-end ownership of complex data ecosystems. I build secure, scalable data pipelines across diverse applications and APIs, ensuring reliable, well-modeled data for analytics, security, and product teams. I focus on strong data engineering fundamentals including automated testing, metadata standards, and pipeline observability. I also leverage AI-assisted development, building and using LLM-based tools and agents to streamline tasks such as schema design, documentation, and workflow optimization. I’ve led major refactoring and migration efforts, modernizing legacy systems and improving data models and storage efficiency while delivering maintainable, high-impact data solutions.
Use Claude Code in daily development workflows with client agents and MCP server integrations (including MCP-Security) to interact programmatically with data systems, APIs, and security frameworks for automation and advanced analysis. Design and operate scalable data ingestion pipelines that process large volumes of structured and unstructured data, ensuring reliable data collection, normalization, and downstream usability. Build and maintain robust ETL/ELT pipelines and event-driven data workflows using cloud and serverless architectures to support high-throughput and fault-tolerant data processing. Improved CI/CD pipelines and infrastructure automation using Terraform, optimizing data workflows for performance, cost efficiency, and reliable deployment at scale. Design and build AI-powered engineering tools and automation frameworks using LangChain and OpenAI, enabling developers to rapidly create modular AI agents for data workflows and operational tasks. Integrate AI-assisted development practices across the data platform, leveraging LLMs for schema generation, documentation, pipeline optimization, and data analysis
- Collaborating with client companies to establish and enhance data infrastructure on the Snowflake platform, ensuring seamless data replication and transfering. - Designing, engineeing, and maintaining data transformation pipelines, consolidating diverse data sources into Snowflake, adhering to best practices in data modeling and ETL/ELT processes. - Leveraging cloud data analytics services, demonstrating proficiency in Python and SQL for writing elegant and maintainable code. - Contributing to the adoption of best practices in data system creation, test design, analysis, validation, and documentation, fostering a culture of data trustworthiness and product excellence.
- Leveraging Wi-Fi device data to generate actionable insights to share the stories with customers through a state-of-the-art BI module by harnessing Big Data from Databricks DDM, integrating it with Tableau, and deploying it on AWS for effective delivery. - Devised an automated framework, reducing manual reporting efforts through scheduled queries, AWS S3 storage utilization, validation, and timely stakeholder notifications - Utilize PySpark and SQL on Databricks to collect and integrate data from diverse sources into a centralized Data Lake. Process the data, making it ready for quick modeling with AutoML. Direct the results to relevant data pipelines for efficient data-driven decision-making using modern BI tools on AWS - Applying a data-driven approach to define, prototype, and evaluate Wi-Fi metrics that comprehensively addressed both technical aspects and end-user quality of experience. Employed Python, PySpark, and SQL languages within Databricks environment, while harnessing AutoML for accelerated base modeling.
- Executed comprehensive data cleaning, organization, and analysis tasks, effectively processing ETL workflows into cloud storage, databases, and Feature Store by using AWS services mainly SageMaker, S3, and Airflow as a third-party application to automate, and streamline efficiency, and accuracy. - Containerized and deployed machine learning applications using Docker and ECS, with integration into ECR for efficient management and deployment
- Applied analytic techniques, such as Predictive Modeling, Time Series Analysis, Regression, Classifica- tion, and Clustering, to extract meaningful insights from customer data. To create inputs for marketing strategies and decision-making processes. - Authored and co-authored academic articles published by prestige universities and earned an excellent paper award. - Integrated data using Nifi to design, automate, and manage the data flow between different systems.