United States
Results-driven Data Engineer with over 10 years of experience designing and implementing modern data platforms across Azure cloud services. Proven expertise in architecting scalable, secure, and high-performance ETL/ELT pipelines using tools like Azure Data Factory, Databricks, Snowflake, and Delta Lake. Specialized in building event-driven, metadata-driven, and real-time processing architectures that power critical analytics and AI/ML initiatives. Deep hands-on experience with Markit EDM for enterprise data mastering, golden copy workflows, and data governance in insurance and financial domains. Proficient in Python, PySpark, SQL, and dbt, with strong capabilities in dimensional modeling (Star/Snowflake schemas), CI/CD (Azure DevOps, Git), and orchestration (Control-M, Event Grid). Demonstrated success in hybrid and remote teams across retail, finance, and healthcare—driving operational excellence, data quality, and cross-functional collaboration.
Designed and maintained data pipelines supporting CDP initiatives by integrating CRM, behavioral, and third-party sources into a unified customer view. Configured XDM schemas and identity graphs in Adobe Experience Platform (AEP) to enable real-time personalization and activation across digital channels. Architected end-to-end ETL solutions for both batch and streaming use cases, integrated Palantir Foundry for complex data modeling, and built serverless workflows using Cloud Functions and Cloud Run. Collaborated cross-functionally with product owners, data scientists, and analytics teams to deliver robust data infrastructure supporting machine learning pipelines and enterprise-scale decision intelligence.
I designed and implemented scalable data pipelines using Dataflow, BigQuery, and Python to process high-volume financial and operational datasets. Developed ELT frameworks and hybrid real-time/batch workflows, integrating data from SAP sources into BigQuery using Cloud Storage, Pub/Sub, and Cloud Data Fusion with metadata-driven orchestration. Simulated master data management (MDM) processes by building data mastering pipelines using Python, dbt (BigQuery adapter), and Cloud Composer, aligning with enterprise golden record standards. Enhanced BigQuery performance through partitioning, clustering, and optimized query patterns, and designed centralized data models for cross-functional analytics across finance, inventory, and operations. Applied Terraform and Deployment Manager for infrastructure provisioning, developed automated data validation frameworks using Python and Cloud Functions, and ensured audit and compliance via lineage tracking and data cataloging in Data Catalog. Enabled self-service BI using Looker, Data Studio, and federated queries from external sources like Cloud SQL and GCS, delivering timely insights across business teams.
I engineered cloud-native data solutions using AWS Glue, Redshift, and PySpark, focusing on clinical and IoT analytics. I developed ETL and streaming pipelines leveraging Kafka and Kinesis to integrate real-time device data into Redshift. I implemented Python-based data validation scripts and RESTful APIs using Flask for serving ML model predictions to clinical dashboards. I also migrated on-prem SQL data to AWS S3 and Redshift, built reusable data lakes, and automated infrastructure via Terraform and CloudFormation. My contributions included creating Tableau dashboards, implementing CI/CD pipelines using Git and Jenkins, and integrating predictive models into SageMaker-powered analytics workflows.
At Cox Communications, I developed and managed large-scale data processing pipelines using PySpark and Azure HDInsight to streamline customer analytics and performance insights. I orchestrated ETL workflows using Azure Data Factory and implemented deduplication and cleansing rules via PySpark to ensure high data quality. I built and deployed Azure ML-based machine learning pipelines, integrating Python models for predictive analytics, and created RESTful services using Flask for real-time data access. I managed AWS infrastructure including EC2, S3, IAM, and VPC, deploying serverless workflows using Lambda. I also containerized and orchestrated microservices using Docker and Kubernetes, while implementing CI/CD automation through Jenkins, Ansible, and Nexus. Monitoring was performed using CloudWatch, Splunk, and AppDynamics, ensuring operational continuity in an agile delivery environment.
I developed and maintained robust ETL pipelines using Spark, SSIS, and Python, handling structured and semi-structured data from diverse enterprise systems. I built and orchestrated scalable Spark applications using PySpark and Spark SQL for transformation and aggregation tasks. I contributed to the design of Hadoop-based data platforms leveraging Hive, Pig, and HDFS, while optimizing workflows for performance and reliability. I implemented SSIS packages and managed orchestration using NIFI, Docker, and Kubernetes. CI/CD pipelines were established using Jenkins and Apache Airflow, and I enforced data validation logic using Spark Streaming for real-time integrity checks. Additionally, I supported business teams with dashboard development using SQL Server and Informatica and collaborated on implementing data governance best practices aligned with enterprise compliance standards.