Raritan, New Jersey, United States
I am a data science and engineering leader with 10+ years of experience transforming complex healthcare data into actionable insights. My work sits at the intersection of machine learning, real-world evidence, and data strategy, with a current focus on improving outcomes in neuroscience through advanced analytics and predictive modeling. In my current role, I lead initiatives that leverage Rx and medical claims, clinical data, and AI-driven insights to better understand provider behavior, treatment pathways, and patient outcome in neuropsychiatric and neurodegenerative diseases. My team and I develop and deploy scalable predictive models to inform clinical decision-making, optimize patient engagement, and support evidence-based neuroscience strategies.
- Worked as one of the data engineering leads who designed and built enterprise-level data pipeline to enable data analysis, machine learning, and artificial intelligence applications for a wide variety of business units at GSK (Talend, Azure Data Factory) - Implemented and developed Spark (Databricks) jobs for data ingestion and transformation (Java, PySpark, SparkSQL) - Designed and developed tailored data solutions for business units at GSK (Azure Functions, Azure EventHub, Azure SQL Data Warehouse, Python, Azure DevOps) - Provisioned, integrated, and automated Azure SQL Data Warehouse (Azure CLI, T-SQL, Python, Azure DevOps, Jenkins) - Enabled the continuous integration and development using DevOps tools (Azure DevOps and Jenkins)
- Developed a fault-tolerant Apache Airflow architecture with multiple synchronized schedulers (Python) - Designed and implemented a high-volume data pipeline to visualize the trend of stock prices traded in Deutsche Stock Exchange using AWS S3, Apache Spark, PostgreSQL, and Dash by Plotly - Tested the robustness of the fault-tolerant Airflow architecture on the implemented data pipeline by using real-life scheduler failure-scenarios (Python, SQL)
- Contributed to the development of machine learning algorithms for Internet traffic classification. - Contributed to the development of evacuation planning in indoor-fire scenarios using machine learning. - Worked on free-space optical communications systems for high-speed trains. - Involved in the development of transport protocols and load balancing mechanisms for data center networks.
- Involved in the development of Internet traffic classification using machine learning and achieved a classification accuracy of 92%. - Implemented a per-packet load balancing approach for data center networks and improved the completion time of flows by 20%. - Designed a free-space optical communications system for high-speed trains to provide a 1-Gbps data link with the collaboration of China National Railway Locomotive Company (CRRC). - Developed an adaptive-divergence beam that provides an average received-power gain of 35 dB over a fixed-divergence beam in a free-space optical communications system for high-speed trains.