Berlin, Berlin, Germany
Modernized the BI landscape by migrating MSSQL data pipelines to Microsoft Fabric and Azure Synapse, implementing scalable PySpark and SQL workflows for analytics and reporting. Built reusable datasets and Power BI semantic models enabling data-driven decisions across departments. Integrated REST APIs, reduced ETL licensing costs, and introduced machine learning models for forecasting and data trend analysis.
Designed and built a Hadoop-based big data platform using Spark, HBase, and Airflow to centralize hospital data from 90+ clinical systems. Developed PySpark ELT pipelines for SAP data, automated testing and CI/CD, and optimized performance through parallelization. Delivered dashboards and data marts supporting data quality monitoring, governance, and large-scale analytics initiatives.
Working with pytorch to build a cycle GAN network for colour transformation of lung tissue
Building a Deep neural network for classification of RNA molecules with keras.