Martin Marušiak

Data Engineer

Prague, Czechia

About

Experience

  • Consultant at Profinit

  • Data Engineer at Komerční banka

    Data Engineer | Cloudera Platform Design and implement ETL processes and data transformations, primarily using SQL (Impala). Lead Python developer for an internal package enabling application metadata management, data quality checks and automation (similar in functionality to dbt). Occasional support for application operations and monitoring (cover for a colleagues during their absence), including production issue resolution and release deployment.

  • Student Research Assistant - Soluprot at Loschmidt Laboratories

    Research of protein solubility. Designing a new tool for sequence-based prediction of soluble protein expression in E.coli, SoluProt, was created using the gradient boosting machine technique with the TargetTrack database as a training set. When evaluated against a balanced independent test set derived from the NESG database. Aggregating existing datasets, data clensing with cooperation of domain experts, feature design, model training and evaluation. Available at: https://loschmidt.chemi.muni.cz/soluprot Published in Bioinformatics: https://pubmed.ncbi.nlm.nih.gov/33416864 Thesis: https://dspace.vutbr.cz/handle/11012/85135

  • Student Research Assistant at CESNET, z.s.p.o

    DDoS Backscatter detection (C++, python). Creating custom dataset from real flow data provided by CESNET, transforming, enriching and cleansing data from flows into row based feature dataset. Designing prediction features. Developing plugin in C++ for near-real time feature extraction utilizing optimization structures and algorithms such as bloom filtering. Training and evaluation of machine learning models. Detection tools is integrated in NEMEA (stream-wise, flow-based and modular detection system for network traffic analysis) in a form of two modules: 1. Near-real-time feature extractor written in C++: https://github.com/CESNET/Nemea-Modules/tree/master/backscatter 2. Python classifier: https://github.com/CESNET/Nemea-Detectors/tree/master/backscatter_classifier Thesis: https://dspace.vutbr.cz/handle/11012/200166