Jordan Tan

Data Engineer @ Shopee

Singapore

About

Motivated and driven individual who is interested in leveraging on the strengths of technology to help businesses and society.

Experience

  • Data Engineer at Shopee
    Jul 2022 - Present · 4 yrs

    • Technologies used: Apache Spark, Hadoop, Presto, Hive, Scala, Python • Designed and developed the data warehouse for local shop traffic (clicks, impressions, views, orders) on the Shopee platform to provide readily available data for business analysis • Optimised Spark applications such that they met the service-level agreement for the business analysts, as well as minimised file storage in the Hadoop distributed file system • Optimised SQL queries of business analysts by designing and developing efficient data warehouse and pipelines • Created strict and robust data quality check (DQC) rules for the various tables in the data warehouse • Developed scripts to automate data pipeline and DQC rule deployment to live environment through CI/CD • Wrote technical design documents explaining how developed automation scripts work, as well as the business definition of the tables in the data warehouse • Collaborated with various business analysts to understand and deliver business requirements, as well as upstream data engineering teams to resolve data anomalies • Took charge of migrating the data warehouse from SG data centers to US data centers, ensuring data quality is not affected • Developed a Grafana dashboard using PromQL that shows resource usage metrics of Spark applications

  • Software Engineer at DSO National Laboratories
    May 2021 - Jul 2021 · 3 mos

    • Developed from scratch a web application that detects anomalies in system log files • Used React.js for frontend, Flask for backend, and MongoDB for database • Implemented and utilised React Hooks, Components and Props • Used extensive JSX and CSS to style the frontend UI • Used D3.js to develop interactive graphs for anomaly results visualization • Trained and optimized Autoencoder Neural Networks using TensorFlow and Keras • Created K-Nearest Neighbours anomaly detection model using Scikit-learn • Worked independently in a solo project to complete the full stack application on schedule

  • Data Scientist at GovTech Singapore
    May 2020 - Jul 2020 · 3 mos

    • Developed machine learning models to analyse the major business concerns of SMEs in Singapore • Worked with a dataset comprising more than 30000 textual enquiries from SMEs all over Singapore • Used NLP concepts such as TF-IDF, Word2Vec, N-grams, Stemming, Lemmatization, and RegExp for data cleaning/processing • Developed models such as Latent Dirichlet Allocation, K-means Clustering, K-Medoids Clustering to uncover topics of concerns faced by Singapore SMEs • Used metrics such as Topic Coherence and Elbow method for parameter fine tuning • Regularly checked in with end users using an Agile approach to make sure that the project’s results met their needs

  • Data Analyst at Ministry of Defence of Singapore
    Dec 2019 - Jan 2020 · 2 mos

    • Developed a program to uncover deeper information concerning user web browsing behaviour • Used SQLite to read Google Chrome browsing information from History file stored in the computer’s hard drive • Wrote a Python program to automate the process of deriving total duration spent on a webpage over different periods, total count of times the webpage was visited, etc. • Program was able to determine if webpage was visited by typing in URL or by clicking on link • Research contributed significantly to the area of Digital Forensics and Cyber Defence

  • Administrative Associate at Teknor Apex Asia Pacific
    Apr 2018 - Jun 2018 · 3 mos

    • Worked in the Environmental Health and Safety (EHS) Department • Used VBA to automate an Excel file for taking attendance of staff and filling in of staff particulars • Helped to draft an extensive checklist for the company’s audit • Carried out data entry and validation using Excel