Programming Languages: Proficient in Python, PyTest and PySpark with a strong understanding software engineering best practices.
Oops Concept: Strong Knowledge of It.
Cloud Computing: Utilize Azure cloud-based data platforms, specifically leveraging Databricks and Delta Live Tables for data engineering tasks, while effectively utilizing services related to storage, compute, and security.
Data Pipelines: Design, build, and maintain robust and scalable and automated data pipelines for batch and streaming data ingestion of data and processing (Data bricks workflow).
Data Architecture and Modeling: Design and implement robust data models and architectures that align with business requirements and support efficient data processing, analysis, and reporting.
Orchestration: Utilize workflow orchestration tools to automate data pipeline execution and dependency management.
Monitoring and Alerting: Integrate monitoring and alerting mechanisms to track pipeline health, identify performance bottlenecks, and proactively address issues.
Strong Agile principles: Utilize Agile development methodologies, actively participating in sprint planning, daily stand-ups, sprint reviews, and retrospectives. Be flexible and adaptable to changing requirements and priorities throughout the project lifecycle.
Unity Catalog
Good to Have:
Github Actiom
Datagog exposure
Data Quality: Implement data quality checks and balances throughout the data pipeline, including profiling, validation, and root cause analysis, to ensure data accuracy, completeness, and consistency.
CI/CD: Implement continuous integration and continuous delivery (CI/CD) practices for automated testing and deployment of data pipelines.