SRE Kafka

Fulcrum Digital Inc

Dublin

Description

Requirements

Responsibilities:

  • Implement observability practices using Splunk, Dynatrace, Prometheus, Grafana, Datadog, Jaeger/Zipkin. Define SLIs/SLOs and build dashboards for actionable insights into system health.
  • splunk, Jenkins, Remedy BMC (knowing CSR - certificate signing request for cert installation & for pathching), kafka , linux, chef, Dynatrace, azure knowledge and AWS is needed
  • Develop and maintain CI/CD pipelines with Jenkins, GitLab CI, or GitHub Actions to automate build, test, and deployment processes, including rollback strategies.
  • Diagnose and resolve production issues through logs, metrics, and debugging tools. Participate in incident management, perform root cause analysis (RCA), and contribute to blameless postmortems.
  • Implement security best practices: secrets management (Vault), zero-trust architectures, vulnerability management, and compliance standards (SOC 2, GDPR).
  • Manage and operate Apache Kafka (must-have skill): configure topics, manage partitions, ensure high availability, monitor metrics (e.g., consumer lag, throughput), and troubleshoot issues like message loss or latency.
  • Work with Axon Framework (must-have skill): design and maintain event-driven systems using CQRS/ES (Command Query Responsibility Segregation / Event Sourcing) patterns, integrate with Kafka for event streaming, and ensure scalability and resilience of distributed applications.
  • Manage and operate other messaging/streaming platforms such as NATS or MQ as needed.

Qualifications:

  • BS in Computer Science or a related technical field (e.g., Physics, Mathematics) OR equivalent practical experience.
  • 4–5 years of hands-on experience in software development, systems administration, and cloud infrastructure management.
  • Proven expertise in Apache Kafka and Axon Framework (must-have).