Implement observability practices using Splunk, Dynatrace, Prometheus, Grafana, Datadog, Jaeger/Zipkin. Define SLIs/SLOs and build dashboards for actionable insights into system health.
splunk, Jenkins, Remedy BMC (knowing CSR - certificate signing request for cert installation & for pathching), kafka , linux, chef, Dynatrace, azure knowledge and AWS is needed
Develop and maintain CI/CD pipelines with Jenkins, GitLab CI, or GitHub Actions to automate build, test, and deployment processes, including rollback strategies.
Diagnose and resolve production issues through logs, metrics, and debugging tools. Participate in incident management, perform root cause analysis (RCA), and contribute to blameless postmortems.
Implement security best practices: secrets management (Vault), zero-trust architectures, vulnerability management, and compliance standards (SOC 2, GDPR).
Manage and operate Apache Kafka (must-have skill): configure topics, manage partitions, ensure high availability, monitor metrics (e.g., consumer lag, throughput), and troubleshoot issues like message loss or latency.
Work with Axon Framework (must-have skill): design and maintain event-driven systems using CQRS/ES (Command Query Responsibility Segregation / Event Sourcing) patterns, integrate with Kafka for event streaming, and ensure scalability and resilience of distributed applications.
Manage and operate other messaging/streaming platforms such as NATS or MQ as needed.
Qualifications:
BS in Computer Science or a related technical field (e.g., Physics, Mathematics) OR equivalent practical experience.
4–5 years of hands-on experience in software development, systems administration, and cloud infrastructure management.
Proven expertise in Apache Kafka and Axon Framework (must-have).